Perl vs Python vs Ruby: Fasta reading using Bio packages

Since all the languages I mentioned in my previous post have Bio packages which can parse fasta files, I did a quick comparison of the performance of the three implementations. Here are the implementations, they are highly similar.



Python Hg19.fa 65.15s user 11.84s system 99% cpu 1:17.00 total
fastaLengths-bio.rb Hg19.fa 56.07s user 14.18s system 99% cpu 1:10.26 total Hg19.fa 46.85s user 13.11s system 99% cpu 59.969 total

This highlights a major implementation deficiency in the perl and ruby bio projects for reading fasta files as the results here are the exact reverse of the simple parsers from my previous post. This performance regression is due to the bioperl SeqIO method attempting to identify the sequence as dna or protein every time next_seq is called, setting the type in the SeqIO constructor brings the perl implementation back in the lead by a fair margin.

Perl 2 Hg19.fa 38.50s user 10.76s system 99% cpu 49.267 total