Skip to content

Latest commit

 

History

History

benchmark

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Benchmark

Datasets and results are described at http://shenwei356.github.io/seqkit/benchmark

The benchmark needs be performed in Linux-like operating systems.

Install softwares

Softwares

  1. seqkit. (Go). Version v0.3.1.1.
  2. fasta_utilities. (Perl). Version 3dcc0bc. Lots of dependencies to install_.
  3. fastx_toolkit. (Perl). Version 0.0.13. Can't handle multi-line FASTA files_.
  4. seqmagick. (Python). Version 0.6.1
  5. seqtk. (C). Version 1.1-r92-dirty.

A Python script memusg was used to computate running time and peak memory usage of a process.

Attention: the fasta_utilities uses Perl module Term-ProgressBar which makes it failed to run when using benchmark script run_benchmark_00_all.pl. Please change the source code of ProgressBar.pm (for me, the path is /usr/share/perl5/vendor_perl/Term/ProgressBar.pm). Add the code below after line 535:

$config{bar_width} = 1 if $config{bar_width} < 1;

The edited code is

} else {
  $config{bar_width}  = $target;
  $config{bar_width} = 1 if $config{bar_width} < 1;   # new line
  die "configured bar_width $config{bar_width} < 1"
  if $config{bar_width} < 1;
}

Clone this repository

git clone https://github.com/shenwei356/seqkit
cd seqkit/benchmark

Data preparation

http://shenwei356.github.io/seqkit/benchmark/#datasets

Or download all test data seqkit-benchmark-data.tar.gz (2.2G) and uncompress it, and then move them into directory seqkit/benchmark

wget http://app.shenwei.me/data/seqkit/seqkit-benchmark-data.tar.gz
tar -zxvf seqkit-benchmark-data.tar.gz
mv seqkit-benchmark-data/* seqkit/benchmark

Run tests

A Perl scripts run.pl is used to automatically running tests and generate data for plotting.

$ perl run.pl -h
Usage:

1. Run all tests:

perl run.pl run_benchmark*.sh --outfile benchmark.5test.tsv

2. Run one test:

perl run.pl run_benchmark_04_remove_duplicated_seqs_by_name.sh -o benchmark.rmdup.tsv

3. Custom repeate times:

perl run.pl -n 3 run_benchmark_04_remove_duplicated_seqs_by_name.sh -o benchmark.rmdup.tsv

To compare performance between different softwares, run:

./run.pl run_benchmark*.sh -n 3 -o benchmark.5tests.tsv

To test performance of other functions in seqkit, run:

./run.pl run_test*.sh -n 1 -o benchmark.seqkit.tsv

Plot result

R libraries dplyr, ggplot2, scales, ggthemes, ggrepel are needed.

Plot for result of the five tests:

./plot.R -i benchmark.5tests.tsv

Plot for result of the tests of other functions in seqkit:

./plot.R -i benchmark.seqkit.tsv --width 5 --height 3

./plot.R -i benchmark.5tests.tsv --width 8 --height 3 --lx 0.75 --ly 0.3