Index structure has changed since commit 6743183. Rebuild the Index if you are using a later commit.
Added MC flag in the output sam file in commit a591e22. Output should match original bwa-mem version 0.7.17.
As of commit e0ac59e, we have a git submodule safestringlib. To get it, use --recursive while cloning or use "git submodule init" and "git submodule update" in an already cloned repository (See below for more details).
# Use precompiled binaries (recommended)
curl -L https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.0pre2/bwa-mem2-2.0pre2_x64-linux.tar.bz2 \
| tar jxf -
bwa-mem2-2.0pre2_x64-linux/bwa-mem2 index ref.fa
bwa-mem2-2.0pre2_x64-linux/bwa-mem2 mem ref.fa read1.fq read2.fq > out.sam
# Compile from source (not recommended for general users)
# Get the source
git clone --recursive https://github.com/bwa-mem2/bwa-mem2
cd bwa-mem2
# Or
git clone https://github.com/bwa-mem2/bwa-mem2
cd bwa-mem2
git submodule init
git submodule update
# Compile and run
make
./bwa-mem2
Bwa-mem2 is the next version of the bwa-mem algorithm in bwa. It produces alignment identical to bwa and is ~1.3-3.1x faster depending on the use-case, dataset and the running machine.
The original bwa was developed by Heng Li (@lh3). Performance enhancement in bwa-mem2 was primarily done by Vasimuddin Md (@yuk12) and Sanchit Misra (@sanchit-misra) from Parallel Computing Lab, Intel. Bwa-mem2 is distributed under the MIT license.
For general users, it is recommended to use the precompiled binaries from the
release page. These binaries were compiled with the Intel compiler and
runs faster than gcc-compiled binaries. The precompiled binaries also
indirectly support CPU dispatch. The bwa-mem2
binary can automatically choose
the most efficient implementation based on the SIMD instruction set available
on the running machine. Precompiled binaries were generated on a CentOS6
machine using the following command line:
make CXX=icpc multi
The usage is exactly same as the original BWA MEM tool. Here is a brief synopsys. Run ./bwa-mem2 for available commands.
# Indexing the reference sequence (Requires 28N GB memory where N is the size of the reference sequence).
./bwa-mem2 index [-p prefix] <in.fasta>
Where
<in.fasta> is the path to reference sequence fasta file and
<prefix> is the prefix of the names of the files that store the resultant index. Default is in.fasta.
# Mapping
# Run "./bwa-mem2 mem" to get all options
./bwa-mem2 mem -t <num_threads> <prefix> <reads.fq/fa> > out.sam
Where <prefix> is the prefix specified when creating the index or the path to the reference fasta file in case no prefix was provided.
Datasets:
Reference Genome: human_g1k_v37.fasta
Alias | Dataset source | No. of reads | Read length |
---|---|---|---|
D1 | Broad Institute | 2 x 2.5M bp | 151bp |
D2 | SRA: SRR7733443 | 2 x 2.5M bp | 151bp |
D3 | SRA: SRR9932168 | 2 x 2.5M bp | 151bp |
D4 | SRA: SRX6999918 | 2 x 2.5M bp | 151bp |
Machine details:
Processor: Intel(R) Xeon(R) 8280 CPU @ 2.70GHz
OS: CentOS Linux release 7.6.1810
Memory: 100GB
Notes:
- If you are using machine with multiple sockets and numa domains, please use
numactl
to run on single socket (numa domain) for maximum performance.
E.g: In dual numa and dual socket system:numactl -m 0 -N 0 <bwa-mem2>
command executes bwa-mem2 on first socket and numa domain. - The following charts shows bwa-mem2 performance on 1 thread and 56 threads with single-end and paired-end reads.
Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019.