Minimap2 Python Binding

Minimap2 is a fast and accurate pairwise aligner for genomic and transcribed nucleotide sequences. This module wraps minimap2 and provides a convenient interface to calling minimap2 in Python.

Installation

The minimap2 module can be installed directly with:

git clone https://github.com/lh3/minimap2
cd minimap2
python setup.py install

or with pip:

pip install --user minimap2

Usage

The following Python program shows the key functionality of this module:

import minimap2 as mm
a = mm.Aligner("test/MT-human.fa")  # load or build index
if not a: raise Exception("ERROR: failed to load/build index")
for hit in a.map("GGTTAAATACAGACCAAGAGCCTTCAAAGCCCTCAGTAAGTTGCAATACTTAATTTCTGT"):
        print("{}\t{}\t{}\t{}".format(hit.ctg, hit.r_st, hit.r_en, hit.cigar_str))

It builds an index from the specified sequence file (or loads an index if a pre-built index is specified), aligns a sequence against it, traverses each hit and prints them out.

APIs

Class minimap2.Aligner

Aligner(fn_idx_in, preset=None, ...)

Arguments:

fn_idx_in: index or sequence file name. Minimap2 automatically tests the file type. If a sequence file is provided, minimap2 builds an index. The sequence file can be optionally gzip'd.
preset: minimap2 preset. Currently, minimap2 supports the following presets: sr for single-end short reads; map-pb for PacBio read-to-reference mapping; map-ont for Oxford Nanopore read mapping; splice for long-read spliced alignment; asm5 for assembly-to-assembly alignment; asm10 for full genome alignment of closely related species. Note that the Python module does not support all-vs-all read overlapping.
k: k-mer length, no larger than 28
w: minimizer window size, no larger than 255
min_cnt: mininum number of minimizers on a chain
min_chain_score: minimum chaing score
bw: chaining and alignment band width
best_n: max number of alignments to return
n_threads: number of indexing threads; 3 by default
fn_idx_out: name of file to which the index is written

map(seq)

This method maps seq against the index. It yields a generator, generating a series of Alignment objects.

Class minimap2.Alignment

This class has the following properties:

ctg: name of the reference sequence the query is mapped to
ctg_len: total length of the reference sequence
r_st and r_en: start and end positions on the reference
q_st and q_en: start and end positions on the query
strand: +1 if on the forward strand; -1 if on the reverse strand
mapq: mapping quality
NM: number of mismatches and gaps in the alignment
blen: length of the alignment, including both alignment matches and gaps
trans_strand: transcript strand. +1 if on the forward strand; -1 if on the reverse strand; 0 if unknown
is_primary: if the alignment is primary (typically the best and the first to generate)
cigar_str: CIGAR string
cigar: CIGAR returned as an array of shape (n_cigar,2). The two numbers give the length and the operator of each CIGAR operation.

An Alignment object can be converted to a string in the following format:

q_st  q_en  strand  ctg  ctg_len  r_st  r_en  blen-NM  blen  mapq  cg:Z:cigar_str

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

Minimap2 Python Binding

Installation

Usage

APIs

Class minimap2.Aligner

Class minimap2.Alignment

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

Minimap2 Python Binding

Installation

Usage

APIs

Class minimap2.Aligner

Class minimap2.Alignment