Skip to content

sein-tao/bam2m5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bam2m5: Covert bam file to Pacbio m5 format

Description

Pacbio .m5 alignment format is used by several assembly software like pbdagcon, sparc, yet only blasr could output this format. sam/bam file is still the standard output format for alignment. Here provides a converter from sam/bam format to m5, connecting universal Pacbio alignment results with downstream analysis softwares need m5 format.

Usage

Usage: python3 bam2m5.py <in.bam> <ref.fa> [--score <score_scheme>] <out.m5>

in.bam

input bam file, should be sorted by coordinate for efficiency.

Note: as coordinate of read is used in m5 file, if bam is generated by blasr, -clipping (soft|hard) parameter should be used.

ref.fa

reference file

the <ref.fa>.fai is needed. either you already have one or the programm will build one for you, in which case the write permission to the dir which contains <ref.fa> is needed.

score_scheme
  • scoring parameter used for alignment, in format match,mismatch,gap_open,gap_extend.
  • eg: -5,6,10,0 means score -5 for a match, score 6 for a mismatch, 10 for gap open and 0 for one base gap extend.
  • the default score schemes for these software:
    • blasr -sam: -5,6,10,0
    • blasr -m 5: -5,6,0,5
    • bwa mem -x pacbio: 1,-1,-1,-1
  • notice the different sign of scores for blasr and bwa
  • the default value is -5,6,0,5, equivalent to blasr -m 5 scheme
  • if the score filed in m5 file is used by downstream analysis, one may choose use the blasr -m 5 scheme to get compatibility with blasr -m 5 result, no matter which score scheme is really used by the alignment software.
out.m5
output m5 file

example: python3 bam2m5.py align.sorted.bam ref.fa -5,6,0,5 align.sorted.m5

Dependency

Python >= 3.0:
this script is in python3, python2 support may be added later
BioUtil >= 0.2:
python package, handling bam file, fasta file reading. Use pip3 install BioUtil to install this pacakge.
cython >= 0.24:
used for speed up code

Install

  1. download the file, unzip the file, into the dir
  2. run python3 cython_build.py
  3. (optional) cd test; bash run_test.sh for test. (blasr needed)
  4. Now you can invoke bam2m5.py for use.

Authors

Yu XU, [email protected]

Lisense

These scripts are under GPL2 lisense.

About

convert bam file to Pacbio m5 format

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published