Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
samsift		samsift
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.rst		README.rst
increment_version.py		increment_version.py
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

SAMsift

SAMsift is a program for advanced filtering and tagging of SAM/BAM alignments using Python expressions.

Getting started

git clone http://github.com/karel-brinda/samsift
cd samsift
# keep only alignments with alignment score >94
samsift/samsift -i tests/test.bam -o filtered.sam -f 'AS>94'
# add tags 'ln' with sequence length and 'ab' with average base quality
samsift/samsift -i tests/test.bam -o with_ln_ab.sam -c 'ln=len(SEQ);ab=1.0*sum(QUAL)/ln'

Installation

Using Bioconda:

# add all necessary Bioconda channels
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

# install samsift
conda install samsift

Using PIP from PyPI:

pip install --upgrade samsift

Using PIP from Github:

pip install --upgrade git+https://github.com/karel-brinda/samsift

Command-line parameters

usage: samsift.py [-h] [-v] [-i file] [-o file] [-f py_expr] [-c py_code]
                  [-d py_expr] [-t py_expr]

Program: samsift (advanced filtering and tagging of SAM/BAM alignments using Python expressions)
Version: 0.1.0
Author:  Karel Brinda <[email protected]>

optional arguments:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit
  -i file        input SAM/BAM file [-]
  -o file        output SAM/BAM file [-]
  -f py_expr     filter [True]
  -c py_code     code to be executed (e.g., assigning new tags) [None]
  -d py_expr     debugging expression to print [None]
  -t py_expr     debugging trigger [True]

Algorithm

for ALIGNMENT in ALIGNMENTS:
        if eval(DEBUG_TRIGER):
                print(eval(DEBUG_EXPR))
        if eval(FILTER):
                exec(CODE)
                print(ALIGNMENT)

All Python expressions can access variables mirroring the fields from the alignment section of the SAM specification, i.e., QNAME, FLAG, RNAME, POS (1-based), MAPQ, CIGAR , RNEXT, PNEXT, TLEN, SEQ, and QUAL. For instance, keeping only reads with POS smaller than 10000 can be done by

samsift -i tests/test.bam -f 'POS<10000'

The PySAM representation of current alignment (class pysam.AlignedSegment) is available through variable a. Therefore, the previous example is equivalent to

samsift -i tests/test.bam -f 'a.reference_starts+1<10000'

All SAM tags are translated to variables with equal name. For instance, if alignment score is provided through the AS tag (as it is defined in the Sequence Alignment/Map Optional Fields Specification), then alignments with score smaller or equal to the sequence length can be removed using

samsift -i tests/test.bam -f 'AS>len(SEQ)'

If CODE is provided, all two-letter variables are back-translated to tags. For instance, a tag ab carrying the average base quality can be added by

samsift -i tests/test.bam -c 'ab=1.0*sum(QUAL)/ln'

Similar programs

samtools view can filter alignments based on FLAGS, read group tags, and CIGAR strings.
sambamba view supports, in addition to SAMtools, filtration using simple perl expression. However, it's not possible to compare different tags.
bamPals adds tags XB, XE, XP and XL.
SamJavascript can filter alignments using JavaScript expressions.

Author

Karel Brinda <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAMsift

Getting started

Installation

Command-line parameters

Algorithm

Similar programs

Author

About

Releases 6

Packages

Languages

License

karel-brinda/samsift

Folders and files

Latest commit

History

Repository files navigation

SAMsift

Getting started

Installation

Command-line parameters

Algorithm

Similar programs

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages