GitHub - pythseq/ISEScan at d2c4afe5b9ad7925917b4f0b244215a8886eb32c

pythseq / ISEScan Public

forked from xiezhq/ISEScan

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
hmm		hmm
prediction		prediction
proteome		proteome
NC_012624.fna		NC_012624.fna
clusters.faa.hmm		clusters.faa.hmm
clusters.single.faa		clusters.single.faa
constants.py		constants.py
isPredict.py		isPredict.py
isPredictSingle.py		isPredictSingle.py
is_analysis.py		is_analysis.py
license.ssw.txt		license.ssw.txt
pred.py		pred.py
pyssw.py		pyssw.py
readme		readme
ssw_wrap.py		ssw_wrap.py
tools.py		tools.py

Repository files navigation

ISEScan is a python package to identify IS elements in genome sequences.

# Input and output

> It requires three input files:

1) Genome sequence file in FASTA format, one sequence per file.
2) Profile hidden markov models which are shipped with ISEScan, clusters.faa.hmm and clusters.single.faa.

> After the program running ends, it will produced a list of IS elements identified in the genome sequence and put all produced files into user-specified directory, for example, ./prediction. The produced files include:

1) one *.sum file, which summarizes the number of the identified IS element copies for each family
2) one *.gff file, which list each identified IS element copy with family classification and its TIR (Terminal Inverted Repeat)
3) one *.out file, which list each identified IS element copy with all details, one IS element copies per line
4) one *.is.fna file, which is a FASTA sequence file and contains the nucleic sequence of each identified IS copies

# How to run it?

ISEScan can run on any platform when the required packages are pre-installed on your computer. However, we recommend running it on Linux platform.

> Running ISEScan requires some packages pre-installed on your computer. For your convenience, we list below each required package, the name, recommended version and the site where you can find:

1) Python 3.3.3 or later, https://www.python.org/downloads/
2) numpy-1.8.0 or later, https://sourceforge.net/projects/numpy/files/NumPy/1.8.0/
3) scipy-0.13.1 or later, https://sourceforge.net/projects/scipy/files/scipy/0.13.1/
4) fastcluster, latest version recommended, https://pypi.python.org/pypi/fastcluster
5) FragGeneScan 1.19 or later, https://sourceforge.net/projects/fraggenescan/
6) HMMER-3.1b2 or later, http://hmmer.org/download.html
7) BLAST 2.2.31 or later, https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
8) SSW Library, the latest version does not work well with ISEScan and the correct SSW library is shipped with ISEScan, https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library

# Before we start, please make sure you have done two things below.

1) You have all required packages installed on your computer.
2) Config pakcages. If you haven't done it, please follow the instructions below to finish it before we can run ISEScan.
2.1) Open constants.py, and find two lines marked with 'Config packages';
2.2) Modify the paths to FragGeneScan (and phmmer, hmmsearch, blastn, makeblastdb).
2.3) Save and close constants.py.

# Let's try an example, NC_012624.fna.
# The command below scans NC_012624.fna (a genome sequence from genome Sulfolobus_islandicus_Y_N_15_51), and output all results in prediction directory.

python3 isPredictSingle.py NC_012624.fna proteome hmm

# Wait for its finishing. It may take a while as ISEScan always use the HMMER to scan the genome sequences and it will use 496 profile HMM models to scan each protein sequence translated from the genome sequence.

After ISEScan finish running, you can find three important files in prediction directory, NC_012624.fna.sum, NC_012624.fna.gff, NC_012624.fna.is.fna. The summarization of IS copies for each IS family is in NC_012624.fna.sum, NC_012624.fna.gff list each IS element copy and its TIR. NC_012624.fna.is.fna holds the nucleic acid sequence of each IS element copy.

October 18, 2016