IIBacFinder

IIBacFinder (class II bacteriocin finder) is designed to detect class II bacteriocins (small, unmodified antimicrobial peptides).

Workflow:

Overview of IIBacFinder specializing in detecting unmodified bacteriocins

Schematic flow of bacteriocin mining in IIBacFinder

Installation

Download the latest version of IIBacFinder from zenodo. Of note, do not clone the package directly from the GitHub repository as it is incomplete

wget -O IIBacFinder.tar.gz https://zenodo.org/records/14292149/files/IIBacFinder.tar.gz?download=1
tar -zxvf IIBacFinder.tar.gz

Download and clone the execution environment for IIBacFinder

# download
wget -O env_IIBacFinder.tar.gz https://zenodo.org/records/14292149/files/env_IIBacFinder.tar.gz?download=1
# clone
mkdir -p $path/env_IIBacFinder
tar -xzf env_IIBacFinder.tar.gz -C $path/env_IIBacFinder
source $path/env_IIBacFinder/bin/activate
conda unpack

$path is where the IIBacFinder environment will be unpacked and cloned.

install signalp6

Due to license restrictions, this recipe cannot distribute signalp6 directly.

Please download signalp-6.0d.fast.tar.gz from:
https://services.healthtech.dtu.dk/cgi-bin/sw_request?software=signalp&version=6.0&packageversion=6.0h&platform=fast

After registering online, you will receive the link for package download via email, and then download the package locally.

Assuming you have downloaded the package locally, then run the following command to complete the installation:

signalp6-register signalp-*tar.gz

This will copy signalp6 into your conda environment. After this step, the installation of IIBacFinder will be complete.

Let's say that you need to deactivate the environment after the prediction, you can run

source $path/env_IIBacFinder/bin/deactivate

Running IIBacFinder

To confirm the successful installation and view all options, execute the command below

python $PATH/IIBacFinder/scripts/predict.py -h

$PATH is the directory where IIBacFinder was placed.

usage: IIBacFinder [-h] -i INDIR [-e HMMEXCUTE] [-a AMPEP] [-r RCMD] [-s HMMSCAN] -o OUTDIR [-t THRESHOLD] [-p] [-m {single,meta}] [-v]

Detecting class II bacteriocins from genomes

optional arguments:
  -h, --help            show this help message and exit
  -i INDIR, --inDir INDIR
                        Input path to folder containg FASTA file, whose suffix could in list of ['.fas', '.fa', '.fasta', '.faa', '.fna']
  -e HMMEXCUTE, --hmmexcute HMMEXCUTE
                        The excutive hmmsearch, defualt: hmmsearch
  -a AMPEP, --ampep AMPEP
                        The excutive ampep, defualt: ampep
  -r RCMD, --Rcmd RCMD  The excutive Rscript, defualt: Rscript
  -s HMMSCAN, --hmmscan HMMSCAN
                        The excutive hmmscan, defualt: hmmscan
  -o OUTDIR, --outDir OUTDIR
                        The path to the folder storing output files
  -t THRESHOLD, --threshold THRESHOLD
                        Number of threshols used, default: 20
  -p, --prodigal_short  Whether perform gene prediction using prodigal short (default: TRUE), toggle to close. This is indispensable for FASTA files.
  -m {single,meta}, --prodigal_p {single,meta}
                        Select procedure model (single or meta) in 'prodigal_short', identical to the parameter '-p'. Default is single
  -v, --version         Print out the version and exit.

Key parameters:

-i: Input directory where FASTA files for prediction are located.
-o: Output directory
-p: If the input FASTA files are genome sequences, this parameter should remain at its default setting to run gene prediction using prodigal_short. If the input FASTA files are prediction results from prodigal_short, this parameter should be specified to skip the prodigal_short prediction step.
-m: The prediction model for prodigal_short prediction, see more details at prodigal.

Running a demo:

python $PATH/IIBacFinder/scripts/predict.py -i $PATH/IIBacFinder/test_fasta/ -o test_prediction

Prediction results can be found in test_prediction.

Output files

IIBacFinder generates output files explained below:

Intermediate folders
- prodigal_out
  
  Gene prediction by prodigal-short
- prediction_domain
  
  Prediction results based on precursor rules
- prediction_geneContext
  
  Prediction results based on context gene rules
- region_annotation
  
  Domain scanning for bactericoin gene clusters
- diamond_alingment
  
  Blast results for predicted bacteriocin precursors against publicly available AMP sequences
- leader_prediction
  
  Results of leader prediction by SignalP 6.0 and modified NLPPrecursor
Data output folders
- results
  
  Prediction results of each input FASTA file
- region_plot
  
  Visualizations of predicted bacteriocin gene clusters, including three files with formats in .svg, .tsv, and .gbk
Output summary files
- overall_result.tsv (most important)
  - CDs: CDs ID in genome annotation files by prodigal-short which could be found at prodigal_out
  - Rules: Prediction rule, one of Domain (prediction based on precursor rules), GeneContext (prediction based on context gene rules), and Both (prediction based on both precursor and context gene rules )
  - Domain_rule: The ID of self-build precursor domains which could be found at IIBacFinder/domains/classII-related.hmm
  - Domain_Evalue: E-value of hmmsearch query results
  - Domain_Bitscore: Bitscore of hmmsearch query results
  - Context_rule: Context gene rules used for bacteriocin prediction, which could be found at IIBacFinder/domains/rule.gene.context.hmm
  - PFAM_domain: Domain annotation of precursor sequence against PFAM database, which could be found at IIBacFinder/domains/Pfam-A.hmm
  - NCBI_domain: Domain annotation of precursor sequence against NCBI database, which could be found at IIBacFinder/domains/hmm_PGAP.LIB
  - Sequence: Predicted bacteriocin precursor sequence
  - Length: Length of predicted precursor sequence
  - Description: Putative description of predicted precursor sequence
  - Potential_Leader_Type: Inferred leader type
  - Genome: Genome ID
  - Region: Predicted gene cluster regions whose details could be found in `region_plot``
  - Contig: Contig ID
  - Start: Start position of predicted precursor sequence
  - End: End position of predicted precursor sequence
  - Strand: Strand of predicted precursor sequence
  - Partial_index: Completeness of predicted precursor sequence, which was annotated by prodigal-short
  - Start_type: Start codon of predicted precursor sequence, which was annotated by prodigal-short
  - RBS_motif: RBS motif of predicted precursor sequence, which was annotated by prodigal-short
  - Including_elements: Predicted elements associated with bacteriocin biosynthesis in the gene cluster region
  - Uniq_ID: Assigned a unique ID for each predicted precursor sequence
  - leader_sec: Predicted leader sequence of precursor sequence with sec type
  - core_sec: Predicted core sequence of precursor sequence with sec type
  - leader_gg: Predicted leader sequence of precursor sequence with double-glycine type
  - core_gg: Predicted core sequence of precursor sequence with double-glycine type
  - Predicted_mature_peptide: Predicted mature sequence of bacteriocin sequence
  - Confidence: Assigned confidence level
  - Length__core: Length of the predicted mature sequence
  - Charge (pH=7)__core: Charge of the predicted mature sequence annotated by peptides.py
  - Isoelectric_point__core: Isoelectric point of the predicted mature sequence annotated by peptides.py
  - Molecular_weight (monoisotopic)__core: Molecular weight of the predicted mature sequence annotated by peptides.py
  - Aliphatic_index__core: Aliphatic index of the predicted mature sequence annotated by peptides.py
  - Boman__core: Boman index of the predicted mature sequence annotated by peptides.py
  - Instability_index__core: Instability index of the predicted mature sequence annotated by peptides.py
  - Hsp_len: The length of best hit when querying predicted bacteriocin sequence against publically available AMP sequences
  - Hsp_identity: The identity of the best hit
  - Coverage_q: The coverage of the best hit compared to predicted bacteriocin sequence. For example, 100.0 (54/54) represents 100% coverage with the best hit and predicted bacteriocin being 54AAs and 54AAs, respectively.
  - Coverage_s: The coverage of the best hit compared to AMP sequence. For example, 78.3 (54/69) represents 78.3% coverage with the best hit and known AMP being 54AAs and 69AAs, respectively.
  - AMP_seq: AMP sequence
  - AMP_accession: AMP accession ID in the publicly available database
  - AMP_name: AMP description in the database
  - AMP_database: AMP database, including APD3, DRAMP, DBAASP, and dbAMP2
- all.precusors.gg.fa and all.precusors.gg.json
  
  Predicted bacteriocin precursor sequences with a putative double-glycine leader
- all.precusors.sec.fa and all.precusors.sec.json
  
  Predicted bacteriocin precursor sequences with a putative sec leader
- all.precusors.fa
  
  Overall predicted bacteriocin precursor sequences

Notes:

IIBacFinder may overlook certain precursors due to its prediction threshold, especially for glycine-type bacteriocins, which can sometimes have multiple precursors within a single gene cluster. Therefore, it is advisable to double-check the predicted gene cluster instead of relying solely on precursor prediction.
As with all bioinformatics tools, it is important not to place complete trust in the predictions. Conducting a manual check of the precursor and context gene is advisable to exclude false positives.

Reference:

Zhang, Dengwei, et al. "Systematically investigating and identifying unmodified bacteriocins in the human gut microbiome." bioRxiv (2024): 2024-07.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
AMP_database		AMP_database
IMGs		IMGs
filter		filter
models		models
scripts		scripts
test_fasta		test_fasta
test_prediction		test_prediction
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IIBacFinder

Installation

Running IIBacFinder

Running a demo:

Output files

Notes:

Reference:

About

Releases

Packages

Languages

ZhangDengwei/IIBacFinder

Folders and files

Latest commit

History

Repository files navigation

IIBacFinder

Installation

Running IIBacFinder

Running a demo:

Output files

Notes:

Reference:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages