Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
example		example
scripts		scripts
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
detettore.py		detettore.py
detettore_ad.png		detettore_ad.png
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Repository files navigation

detettore – a program to detect transposable element polymorphisms

September 2021

detettore uses reference-aligned paired-end reads to search for:

TE insertion polymorphisms (TIPs), i.e. TEs absent in the reference genome but present in a sequenced individual
TE absence polymorphisms (TAPs), i.e. TEs present in the reference but absent in a sequenced individual

About

detettore was developed and tested with plant (Brachypodium distachyon, 272 Mb) and bacterial (Mycobacterium tuberculosis, 4.4 Mb) genomes.

New in version 2:

all output in VCF format
output of invariant sites
genotype calling based on genotype likelihoods
compatibility with minimap2 and other mappers
inference of TIPs from single-end reads

Installation

detettore is written in Python 3 and available on PyPI. The only non-Python dependency is minimap2, which can be downloaded here: https://github.com/lh3/minimap2.

To avoid conflicts with dependencies, it is best to install detettore in a virtual environment:

# Create environment called detettore
conda create -n detettore python=3.7

# Activate environment
source activate detettore

# Install detettore
pip install detettore

Or when not using Anaconda:

# Create environment, where <location> is the path to the environment
virtualenv -p python3.7 <location>

# Activate environment
source <location>/bin/activate

# Install detettore
pip install detettore

After installation, three commands should be callable from the command line: detettore, combinevcf, and bamstats.

Usage

Single sample

Basic usage illustrated with the data in the example folder.

detettore \
  -b example/reads.bam \
  -r example/reference.fasta \
  -a example/TE_annotation.gff \
  -t example/TE_consensusLib.fasta \
  -o example \
  -m tips taps \
  --require_split \
  --include_invariant

Explanation of command line parameters

Parameter	Explanation
Input/Output
-b	bam/cram file with reference-aligned paired-end or single-end reads.
-r	Reference genome in fasta format.
-t	TE consensus sequences in fasta format.
-a	TE annotation in bed or gff format.
-o	Sample name, used as a prefix for output files.
Program settings
-m	The module to run (tips, taps, or both, as above).
-c	Number of CPUs.
--region	Restrict search to region chromosome:start:end.
--include_invariant	Include conserved TEs in vcf output.
--require_split	Discard variant candidates if no splitread evidence is present.
--keep	Keep intermediate files.
Thresholds
-q	Minimum mapping quality of reference-aligned anchor reads.
-lDR	Minimum alignment length for discordant read-pair mtarget hits. [50]
-lSR	Minimum alignment length for splitread target hits. [20]

Points to consider:

Chromosome names must be consistent in the different files, which can be a problem when files are downloaded from different sources.
The bam file should be as unfiltered as possible: files containing only properly paired or uniquely mapping reads, while useful for SNP calling, lack the information required by detettore.
With single-end reads, only TIPs will be detected, while TAPs will be output as missing data (./. in the VCF).

Output

The command above will produce a file called example.vcf.gz containing TIPs and TAPs, and a log file example.log. If --keep was set, a folder sample_tmp will also be present, containing ...

Multiple samples

Here is an example workflow to call TE polymorphisms on multiple samples. GNU parallel is used to run 10 samples in parallel, with one CPU per sample. A different approach might be required on computing clusters with job queing.

# Set input files which do not change for different samples
ref=/path/to/reference.fasta
annot=/path/to/TE_annotation.gff
telib=/path/to/TE_consensus.fasta

# Create a list of detettore commands, looping through a list containg paths to bam files.
# The files are assumed to be named sample.bam
while read bampath;
do

  sample=$(basename $bampath | cut -d'.' -f1)

  echo "\
  detettore \
    -b $bampath \
    -r $ref \
    -a $annot \
    -t $telib \
    -o $sample \
    -m tips taps \
    -c 1 \
    --require_split \
    --include_invariant"\
  >> run_detettore.cmds

done < paths_to_bam.txt

# Use GNU parallel to run 10 commands simultaneously. Save stderr and stdout to log files.
parallel -j10 < run_detettore.cmds 2> err.log > stdout.log

Combine VCFs

Single VCF files can be combined with the command combinevcf. Some basic filtering can be applied in this step.

# Combine VCF files
combinevcf <Path to folder containing *.vcf.gz output of detettore>

# Show filtering options
combinevcf -h

Licence

GNU General Public License v3. See LICENSE.txt for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

detettore – a program to detect transposable element polymorphisms

Table of Contents

About

Installation

Usage

Single sample

Explanation of command line parameters

Output

Multiple samples

Combine VCFs

Licence

About

Releases 1

Packages

Contributors 2

Languages

License

cstritt/detettore

Folders and files

Latest commit

History

Repository files navigation

detettore – a program to detect transposable element polymorphisms

Table of Contents

About

Installation

Usage

Single sample

Explanation of command line parameters

Output

Multiple samples

Combine VCFs

Licence

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages