Tools for phylogenomic analyses

This repository includes several python scripts aimed at facilitating some steps of phylogenomic analysis pipeline. They particularly have been used for the following work:

Marlétaz F, Peijnenburg KTCA, Goto T, Satoh N, Rokhsar DS. A new spiralian phylogeny refines the position of the enigmatic arrow worms. in preparation

cross-conta.py

The problem of index hoping causes a small fraction of reads to cross-contaminate illumina libraries sequenced on the same lane. This problem is minor for quantitative approaches but for de novo assembly, it can cause the presence of mislabelled assembled transcripts. This tool follows the same line of reasonning as described in Simion et al. (2018). Briefly, for a given transcriptome, it generates read count against all libraries sequenced on the same lane using kallisto, and then filters out each transcript that has a higher count on another library than the ones it belongs to.

usage: cross-conta.py [options] <ctrl>

  ctrl        Control file including names of assembly and paired reads   for each library

optional arguments:
  -p NPROC    Number of threads (default: 8)
  -f FOLD     Fold-enrichment to discard contig (default: 2)
  -m MINCOV   Minimal coverage of contig by corresp. reads (default: 2)

phylostata.py

This utility computes various statistics against a collection of alignments and applies from filters.

Dependencies: ete3 library, numpy and Biopython which can all be installed with conda.

Briefly, it checks the monophyly for each clade mentioning in the taxonomic list, it computes the mutational saturation for each alignment, and excludes taxa with divergence to the root higher than a threshold. The usage is very simply:

Usage: phyloStrata.py <taxlist> <suffix> <fasta files...>

taxlist needs to be formatted as a list of taxa within the alignments with a generic clade name separated by a tab. The monophyly of the taxa in the clade will be checked. The suffix will be used for the output file. The tree files need to be named as taxon.xx.xx and corresponding fasta files taxon.al.hc.tr.fa.

concatenate-red.py

This files build a concatenated alignments from the filtered alignment and the statistics file generated by phylostata.py.

Usage: concatenate-ext.py <taxlist> <suffix> <fasta files...>

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
concatenate-red.py		concatenate-red.py
cross-conta.py		cross-conta.py
phyloStrata.py		phyloStrata.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tools for phylogenomic analyses

cross-conta.py

phylostata.py

concatenate-red.py

About

Releases

Packages

Languages

fmarletaz/phylogenomics

Folders and files

Latest commit

History

Repository files navigation

Tools for phylogenomic analyses

cross-conta.py

phylostata.py

concatenate-red.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages