This repository includes several python scripts aimed at facilitating some steps of phylogenomic analysis pipeline. They particularly have been used for the following work:
Marlétaz F, Peijnenburg KTCA, Goto T, Satoh N, Rokhsar DS. A new spiralian phylogeny refines the position of the enigmatic arrow worms. in preparation
The problem of index hoping causes a small fraction of reads to cross-contaminate illumina libraries sequenced on the same lane. This problem is minor for quantitative approaches but for de novo assembly, it can cause the presence of mislabelled assembled transcripts. This tool follows the same line of reasonning as described in Simion et al. (2018). Briefly, for a given transcriptome, it generates read count against all libraries sequenced on the same lane using kallisto, and then filters out each transcript that has a higher count on another library than the ones it belongs to.
usage: cross-conta.py [options] <ctrl>
ctrl Control file including names of assembly and paired reads for each library
optional arguments:
-p NPROC Number of threads (default: 8)
-f FOLD Fold-enrichment to discard contig (default: 2)
-m MINCOV Minimal coverage of contig by corresp. reads (default: 2)
This utility computes various statistics against a collection of alignments and applies from filters.
Dependencies: ete3 library, numpy and Biopython which can all be installed with conda.
Briefly, it checks the monophyly for each clade mentioning in the taxonomic list, it computes the mutational saturation for each alignment, and excludes taxa with divergence to the root higher than a threshold. The usage is very simply:
Usage: phyloStrata.py <taxlist> <suffix> <fasta files...>
taxlist
needs to be formatted as a list of taxa within the alignments with a generic clade name separated by a tab. The monophyly of the taxa in the clade will be checked.
The suffix
will be used for the output file.
The tree files
need to be named as taxon.xx.xx and corresponding fasta files taxon.al.hc.tr.fa
.
This files build a concatenated alignments from the filtered alignment and the statistics file generated by phylostata.py
.
Usage: concatenate-ext.py <taxlist> <suffix> <fasta files...>