Skip to content

Generating a consensus

Ryan Wick edited this page Jan 5, 2021 · 20 revisions

Requirements

Before this step, you'll need to have run the previous ones. If your cluster directories contain 2_all_seqs.fasta, 3_msa.fasta and 4_reads.fastq files, you should be ready!

Concept

The final step of Trycycler is to generate a consensus contig sequence for each cluster. It does this by converting the MSA into a graph form, containing "same" chunks (where all the input sequences agree) and "different" chunks (where there are two or more options). It then chooses the most popular option for each different chunk (see How variants are chosen for the consensus sequence for more details). When there is a tie between options, Trycycler aligns the reads to the alternative sequences and chooses the option with the best read alignment scores.

Running Trycycler consensus

The Trycycler consensus command must be run separately for each of your good clusters.

Assuming your trycycler output directory is trycycler and your good clusters are numbers 1, 2 and 3, these are the commands you would run:

trycycler consensus --cluster_dir trycycler/cluster_001
trycycler consensus --cluster_dir trycycler/cluster_002
trycycler consensus --cluster_dir trycycler/cluster_003

Settings

  • --linear: use this option if your input contigs are not circular. It will disable the circularisation steps when aligning reads and choosing variants.
  • --min_aligned_len: reads with less than this many bases aligned (default = 1000) will be ignored.
  • --min_read_cov: reads with less than this percentage of their length aligned (default = 90.0) will be ignored.
  • --threads: this is how many threads Trycycler will use for read alignment. It will only affect the speed performance, so you'll probably want to use as many threads as you have available.
  • --verbose: use this flag to display extra output. For every read-assessed variant, this will show the alternative sequences and their read alignment scores.

Output

When finished, you should have a 7_final_consensus.fasta file in each of your cluster directories. If you have multiple clusters, you can combine their consensus sequences into a single FASTA file like this:

cat trycycler/cluster_*/7_final_consensus.fasta > trycycler/consensus.fasta

This is the end of Trycycler's pipeline! However, you might want to polish your consensus sequences to further improve their accuracy.

Clone this wiki locally