-
Notifications
You must be signed in to change notification settings - Fork 28
How circularisation repair works
Most bacterial replicons are circular, which is relevant for Trycycler in two ways: getting a clean circularisation (no gap or overlap) and getting a consistent starting point. This is done as part of the Trycycler reconcile command.
Trycycler attempts to circularise each contig sequence using each of the other sequences as a reference. Specifically, it aligns the start and end of the contig to the other sequences and uses those alignments to determine whether the contig is already circular, needs sequence added or needs sequence removed.
In the following examples, sequence A is the one we are trying to circularise and sequence B is the other reference sequence.
Ideally, A's end is immediately followed by A's start in B. If this is the case, that means A is already circular and there's nothing more to do.
It may be that A's end and start are both found in B, but with a gap in between. This implies that A is missing some sequence in its circularisation. Trycycler will fill in this gap using the sequence between the hits in B.
If A's end and start overlap in B, that implies that A has too much sequence – i.e. some sequence is duplicated at its start/end. In this case, Trycycler will trim A's end to give it a clean circularisation.
If there is too much gap between A's end and start in B, that implies that A is missing a lot of sequence. Trycycler will fail to circularise A in this case. It probably makes sense to exclude A and try running Trycycler reconcile again.
Conversely, A's start might come well before A's end in B. This implies that A has quite a lot of overlap. Trycycler may be able to resolve this by trimming the start/end of A, but it might not. If this happens, you can try to manually trim A and then run Trycycler reconcile again. Or else you can simply exclude A.
If A's start and end are found in multiple places in B, this will also cause Trycycler to fail circularisation. This suggests that A begins/end in a repeat sequence – not necessarily a problem with the assembly but it does make circularisation difficult. In such cases, simply excluding A is probably in order.
If A's start or end is not found in B, that will also cause a failure to circularise. This suggests that either A contains spurious sequence or B contains missing sequence. When this causes a circularisation failure, it's best to exclude A.
If A and B have the same start/end, then there is no information for fixing A's circularisation. This sometimes happens with two input assemblies from the same assembler. It's usually not a problem, as A's circularisation can be repaired using one of the other sequences instead.
Trycycler will conduct all pairwise circularisations. For example, if you have four input assemblies (A, B, C and D), Trycycler will attempt to circularise sequence A using sequences B, C and D. It will attempt to circularise sequence B using sequences A, C and D, and so on.
This means there can be multiple ways to circularise a sequence. For example, A might be circularised in three ways: 20 bp added from B, 21 bp added from C and 19 bp added from D. To choose which is the best option, Trycycler aligns the reads to the circularisation junction (this is why reads must be given as a command line parameter to Trycycler reconcile). Whichever circularisation option results in the highest total alignment score is chosen as the final one.
A circular sequence can potentially start at any point on either strand and still be a valid assembly. However, when reconciling multiple alternative contigs, it is necessary to make all sequences consistent with each other – i.e. start at the same point and on the same strand.
By convention, Trycycler will try to start the contigs at a replication initiator protein gene sequence like dnaA. For more detail, see Starting sequences for circular replicons. To be a suitable starting point, the starting sequence must be in each of the contigs and only occur once in each contig.
If a replication initiator protein gene sequence can't be found, Trycycler will randomly select a subsequence which is present in each of the contigs only once and use that as the starting sequence.
- Home
- Software requirements
- Installation
-
How to run Trycycler
- Quick start
- Step 1: Generating assemblies
- Step 2: Clustering contigs
- Step 3: Reconciling contigs
- Step 4: Multiple sequence alignment
- Step 5: Partitioning reads
- Step 6: Generating a consensus
- Step 7: Polishing after Trycycler
- Illustrated pipeline overview
- Demo datasets
- Implementation details
- FAQ and miscellaneous tips
- Other pages
- Guide to bacterial genome assembly (choose your own adventure)
- Accuracy vs depth