Skip to content

Commit

Permalink
Update quick-start.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
skoren committed Mar 2, 2016
1 parent 2ccebfc commit e53b0ed
Showing 1 changed file with 28 additions and 15 deletions.
43 changes: 28 additions & 15 deletions documentation/source/quick-start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,19 +65,32 @@ Output files are described in the next section.
Find the Output
~~~~~~~~~~~~~~~~~~~~~~

Outputs from the assembly tasks are in:

- ecoli*/ecoli.layout
- ecoli*/ecoli.gfa
- ecoli*/ecoli.contigs.fasta
- ecoli*/ecoli.bubbles.fasta
- ecoli*/ecoli.unassembled.fasta

The canu progress chatter records statistics such as an input read histogram, corrected read histogram, and overlap types. The layout provides information on where each read ended up in the final assembly. The `GFA <http://lh3.github.io/2014/07/19/a-proposal-of-the-grapical-fragment-assembly-format/>`_ is the assembly graph generated by Canu. The fasta output is split into three types:

- contigs: everything which could be assembled and is part of the primary assembly. This includes both unique and repetitive elements
- bubbles: alternate paths in the graph which could not be merged into the primary assembly.
- unassembled: reads which could not be incorporated into the primary or bubble assemblies.
The canu progress chatter records statistics such as an input read histogram, corrected read histogram, and overlap types. Outputs from the assembly tasks are in:

ecoli*/ecoli.layout
The layout provides information on where each read ended up in the final assembly, including contig and positions. It also includes the consensus sequence for each contig.

ecoli*/ecoli.gfa
The `GFA <http://lh3.github.io/2014/07/19/a-proposal-of-the-grapical-fragment-assembly-format/>`_ is the assembly graph generated by Canu. Currently this includes the contigs, associated bubbles, and any overlaps which were not used by the assembly.

The fasta output is split into three types:

ecoli*/asm.contigs.fasta
everything which could be assembled and is part of the primary assembly. This includes both unique and repetitive elements. Each contig has several flags included on the fasta def line. These include:
=============== ====== ==========
tag values definition
=============== ====== ==========
len int length in bp
reads int number of sequences comprising the contig
suggestRepeat yes/no whether the contig is a repetitive element or unique
suggestCircular yes/no currently unused
=============== ====== ==========

ecoli*/asm.bubbles.fasta
alternate paths in the graph which could not be merged into the primary assembly.

ecoli*/asm.unassembled.fasta
reads which could not be incorporated into the primary or bubble assemblies.


Correct, Trim and Assemble, Manually
Expand Down Expand Up @@ -185,6 +198,7 @@ After the run completes, we can check the assembly statistics::
tgStoreDump -sizes -s 12100000 -T yeast/unitigging/asm.tigStore 2 -G yeast/unitigging/asm.gkpStore

::

lenSuggestRepeat sum 1135952 (genomeSize 12100000)
lenSuggestRepeat num 159
lenSuggestRepeat ave 7144
Expand Down Expand Up @@ -232,8 +246,7 @@ Changes
Known Issues
-------------------

- Bogart (unitigger) has false positives in repeat breaking. Currently, the workaround is to increase the minimum overlap size to avoid detecting false repeats ca
used by short overlaps. Canu will automatically do this for large (>100MB) genomes.
- Bogart (unitigger) has false positives in repeat breaking. Currently, the workaround is to increase the minimum overlap size to avoid detecting false repeats caused by short overlaps. Canu will automatically do this for large (>100MB) genomes.
- LSF support has limited testing
- Large memory usage while unitig consensus calling on unitigs over 100MB in size (140Mb contig uses approximate 75GB).
- Distributed file systems (such as GPFS) causes issues with memory mapped files, slowing down parts of Canu, including meryl (0-mercounts) and falcon-sense (2-correction).

0 comments on commit e53b0ed

Please sign in to comment.