Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
sgblanch committed Aug 28, 2017
2 parents 6bae842 + e234bb3 commit 37a2b48
Show file tree
Hide file tree
Showing 42 changed files with 3,066 additions and 531 deletions.
1 change: 1 addition & 0 deletions addCopyrights-BuildData.pl
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
$stoppingCommits{"1ef335952342ef06ad1651a888f09c312f54dab8"} = 1; # 18 MAY 2016
$stoppingCommits{"bbbdcd063560e5f86006ee6b8b96d2d7b80bb750"} = 1; # 21 NOV 2016
$stoppingCommits{"64459fe33f97f6d23fe036ba1395743d0cdd03e4"} = 1; # 17 APR 2017
$stoppingCommits{"9e9bd674b705f89817b07ff30067210c2d180f42"} = 1; # 14 AUG 2017

open(F, "< logs") or die "Failed to open 'logs': $!\n";

Expand Down
1,029 changes: 1,029 additions & 0 deletions addCopyrights.dat

Large diffs are not rendered by default.

30 changes: 26 additions & 4 deletions addCopyrights.pl
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,36 @@

use strict;

my @dateStrings = ( "???", "JAN", "FEB", "MAR", "APR", "MAY", "JUN", "JUL", "AUG", "SEP", "OCT", "NOV", "DEC" );
# To run this:
#
# Update the copyright data file by appending info on new commits:
# perl addCopyrights-BuildData.pl >> addCopyrights.dat
#
# Update copyright on each file, writing to new files:
# perl addCopyrights.pl -test
#
# Update copyright on specific files by listing them at then end:
# perl addCopyrights.pl -test src/bogart/bogart.C
#
# All files get rewritten, even if there are no changes. If not running in 'test' mode
# you can use git to see what changes, and to verify they look sane.
#
# Once source files are updated, update addCopyright-BuildData.pl with the last
# commit hash and commit those changes (both the dat and pl).
#

#
# If set, rename original files to name.ORIG, rewrite files with updated copyright text.
# If not, create new name.MODIFIED files with updated copyright text.
#

my $doForReal = 1;

if ($ARGV[0] eq "-test") {
shift @ARGV;
$doForReal = 0;
}

#
# The change data 'addCopyrights.dat' contains lines of two types:
#
Expand All @@ -26,7 +49,6 @@
# of the original name need to be updated to the new name.
#


sub toList (@) {
my @all = sort { $a <=> $b } @_;
my $ret;
Expand Down Expand Up @@ -76,6 +98,8 @@ ($@)
my @AC = @_;
my @AClist;

my @dateStrings = ( "???", "JAN", "FEB", "MAR", "APR", "MAY", "JUN", "JUL", "AUG", "SEP", "OCT", "NOV", "DEC" );

my %dates;

foreach my $ac (@AC) {
Expand Down Expand Up @@ -405,8 +429,6 @@ ($@)
if ($doForReal) {
my $perms = `stat -f %p $file`; chomp $perms; $perms = substr($perms, -3);

#rename "$file", "$file.ORIG";

open(F, "> $file") or die "Failed to open '$file' for writing: $!\n";
print F @lines;
close(F);
Expand Down
4 changes: 2 additions & 2 deletions documentation/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,9 @@
# built documents.
#
# The short X.Y version.
version = '1.5'
version = '1.6'
# The full version, including alpha/beta/rc tags.
release = '1.5'
release = '1.6'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
30 changes: 20 additions & 10 deletions documentation/source/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,29 +54,39 @@ What parameters should I use for my reads?

**Nanopore R7 1D** and **Low Identity Reads**
With R7 1D sequencing data, and generally for any raw reads lower than 80% identity, five to
ten rounds of error correction are helpful. To run just the correction phase, use options
``-correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high``. Use the output of
the previous run (in ``asm.correctedReads.fasta.gz``) as input to the next round.
ten rounds of error correction are helpful::

Once corrected, assemble with ``-nanopore-corrected <your data> correctedErrorRate=0.3 utgGraphDeviation=50``
canu -p r1 -d r1 -correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high -nanopore-raw your_reads.fasta
canu -p r2 -d r2 -correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high -nanopore-raw r1/r1.correctedReads.fasta.gz
canu -p r3 -d r3 -correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high -nanopore-raw r2/r2.correctedReads.fasta.gz
canu -p r4 -d r4 -correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high -nanopore-raw r3/r3.correctedReads.fasta.gz
canu -p r5 -d r5 -correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high -nanopore-raw r4/r4.correctedReads.fasta.gz

Then assemble the output of the last round, allowing up to 30% difference in overlaps::

canu -p asm -d asm correctedErrorRate=0.3 utgGraphDeviation=50 -nanopore-corrected r5/r5.correctedReads.fasta.gz

**Nanopore R7 2D** and **Nanopore R9 1D**
Increase the maximum allowed difference in overlaps from the default of 4.5% to 7.5% with
``correctedErrorRate=0.075``
Increase the maximum allowed difference in overlaps from the default of 14.4% to 22.5% with
``correctedErrorRate=0.225``

**Nanopore R9 2D** and **PacBio P6**
Slightly decrease the maximum allowed difference in overlaps from the default of 4.5% to 4.0%
with ``correctedErrorRate=0.040``
Slightly decrease the maximum allowed difference in overlaps from the default of 14.4% to 12.0%
with ``correctedErrorRate=0.120``

**Early PacBio Sequel**
Based on exactly one publically released *A. thaliana* `dataset
<http://www.pacb.com/blog/sequel-system-data-release-arabidopsis-dataset-genome-assembly/>`_,
slightly decrease the maximum allowed difference from the default of 4.5% to 4.0% with
``correctedErrorRate=0.040 corMhapSensitivity=normal``. For recent Sequel data, the defaults
are appropriate.
seem to be appropriate.

**Nanopore R9 large genomes**
Due to some systematic errors, the identity estimate used by Canu for correction can be an over-estimate of true error, inflating runtime. For recent large genomes (>1gbp) we've used ``'corMhapOptions=--threshold 0.8 --num-hashes 512 --ordered-sketch-size 1000 --ordered-kmer-size 14'``. This can be used with 30x or more of coverage, below that the defaults are OK.
Due to some systematic errors, the identity estimate used by Canu for correction can be an
over-estimate of true error, inflating runtime. For recent large genomes (>1gbp) with more
than 30x coverage, we've used ``'corMhapOptions=--threshold 0.8 --num-hashes
512 --ordered-sketch-size 1000 --ordered-kmer-size 14'``. This is not needed for below 30x
coverage.


My assembly continuity is not good, how can I improve it?
Expand Down
40 changes: 20 additions & 20 deletions src/AS_UTL/AS_UTL_reverseComplement.C
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,10 @@
static
char
inv[256] = {
0, 0, 0, 0, 0, 0, 0, 0, // 0x00 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x08 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x10 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x18 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x00 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x08 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x10 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x18 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x20 - !"#$%&'
0, 0, 0, 0, 0, 0, 0, 0, // 0x28 - ()*+,-./
0, 0, 0, 0, 0, 0, 0, 0, // 0x30 - 01234567
Expand All @@ -53,22 +53,22 @@ inv[256] = {
0, 0, 0, 0, 0, 0, 0, 0, // 0x68 - hijklmno
0, 0, 0, 0,'a', 0, 0, 0, // 0x70 - pqrstuvw
0, 0, 0, 0, 0, 0, 0, 0, // 0x78 - xyz{|}~
0, 0, 0, 0, 0, 0, 0, 0, // 0x80 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x88 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x90 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x98 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xa0 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xa8 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xb0 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xb8 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xc0 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xc8 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xd0 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xd8 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xe0 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xe8 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xf0 -
0, 0, 0, 0, 0, 0, 0, 0 // 0xf8 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x80 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x88 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x90 -
0, 0, 0, 0, 0, 0, 0, 0, // 0x98 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xa0 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xa8 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xb0 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xb8 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xc0 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xc8 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xd0 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xd8 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xe0 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xe8 -
0, 0, 0, 0, 0, 0, 0, 0, // 0xf0 -
0, 0, 0, 0, 0, 0, 0, 0 // 0xf8 -
};


Expand Down
92 changes: 76 additions & 16 deletions src/AS_UTL/timeAndSize.C
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
* are Copyright 2014 Battelle National Biodefense Institute, and
* are subject to the BSD 3-Clause License
*
* Brian P. Walenz beginning on 2017-AUG-10
* are a 'United States Government Work', and
* are released in the public domain
*
* File 'README.licenses' in the root directory of this distribution contains
* full conditions and disclaimers for each license.
*/
Expand All @@ -31,8 +35,6 @@





double
getTime(void) {
struct timeval tp;
Expand All @@ -41,16 +43,78 @@ getTime(void) {
}


uint64
getProcessSizeCurrent(void) {
struct rusage ru;
uint64 sz = 0;

static
bool
getrusage(struct rusage &ru) {

errno = 0;

if (getrusage(RUSAGE_SELF, &ru) == -1) {
fprintf(stderr, "getProcessSizeCurrent()-- getrusage(RUSAGE_SELF, ...) failed: %s\n",
fprintf(stderr, "getrusage(RUSAGE_SELF, ...) failed: %s\n",
strerror(errno));
return(false);
}

return(true);
}



static
bool
getrlimit(struct rlimit &rl) {

errno = 0;

if (getrlimit(RLIMIT_DATA, &rl) == -1) {
fprintf(stderr, "getrlimit(RLIMIT_DATA, ...) failed: %s\n",
strerror(errno));
} else {
return(false);
}

return(true);
}



double
getCPUTime(void) {
struct rusage ru;
double tm = 0;

if (getrusage(ru) == true)
tm = ((ru.ru_utime.tv_sec + ru.ru_utime.tv_usec / 1000000.0) +
(ru.ru_stime.tv_sec + ru.ru_stime.tv_usec / 1000000.0));

return(tm);
}



double
getProcessTime(void) {
struct timeval tp;
static double st = 0.0;
double tm = 0;

if (gettimeofday(&tp, NULL) == 0)
tm = tp.tv_sec + tp.tv_usec / 100000.0;

if (st == 0.0)
st = tm;

return(tm - st);
}



uint64
getProcessSize(void) {
struct rusage ru;
uint64 sz = 0;

if (getrusage(ru) == true) {
sz = ru.ru_maxrss;
sz *= 1024;
}
Expand All @@ -59,18 +123,14 @@ getProcessSizeCurrent(void) {
}



uint64
getProcessSizeLimit(void) {
struct rlimit rlp;
struct rlimit rl;
uint64 sz = ~uint64ZERO;

errno = 0;
if (getrlimit(RLIMIT_DATA, &rlp) == -1) {
fprintf(stderr, "getProcessSizeLimit()-- getrlimit(RLIMIT_DATA, ...) failed: %s\n",
strerror(errno));
} else {
sz = rlp.rlim_cur;
}
if (getrlimit(rl) == true)
sz = rl.rlim_cur;

return(sz);
}
Expand Down
11 changes: 9 additions & 2 deletions src/AS_UTL/timeAndSize.H
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,20 @@
* are Copyright 2014 Battelle National Biodefense Institute, and
* are subject to the BSD 3-Clause License
*
* Brian P. Walenz beginning on 2017-AUG-10
* are a 'United States Government Work', and
* are released in the public domain
*
* File 'README.licenses' in the root directory of this distribution contains
* full conditions and disclaimers for each license.
*/

#include "AS_global.H"

double getTime(void);
double getTime(void);

double getCPUTime(void);
double getProcessTime(void);

uint64 getProcessSizeCurrent(void);
uint64 getProcessSize(void);
uint64 getProcessSizeLimit(void);
4 changes: 4 additions & 0 deletions src/AS_UTL/writeBuffer.H
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
* are a 'United States Government Work', and
* are released in the public domain
*
* Sergey Koren beginning on 2017-MAY-17
* are a 'United States Government Work', and
* are released in the public domain
*
* File 'README.licenses' in the root directory of this distribution contains
* full conditions and disclaimers for each license.
*/
Expand Down
6 changes: 6 additions & 0 deletions src/AS_global.C
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
#include "canu_version.H"

#include "AS_UTL_stackTrace.H"
#include "timeAndSize.H"

#ifdef X86_GCC_LINUX
#include <fpu_control.h>
Expand Down Expand Up @@ -105,6 +106,11 @@ AS_configure(int argc, char **argv) {
AS_UTL_installCrashCatcher(argv[0]);


// Set the start time.

getProcessTime();


//
// Et cetera.
//
Expand Down
6 changes: 3 additions & 3 deletions src/bogart/AS_BAT_DropDeadEnds.C
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
*
* Modifications by:
*
* Brian P. Walenz beginning on 2017-MAY-22
* Brian P. Walenz beginning on 2017-MAY-31
* are a 'United States Government Work', and
* are released in the public domain
*
Expand Down Expand Up @@ -232,7 +232,7 @@ dropDeadEnds(AssemblyGraph *AG,
continue;

// At least one read needs to be kicked out. Make new tigs for everything.

char fnMsg[80] = {0}; Unitig *fnTig = NULL;
char nnMsg[80] = {0}; Unitig *nnTig = NULL; int32 nnOff = INT32_MAX;
char lnMsg[80] = {0}; Unitig *lnTig = NULL;
Expand Down Expand Up @@ -263,7 +263,7 @@ dropDeadEnds(AssemblyGraph *AG,
for (uint32 cc=0, tt=0; tt<tig->ufpath.size(); tt++) {
ufNode &read = tig->ufpath[tt];

if (read.ident == fn) {
if (read.ident == fn) {
sprintf(fnMsg, "first read %9u to tig %7u --", read.ident, fnTig->id());
fnTig->addRead(read, -read.position.min(), false);

Expand Down
Loading

0 comments on commit 37a2b48

Please sign in to comment.