Last updated: February 3rd 2014
We are happy to announce version 1.5rc1, a.k.a. Praline Brownie
- NEWS
- MetAMOS single file binary
- SUMMARY
- HARDWARE REQUIREMENTS
- 32GB of RAM
- SOFTWARE REQUIREMENTS
- git
- gcc
- automake
- python-tools
- python-devel
- zlib-devel
- numpy
- freetype, freetype-devel
- libpng-devel
- matplotlib
- curl
- INSTALLING METAMOS
- python INSTALL.py core
- QUICK START
- initPipeline -q -1 fastq.1 -2 fastq.2 -d projectDir -W core
- runPipeline -d projectDir -p 16
- WORKFLOWS
- core
- imetamos
- GENERIC TOOL
- TEST SUITE
- EXAMPLE OUTPUT
- CONTACT
- CITE
- PMID: 23320958
- ISSUES
- HARDWARE REQUIREMENTS
- FIN
1. iMetAMOS now available.
2. Updated 64-bit frozen Linux binary now available (fixed FastQC issue)
3. Generic framework for adding assemblers/classifiers
4. SRA run identifiers supported.
5. Remote input files, as well as compressed input files are supported.
6. Numerous bug fixes
*on deck: kmher, and Viritas
In attempt to further simplify the MetAMOS installation process, we are happy to announce the availability of a 'frozen' MetAMOS binary for Linux-x68_64 platforms. Along with this binary comes a significantly reduced list of prerequisites:
- Java 1.6 (or newer)
- Perl 5.8.8 (or newer)
- 64-bit *nix OS or Mac OSX 10.7+ (you may need to install MacPorts for full functionality)
First, select your flavor (DBs below are required but provided separately):
Linux 64-bit: ftp://ftp.cbcb.umd.edu/pub/data/treangen/MA_fb_v1.5rc1_linux.tar.gz
OSX 64-bit: ftp://ftp.cbcb.umd.edu/pub/data/treangen/MA_fb_v1.5rc1_OSX.tar.gz
Then add the toppings:
ALL DBS: ftp://ftp.cbcb.umd.edu/pub/data/treangen/allDBs.tar.gz
LIGHT DBS: ftp://ftp.cbcb.umd.edu/pub/data/treangen/minDBs.tar.gz
Finally, run a quick test:
./run_test.sh
This will take a moment to extract. And you will get a "No DBs found ERROR!" if you do not download any DBs. We recommend you try the Light version first, and if you need the extended DBs (for BLAST, FCP, QUAST, etc) grab the "ALL DBS" tarball. This should be a download once and only once operation. In addition, your existing DBs could work (assuming they are same format, etc). Further details on the expected DBs on the readthedocs page.
**Note: please use caution! this binaries eat up disk space quickly. Please ensure you have ample free space (100GB+) before download & use.
A) HARDWARE REQUIREMENTS
B) SOFTWARE REQUIREMENTS
C) INSTALLING MetAMOS
D) QUICK START
E) WORKFLOWS
F) GENERIC TOOL
G) TEST SUITE
H) EXAMPLE OUTPUT
I) CONTACT
J) CITE
K) ISSUES
MetAMOS was designed to work on any standard 64bit Linux environment. To use MetAMOS for tutorial/teaching purposes, a minimum of 8 GB RAM is required. To get started on real data sets a minimum of 32 GB of RAM is recommended, and anywhere from 64-1000 GB may be necessary for larger datasets. In our experience, for most 50-100 million read datasets, 64 GB is a good place to start (128 GB of memory now available on High Memory Instance at Amazon Elastic Compute Cloud ).
If you are using the frozen binary, you can skip this section. The MetAMOS frozen binary includes dependencies and only requires perl 5.8+ and java 1.6+.
The main prerequisite software for installing/running MetAMOS is python 2.6+, perl 5.8+, and java 1.6+. Depending on your platform/Linux distribution, you might also need to download and install the following BEFORE running INSTALL.py:
- git
- gcc
- automake
- python-tools
- python-devel
- zlib-devel
- numpy
- freetype, freetype-devel
- libpng-devel
- matplotlib
- curl
- R
Additional software will be downloaded by MetAMOS as needed. Additionally, there is some software that MetAMOS can incorporate into its pipeline that we are not allowed to distribute, such as MetaGeneMark and Newbler. To get a license to use MetaGeneMark, please visit: http://exon.gatech.edu/license_download.cgi. Once the tool is installed, add it to your PATH variable and MetAMOS will then enable its use in the pipeline.
To download the software release package, go here. You can also browse the repository and click on Downloads. Once downloaded, simply unpack the files and open the MetAMOS directory. Once inside the MetAMOS directory, run:
python INSTALL.py
This will download and install the external dependencies which may take minutes or hours to download depending on your connection speed. metAMOS supports workflows to install subsets of tools for faster installation. By default only the core dependencies are installed. To install iMetAMOS run
python INSTALL.py iMetAMOS
You can run:
python INSTALL.py -h
to get a listing of available workflows and programs. You can specify either workflows or programs as arguments to INSTALL.py. For example, to install the core workflow plus PhyloSift, run
python INSTALL.py core phylosift
To install the programs which are part of the optional workflow run
python INSTALL.py optional
If all dependencies are downloaded (including optional/deprecated ones), this will take quite awhile to complete (dependending on internet connection & server, anywhere from a few hours to 1-2 days).
Before you get started using MetAMOS/iMetAMOS a brief review of its design will help clarify its intended use. MetAMOS gas two main components:
- initPipeline
- runPipeline
Below is a simple example of running of iMetAMOS to assemble an SRA dataset:
initPipeline -q -1 SRR987657 -d projectDir -W iMetAMOS
runPipeline -d projectDir -p 16
The first component, initPipeline, is for creating new projects and also initializing sequence libraries. Currently interleaved & non-interleaved fasta, fastq, and SFF files are supported. Input files can be compressed (bzip2, gzip) and can reside on remote servers (in this case the full URL must be specified). SRA run identifiers are also supported.
The file-type flags (-f, -q, and -s) must be specified before the file. Once specified, they remain in effect until a different file type is specified.
usage: initPipeline -f/-q -1 file.fastq.1 -2 file.fastq.2 -d projectDir -i 300:500
options: -s -c -q, -f, -1, -2, -d, -m, -i
- -1: either non-paired file of reads or first file in pair, can be list of multiple separated by a comma
- -2: second paired read file, can be list of multiple separated by a comma
- -c: fasta file containing contigs
- -d: output project directory (required)
- -f: boolean, reads are in fasta format (default is fastq)
- -h: display help message
- -i: insert size of library, can be list separated by commas for multiple libraries
- -l: SFF linker type
- -m: interleaved file of paired reads
- -o: reads are in outtie orientation (default innie)
- -q: boolean, reads are in fastq format (default is fastq)
- -s/--sff: boolean, reads are in SFF format (default is fastq)
- -W: string: workflow name (-W iMetAMOS will run iMetAMOS). A workflow can specify parameters as well as data. A workflow can be immutable in which case any command-line parameters will not be used. Otherwise, command-line parameters take priority over workflow defaults.
For example, to input a:
(non-interleaved fastq, single library)
initPipeline -q -1 file.fastq.1 -2 file.fastq.2 -d projectDir -i 300:500
(non-interleaved fasta, single library)
initPipeline -f -1 file.fastq.1 -2 file.fastq.2 -d projectDir -i 300:500
(interleaved fastq, single library)
initPipeline -q -m file.fastq.12 -d projectDir -i 300:500
(interleaved fastq, multiple libraries)
initPipeline -q -m file.fastq.12,file2.fastq.12 -d projectDir -i 300:500,1000:2000
(interleaved fastq, multiple libraries, existing assembly)
initPipeline -q -m file.fastq.12,file2.fastq.12 -c file.contig.fa -d projectDir -i 300:500,1000:2000
(non-interleaved remote fastq, single library)
initPipeline -q -1 ftp://ftp.cbcb.umd.edu/pub/data/metamos/gage-b-rb.miseq.1.fastq.gz -2 ftp://ftp.cbcb.umd.edu/pub/data/metamos/gage-b-rb.miseq.2.fastq.gz -d projectDir -i 300:500
(unpaired SRA run using iMetAMOS)
initPipeline 1 <SRA RUN ID> -d projectDir -W iMetAMOS
(paired-end SRA run using iMetAMOS)
initPipeline -m <SRA RUN ID> -d projectDir -i 300:500 -W imetAMOS
The second component, runPipeline, takes a project directory as input and runs the following steps by default:
- Preprocess
- Assemble
- FindORFs
- Validate
- FindRepeats
- Abundance
- Annotate
- FunctionalAnnotation
- Scaffold
- Propagate
- FindScaffoldORFs
- Classify
- Postprocess
usage info:
usage: runPipeline [options] -d projectdir
- -h = : print help [this message]
- -j = : just output all of the programs and citations then exit (default = NO)
- -v = : verbose output? (default = NO)
- -d = : directory created by initPipeline (REQUIRED)
[options]: [pipeline_opts] [misc_opts]
[pipeline_opts]: options that affect the pipeline execution
Pipeline consists of the following steps:
Preprocess, Assemble, FindORFS, MapReads, Validate, Abundance, Annotate, Scaffold, Propagate, Classify, Postprocess
Each of these steps can be referred to by the following options:
- -f = : force this step to be run (default = NONE)
- -s = : start at this step in the pipeline (default = Preprocess)
- -e = : end at this step in the pipeline (default = Postprocess)
- -n = : step to skip in pipeline (default=NONE)
For each step you can fine-tune the execution as follows
[Preprocess]
- -t = : enable filter of input reads (default = metAMOS, options = metAMOS, EA-UTILS, PBcR for PacBio sequences)
- -q = : produce FastQC quality report for reads with quality information (fastq or sff)? (default = NO)
[Assemble]
- -a = : genome assembler to use (default = SOAPdenovo). This can also be a comma-separated list of assembler (for example: soap,velvet) in this case, all selected assemblers will be run and the best selected for subsequent analysis
- -k = : k-mer size to be used for assembly (default = auto-selected).
-
This can also be a comma-separated list of kmers to use
- -o = : min overlap length
[MapReads]
- -m = : read mapper to use? (default = bowtie)
- -i = : save bowtie (i)ndex (default = NO)
- -b = : create library specific per bp coverage of assembled contigs (default = NO)
[FindORFS]
- -g = : gene caller to use (default=FragGeneScan)
- -l = : min contig length to use for ORF call (default = 300)
- -x = : min contig coverage to use for ORF call (default = 3X)
[Validate]
- -X = : comma-separated list of validators to run on the assembly. (default = lap, supported = reapr,orf,lap,ale,quast,frcbam,freebayes,cgal,n50)
- -S = : comma-separated list of scores to use to select the winning assembly. By default, all validation tools specified by -X will be run. For each score, an optional weight can be specified as SCORE:WEIGHT. For example, LAP:1,CGAL:2 (supported = all,lap,ale,cgal,snp,frcbam,orf,reapr,n50)
[Annotate]
- -c = : classifier to use for annotation (default = FCP)
- -u = : annotate unassembled reads (default = NO)
[Classify]
- -z = : taxonomic level to categorize at (default = class)
[misc_opts]: Miscellaneous options
- -r = : retain the AMOS bank (default = NO)
- -p = : number of threads to use (be greedy!) (default=1)
- -4 = : 454 data (default = NO)
For example, to enable read filtering:
-t
and to enable meta-IDBA as the assembler:
-a metaidba
And to use PhyloSift to annotate:
-c phylosift
Any single step in the pipeline can be skipped by passing the following parameter to runPipeline:
-n,--skipsteps=Step1,..
MetAMOS reruns steps based on timestamp information, so if the input files for a step in the pipeline hasn't changed since the last run, it will be skipped automatically. However, you can forcefully run any step in the pipeline by passing the following parameter to runPipeline:
-f,--force=Step1,..
MetAMOS stores a summary of the input libraries in pipeline.ini in the working directory. The pipeline.conf file stores the list of programs available to MetAMOS. Finally, pipeline.run stores the selected parameters and programs for the current run. MetAMOS also stores detailed logs of all commands executed by the pipeline in Logs/COMMANDS.log and a log for each step of the pipeline in Logs/.log
Upon completion, all of the final results will be stored in the Postprocess/out directory. A component, create_summary.py, takes this directory as input and as output, generates an HTML page with with summary statistics and a few plots. An optional component, create_plots.py, takes one or multiple Postprocess/out directories as input and generates comparative plots.
A workflow is a text-file that specified command-line options and input sequences required to run metAMOS. A workflow may optionally inherit options/data from other workflows. A workflow may also be immutable if the parameters should not be modifiable by a user. An example workflow is below:
inherit:isolate
modify:True
command: -q -u -r -v -I -c kraken -p 16 -a spades,velvet-sc,abyss,ray,edena,sga,masurca,soap,soap2,velvet -t metamos -n FunctionalAnnotation -f Postprocess -z phylum
asmcontigs: /Users/skoren/Personal/Research/metAMOS/Test/test.asm,ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Candidatus_Carsonella_ruddii_uid58773/NC_008512.fna
lib1format: fasta
lib1mated: True
lib1innie: True
lib1interleaved: True
lib1f1: /Users/skoren/Personal/Research/metAMOS/Test/carsonella_pe_filt.fna.gz,2000,5000,3500,500
The available options are:
- inherit - any other workflows to inherit from. In this case, the workflow inherits options from the isolate workflow
- modify - whether users are allowed to specify command-line parameters at runtime. If false, command-line options are ignored
- command - command-line options to specify for runPipeline
- asmcontigs - optional, pre-assembled contigs to include in analysis. Can be remote file. Multiple files can be separated using commas.
- lib#format - input type for lib #. Can be fasta/fastq/sff
- lib#mated - whether the library is mated or not
- lib#innie - whether the mates are in the innie (Illumina paired-end) format or not (Illumina mate-pair)
- lib#interleaved - whether the input sequences are in a single file or in two separate files
- lib#f1 - the name of the input file, along with library min, max, mean, stdev
An arbitrary number of libraries may be specified in the above format. The below example shows an unmated library:
lib1format: fasta
lib1mated: False
lib1innie: False
lib1interleaved: False
lib1frg: /Users/skoren/Personal/Research/metAMOS/Test/carsonella_pe_filt.fna.gz
as well as a non-interleaved library:
lib1format: fasta
lib1mated: True
lib1innie: True
lib1interleaved: False
lib1f1: /Users/skoren/Personal/Research/metAMOS/Test/carsonella_pe_1.fna.gz,2000,5000,3500,500
lib1f2: /Users/skoren/Personal/Research/metAMOS/Test/carsonella_pe.2.fna.gz,2000,5000,3500,500
Workflows may be shared between users, as long as the input files are accessible (i.e. they are on a remote server or the systems share a file system). Workflow files should be placed in the metAMOS/workflows directory or the working directory where MetAMOS is launched.
MetAMOS allows new tools to be added to the ASSEMBLE and ANNOTATE steps without requiring code changes. The addition of a tool is a three-step process.
1) Add the tool name under metAMOS/Utilities/<STEPNAME>.generic. For example. if you want to add a new assembler, you would modify ASSEMBLE.generic. This file contains one tool name per line. The tool name is arbitrary text and will be used by MetAMOS to look up detailed configuration. The current ASSEMBLE.generic looks like:
% cat Utilities/config/ASSEMBLE.generic
abyss
sga
spades
ray
masurca
mira
edena
idba-ud
You can add multiple versions of an assembler. In this documentation, we will add SOAPdenovo v1.05 in addition to the above tools. First, we will add soap_v105 to the end of ASSEMBLE.generic:
% cat Utilities/config/ASSEMBLE.generic
abyss
sga
spades
ray
masurca
mira
edena
idba-ud
soap_v105
2) Write a configuration file for the tool. The configuration file specifies input requirements for the program as well as a name, output, and executable location. Within configuration files, several keywords may be specified that are updated at runtime. The list of currently supported keywords can be found at the end of this section. In the above example, MetAMOS would expect a file named soap_v105.spec.
Below is an example configuration file used for Ray:
% cat Utilities/config/ray.spec
[CONFIG]
maxlibs 1
input FASTQ
name Ray
output [PREFIX]_ray/Contigs.fasta
scaffoldOutput [PREFIX]_ray/Scaffolds.fasta
location cpp/[MACHINE]/Ray/bin
threads -n
paired_interleaved -i [FIRST]
paired -p [FIRST] [SECOND]
commands rm -rf [RUNDIR]/ray && \
[MPI] [THREADS] Ray \
-o [RUNDIR]/[PREFIX]_ray [INPUT]
unpaired -s [FIRST]
[Ray]
k [KMER]
The [CONFIG] section is the generic configuration section, you can specify step-specific configuration later on. Here, most properties of where the tool is located, what its output is, and what input it requires is specified:
- input - the type of input (FASTQ in this case)
- name - the full name of the tool you want to report later on. This can be arbitrary text.
- output - where the output contigs from the tool are. For assemblers, this is contigs. [PREFIX] is a keyword for the MetAMOS prefix for the assembly when it is run. This is assumed to be relative to the MetAMOS run directory.
- scaffoldOutput - where the output scaffolds from the tool are, if available.
- backupOutput - some assemblers fail to generate their final output on some datasets. In this case, this can specify preliminary contig output which will only be used if the main output is not available.
- location - path to the executable. This is relative to metAMOS/Utilities. You can specify [MACHINE] to substitute your machine type into the executable path (i.e. Linux-x86_64). The user path will be searched if the tool is not found in the specified location
- threads - the parameter to pass number of threads to use for the program, if available
- paired - how to pass paired-end (assumed innie) interleaved data (FIRST refers to left mates, SECOND to right)
- paired_interleaved - how to pass paired-end (assumed innie) non-interleaved. FIRST refers to the interleaved file.
- mated - how to pass mate-pair data (assumed outtie) non-interleaved data (FIRST refers to left mates, SECOND to right)
- mated_interleaved - how to pass mate-pair data (assumed outtie) interleaved mates
- unpaired - how to pass fragment data to the program. FIRST refers to the unmated file.
- commands - an arbitrary list of commands to run to execute the tool. Multiple lines are supported with the \ character. Multiple commands can be specified using &&. In the above example, rm -rf will run first followed by Ray. Common useful keywords are:
- [PREFIX] - the prefix to use for output
- [RUNDIR] where the program is running
- [KMER] - the selected k-mer to use for assembly
- [MEM] - available memory
- [THREADS] - the threads parameter and number of threads requested by the user
- [INPUT] - the formatted input based on the libraries provided to metAMOS
The [Ray] section is a step-specific configuration. This is based on the executable names used in commands above. By default the parameters will be passed with prefixed - so here Ray will be run with -k [KMER]
Some assemblers (SOAPdenovo, MaSuRCA, etc) require an input configuration file rather than taking parameters on the command line. In this case, we need both a spec and template file (soap_v105.spec and soap_v105.template) which will get updated at runtime and passed to the assembler. The [CONFIG] section then includes a config option which specifies the template and the keyword [INPUT] will pass the configuration file rather than library information.
Below is an example spec file for SOAPdenovo that requires a template and spec file:
% cat Utilities/config/soap_v105.spec
[CONFIG]
input FASTQ
name soap_v105
threads -p
output [PREFIX]/[PREFIX].asm.contig
location cpp/[MACHINE]/SOAPdenovo_1.05/
scaffoldOutput [PREFIX]/[PREFIX].asm.scafSeq
config config/soap_v105.template
mated rank=[LIB]\navg_ins=[MEAN]\nreverse_seq=1\nasm_flags=2\nq1=[FIRST]\nq2=[SECOND]
paired rank=[LIB]\navg_ins=[MEAN]\nreverse_seq=0\nasm_flags=3\nq1=[FIRST]\nq2=[SECOND]
unpaired rank=[LIB]\navg_ins=0\nq=[FIRST]
commands rm -rf [PREFIX] && \
mkdir [PREFIX] && \
SOAPdenovo all -s [INPUT] -o [PREFIX]/[PREFIX].asm -K [KMER] [THREADS]
% cat Utilities/config/soap_v105.template
#maximal read length
max_rd_len=150
[LIB]
[INPUT]
Here, the config template is specified (again relative to metAMOS/Utilities) and the [INPUT] keyword will be replaced by the library information at run time.
Citations are tab-delimited and specify the lower-case tool alias, full tool-name, and citation information. For example:
soap_v105 SOAPdenovo v1.05 Li Y, Hu Y, Bolund L, Wang J: State of the art de novo assembly of human genomes from massively parallel sequencing data.Human genomics 2010, 4:271-277.
The citation will be automatically printed by MetAMOS whenever a run uses the specified tool.
4) For ANNOTATE tools, we also need a way to convert the output to Krona. By default, MetAMOS will look for an Import<toolName>.pl script. If one is not found, it will rely on a generic import which will assumed a tab-delimited format:
contig/readID NCBI Taxonomy ID
The currently supported list of keywords:
- MEM - max memory limit
- LIB - library identifier (i.e. 1, 2, 3, etc)
- INPUT - replace with input to the program (a collection of input files or libraries depending on the step or a configuration file)
- MACHINE - replaced with Linux-x86_64, Darwin-x86_64, etc
- FIRST - replaced with left mates in mated read or interleaved or unpaired reads otherwise
- SECOND - replaced with right mates, in paired non-interleaved libs
- ORIENTATION - replaced with the word innie or outtie
- ORIENTATION_FIGURE - replaced with ---> <--- or <--- ---> for pe and mp, respectively
- MEAN - replaced with library mean
- SD - replaced with library standard dev
- MIN - replaced with library min
- MAX - replaced with library max
- THREADS - replaced with thread parameter specified and requested number of threads
- KMER - the kmer requested
- OFFSET - the phred offset (33/64) of the input files
- PREFIX - the desired prefix for the program output
- DB - the location of the MetAMOS DBs (i.e. Utilities/DB)
- RUNDIR - the location where the program is running (i.e. MetAMOS run directory)
- LOCATION - the location where the program executable lives
- TECHNOLOGY - the type of sequencing data (454, Illumina, etc)
We have developed a set of scripts for testing the various features of MetAMOS. All of these regression test scripts are available inside the /Test directory and include all necessary datasets to run them. Here is a brief listing of the test scripts we currently include:
*Test initPipeline ./Test/test_create.sh
*Vanilla test ./Test/run_test.sh
*Test PhlyoSift ./Test/test_amphora.sh
*Test Minimus ./Test/test_minimus.sh
*Test Preprocess filtration of non-interleaved fastq files ./Test/test_filter_noninterleaved_fastq.sh
*Test iMetAMOS ./Test/test_ima.sh
*Test SRA download ./Test/test_sra.sh
*Test Newbler (if available) ./Test/test_newbler.sh
*Test CA (fasta) ./Test/test_ca_fasta.sh
*Test CA (fastq) ./Test/test_ca.sh
*Test SOAPdenovo ./Test/test_soap.sh
*Test MetaVelvet ./Test/test_metavelvet.sh
*Test SparseAssembler ./Test/test_sparse.sh
*Test Velvet ./Test/test_velvet.sh
*Test FCP ./Test/test_fcp.sh
*Test Spades ./Test/test_spades.sh
*Test BLAST ./Test/test_blast.sh
MetAMOS generates an interactive web page once a run successfully completes: http://www.cbcb.umd.edu/~sergek/imetamos/gageb/Postprocess/out/html/summary.html
This includes summary statistics and taxonomic information based on Krona [1]. The easiest way to interact with the results is through the web interface. The Postprocess/out directory contains the results of the analysis. By default, metAMOS uses the prefix "proba" (Galician for test). Thus, files will have the name "proba".*.
-
abundance.krona.html
Krona [1] plot of abundances using the tool selected for abundance (MetaPhyler [2] by default)
-
annotate.krona.html
Krona [1] plot of abundances using the tool selected for classification (Kraken [3] by default)
-
asm.scores
Validation scores for each assembly/kmer combination run. Header contains information on scores generated
-
best.asm
The name of the assembly/kmer combination that was selected as the best
-
.classified
Subdirectory containing each level of the selected taxonomy (class by default) and the contigs/reads/orfs belonging to each
-
.original.annots
Tab-delimited taxonomic level assignments for each contig/unassembled read. Class IDs correspond to NCBI taxonomy IDs.
-
.original.reads.annots
Tab-delimited taxonomic level assignments as above, where contigs are replaced with their constituent sequences.
-
.propagated.annots
Tab-delimited file as above after assembly graph-based propagation of assignments to contigs.
-
.propagated.reads.annots
Tab-delimited file as above after propagation and having contigs replaced with their constituent reads.
-
html
HTML output from the pipeline. summary.html contains an interactive results view.
-
proba.bnk
AMOS bank format of the assembly that can be visualized using Hawkeye.
-
proba.classify.txt
The raw output of the abundances using the tool selected for abundance estimations (MetaPhyler [2] by default)
-
proba.ctg.cnt
The number of sequences mapped to each assembly contig
-
proba.ctg.cvg
The coverage of each assembly contig
-
proba.ctg.fa
The assembled contigs
-
proba.hits
The raw output of the contig/unassembled reads classifications using the selected tool (Kraken [3]) by default.
-
proba.lib1.contig.reads
The per-library assignment of sequences to contigs
-
proba.lib1.unaligned.fasta
The per-library unassembled sequences
-
proba.scf.fa
The assembled scaffolds
-
proba.motifs.fa
The motifs within scaffolds identified by Bambus 2
-
proba.orf.faa
The protein sequences of identified open reading frames (ORFs) in the assembly and unassembled reads
-
proba.orf.fna
The fasta sequences of identified open reading frames (ORFs) in the assembly and unassembled reads
-
proba.scf.orf.faa
The protein sequences of identified open reading frames (ORFs) in the scaffolds
-
proba.scf.orf.fna
The protein sequences of identified open reading frames (ORFs) in the scaffolds
-
ref.fasta
The recruited reference genome used for validation (iMetAMOS only)
-
ref.name
The name of the recruited reference genome (iMetAMOS only)
Additional details for each step are available under /out. This includes the raw output (as well as any intermediate files) of any tools run during that step. For example, Annotate/out/proba.prokka includes the full Prokka annotation output. Assemble/out/abyss*/ contains the intermediate files output by ABySS. Additionally, since MetAMOS stores all of its results in an AMOS bank, the assemblies can be visualized with Hawkeye.
[1] Ondov BD, Bergman NH, Phillippy AM.. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385. PMID: 21961884
[2] Liu B, Gibbons T, Ghodsi M, Treangen T, Pop M. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics. 2011;12 Suppl 2:S4. Epub 2011 Jul 27.
[3] Wood DE, Salzberg SL. Rapid phylogenetic sequence classification through repeated exact alignment. In preparation.
If you encounter any problems/bugs, please check the known issues pages: https://github.com/treangen/MetAMOS/issues?direction=desc&sort=created&state=open to see if it has already been documented.
If not, please report the issue either using the contact information below or by submitting a new issue online. Please include information on your run, any output produced by runPipeline, as well as the pipeline.* files and the Log/<LAST_STEP> file (if not too large).
Who to contact to report bugs, forward complaints, feature requests:
Todd Treangen: [email protected] Sergey Koren: [email protected]
Treangen TJ*, Koren S*, Sommer DD, Liu B, Astrovskaya I, Ondov B, Darling AE, Phillippy AM, Pop M. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol. 2013 Jan 15;14(1):R2. PMID: 23320958.
url: http://genomebiology.com/content/pdf/gb-2013-14-1-r2.pdf
*Indicates both authors contributed equally to this work
Here is a link to known, open issues.