A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics), with a focus on ncov2019
WARNING - THIS REPO IS UNDER ACTIVE DEVELOPMENT AND ITS BEHAVIOUR MAY CHANGE AT ANY TIME.
PLEASE ENSURE THAT YOU READ BOTH THE README AND THE CONFIG FILE AND UNDERSTAND THE EFFECT OF THE OPTIONS ON YOUR DATA!
This Nextflow pipeline automates the ARTIC network nCoV-2019 novel coronavirus bioinformatics protocol. It is being developed to aid the harmonisation of the analysis of sequencing data generated by the COG-UK project. It will turn SARS-COV2 sequencing data (Illumina or Nanopore) into consensus sequences and provide other helpful outputs to assist the project's sequencing centres with submitting data.
$ nextflow run seqeralabs/ncov2019-artic-nf \
--illumina \
-process.queue={AWS_BATCH_QUEUE} \
-work-dir {AWS_S3_BUCKET}/work \
-profile batch
$ nextflow run seqeralabs/ncov2019-artic-nf \
--directory $baseDir/data/nanopore/ebov-flongle \
--nanopolish \
-process.queue={AWS_BATCH_QUEUE} \
-work-dir {AWS_S3_BUCKET} \
-profile batch
An up-to-date version of Nextflow is required because the pipeline is written in DSL2. Following the instructions at https://www.nextflow.io/ to download and install Nextflow should get you a recent-enough version.
This repo contains both Singularity and Dockerfiles. You can build the Singularity containers locally by running scripts/build_singularity_containers.sh
and use them with -profile singularity
The containers will be available from Docker/Singularityhub shortly.
The repo contains a environment.yml files which automatically build the correct conda env if -profile conda
is specifed in the command. Although you'll need conda
installed, this is probably the easiest way to run this pipeline.
By default, the pipeline just runs on the local machine. You can specify -profile slurm
to use a SLURM cluster.
You can use multiple profiles at once, separating them with a comma. This is described in the Nextflow documentation
Configuration options are set in conf/base.config
. They are described and set to sensible defaults (as suggested in the nCoV-2019 novel coronavirus bioinformatics protocol)
The only required option is --directory
, which should point to either:
- Nanopore run output directory
- Illumima sequencing run output directory
Set output directory with --outdir
Use --minimap
to swap to minimap for mapping (nanopore only) and --barcode
if you ran with barcodes (nanopore only)
Use --nanopolish
or --medaka
to run these workflows. If your sequencing run was barcoded, use --barcode
to run demultiplexing with porechop via artic demultiplex
. Use --directory
to specify the nanopore output directory, usually coded something like: <date>_<time>_<position>_<flowcell>_<ID_STRING>
.
The Illumina workflow leans heavily on the excellent ivar for primer trimming and consensus making. This workflow will be updated to follow ivar, as its also in very active development! Use --illumina
to run the Illumina workflow. Use --directory
to point to an Illumina output directory usually coded something like: <date>_<machine_id>_<run_no>_<some_zeros>_<flowcell>
. The workflow will recursively grab all fastq files under this directory, so be sure that what you want is in there, and what you don't, isn't!
Important config options are:
Option | Description |
---|---|
allowNoprimer | Allow reads that don't have primer sequence? Ligation prep = false, nextera = true |
illuminaKeepLen | Length of illumina reads to keep after primer trimming |
illuminaQualThreshold | Sliding window quality threshold for keeping reads after primer trimming (illumina) |
mpileupDepth | Mpileup depth for ivar |
ivarFreqThreshold | ivar frequency threshold for variant |
ivarMinDepth | Minimum coverage depth to call variant |