Version v0.4.10
SARS-CoV-2 variant calling and consensus assembly pipeline for ARTIC v3 amplicons sequenced on Illumina or Oxford Nanopore platforms
docker build -t covid19 .
Run the pipeline in the Docker image:
docker \
run \
--rm \
--workdir /data \
--volume `pwd`:/data \
--entrypoint /bin/bash \
--env prefix=test-covid19 \
--env reference=reference/nCoV-2019.reference.fasta \
--env input_fastq=data/twist-target-capture/RNA_control_spike_in_10_6_100k_reads.fastq.gz \
--env primer_bed_file=reference/artic-v1/ARTIC-V3.bed \
covid19 \
jobscript.sh
For Oxford Nanopore:
docker \
run \
--rm \
--workdir /data \
--volume `pwd`:/data \
--entrypoint /bin/bash \
--env prefix=test-covid19 \
--env INSTRUMENT_VENDOR="Oxford Nanopore" \
--env reference=reference/nCoV-2019.reference.fasta \
--env input_fastq=data/twist-target-capture/RNA_control_spike_in_10_6_100k_reads.fastq.gz \
--env primer_bed_file=reference/artic-v1/ARTIC-V3.bed \
covid19 \
jobscript.sh
This currently produces a consensus.fa
file, a variants.vcf
, a BAM file (covid19.bam
), nextstrain results (nextstrain.json
) and pangolin results (pangolin.csv
).
To run tests, run pytest
.
This repository includes a local requirements.txt
file for quickly running some golden output tests across a variety of datasets. This repository is set up to use Github Actions to automatically build the Docker image and run those tests to ensure there are no regressions. These ensure that parameter and pipeline changes don't affect variant calls or consensus sequence generation.
Currently, the following integration tests are run:
- Simulated Illumina data from the SARS-CoV-2 reference including simulated variants across the genome
- Example Twist hybrid capture data (Illumina)
- Example ARTIC v1 amplicon sequencing data (Illumina)
It also uses pre-commit
to keep things clean and orderly. To get started, first install the requirements (Python 3 required): pip install -r requirements.txt
. Then install the pre-commit
hooks: pre-commit install --install-hooks
. Note that you'll also need shellcheck
installed on your system (brew install shellcheck
on a Mac).
Many thanks are due across the community, including but not limited to:
- @tseemann, @gkarthik, @nickloman, and many others for quick discussions on optimal SNP calling for both amplicon (ARTIC primers) and non-amplicon sequencing approaches
- @nickloman, @joshquick, @rambaut, @k-florek and others working on the ARTIC protocol for SARS-CoV-2
- @pangolin and @nextclade for surveillance tools
- Voigt lab for dnaplotlib