This pipeline is in the early stages of development and is not fully tested!
The CPS extractor pipeline is a Nextflow pipeline designed for processing Streptococcus pneumoniae FASTA sequences to extract the capsular locus sequence (CPS) and check for disruptive mutations
The pipeline is designed to be easy to set up and use, and is suitable for use on local machines and high-performance computing (HPC) clusters alike. Once you have downloaded the necessary docker/singularity images the pipeline can be used offline unless you have changed the selection of any database or container image.
The development of this pipeline is part of the GPS Project (Global Pneumococcal Sequencing Project).
- A POSIX-compatible system (e.g. Linux, macOS, Windows with WSL) with Bash 3.2 or later
- Java 11 or later (up to 21) (OpenJDK/Oracle Java)
- Docker or Singularity/Apptainer
- For Linux, Singularity/Apptainer or Docker Engine is recommended over Docker Desktop for Linux. The latter is known to cause permission issues when running the pipeline on Linux.
- Only Illumina paired-end short reads are supported
- Each sample is expected to be a pair of raw reads following this file name pattern:
*.{fa,fasta}
- By default, Docker is used as the container engine and all the processes are executed by the local machine. To change this, you could use Nextflow's built-in
-profile
option to switch to other available profilesℹ️
-profile
is a built-in Nextflow option, it only has one leading-
nextflow run . -profile [profile name]
- Available profiles:
Profile Name Details standard
(Default)Docker is used as the container engine.
Processes are executed locally.singularity
Singularity is used as the container engine.
Processes are executed locally.lsf
The pipeline should be launched from a LSF cluster head node with this profile.
Singularity is used as the container engine.
Processes are submitted to your LSF cluster viabsub
by the pipeline.
(Tested on Wellcome Sanger Institute farm5 LSF cluster only)
|Usage:
|nextflow run . [option] [value]
|
|--bakta_threads [INT] Threads used for bakta. Default: 4
|--bakta_db [PATH] Path to bakta database. Default: /data/pam/software/bakta/v5
|--blastdb [PATH] Path to blast database. Default: cps_blastdb
|--input [PATH] Path to the input directory that contains the sequences to be processed {.fa,.fasta}. Default: input
|--output [PATH] Path to the output directory that save the results. Default: output
|--prodigal-training-file [PATH] Path to prodigal training file used in annotation. Default: all.trn
|--version Alternative workflow for getting versions of pipeline, container images, tools and databases
|--help Print this help message