CPS Extractor Pipeline

This pipeline is in the early stages of development and is not fully tested!

The CPS extractor pipeline is a Nextflow pipeline designed for processing Streptococcus pneumoniae FASTA sequences to extract the capsular locus sequence (CPS) and check for disruptive mutations

The pipeline is designed to be easy to set up and use, and is suitable for use on local machines and high-performance computing (HPC) clusters alike. Once you have downloaded the necessary docker/singularity images the pipeline can be used offline unless you have changed the selection of any database or container image.

The development of this pipeline is part of the GPS Project (Global Pneumococcal Sequencing Project).

Usage

Requirement

A POSIX-compatible system (e.g. Linux, macOS, Windows with WSL) with Bash 3.2 or later
Java 11 or later (up to 21) (OpenJDK/Oracle Java)
Docker or Singularity/Apptainer
- For Linux, Singularity/Apptainer or Docker Engine is recommended over Docker Desktop for Linux. The latter is known to cause permission issues when running the pipeline on Linux.

Accepted Inputs

Only Illumina paired-end short reads are supported
Each sample is expected to be a pair of raw reads following this file name pattern:
- *.{fa,fasta}

Profile

By default, Docker is used as the container engine and all the processes are executed by the local machine. To change this, you could use Nextflow's built-in -profile option to switch to other available profiles

ℹ️ -profile is a built-in Nextflow option, it only has one leading -
```
nextflow run . -profile [profile name]
```

Available profiles:

Profile Name	Details
`standard` (Default)	Docker is used as the container engine. Processes are executed locally.
`singularity`	Singularity is used as the container engine. Processes are executed locally.
`lsf`	The pipeline should be launched from a LSF cluster head node with this profile. Singularity is used as the container engine. Processes are submitted to your LSF cluster via `bsub` by the pipeline. (Tested on Wellcome Sanger Institute farm5 LSF cluster only)

Options

|Usage:
|nextflow run . [option] [value]
|
|--bakta_threads [INT]           Threads used for bakta. Default: 4
|--bakta_db [PATH]               Path to bakta database. Default: /data/pam/software/bakta/v5
|--blastdb [PATH]                Path to blast database. Default: cps_blastdb
|--input [PATH]                  Path to the input directory that contains the sequences to be processed {.fa,.fasta}. Default: input
|--output [PATH]                 Path to the output directory that save the results. Default: output
|--prodigal-training-file [PATH] Path to prodigal training file used in annotation. Default: all.trn
|--version                       Alternative workflow for getting versions of pipeline, container images, tools and databases
|--help                          Print this help message

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bin		bin
modules		modules
workflows		workflows
.gitignore		.gitignore
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CPS Extractor Pipeline

Table of contents

Usage

Requirement

Accepted Inputs

Profile

Options

About

Releases

Packages

Languages

cebile-lekh/cps_nf

Folders and files

Latest commit

History

Repository files navigation

CPS Extractor Pipeline

Table of contents

Usage

Requirement

Accepted Inputs

Profile

Options

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages