Skip to content

lpembleton/sorting-hat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nextflow run with docker

sorting-hat

Introduction

sorting-hat is a bioinformatics pipeline that demultiplexes restriction enzyme genotyping-by-sequencing data. The sequencing data is assumed to single end and have a single internal barcode as in the GBS protocol (and derivatives of) described by Elshire et al. 2011.

The pipeline also computes standard QA/QC stats.

Pipeline Summary:

For Restriction Enzyme GBS Demulitplexing:

  1. Prepare keyfiles for cutadapt input (awk)
  2. Timming fastq file to remove common adapter before looking for internal barcodes in step 3. (Cutadapt)
  3. Demultiplex fastq file using provide barcode keys (Cutadapt)
  4. Read quality and adapter trimming (fastp)
  5. Read QC (FastQC)
  6. Reporting (MultiQC)

Input Requirements:

The pipeline expects a CSV samplesheet as input, which should contain the run id, the lane number, the path to the key file and the path to the fastq file. It should like something similar to:

runid,lane,key,fastq_1
seq01,1,seq01-keyfile.txt,seq01_S2_L001_R1_001.fastq.gz
seq02,2,seq02-keyfile.txt,seq02_S3_L002_R1_001.fastq.gz

Note the column names are important

The keyfile itself should be a tab separated text file (no header) containing the sample name and the associated barcode sequence. I should look something similar to:

sample01    AACT
sample02    CGGT
sample03    TGCG

Usage

nextflow run main.nf --input samplesheet.csv --outdir -profile local

If no outdir is provided it will create one called 'demultiplexed' in the associated directory.

Pipeline Output

This pipeline outputs demultplexed fastq files. fastq files following quality and adapter trimming are not retained as most other pipeline included this process in their initial steps. Fastp and FastQC processing in only included to generate sequence quality reports to help users determine the appropriate next steps.

Credits

sorting-hat (the nf pipeline, not the harry potter hat) was originally written by LWPembleton.

A lot of inspiration and structure was taken from the Nextflow documentation, the fantastic nf-core community and modules.

Nextflow enables reproducible computational workflows.

Paolo Di Tommaso, Maria Chatzou, Evan Floden, Pablo Prieto Barja, Emilio Palumbo & Cedric Notredame.

P. Di Tommaso, et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316–319 (2017) doi:10.1038/nbt.3820

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Citations

About

A fastq demultiplexing nextflow pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published