sorting-hat

Introduction

sorting-hat is a bioinformatics pipeline that demultiplexes restriction enzyme genotyping-by-sequencing data. The sequencing data is assumed to single end and have a single internal barcode as in the GBS protocol (and derivatives of) described by Elshire et al. 2011.

The pipeline also computes standard QA/QC stats.

Pipeline Summary:

For Restriction Enzyme GBS Demulitplexing:

Prepare keyfiles for cutadapt input (awk)
Timming fastq file to remove common adapter before looking for internal barcodes in step 3. (Cutadapt)
Demultiplex fastq file using provide barcode keys (Cutadapt)
Read quality and adapter trimming (fastp)
Read QC (FastQC)
Reporting (MultiQC)

Input Requirements:

The pipeline expects a CSV samplesheet as input, which should contain the run id, the lane number, the path to the key file and the path to the fastq file. It should like something similar to:

runid,lane,key,fastq_1
seq01,1,seq01-keyfile.txt,seq01_S2_L001_R1_001.fastq.gz
seq02,2,seq02-keyfile.txt,seq02_S3_L002_R1_001.fastq.gz

Note the column names are important

The keyfile itself should be a tab separated text file (no header) containing the sample name and the associated barcode sequence. I should look something similar to:

sample01    AACT
sample02    CGGT
sample03    TGCG

Usage

nextflow run main.nf --input samplesheet.csv --outdir -profile local

If no outdir is provided it will create one called 'demultiplexed' in the associated directory.

Pipeline Output

This pipeline outputs demultplexed fastq files. fastq files following quality and adapter trimming are not retained as most other pipeline included this process in their initial steps. Fastp and FastQC processing in only included to generate sequence quality reports to help users determine the appropriate next steps.

Credits

sorting-hat (the nf pipeline, not the harry potter hat) was originally written by LWPembleton.

A lot of inspiration and structure was taken from the Nextflow documentation, the fantastic nf-core community and modules.

Nextflow enables reproducible computational workflows.

Paolo Di Tommaso, Maria Chatzou, Evan Floden, Pablo Prieto Barja, Emilio Palumbo & Cedric Notredame.

P. Di Tommaso, et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316–319 (2017) doi:10.1038/nbt.3820

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
conf		conf
docs/images		docs/images
modules/local		modules/local
workflows		workflows
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sorting-hat

Introduction

Pipeline Summary:

Input Requirements:

Usage

Pipeline Output

Credits

Citations

About

Releases

Packages

Languages

License

lpembleton/sorting-hat

Folders and files

Latest commit

History

Repository files navigation

sorting-hat

Introduction

Pipeline Summary:

Input Requirements:

Usage

Pipeline Output

Credits

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages