giraffe-svs-nf

Genotyping structural variants with pangenomes

Introduction

This is a nextflow pipeline for genotyping structural variants from short reads using a pangenome reference and various tools in the VG toolkit.

It requires the following inputs:

giraffe indices for a pangenome reference graph, including gbz, min, and dist
paired-end short reads for a set of sample you want to genotype
the reference to genotype against, in fasta format --- must be the same one chosen as the reference when building the pangenome
a sample sheet in csv format giving sample IDs and absolute paths to fastqs for each sample to be genotyped (see Preparing data)

and produces a single gzipped vcf containing genotype calls for all of the samples.

This pipeline is based on the process used to genotype SVs in Sirén et al. (2021).

Installation

On Lewis

If you are using the Lewis cluster and are a member of the warrenlab-group, you can just install nextflow in a conda environment if you haven't already, and then run the pipeline with the argument -profile lewis.

With conda

If you have conda installed and in your bin, nextflow can create an environment to run all tasks in if you run the pipeline with the argument -profile conda.

Requirements

If using one of the methods above, you don't need to worry about installing the requirements yourself, but here they are in case you do:

Preparing data

You'll need to make a sample sheet in csv format with columns named sample_id, r1, and r2. The sample ID should be a unique identifier for that individual, and absolute paths to forward and reverse fastq files should be specified in the r1 and r2 columns, respectively. If there are multiple pairs of forward and reverse fastq file for a single sample, you can have multiple rows for that sample.

Here is an example sample sheet:

sample_id,r1,r2
princess,/data/reads/princess_R1.fastq.gz,/data/reads/princess_R2.fastq.gz
junebug,/data/reads/junebug_R1.fastq.gz,/data/reads/junebug_R2.fastq.gz
oreo,/data/reads/oreo_L1_R1.fastq.gz,/data/reads/oreo_L1_R2.fastq.gz
oreo,/data/reads/oreo_L2_R1.fastq.gz,/data/reads/oreo_L2_R2.fastq.gz
freddie,/data/reads/freddie_R1.fastq.gz,/data/reads/freddie_R2.fastq.gz

Running

To run the pipeline, use the following command:

nextflow run WarrenLab/giraffe-svs-nf \
     --sample-sheet samples.csv \
     --gbz-index index/Amex-pg.full.gbz \
     --dist-index index/Amex-pg.full.dist \
     --min-index index/Amex-pg.full.min \
     --ref-fasta index/surface.fa
     --ref-path surface

--ref-path is the name of the reference path in the pangenome graph, as specified when building it.

N.B. You do not need to download or clone the code in this repository. The command above tells nextflow fetch the pipeline code from this repository and run it.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LICENSE		LICENSE
README.md		README.md
conda.yml		conda.yml
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

giraffe-svs-nf

Introduction

Installation

On Lewis

With conda

Requirements

Preparing data

Running

About

Releases

Packages

Languages

License

WarrenLab/giraffe-svs-nf

Folders and files

Latest commit

History

Repository files navigation

giraffe-svs-nf

Introduction

Installation

On Lewis

With conda

Requirements

Preparing data

Running

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages