Serverless Variant Calling

Datasets

Trypanosome

Reference genome (FASTA, 35 MB): (Link)[https://tritrypdb.org/tritrypdb/app/downloads/Current_Release/TbruceiTREU927/fasta/data/]
Sequence Reads:
- SRR6052133* (FASTQGZip, 668 MB): (Link)[https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR6052133&display=download]

Human

Reference genome (FASTA GZip, 905 MB): (Link)[http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/]
Sequence Reads:
- SRR15068323 (FASTQGZip, 1.2 GB): (Link)[https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR15068323&display=data-access]
- ERR9856489 (FASTQGZip, 12.1 GB): (Link)[https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=ERR9856489&display=data-access]

Bos taurus

Reference genome (FASTA GZip, 781 MB): (Link)[https://www.ensembl.org/Bos_taurus/Info/Index]
Sequence Reads:
- SRR934415* (FASTQGZip, 16.5 GB): (Link)[https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR934415&display=data-access]

*The pipeline currently support only single-end sequences. Use only the first sequence for paired-end reads.

Setup

Setup Lithops for AWS backend.
Build the runtime in the dockerfile directory :

$ lithops runtime build -f lambda.Dockerfile serverless-genomics:1

Configure Lithops to use the built runtime (e.g. serverless-genomics:1). The required runtime memory, ephemeral disk size and timeout will depend on the number of partitions and
Create an S3 bucket with the required input datasets.
Use VariantCallingPipeline to execute the pipeline. Provide the necessary parameters. You can see a complete list in file serverlessgenomics/pipeline.py. The required parameters are:
- run_id: ID of a specific run. It can be reused for failed runs. You must change the ID if different input data are used.
- fasta_path: Path of the input sequence read (s3://...).
- fasta_chunks: Number of FASTA partitions.
- fastq_path: Path of the input reference genome (s3://...).
- fastq_chunks: Number of FASTQ partitions.
- storage_bucket: Temporary data S3 bucket name. It must exist.
Call VariantCallingPipeline.preprocess() for the pre-processing stage, VariantCallingPipeline.alignment() to run the alignment phase, VariantCallingPipeline.reduce() for the alignment phase or VariantCallingPipeline.run_pipeline() for a complete execution.

Article

You can read more about this pipeline in the published article Scaling a Variant Calling Genomics Pipeline with FaaS, presented in WoSC '23: Proceedings of the 9th International Workshop on Serverless Computing, part of MIDDLEWARE 2023 24th ACM/IFIP International Middleware Conference: https://dl.acm.org/doi/10.1145/3631295.3631403 (Preprint)

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
dockerfile		dockerfile
serverlessgenomics		serverlessgenomics
.gitignore		.gitignore
README.md		README.md
abort_mpu.py		abort_mpu.py
cli.py		cli.py
cost_estimator.py		cost_estimator.py
example.py		example.py
generate_stats.py		generate_stats.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serverless Variant Calling

Datasets

Setup

Article

About

Releases

Packages

Contributors 5

Languages

CLOUDLAB-URV/serverless-genomics-variant-calling

Folders and files

Latest commit

History

Repository files navigation

Serverless Variant Calling

Datasets

Setup

Article

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages