A software package and executable bioinformatics workflow for the analysis of recombinant adeno-associated virus (rAAV) products by PacBio long-read sequencing.
- For a full explanation of the methods and example results on public PacBio datasets, see the preprint paper on bioRxiv: Standardized Nomenclature and Reporting for PacBio HiFi Sequencing and Analysis of rAAV Gene Therapy Vectors
- For a summary of technical methods, AAV type/subtype definitions, and interpretation, see: Design and definitions
- For answers to frequently asked questions, see the FAQ
LAAVA can be used as an end-to-end Nextflow workflow, an interactive Docker container, or individual scripts in this codebase.
This code can be run as a standard Nextflow workflow. When run this way, the workflow will automatically pull in the analysis scripts and their dependencies as a Docker image.
To get started, create a JSON file with your parameter values, similar to params-local-ss.json in this repo, and run it with:
nextflow run -profile local -params-file <your-params-file.json> main.nf
For exploratory analysis or troubleshooting, you can also run the laava
docker image
directly on the command line as an interactive container.
Assuming you have Docker installed, fetch the container image:
docker pull ghcr.io/formbio/laava:latest
Then run it interactively in your current directory:
docker run -v $(pwd):/data -w /data -it ghcr.io/formbio/laava:latest bash
You can directly download or clone the repo to use the scripts directly.
$ git clone https://github.com/formbio/laava.git
There are several ways to satisfy the script dependencies locally.
The laava_dev.dockerfile
in this repo installs the scripts' dependencies, but not the
scripts themselves, into a Docker container image that you can then use to run the local
copies of the scripts. This allows you to edit the code in this repo in-place and run it
within the container environment without rebuilding the container.
To build the container image with the name laava_dev
(you can use another name if you prefer):
docker build -t laava_dev:latest -f laava_dev.dockerfile .
To run the container in the current working directory:
docker run -v $(pwd):$(pwd) -w $(pwd) -it laava_dev:latest bash
This opens a Bash shell with the scripts in the PATH, and the original working directory mounted in place.
The conda channels and dependencies are in the configuration file conda_env.yml
.
With this environment and a LaTeX installation (via e.g. apt), you'll have all the
dependencies you need to run LAAVA scripts directly on Linux, and nearly everything
you need on Mac.
The same environment is also used by the Docker container images internally.
First, install conda via Miniconda or Anaconda.
Next, use the YAML configuration file to create a new conda environment and install its dependencies:
$ conda env create -f conda_env.yml
Finally, once installation completes, activate the new environment:
$ source activate laava
At this point the prompt should change to (laava) $
and the executable scripts should
be available in your PATH.
This repo includes small test files based on PacBio's public AAV sequencing examples.
The test/
subdirectory in this repo contains small example input files and a Makefile
to run the scripts to reanalyze them and produce example HTML and PDF reports.
If you have Docker, Nextflow, and Make available, you can run a variety of tests from the top directory of this repo.
make test
-- run both test samples using the Docker image directly, skipping Nextflow, and check the results quantitatively.make sc
-- run the example self-complementary AAV (scAAV) sample with the Nextflow pipeline. This takes about 1-2 minutes.make ss
-- run the example single-stranded AAV (ssAAV) sample. This takes about 2-3 minutes, including an additional flip/flop analysis step.make all
-- run both example AAV samples using Nextflow.make min
-- run the scAAV sample with the minimum number of required parameters, exercising the default behavior including guessing the construct vector type (sc/ss).make folder
-- run both samples via folder input, exercising a batch processing mode.
Each of these commands will generate example HTML and PDF reports from the test datasets included in the repo, which you can view locally.