-
Notifications
You must be signed in to change notification settings - Fork 16
Home
This wiki contains instructions for running a demo of Griffin on a small bam file. There are two steps which can be run independently for the demo:
- GC correction
- Nucleosome profiling
To run this demo, we recommend that you use a conda environment to install the correct package versions. Testing was performed using conda 4.10.3 (https://docs.conda.io/en/latest/miniconda.html).
This demo uses relative paths which assume a specific file structure within your Griffin
folder, however, you can use different paths for your work if desired (ex. if you keep your reference genome in a different location), you just need to update the config.yaml to reflect the new paths.
Instructions for initializing the conda environment. Use this conda environment for all steps.
conda create --name griffin_demo python=3.7.4
conda activate griffin_demo
or source activate griffin_demo
pip install snakemake==5.19.2
conda install pandas=1.3.2
conda install scipy=1.7.1
conda install -c bioconda pyBigWig=0.3.17
pip install matplotlib==3.4.1
conda install -c bioconda samtools=1.13
conda install -c bioconda bedtools=2.29.2
pip install pybedtools==0.8.0 #also installs pysam-0.19.0
conda install numpy=1.21.2
conda install nomkl numpy scipy
#this prevents the error discussed in issue #14 (Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD)
Total time to run griffin_GC_and_mappability_correction: ~45 minutes
-
If you haven't already activated the conda environment, activate it:
conda activate griffin_demo
-
Copy snakemakes/griffin_GC_and_mappability_correction/ to a location where you would like to do the analysis (In this demo, we will use a directory called run_demo):
mkdir run_demo
cp -r snakemakes/griffin_GC_and_mappability_correction run_demo
-
Download the reference genome from the link below (if you don't have wget, you can open the link in a browser), unzip it, and put it in Ref (the download may take ~10 minutes):
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
gunzip hg38.fa.gz
mv hg38.fa Ref/
-
Download the mappability track (1.2gb) from the link below (if you don't have wget, you can open the link in a browser) and put it in Ref (the download may take ~15 minutes):
wget https://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k100.Umap.MultiTrackMappability.bw
mv k100.Umap.MultiTrackMappability.bw Ref/
-
Convert the demo cram file to bam and create an index (takes ~1 minute):
samtools view -b -T Ref/hg38.fa -o demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.cram
samtools index demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam
-
Navigate to the folder with the snakefile
cd run_demo/griffin_GC_and_mappability_correction/
-
Open the samples.yaml (run_demo/griffin_GC_and_mappability_correction/config/samples.yaml) and update the path to the demo bam file:
samples:
Healthy_demo: ../../demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam -
If you do NOT want the snakemake to use the default 8 CPUs, open the cluster_slurm.yaml (run_demo/griffin_GC_and_mappability_correction/config/cluster_slurm.yaml) and edit the ncpus parameter (line 18 and line 23) for the GC_counts step (other parameters in this file are not used unless launching to a slurm cluster). Increasing the CPUs will parallelize genomic regions to make the analysis run faster, as long as your computer has the CPUs available.
-
Run the snakemake (expected runtime: ~15 minutes with 8 CPU):
snakemake -s griffin_GC_and_mappability_correction.snakefile --cores 1 -np #dry run to print a list of jobs
snakemake -s griffin_GC_and_mappability_correction.snakefile --cores 1 #runs one job at a time
-
The outputs should be identical to the expected outputs in demo/griffin_GC_correction_demo_files/expected_results/:
Healthy_demo.GC_bias.txt md5sum: 29d34798c67edad2c371cedb94b3a8b8
Total time to run the griffin_nucleosome_profiling demo: ~15 minutes
-
If you haven't already activated the conda environment, activate it:
conda activate griffin_demo
-
Copy snakemakes/griffin_nucleosome_profiling/ to a location where you would like to do the analysis (In this demo, we will use a directory called run_demo):
mkdir run_demo #if you haven't already made this directory
cp -r snakemakes/griffin_nucleosome_profiling run_demo
-
If you haven't already downloaded the reference genome (during the GC correction demo above), download it from the link below (if you don't have wget, you can open the link in a browser), unzip it, and put it in Ref (the download may take a few minutes, you can also symlink an existing copy into the Ref folder):
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
gunzip hg38.fa.gz
mv hg38.fa Ref/
-
If you haven't already downloaded the mappability track (during the GC correction demo above), download it from the link below (if you don't have wget, you can open the link in a browser) and put it in Ref (the download may take ~15 minutes):
wget https://hgdownload.soe.ucsc.edu/gbdb/hg38/hoffmanMappability/k100.Umap.MultiTrackMappability.bw
mv k100.Umap.MultiTrackMappability.bw Ref/
-
If you haven't already converted the demo cram file to bam file, convert it (takes ~1 minute):
samtools view -b -T Ref/hg38.fa -o demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.cram
samtools index demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam
-
Navigate to the folder with the snakefile:
cd run_demo/griffin_nucleosome_profiling/
-
Open the sites.yaml (run_demo/griffin_nucleosome_profiling/config/sites.yaml) and update the path to the demo sites file:
site_lists:
CTCF_demo: ../../demo/griffin_nucleosome_profiling_demo_files/sites/CTCF.hg38.1000.txt -
Open the samples.GC.yaml (run_demo/griffin_nucleosome_profiling/config/samples.yaml) and update the path to the demo bam file and GC correction file (if you have run the GC correction demo, you can use the path to your results instead):
samples:
Healthy_demo:
bam: ../../demo/bam/Healthy_GSM1833219_downsampled.sorted.mini.bam
GC_bias: ../../demo/griffin_GC_correction_demo_files/expected_results/Healthy_demo.GC_bias.txt -
Run the snakemake (expected runtime: ~1 minute):
snakemake -s griffin_nucleosome_profiling.snakefile --cores 1 -np #dry run to print a list of jobs
snakemake -s griffin_nucleosome_profiling.snakefile --cores 1
-
The outputs should be identical to the expected outputs in demo/griffin_nucleosome_profiling_demo_files/expected_results/:
Healthy_demo.GC_corrected.coverage.tsv md5: bb7c6b730d44bf201f4091836aa970d4
Healthy_demo.uncorrected.coverage.tsv md5: 9ade7b4d069ba39b2d00a5f0609d41cd