GWA mapping with C. elegans
- R-v3.6.0
- nextflow-v19.07.0
- BCFtools-v1.9
- plink-v1.9
- R-cegwas2
- R-tidyverse-v1.2.1
- R-correlateR
- R-rrBLUP-v4.6
- R-RSpectra-v0.13-1
- R-ggbeeswarm-v0.6
- R-qtl-v1.46-2
- R-sommer-v4.1.1
- R-genetics-v1.3.8.1.2
git clone https://github.com/AndersenLab/cegwas2-nf.git
cd cegwas2-nf
nextflow main.nf --traitfile=test_traits.tsv --vcf=bin/WI.20180527.impute.vcf.gz --p3d=TRUE --sthresh=BF
nextflow main.nf --help
- will display the help message
--traitfile
- is a tab-delimited formatted (.tsv) file that contains trait information. Each phenotype file should be in the following format (replace trait_name with the phenotype of interest):
strain | trait_name_1 | trait_name_2 |
---|---|---|
JU258 | 32.73 | 19.34 |
ECA640 | 34.065378 | 12.32 |
... | ... | ... |
ECA250 | 34.096 | 23.1 |
-
--p3d
- This determines what type of kinship correction to perform prior to mapping.TRUE
corresponds to the EMMAx method andFALSE
corresponds to the slower EMMA method. We recommend running with--p3d=TRUE
to make sure all files of the required files are present and in the proper format, then run with--p3d=FALSE
for a more exact mapping. Default: FALSE. -
--sthresh
- This determines the signficance threshold required for performing post-mapping analysis of a QTL.BF
corresponds to Bonferroni correction,EIGEN
corresponds to correcting for the number of independent markers in your data set, anduser-specified
corresponds to a user-defined threshold, where you replace user-specified with a number. For example--sthresh=4
will set the threshold to a-log10(p)
value of 4. We recommend using the strictBF
correction as a first pass to see what the resulting data looks like. If the pipeline stops at thesummarize_maps
process, no significant QTL were discovered with the input threshold. You might want to consider lowering the threshold if this occurs.
-
--vcf
- is a VCF file with variant data. All strains with phenotypes should be represented in the VCF used for mapping. There should also abe a tabix-generated index file (.tbi) in the same folder as the specified VCF file that has the same name as the VCF except for the addition of the.tbi
extension. (generated usingtabix -p vcf vcfname.vcf.gz
). If this flag is not used a VCF for the C. elegans species will be downloaded from CeNDR -
--freqUpper
- Upper bound for variant allele frequency for a variant to be considered for burden mapping. Default = 0.5 -
--minburden
- The number of strains that must share a variant for that variant to be considered for burden mapping. Default = 2 -
--refflat
- Genomic locations for genes used for burden mapping. A default generated from WS245 is provided in the repositories bin. -
--genes
- Genomic locations for genes formatted for plotting purposes. A default generated from WS245 is provided in the repositories bin. -
--fix_names
- This will query the CeNDR strain set an resolve any discrepancies between your strain set and isotype names on CeNDR. This is important if you are not providing your own VCF, however if you provide your own VCF that contains the strains you phenotyped, you do not need to fix strain names (Default = "fix", change to anything but fix to skip).
Get_GenoMatrix_Eigen.R
- Takes a genotype matrix and chromosome name as input and identifies the number significant eigenvalues.Fix_Isotype_names.R
- Take sample names present in phenotype data and changes them to isotype names found on CeNDR when the--traitdir
flag is used.Run_Mappings.R
- Performs GWA mapping using the rrBLUP R package and the EMMA or EMMAx algorithm for kinship correction. Generates manhattan plot and phenotype by genotype plot for peak positions.Summarize_Mappings.R
- Generates plot of all QTL identified in nextflow pipeline.Finemap_QTL_Intervals.R
- Run EMMA/EMMAx on QTL region of interest. Generates fine map plot, colored by LD with peak QTL SNV found from genome-wide scanplot_genes.R
- Runs SnpEff and generates gene plot.makeped.R
- Converts trait.tsv
files to.ped
format for burden mapping.rvtest
- Executable to run burden mapping, can be found at the RVtests homepageplot_burden.R
- Plots the results from burden mapping.Fix_Isotype_names_bulk.R
- Take sample names present in phenotype data and changes them to isotype names found on CeNDR when the--traitfile
flag is used.
Genotype_Matrix
├── Genotype_Matrix.tsv
├── total_independent_tests.txt
Mappings
├── Data
├── traitname_processed_mapping.tsv
├── QTL_peaks.tsv
├── Plots
├── traitname_manplot.pdf
├── traitname_pxgplot.pdf
├── Summarized_mappings.pdf
Fine_Mappings
├── Data
├── traitname_snpeff_genes.tsv
├── Plots
├── traitname_qtlinterval_finemap_plot.pdf
├── traitname_qtlinterval_gene_plot.pdf
BURDEN
├── VT
├── Data
├── traitname.VariableThresholdPrice.assoc
├── Plots
├── traitname_VTprice.pdf
├── SKAT
├── Data
├── traitname.Skat.assoc
├── Plots
├── traitname_SKAT.pdf
Genotype_Matrix.tsv
- pruned LD-pruned genotype matrix used for GWAS and construction of kinship matrixtotal_independent_tests.txt
- number of independent tests determined through spectral decomposition of the genotype matrix
traitname_processed_mapping.tsv
- Processed mapping data frame for each trait mappedQTL_peaks.tsv
- List of signifcant QTL identified across all traits
traitname_manplot.pdf
- Manhattan plot for each trait that was analyzed. Two significance threshold lines are present, one for the Bonferronit corrected threshold, and another for the spectral decomposition threshold.traitname_pxgplot.pdf
- Phenotype by genotype split at peak QTL positions for every significant QTL identifiedSummarized_mappings.pdf
- A summary plot of all QTL identified
traitname_snpeff_genes.tsv
- Fine-mapping data frame for all significant QTL
traitname_qtlinterval_finemap_plot.pdf
- Fine map plot of QTL interval, colored by marker LD with the peak QTL identified from the genome-wide scantraitname_qtlinterval_gene_plot.pdf
- variant annotation plot overlaid with gene CDS for QTL interval
traitname.VariableThresholdPrice.assoc
- Genome-wide burden mapping result using VT price, see RVtests homepagetraitname.Skat.assoc
- Genome-wide burden mapping result using Skat, see RVtests homepage
traitname_VTprice.pdf
- Genome-wide burden mapping manhattan plot for VTpricetraitname_SKAT.pdf
- Genome-wide burden mapping manhattan plot for Skat
main_mediation.nf
- Mapping and Mediation.mediation.nf
- Mediation on main.nf results.mediation_STDcegwas2.nf
- Mediation on standard cegwas2-nf results.
--transcripteQTL
- eQTL peak file--transcript_exp
- expression file, input of eQTL calling
--traitfile
---cegwas2dir
-
- Randomly sample 1 trait from trait file
- Permutate the trait for 200 times
- Run EMMA mapping with BF and EIGEN, respectively
- Among all the -log10P that passed the threshold, get the 5% FDR
- Call expression QTL with EMMA and draw eQTL map
--pos
- transcript or gene positions in the genome