Skip to content

Quality control, annotation, association test, and post analysis

Notifications You must be signed in to change notification settings

Sirius-Yang/IMDs_WES

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Large-scale whole exome sequencing analyses identified protein-coding variants associated with immune-mediated diseases in 350770 adults

This repository contains code of quality control, annotation, association test for common (logisitic) and rare (SKAT) variants.

In each directory, we have included a README.txt file that provides more detailed information. We further provided important input or result files (excluding those in bed/bim/fam formats) to facilitate better visualization and understanding.


Workflow: QC -> Annotation -> Association -> PostAnalysis

The complete analysis workflow begins with quality control (QC) from Step1 to Step4.

Following QC, annotation starts; the primary analysis utilizes SnpEff for annotations of rare variants and ANNOVAR for common variants, while case-control enrichment employed VEP annotations.

Subsequently, variant and gene-based association tests are conducted separately for common and rare variants, and sensitivity analysis were further adopted to validate their robustness.

Finally, various post-analyses are performed, including BHR heritage analysis and correlation analysis, Cox survival analysis, Gene expression, MR analysis, annotating amino acid alterations, Proteomic-wide analysis, and PheWAS analysis. We have uploaded important codes for this section.


Plot 1:

Study Design. Created with Adobe, no analysis or code involved.

Plot 2:

  • (A) Results of gene-based collapse analysis for rare variants, main result 1. All codes provided (QC, SnpEff, GRM.sh, SAIGE.sh).
  • (B) Results of case-control enrichment. All codes provided (QC, VEP, Case_Control.py).

Plot 3:

  • (A) Results of variant-level association test for common variants. All codes provided (QC, Annovar, common.sh, clump.sh).
  • (B) Convergence of GWAS signals. Main GWAS code provided (GWAS.sh).
  • (C) Pleiotropy effects of common variants. Summarizes results from 3A, no additional code provided.

Plot 4:

  • (A) Burden heritability. All codes provided (Heritage.R).
  • (B) Genetic correlations of IMDs. All codes provided (Correlation.R).

Plot 5:

  • (A) Protein expression levels between mutation carriers and non-carriers. A simple t-test.
  • (B) MR analysis. All codes provided (MR.R).
  • (C) Annotation of amino acid alterations.

Plot 6:

  • (A) PheWAS analysis of rare variants. Similar to Plot 2A, no additional code provided.
  • (B) PheWAS analysis of common variants. Similar to Plot 3A, no additional code provided.
  • (C) PPI and clusters performed by STRING. Web-based API, no code used.
  • (D) Single-cell expression analysis.

About

Quality control, annotation, association test, and post analysis

Resources

Stars

Watchers

Forks

Packages

No packages published