Large-scale whole exome sequencing analyses identified protein-coding variants associated with immune-mediated diseases in 350770 adults
This repository contains code of quality control, annotation, association test for common (logisitic) and rare (SKAT) variants.
In each directory, we have included a README.txt file that provides more detailed information. We further provided important input or result files (excluding those in bed/bim/fam formats) to facilitate better visualization and understanding.
Workflow: QC -> Annotation -> Association -> PostAnalysis
The complete analysis workflow begins with quality control (QC) from Step1 to Step4.
Following QC, annotation starts; the primary analysis utilizes SnpEff for annotations of rare variants and ANNOVAR for common variants, while case-control enrichment employed VEP annotations.
Subsequently, variant and gene-based association tests are conducted separately for common and rare variants, and sensitivity analysis were further adopted to validate their robustness.
Finally, various post-analyses are performed, including BHR heritage analysis and correlation analysis, Cox survival analysis, Gene expression, MR analysis, annotating amino acid alterations, Proteomic-wide analysis, and PheWAS analysis. We have uploaded important codes for this section.
Plot 1:
Study Design. Created with Adobe, no analysis or code involved.
Plot 2:
- (A) Results of gene-based collapse analysis for rare variants, main result 1. All codes provided (QC, SnpEff, GRM.sh, SAIGE.sh).
- (B) Results of case-control enrichment. All codes provided (QC, VEP, Case_Control.py).
Plot 3:
- (A) Results of variant-level association test for common variants. All codes provided (QC, Annovar, common.sh, clump.sh).
- (B) Convergence of GWAS signals. Main GWAS code provided (GWAS.sh).
- (C) Pleiotropy effects of common variants. Summarizes results from 3A, no additional code provided.
Plot 4:
- (A) Burden heritability. All codes provided (Heritage.R).
- (B) Genetic correlations of IMDs. All codes provided (Correlation.R).
Plot 5:
- (A) Protein expression levels between mutation carriers and non-carriers. A simple t-test.
- (B) MR analysis. All codes provided (MR.R).
- (C) Annotation of amino acid alterations.
Plot 6:
- (A) PheWAS analysis of rare variants. Similar to Plot 2A, no additional code provided.
- (B) PheWAS analysis of common variants. Similar to Plot 3A, no additional code provided.
- (C) PPI and clusters performed by STRING. Web-based API, no code used.
- (D) Single-cell expression analysis.