A Data-Driven Sliding-window Pairwise Comparative Approach for the Estimation of Transmission Fitness of SARS-CoV-2 Variants and Construction of the Evolution Fitness Landscape
Preprint Link to the Manuscript Data Source
This repository contains the Differential Population Growth Rate (DPGR) model, a novel sliding-window, data-driven approach to estimate and compare the transmission fitness of SARS-CoV-2 variants. DPGR utilizes pairwise comparisons across specific time windows to quantify relative fitness advantages between viral strains, creating a dynamic fitness landscape that reflects the evolutionary trends of Variants of Concern (VOCs) such as Omicron and Delta. This generalized approach is appilcable to any viral surveillance dataset of exponentially growing viral populations.
By addressing key issues with sampling biases and allowing for real-time epidemiological insights, DPGR provides a robust framework that complements transmission fitness estimation.
Clone this repository to run locally:
git clone https://github.com/QinLab/DPGR2024.git
cd DPGR
Use Conda to create an environment named myenv
and activate it:
conda create --name myenv python=3.8
conda activate myenv
Now, install dependencies from requirements.txt:
pip install -r requirements.txt
Here, we highlight some key insights from applying the DPGR model:
Omicron consistently exhibited higher transmission fitness over Delta across various geographic levels (countries and continents).
As the method is pairwise, we can directly estimate the transmission fitness of the adjacent variants like DPGRAlpha,Beta, DPGRBeta, Delta and DPGRDelta.Omicron. To infer the transmission fitness of the non-adjacent variants, DPGRAlpha,Delta, DPGRAlpha, Omicron, the property of logarithms (log(a/b) = log(a/c) + log(c/b) is used. Thus, we can infer the growth advantages of non-adjacent variants. The initial variant Alpha is considered as the root variant. Using the pairwise distance matrix, the relative transmission fitness landscape and transmission fitness stair are constructed.
Here, is the Pairwise Distance matrix of SARS-CoV-2 Variants of Concern for the WHO labels:
The fitness landscape generated through pairwise DPGR analyses shows how each successive variant of concerns of SARS-CoV-2 dominates over the other. This model provides a clear visualization of the evolutionary trajectory in fitness between major VOCs.
Fitness Stair of the WHO Varinats of Concern(VOCs)
Fitness Stair of the Omicron Sublineages(BA.1* -BA.5*)
The neighbor Join tree helps to understand the dynamic evolution of variants in terms of relative transmission fitness.
Neighbor Join Tree for the WHO Varinats for United States:
├── Code # Jupyter notebooks for analysis and visualization
├── Figures # Visualizations
├── Files # Estimated Fitness values
├── requirements.txt # Dependency list
└── README.md # Project documentation