GMWI2 (Gut Microbiome Wellness Index 2) is a robust and biologically interpretable predictor of health status based on gut microbiome taxonomic profiles.
On a stool metagenome sample, this command-line tool performs four major steps:
- Quality control
- Removal of overrepresented sequences (probable adapter sequences) using fastqc
- Removal of human DNA contaminants (reads that map to GRCh38/hg38) using Bowtie2
- Removal of adapter sequences and low quality reads using Trimmomatic
- Taxonomic profiling using MetaPhlAn3 (v3.0.13) with the mpa_v30_CHOCOPhlAn_201901 marker database
- Transformation of taxonomic relative abundances into a binary presence/absence profile
- Computation of the GMWI2 score using a Lasso-penalized logistic regression model trained on a meta-dataset of 8,069 health status labeled stool shotgun metagenomes
If you use GMWI2, please cite:
Gut Microbiome Wellness Index 2 Enhances Health Status Prediction from Gut Microbiome Taxonomic Profiles Chang and Gupta et al., Nature Communications (2024).
GMWI2 is supported for macOS and Linux, and has been tested on the following systems:
- macOS Big Sur 11.7.10
- CentOS Linux 7 (Core)
To avoid dependency conflicts, please create an isolated conda environment and install the GMWI2 package. Installation via conda/mamba automatically installs GMWI2 and its dependencies. Make sure to perform step 4 to ensure that databases are downloaded and installed! Installation should take ~30 minutes.
- Create new conda environment and install mamba
conda create --name gmwi2_env -c conda-forge mamba python=3.8
- Activate environment
conda activate gmwi2_env
- Install GMWI2 package with mamba
mamba install -c bioconda -c conda-forge gmwi2=1.6
- Download/install databases (and verify that the package was installed correctly) by running GMWI2 on a tiny simulated stool metagenome. This tool automatically installs databases during the first run (should take ~20 minutes). To avoid issues in downloading databases, please run this step before submitting multiple concurrent batch jobs.
# download the tiny stool metagenome
wget https://raw.githubusercontent.com/danielchang2002/GMWI2/main/example/tiny/tiny_1.fastq
wget https://raw.githubusercontent.com/danielchang2002/GMWI2/main/example/tiny/tiny_2.fastq
gmwi2 -f tiny_1.fastq -r tiny_2.fastq -n 16 -o tiny
Try downloading and running GMWI2 on a real example stool metagenome from the pooled dataset used to develop GMWI2 (should take ~20 minutes).
Input: Two (forward/reverse) raw fastq (or fastq.gz) files generated from paired-end stool metagenome reads
Output: The GMWI2 (Gut Microbiome Wellness Index 2) score
usage: gmwi2 [-h] -n NUM_THREADS -f FORWARD -r REVERSE -o OUTPUT_PREFIX [-v]
* Example usage:
$ ls
.
├── forward.fastq
└── reverse.fastq
$ gmwi2 -f forward.fastq -r reverse.fastq -n 8 -o output_prefix
$ ls
.
├── forward.fastq
├── reverse.fastq
├── output_prefix_GMWI2.txt
├── output_prefix_GMWI2_taxa.txt
└── output_prefix_metaphlan.txt
The three output files are:
(i) output_prefix_GMWI2.txt: GMWI2 score
(ii) output_prefix_GMWI2_taxa.txt: A list of the taxa present in the sample used to compute GMWI2
(iii) output_prefix_metaphlan.txt: Raw MetaPhlAn3 taxonomic profiling output
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
required named arguments:
-n NUM_THREADS, --num_threads NUM_THREADS
number of threads
-f FORWARD, --forward FORWARD
forward-read of metagenome (.fastq/.fastq.gz)
-r REVERSE, --reverse REVERSE
reverse-read of metagenome (.fastq/.fastq.gz)
-o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
prefix to designate output file names
To merge GMWI2 score output files from multiple samples into a single csv file, please run:
echo "Sample,GMWI2" > merged.csv && for file in *GMWI2.txt; do echo "$(basename "$file" | awk -F "_GMWI2.txt" '{print $1}'),$(cat "$file")" >> merged.csv; done
We highly recommend that you use the conda tool to compute GMWI2 scores, as the tool checks that you use the correct MetaPhlAn version and marker database!
However, if you have already ran MetaPhlAn on your metagenomes (and are sure that you used the correct MetaPhlAn version and marker database!) and would like to compute GMWI2 scores directly on the MetaPhlAn output files, please run the following:
# download script
wget https://raw.githubusercontent.com/danielchang2002/GMWI2/refs/heads/main/src/gmwi2_metaphlan_output.py
# download linear model
wget https://raw.githubusercontent.com/danielchang2002/GMWI2/refs/heads/main/src/GMWI2/GMWI2_databases/GMWI2_model.joblib
# run script on MetaPhlAn output
python3 gmwi2_metaphlan_output.py metagenome_metaphlan_output.txt GMWI2_model.joblib output_prefix
Please use the colab notebook linked above to reproduce all downstream analyses on the pooled dataset. See the manuscript directory for more details.
The top image was generated via OpenAI DALL·E 2 using the prompt: "3D render of GPU chip in the form of a poop emoji, digital art". The image was then widened using the Runway Infinite Image tool.