aeijpe
diff --git a/‎.gitignore
+8 b/‎.gitignore
+8
diff --git a/‎LICENSE.md
+175 b/‎LICENSE.md
+175
diff --git a/‎README.md
+95 b/‎README.md
+95
diff --git a/‎docs/fig1.jpg
1.12 MB b/‎docs/fig1.jpg
1.12 MB
diff --git a/‎docs/heatmap.png
1.94 MB b/‎docs/heatmap.png
1.94 MB
diff --git a/‎docs/joint_logo.png
286 KB b/‎docs/joint_logo.png
286 KB
diff --git a/‎docs/mmp.png
157 KB b/‎docs/mmp.png
157 KB
diff --git a/‎docs/mmp_logo.png
601 KB b/‎docs/mmp_logo.png
601 KB
diff --git a/‎environment.yml
+149 b/‎environment.yml
+149
diff --git a/‎src/__init__.py b/‎src/__init__.py
diff --git a/‎src/configs/H2T_default/config.json
+8 b/‎src/configs/H2T_default/config.json
+8
diff --git a/‎src/configs/OT_default/config.json
+21 b/‎src/configs/OT_default/config.json
+21
diff --git a/‎src/configs/PANTHER_default/config.json
+15 b/‎src/configs/PANTHER_default/config.json
+15
diff --git a/‎src/configs/ProtoCount_default/config.json
+8 b/‎src/configs/ProtoCount_default/config.json
+8
diff --git a/‎src/data_csvs/rna/hallmarks/BLCA/rna_clean.csv
+361 b/‎src/data_csvs/rna/hallmarks/BLCA/rna_clean.csv
+361
diff --git a/‎src/data_csvs/rna/hallmarks/BRCA/rna_clean.csv
+940 b/‎src/data_csvs/rna/hallmarks/BRCA/rna_clean.csv
+940
diff --git a/‎src/data_csvs/rna/hallmarks/COADREAD/rna_clean.csv
+321 b/‎src/data_csvs/rna/hallmarks/COADREAD/rna_clean.csv
+321
diff --git a/‎src/data_csvs/rna/hallmarks/KIRC/rna_clean.csv
+607 b/‎src/data_csvs/rna/hallmarks/KIRC/rna_clean.csv
+607
diff --git a/‎src/data_csvs/rna/hallmarks/LUAD/rna_clean.csv
+577 b/‎src/data_csvs/rna/hallmarks/LUAD/rna_clean.csv
+577
diff --git a/‎src/data_csvs/rna/hallmarks/STAD/rna_clean.csv
+337 b/‎src/data_csvs/rna/hallmarks/STAD/rna_clean.csv
+337
@@ -0,0 +1,8 @@
+**/__pycache__/
+**/embeddings/
+**/wandb/
+.ipynb_checkpoints
+src/results/
+.DS_Store
+.vscode
+
@@ -0,0 +1,95 @@
+# MMP
+
+
+  <b>Multimodal Prototyping for cancer survival prediction</b>, ICML 2024.
+	<br><em>Andrew H. Song, Richard J. Chen, Guillaume Jaume, Anurag Vaidya, Alexander S. Baras, Faisal Mahmood</em></br>
+
+<img src="docs/mmp_logo.png" width="250px" align="right" />
+
+[Paper](https://openreview.net/pdf?id=3MfvxH3Gia) | [Cite](#cite)
+
+**Abstract:** Multimodal survival methods combining gigapixel histology whole-slide images (WSIs) and
+transcriptomic profiles are particularly promising for patient prognostication and stratification.
+Current approaches involve tokenizing the WSIs into smaller patches (> 10k patches) and transcriptomics into gene groups, which are then integrated using a Transformer for predicting outcomes. However, this process generates many
+tokens, which leads to high memory requirements for computing attention and complicates post-hoc interpretability analyses. Instead, we hypothesize that we can: (1) effectively summarize the morphological content of a WSI by
+condensing its constituting tokens using morphological prototypes, achieving more than 300× compression; and (2) accurately characterize cellular functions by encoding the transcriptomic profile with biological pathway prototypes, all
+in an unsupervised fashion. 
+
+We introduce **M**ulti**M**odal **P**rototyping framework (**MMP**), where the resulting multimodal tokens are then processed by a fusion network, either with a Transformer or an optimal transport cross-alignment, which now operates with a small and fixed number of tokens without approximations. Extensive evaluation shows that our framework outperforms state-of-the-art methods with much less computation while unlocking new interpretability analyses.
+
+**MMP** (a.k.a. **M**ulti**M**odal **P**anther) is a multimodal extension of our companion work **PANTHER** (*CVPR 2024*, [paper](https://openaccess.thecvf.com/content/CVPR2024/html/Song_Morphological_Prototyping_for_Unsupervised_Slide_Representation_Learning_in_Computational_Pathology_CVPR_2024_paper.html), [code](https://github.com/mahmoodlab/PANTHER)), so we encourage you to check it out!
+
+<img src="docs/fig1.jpg" width="1400px" align="center" />
+
+## Updates
+- 07/02/2024: The first version of MMP codebase is now live!
+
+## Installation
+Please run the following command to create MMP conda environment.
+```shell
+conda env create -f environment.yml
+```
+
+## MMP Walkthrough
+MMP can largely be broken down into four steps:
+
+**Step 1**: Construct histology prototypes (across the specific cancer cohort) and aggregate tissue patch tokens to the each prototype for each patient.\
+**Step 2**: Construct pathway prototypes and aggregate transcriptomic expression tokens to each prototype for each patient.\
+**Step 3**: Fuse aggegated histology and pathway embeddings and perform downstream task.\
+**Step 4**: Visualization.
+
+### Step 1. Morphology prototype construction
+For instructions on **Step 1**, please refer to the instructions in [PANTHER](https://github.com/mahmoodlab/PANTHER).
+
+### Step 2. Pathway prototype construction
+First, we need to download the pancancer-normalized TCGA transcriptomics expression data from Xena database.\
+Next, using **hallmark oncogene sets** (located in `src/data_csvs/rna/metadata/hallmarks_signatures.csv`), we filter the genes that are subset of hallmark pathways. Note that MMP can be extended to other pathways as well.
+Detailed instructions can be found in the [notebook](src/preprocess_pancancer_TCGA_normalized_RNA.ipynb).
+
+### Step 3. Multimodal Fusion
+We can run a downstream task as follows (The data splits for TCGA cohorts used in our study can be found in `src/splits/survival`)
+```shell
+cd src
+./scripts/survival/brca_surv.sh 0 mmp
+``` 
+where [mmp](src/scripts/survival/mmp.sh) is a bash script that contains argument examples.
+
+
+
+MMP currently supports 
+- **Prototype-based multimodal fusion**: Two possible approaches. `model_mm_type=coattn` (Transformer-based full-attention) or `model_mm_type=coattn_mot` (OT-based cross-attention). 
+  - For histology aggregation approach, you can specify PANTHER or OT (`model_histo_type=PANTHER,default` or `model_histo_type=OT,default`)
+- **SurvPath**: Adapted from [SurvPath](https://github.com/mahmoodlab/SurvPath). Specify `model_mm_type=survpath` and `model_histo_type=mil,default`.
+  - Example script available in [survpath](src/scripts/survival/survpath.sh).
+- **Unimodal prototype baselines**: Use either `model_mm_type=histo` (histology prototypes only) or `model_mm_type=gene` (pathway prototypes only).
+
+
+
+### Step 4. Visualization
+
+The instructions for visualizations of prototype assignment map and histology => pathway & pathway => histology interactions are explained in the [notebook](src/visualization/mmp_visualization.ipynb). Currently only `model_mm_type=coattn` is supported.
+
+<img src='docs/heatmap.png' width="1400px" align="center"/>
+
+## MMP future directions
+As emphasized in the paper, multimodal survival analysis is a challenging clinical task that has seen significant interest in the biomedical,  computer vision, and machine learning communities. Though multimodal integration generally outperforms unimodal baselines, we note that the development of better unimodal baselines may (or may not) close the performance gap for certain cancer types, which is an area of further exploration.
+
+## Acknowledgements
+If you find our work useful in your research or if you use parts of this code please cite our paper:
+
+```bibtext
+@inproceedings{song2024multimodal,
+  title={Multimodal Prototyping for cancer survival prediction},
+  author={Song, Andrew H and Chen, Richard J and Jaume, Guillaume and Vaidya, Anurag Jayant and Baras, Alexander and Mahmood, Faisal},
+  booktitle={Forty-first International Conference on Machine Learning},
+  year={2024}
+}
+```
+
+The code for **MMP** was adapted and inspired by the fantastic works of [PANTHER](https://openaccess.thecvf.com/content/CVPR2024/html/Song_Morphological_Prototyping_for_Unsupervised_Slide_Representation_Learning_in_Computational_Pathology_CVPR_2024_paper.html), [SurvPath](https://github.com/mahmoodlab/SurvPath) and [CLAM](https://github.com/mahmoodlab/CLAM). Boilerplate code for setting up supervised MIL benchmarks was developed by Ming Y. Lu and Tong Ding.
+
+## Issues 
+- Please open new threads or report issues directly (for urgent blockers) to `[email protected]`.
+- Immediate response to minor issues may not be available.
+
+<img src=docs/joint_logo.png> 
@@ -0,0 +1,149 @@
+name: mmp
+channels:
+  - pytorch
+  - nvidia
+  - conda-forge
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - asttokens=2.4.1=pyhd8ed1ab_0
+  - blas=1.0=openblas
+  - bzip2=1.0.8=h5eee18b_5
+  - ca-certificates=2024.2.2=hbcca054_0
+  - comm=0.2.2=pyhd8ed1ab_0
+  - cudatoolkit=11.4.1=h8ab8bb3_9
+  - debugpy=1.6.7=py310h6a678d5_0
+  - decorator=5.1.1=pyhd8ed1ab_0
+  - entrypoints=0.4=pyhd8ed1ab_0
+  - exceptiongroup=1.2.0=pyhd8ed1ab_2
+  - executing=2.0.1=pyhd8ed1ab_0
+  - faiss-gpu=1.7.4=py3.10_hc0239a3_0_cuda11.4
+  - ipykernel=6.29.3=pyhd33586a_0
+  - ipython=8.22.2=pyh707e725_0
+  - jedi=0.19.1=pyhd8ed1ab_0
+  - jupyter_client=7.3.4=pyhd8ed1ab_0
+  - jupyter_core=5.7.2=py310hff52083_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - libfaiss=1.7.4=h13c3c6d_0_cuda11.4
+  - libffi=3.4.4=h6a678d5_0
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgfortran-ng=11.2.0=h00389a5_1
+  - libgfortran5=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libopenblas=0.3.21=h043d6bf_0
+  - libsodium=1.0.18=h36c2ea0_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libuuid=1.41.5=h5eee18b_0
+  - matplotlib-inline=0.1.7=pyhd8ed1ab_0
+  - ncurses=6.4=h6a678d5_0
+  - nest-asyncio=1.6.0=pyhd8ed1ab_0
+  - numpy=1.26.4=py310heeff2f4_0
+  - numpy-base=1.26.4=py310h8a23956_0
+  - openssl=3.0.13=h7f8727e_0
+  - packaging=24.0=pyhd8ed1ab_0
+  - parso=0.8.4=pyhd8ed1ab_0
+  - pexpect=4.9.0=pyhd8ed1ab_0
+  - pickleshare=0.7.5=py_1003
+  - pip=23.3.1=py310h06a4308_0
+  - platformdirs=4.2.0=pyhd8ed1ab_0
+  - prompt-toolkit=3.0.42=pyha770c72_0
+  - psutil=5.9.0=py310h5eee18b_0
+  - ptyprocess=0.7.0=pyhd3deb0d_0
+  - pure_eval=0.2.2=pyhd8ed1ab_0
+  - pygments=2.17.2=pyhd8ed1ab_0
+  - python=3.10.14=h955ad1f_0
+  - python-dateutil=2.9.0=pyhd8ed1ab_0
+  - python_abi=3.10=2_cp310
+  - pyzmq=25.1.2=py310h6a678d5_0
+  - readline=8.2=h5eee18b_0
+  - setuptools=68.2.2=py310h06a4308_0
+  - six=1.16.0=pyh6c4a22f_0
+  - sqlite=3.41.2=h5eee18b_0
+  - stack_data=0.6.2=pyhd8ed1ab_0
+  - tk=8.6.12=h1ccaba5_0
+  - tornado=6.1=py310h5764c6d_3
+  - traitlets=5.14.3=pyhd8ed1ab_0
+  - typing_extensions=4.11.0=pyha770c72_0
+  - wcwidth=0.2.13=pyhd8ed1ab_0
+  - wheel=0.41.2=py310h06a4308_0
+  - xz=5.4.6=h5eee18b_0
+  - zeromq=4.3.5=h6a678d5_0
+  - zlib=1.2.13=h5eee18b_0
+  - pip:
+    - absl-py==2.1.0
+    - appdirs==1.4.4
+    - certifi==2024.2.2
+    - charset-normalizer==3.3.2
+    - click==8.1.7
+    - contourpy==1.2.1
+    - cycler==0.12.1
+    - docker-pycreds==0.4.0
+    - ecos==2.0.13
+    - einops==0.7.0
+    - filelock==3.13.4
+    - fonttools==4.51.0
+    - fsspec==2024.3.1
+    - gitdb==4.0.11
+    - gitpython==3.1.43
+    - grpcio==1.62.2
+    - h5py==3.11.0
+    - huggingface-hub==0.22.2
+    - idna==3.7
+    - intel-openmp==2024.1.0
+    - jinja2==3.1.3
+    - joblib==1.4.0
+    - kiwisolver==1.4.5
+    - markdown==3.6
+    - markupsafe==2.1.5
+    - matplotlib==3.8.4
+    - mkl==2024.1.0
+    - mpmath==1.3.0
+    - networkx==3.3
+    - numexpr==2.10.0
+    - nvidia-cublas-cu12==12.1.3.1
+    - nvidia-cuda-cupti-cu12==12.1.105
+    - nvidia-cuda-nvrtc-cu12==12.1.105
+    - nvidia-cuda-runtime-cu12==12.1.105
+    - nvidia-cudnn-cu12==8.9.2.26
+    - nvidia-cufft-cu12==11.0.2.54
+    - nvidia-curand-cu12==10.3.2.106
+    - nvidia-cusolver-cu12==11.4.5.107
+    - nvidia-cusparse-cu12==12.1.0.106
+    - nvidia-nccl-cu12==2.19.3
+    - nvidia-nvjitlink-cu12==12.4.127
+    - nvidia-nvtx-cu12==12.1.105
+    - nystrom-attention==0.0.12
+    - osqp==0.6.5
+    - pandas==2.2.2
+    - pillow==10.3.0
+    - protobuf==4.25.3
+    - pyparsing==3.1.2
+    - pytz==2024.1
+    - pyyaml==6.0.1
+    - qdldl==0.1.7.post2
+    - regex==2024.4.16
+    - requests==2.31.0
+    - safetensors==0.4.3
+    - scikit-learn==1.3.2
+    - scikit-survival==0.22.2
+    - scipy==1.11.4
+    - seaborn==0.13.2
+    - sentry-sdk==1.45.0
+    - setproctitle==1.3.3
+    - smmap==5.0.1
+    - sympy==1.12
+    - tbb==2021.12.0
+    - tensorboard==2.16.2
+    - tensorboard-data-server==0.7.2
+    - threadpoolctl==3.4.0
+    - tokenizers==0.19.1
+    - torch==2.2.2
+    - torchvision==0.17.2
+    - tqdm==4.66.2
+    - transformers==4.40.0
+    - triton==2.2.0
+    - tzdata==2024.1
+    - urllib3==2.2.1
+    - wandb==0.16.6
+    - werkzeug==3.0.2
@@ -0,0 +1,8 @@
+{
+  "in_dim": 768,
+  "n_classes": 2,
+  "out_size": 8,
+  "load_proto": false,
+  "proto_path": ".",
+  "fix_proto": false
+}
@@ -0,0 +1,21 @@
+{
+  "in_dim": 768,
+  "n_classes": 2,
+  "n_filters": 1024,
+  "len_motifs": 1,
+  "subsamplings": 1,
+  "kernel_args": 0.4,
+  "weight_decay": 0.0001,
+  "embed_ratio": 16,
+  "ot_eps": 0.1,
+  "heads": 1,
+  "out_size": 4,
+  "out_type": "param_cat",
+  "max_iter": 100,
+  "distance": "euclidean",
+  "fit_bias": false,
+  "alternating": false,
+  "load_proto": false,
+  "proto_path": ".",
+  "fix_proto": true
+}
@@ -0,0 +1,15 @@
+{
+  "in_dim": 768,
+  "n_classes": 2,
+  "heads": 1,
+  "em_iter": 1,
+  "tau": 0.001,
+  "ot_eps": 0.1,
+  "n_fc_layers": 0,
+  "dropout": 0.25,
+  "out_type": "param_cat",
+  "out_size": 8,
+  "load_proto": false,
+  "proto_path": ".",
+  "fix_proto": false
+}
@@ -0,0 +1,8 @@
+{
+  "in_dim": 768,
+  "n_classes": 2,
+  "out_size": 8,
+  "load_proto": true,
+  "proto_path": ".",
+  "fix_proto": false
+}