This code originates from MEAN, with our modifications detailed in the change log.
This document describes how to run the three main experiments using the provided scripts:
- Evaluation on SAbDab
- Antigen-binding CDR-H3 Redesign
- Affinity Optimization
Before running any experiments:
- Follow the setup instructions in the original README.md:
- Install dependencies using
bash scripts/setup.sh
- Download structure data from SAbDab
- Place the structure data in
all_structures/imgt
- Install dependencies using
This experiment involves training and evaluating models on different CDR combinations.
bash scripts/prepare_data_kfold.sh summaries/sabdab_summary.tsv all_structures/imgt
# Train all CDR combinations for each CDRH type
GPU=0 bash run_all_cdrs.sh
This will:
- Process CDRH1-3 directories
- Train models for all CDR combinations (1, 2, 3, 1-2, 1-3, 2-3, 1-2-3)
- Skip combinations where checkpoints already exist
- Save checkpoints in
summaries/cdrh{i}/CDR{combination}/ckpt/
# Evaluate all trained models
GPU=0 bash run_all_cdrs_eval.sh
This will:
- Evaluate each trained model
- Generate results for each CDR combination
- Save results in the corresponding output directories
bash scripts/prepare_data_rabd.sh summaries/rabd_summary.jsonl all_structures/imgt summaries/sabdab_all.json
# Train and evaluate all CDR combinations
GPU=0 bash run_all_cdrs_rabd.sh
This will:
- Train models for each CDR combination
- Target CDRH3 for redesign
- Save results in
summaries/cdrh3/CDR{combination}/
bash scripts/prepare_data_skempi.sh summaries/skempi_v2_summary.jsonl all_structures/imgt summaries/sabdab_all.json
# Run pretraining, ITA training, and evaluation for all combinations
GPU=0 bash run_all_cdrs_opt.sh
This will:
- Run pretraining if needed
- Perform ITA training
- Generate and evaluate optimized sequences
- Save results in
summaries/CDR{combination}/
- All scripts support the
GPU
environment variable to specify which GPU to use - Set
GPU=-1
to run on CPU - Results and checkpoints are organized by:
- CDR type (cdrh1/2/3)
- CDR combination (1, 2, 3, 1-2, etc.)
- Model type and mode
- Each script will skip combinations where checkpoints already exist
- Use
MODE=100
for heavy chain only,MODE=111
for full context (default)
summaries/
├── cdrh1/
│ ├── CDR1/
│ ├── CDR1_2/
│ └── ...
├── cdrh2/
│ ├── CDR1/
│ ├── CDR1_2/
│ └── ...
└── cdrh3/
├── CDR1/
├── CDR1_2/
└── ...
Each CDR combination directory contains:
ckpt/
- Model checkpoints- Results and evaluation logs