PROTACFold is a comprehensive toolkit for analyzing and predicting Proteolysis Targeting Chimera (PROTAC) structures using AlphaFold 3. PROTACs are heterobifunctional molecules that induce targeted protein degradation by forming ternary complexes between a protein of interest (POI) and an E3 ubiquitin ligase. This toolkit provides methods for accurate prediction, evaluation, and analysis of these complex structures to advance PROTAC drug discovery.
- Overview
- Features
- Installation
- Directory Structure
- Usage
- Key Metrics
- Predicted Structures
- Tools
- Data Sources
- License
- Acknowledgments
- Citation
- AlphaFold 3 Integration: Streamlined setup and usage of AlphaFold 3 for PROTAC ternary complex prediction
- Multiple Ligand Representation Methods: Support for both Chemical Component Dictionary (CCD) and SMILES formats
- Comprehensive Structure Analysis: Calculate RMSD, DockQ scores, pTM, ipTM, and TM-scores for evaluating model quality
- Molecular Property Analysis: Calculate and analyze physicochemical properties of PROTACs using RDKit
- Advanced Visualization: Interactive plots and statistical analysis of prediction metrics
- Benchmark Capabilities: Compare predictions with experimental structures and other computational methods
- Format Conversion: Tools for converting between different molecular structure formats (PDB, CIF)
- Python 3.11+
- CUDA-compatible GPU (for AlphaFold 3)
- Docker (recommended for AlphaFold 3 setup)
We use AlphaFold 3 inference code available from Google DeepMind.
Our detailed instructions for setting up AlphaFold 3 using Docker can be found in the installation guide. For reference, you can also consult the official AlphaFold 3 documentation, though our guide provides comprehensive step-by-step instructions tailored more for PROTACFold users.
- Clone the repository:
git clone https://github.com/NilsDunlop/PROTACFold.git
cd PROTACFold
- Install Python dependencies:
pip install -r requirements.txt
data/
: Contains datasets and analysis resultsaf3_input/
: Input files for AlphaFold 3 (SMILES and CCD formats)af3_results/
: Consolidated results from AlphaFold 3 predictionsplots/
: Generated visualizationshal_04732948/
: Data from Pereira et al., 2024 for comparison
utils/
: Utility scripts for structure analysis and property calculationnotebooks/
: Jupyter notebooks for analysis and visualizationdocs/
: Documentation including installation guides and images
Use AlphaFold 3 to predict the structure of PROTAC-mediated ternary complexes:
- Prepare your input JSON files in either CCD or SMILES format (see examples in
data/af3_input/
) - Run AlphaFold 3 using Docker (see installation guide)
- Analyze results using the provided utility scripts
# Calculate RMSD between predicted and reference structures
python utils/rmsd_calculator.py --pred path/to/prediction.pdb --ref path/to/reference.pdb
# Calculate DockQ score for protein-protein interface quality assessment
python utils/compute_dockq.py --pred path/to/prediction.pdb --ref path/to/reference.pdb
# Calculate molecular properties from SMILES
python utils/molecular_properties.py --input data/smiles_file.csv --output results.csv
# Compare prediction metrics across multiple models
python utils/compare_predictions.py --input_dir path/to/predictions --output results.csv
Explore the Jupyter notebooks for comprehensive analysis workflows:
jupyter notebook notebooks/af3_analysis.ipynb
The notebooks demonstrate:
- Comparative analysis of CCD vs. SMILES-based predictions
- Correlation between confidence metrics (pTM/ipTM) and structural quality
- Component-wise analysis (POI vs E3 ligase interfaces)
- Molecular property distribution of successful PROTACs
PROTACFold evaluates predictions using multiple metrics:
- DockQ Score: Quality measure for protein-protein docking interfaces
- RMSD: Root Mean Square Deviation between predicted and experimental structures
- pTM/ipTM: AlphaFold confidence metrics for overall and interface quality
- Molecular Descriptors: Physicochemical properties of PROTAC molecules
All predicted structures, as well as two replicas of a 300 ns MD simulation of complex 9B9W, are available on Zenodo. See below an example of the predicted structure of complex 7PI4, with the ground truth in grey and the AF3 prediction in gold.
- AlphaFold 3 - DeepMind's state-of-the-art protein structure prediction model
- DockQ - Quality measure for protein-protein docking models
This project integrates data from:
This project is licensed under the MIT License - see the LICENSE file for details.
- The AlphaFold team at Google DeepMind
- Developers of open-source tools used in this project (RDKit, DockQ)
- PyMOL for visualization
- Contributors to PROTAC databases and experimental data
If you use PROTACFold in your research, please cite the preprint: Enhancing PROTAC Ternary Complex Prediction with Ligand Information in AlphaFold 3