Skip to content

A toolkit developed to predict and analyze PROTAC-mediated protein degradation complexes using AlphaFold3.

License

Notifications You must be signed in to change notification settings

NilsDunlop/PROTACFold

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PROTACFold

License: MIT Python 3.8+ AlphaFold 3 DOI
PROTACFold Workflow

Overview

PROTACFold is a comprehensive toolkit for analyzing and predicting Proteolysis Targeting Chimera (PROTAC) structures using AlphaFold 3. PROTACs are heterobifunctional molecules that induce targeted protein degradation by forming ternary complexes between a protein of interest (POI) and an E3 ubiquitin ligase. This toolkit provides methods for accurate prediction, evaluation, and analysis of these complex structures to advance PROTAC drug discovery.

Table of Contents

Features

  • AlphaFold 3 Integration: Streamlined setup and usage of AlphaFold 3 for PROTAC ternary complex prediction
  • Multiple Ligand Representation Methods: Support for both Chemical Component Dictionary (CCD) and SMILES formats
  • Comprehensive Structure Analysis: Calculate RMSD, DockQ scores, pTM, ipTM, and TM-scores for evaluating model quality
  • Molecular Property Analysis: Calculate and analyze physicochemical properties of PROTACs using RDKit
  • Advanced Visualization: Interactive plots and statistical analysis of prediction metrics
  • Benchmark Capabilities: Compare predictions with experimental structures and other computational methods
  • Format Conversion: Tools for converting between different molecular structure formats (PDB, CIF)

Installation

Prerequisites

  • Python 3.11+
  • CUDA-compatible GPU (for AlphaFold 3)
  • Docker (recommended for AlphaFold 3 setup)

Using Docker (Recommended)

We use AlphaFold 3 inference code available from Google DeepMind.

Our detailed instructions for setting up AlphaFold 3 using Docker can be found in the installation guide. For reference, you can also consult the official AlphaFold 3 documentation, though our guide provides comprehensive step-by-step instructions tailored more for PROTACFold users.

Manual Installation

  1. Clone the repository:
git clone https://github.com/NilsDunlop/PROTACFold.git
cd PROTACFold
  1. Install Python dependencies:
pip install -r requirements.txt

Directory Structure

  • data/: Contains datasets and analysis results
    • af3_input/: Input files for AlphaFold 3 (SMILES and CCD formats)
    • af3_results/: Consolidated results from AlphaFold 3 predictions
    • plots/: Generated visualizations
    • hal_04732948/: Data from Pereira et al., 2024 for comparison
  • utils/: Utility scripts for structure analysis and property calculation
  • notebooks/: Jupyter notebooks for analysis and visualization
  • docs/: Documentation including installation guides and images

Usage

PROTAC Structure Prediction

Use AlphaFold 3 to predict the structure of PROTAC-mediated ternary complexes:

  1. Prepare your input JSON files in either CCD or SMILES format (see examples in data/af3_input/)
  2. Run AlphaFold 3 using Docker (see installation guide)
  3. Analyze results using the provided utility scripts

Analyzing Prediction Results

# Calculate RMSD between predicted and reference structures
python utils/rmsd_calculator.py --pred path/to/prediction.pdb --ref path/to/reference.pdb

# Calculate DockQ score for protein-protein interface quality assessment
python utils/compute_dockq.py --pred path/to/prediction.pdb --ref path/to/reference.pdb

# Calculate molecular properties from SMILES
python utils/molecular_properties.py --input data/smiles_file.csv --output results.csv

# Compare prediction metrics across multiple models
python utils/compare_predictions.py --input_dir path/to/predictions --output results.csv

Visualization and Analysis

Explore the Jupyter notebooks for comprehensive analysis workflows:

jupyter notebook notebooks/af3_analysis.ipynb

The notebooks demonstrate:

  • Comparative analysis of CCD vs. SMILES-based predictions
  • Correlation between confidence metrics (pTM/ipTM) and structural quality
  • Component-wise analysis (POI vs E3 ligase interfaces)
  • Molecular property distribution of successful PROTACs

Key Metrics

PROTACFold evaluates predictions using multiple metrics:

  • DockQ Score: Quality measure for protein-protein docking interfaces
  • RMSD: Root Mean Square Deviation between predicted and experimental structures
  • pTM/ipTM: AlphaFold confidence metrics for overall and interface quality
  • Molecular Descriptors: Physicochemical properties of PROTAC molecules

Predicted Structures

All predicted structures, as well as two replicas of a 300 ns MD simulation of complex 9B9W, are available on Zenodo. See below an example of the predicted structure of complex 7PI4, with the ground truth in grey and the AF3 prediction in gold.

PDB ID 7PI4

Tools

Protein Structure Prediction

  • AlphaFold 3 - DeepMind's state-of-the-art protein structure prediction model

Structure Analysis and Comparison

  • DockQ - Quality measure for protein-protein docking models

Visualization and Chemoinformatics

  • PyMOL - Molecular visualization system
  • RDKit - Open-source chemoinformatics toolkit

Data Sources

This project integrates data from:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • The AlphaFold team at Google DeepMind
  • Developers of open-source tools used in this project (RDKit, DockQ)
  • PyMOL for visualization
  • Contributors to PROTAC databases and experimental data

Citation

If you use PROTACFold in your research, please cite the preprint: Enhancing PROTAC Ternary Complex Prediction with Ligand Information in AlphaFold 3

About

A toolkit developed to predict and analyze PROTAC-mediated protein degradation complexes using AlphaFold3.

Topics

Resources

License

Stars

Watchers

Forks