PPIFold

Automated pipeline for massive PPI prediction and figure creation.

PPIFold is a tool for analyzing Protein-Protein Interactions from AlphaPulldown, with automated pre- and post-processing. It is used to generate PPI predictions for multiple systems without wasting time on generating initial files and sorting results. It predicts the best homo-oligomer for a protein and the best interface for interacting with specific proteins. This allows for the prediction of massive multimeric complexes with numerous PPIs.

PPIFold

Requirements

AlphaFold data base
Conda
SignalP5 (optional)
Singularity and Singularity Image

Installations

Installation of AlphaFold data base :

sudo apt install aria2
git clone https://github.com/deepmind/alphafold.git
cd alphafold
scripts/download_all_data.sh /<Directory></Directory> > download.log 2> download_all.log

SignalP5 installation (optional) :

https://services.healthtech.dtu.dk/services/SignalP-5.0/9-Downloads.php

tar -xvzf signalp-5.0b.Linux.tar.gz
cd signalp-5.0b/
cp bin/signalp /usr/local/bin
sudo cp -r lib/* /usr/local/lib

Note

If you do not want to use SignalP, set --use_signalP to False.

Singularity installation :

https://docs.sylabs.io/guides/3.0/user-guide/installation.html#install-on-linux

Download Singularity image (score generation) :

https://github.com/KosinskiLab/AlphaPulldown?tab=readme-ov-file#03-installation-for-the-downstream-analysis-tools

singularity build alpha-analysis_jax_0.4.sif alpha_analysis_jax0.4.def

PPIFold installation :

conda create -n PPIFold -c omnia -c bioconda -c conda-forge python==3.11 openmm==8.0 pdbfixer==1.9 kalign2 networkx hhsuite hmmer
conda activate PPIFold
pip install PPIFold
pip install -U "jax[cuda12]"

Pipeline

Initial Files

You need two initial files :

test.txt
This file needs to be a ".txt" file.
The initial file can be set up using UniProt IDs, FASTA sequences, or both.
UniProt IDs need to be on the same line, separated by commas.

Ex :
UniprotID1,UniprotID2,UniprotID3...

The FASTA sequence needs to start with ">", followed by the protein name.

Ex :
>Name
MFKRSGSLSLALMSSFCSSSLATPLSSAEFDHVARKCAPSVATSTLAAIAK
VESRFDPLAIHDNTTGETLHWQDHTQATQVVRHRLDARHSLDVGLMQINSR
NFSMLGLTPDGALKACPSLSAAANMLKSRYAGGETIDEKQIALRRAISAYN
TGNFIRGFANGYVRKVETAAQSLVPALIEPPQDDHKALKSEDTWDVWGSYQ
RRSQEDGVGGSIAPQPPDQDNGKSADDNQVLFDLY

conf.txt
The conf.txt file needs to contains all path.

Path_Uniprot_ID : Path and name of the initial file.
Path_AlphaFold_Data : Path to the AlphaFold database (default on ./alphadata).
Path_Singularity_Image : Path and name of the singularity image.
Path_Pickle_Feature : Path to your feature folder (default on ./feature).

Arguments

To use PPIFold, simply run the PPIFold command in the folder containing conf.txt and test.txt.

PPIFold --use_mmseq Boolean --make_multimers Boolean --max_aa Integer --use_signalP Boolean --org String

Optional arguments

--use_mmseq Enable or disable MMseq for feature generation ,set to True by default
--make_multimers This argument is set to True by default. If you only want to generate features and MSA, you need to set it to False
--max_aa The maximum length of a model that can be generated by your GPU (depending on VRAM), set to 2000 by default (24 GB)
--use_signalP Use SignalP if your proteins can be periplasmic, set to True by default
--org If you use SignalP, you can select the organism (gram-, gram+, arch or euk), set to Gram- by default

Tip

Save all your pickle files in the same directory.

Results

This pipeline have a cutoff on PAE (10), iQ-score (35) and hiQ-score (50).

Figures

MSA depth
All aligned homologous sequences for O50333.

The y-axis represents the number of homologous sequences, the x-axis represents the positions in the sequence. The color represents the sequence identity.

Residue interaction table
Table of distance between two atoms of O50331 and O5333.

Chains represent different proteins. Two residues in contact are specified, along with their distances. Distances are calculated from the center of mass of the residues. The distance threshold is 10 angtroms, and the PAE is 5.

Distogram
Distance map between each atom of O50331 and O5333.

The x and y axes represent interacting proteins. Pixels inside the black squares represent intra-protein residue distances, while pixels outside represent inter-protein residue distances. The color represents the distance in angstroms: blue indicates a short distance between two residues, and yellow indicates a large distance.

Interaction network
Protein-protein interaction network with iQ-score and homo-oligomers (hiQ-score) predictions.

This network represents interactions between R388 proteins. Each interaction is represented by a line connecting two proteins, colored according to the corresponding iQ-score. A loop on a protein indicates the best homo-oligomers with the highest hiQ-score.

iQ-Score heatmap
Heatmap of iQ-score between each PPI.

Color represents the iQ-score, with a better iQ-score indicated by a lighter color. The black boxes represent either poor PAE, homo-oligomers, or overly large total protein length.

Protein interface
Amino acid sequence with different interfaces used in interacations.
Each interface with a protein is represented by all contact residues, which are colored. The last interaction represents the interface used in homo-oligomerization. If two proteins use the same interface, they will have the same colors.

Generated Files

OOM_int.txt
A text file containing interactions that are too large, based on --max_aa.

Shallow_MSA.txt
A text file containing proteins with an MSA depth lower than 100 sequences.

Warning

Results for proteins with fewer than 100 sequences in the MSA are not accurate for validating or invalidating predicted PPIs.

table.cyt
A file for manually generating a network in Cytoscape.

_summary.signalp5
A file who resume signal peptides for all proteins.

.pdb file
Model structure, with residues colored according to their interaction interface.

Example

After completing test.txt and conf.txt, you need to complete the conf.txt file with all your paths.
Activate your Conda environment.
You must run the command in the directory.
Command :

PPIFold

Name		Name	Last commit message	Last commit date
Latest commit History 773 Commits
PPIFold		PPIFold
example		example
AF_FLAGS		AF_FLAGS
LICENSE		LICENSE
README.md		README.md
conf.txt		conf.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPIFold

Requirements

Installations

Pipeline

Initial Files

Arguments

Results

Figures

Generated Files

Example

About

Releases

Packages

Contributors 2

Languages

License

Qrouger/PPIFold

Folders and files

Latest commit

History

Repository files navigation

PPIFold

Requirements

Installations

Pipeline

Initial Files

Arguments

Results

Figures

Generated Files

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages