- Python v3.7.6 (Anaconda installation recommended)
- PyTorch v1.4.0
- scikit-learn v0.23.2
- Biopython v1.78
- RDKit 2020.09.1
- seaborn v0.11.0
- Matplotlib v3.3.2
- pandas v1.1.3
- SciPy v1.5.2
- NumPy v1.20.2
For users who want to use the deep learning model for prediction, please run these command lines at the terminal:
- (1). Download the DLKcat package
git clone https://github.com/SysBioChalmers/DLKcat
- (2). Download required Python package
pip install numpy requests torch torchvision rdkit-pypi sklearn
- (3). Change directory to
DeeplearningApproach
under the DLKcat package
cd DLKcat/DeeplearningApproach
- (4). Unzip the
input.zip
file under theData
directory
unzip Data/input.zip
- (5). Change directory to the
Code/example
under the DLKcat package
cd Code/example
- (6). Now you can use the trained deep learning model for your prediction via one command line. Here, one input file is needed to be prepared, please check the
Code/example/input.tsv
. For the input file, protein sequence should be provided, and users also need to provide substrate (compound) name or substrate (compound) SMILES, but substrate SMILES is recommended. If it is difficult to find the substrate SMILES, please provide the substrate name and leave the substrate SMILES blank
python prediction_for_input.py input.tsv
- Then the prediction results (
output.tsv
file) will be output under theCode/example
directory
For running analysis and regenerating all figures:
- To regenerate all of the figures, unzip the
input.zip
file inData/input.zip
and run the corresponding figure functions in theCode/analysis
directory
- To regenerate all of the figures, unzip the
- For data collection and cleaning from the BRENDA database:
- run the
brenda_retrieve.py
to get access to the web client and retrieve dataset from the BRENDA database - run the
brenda_download.py
to read all data in the retrieved files and output all EC files - run the
findMaxKvalues_AllOrgs.py
to read all EC files and find the max value for each substrate for the chosen microorganism - run the
brenda_kcat_preprocess.py
to generate Kcat data from all EC files into one file - run the
brenda_kcat_clean.py
to clean the dataset from the BRENDA database - run the
brenda_sequence.py
to get the protein sequence from BRENDA database by one example - run the
brenda_sequence_organism.py
to obtain the protein sequences for all data based on EC number and organism and output into one file for further use - run the
brenda_get_smiles.py
to get canonical SMILES just by substrate name for the BRENDA data using PubChem API
- run the
- For data collection and cleaning from the SABIO-RK database:
- run the
sabio_download.py
to get access to the web client and download the dataset from the SABIO-RK database - run the
sabio_kcat_unisubstrate.py
to read all data from the downloaded files and output into one file for further use - run the
sabio_kcat_clean_unisubstrate.py
to clean the data by unifying all entries - run the
sabio_kcat_clean.py
to used to clean the data for the SABIO-RK data - run the
sabio_kcat_unisubstrate_mutant.py
to annotate the enzyme type information, i.e., wildtype or mutant - run the
uniprot_sequence.py
to to obtain protein sequence by uniprot protein id - run the
sabio_get_smiles.py
to get canonical SMILES just by substrate name for the SABIO-RK data and output one file for use
- run the
- For data combination based on the obtained dataset from the BRENDA and the SABIO-RK database:
- run the
combination_brenda_sabio.py
to preliminarily combine the Kcat data from the BRENDA and the SABIO-RK database - run the
combination_database_data.py
to generate all the combined data into one file for deep learning and further analysis
- run the
- For construction and evaluation of the deep learning model:
- To see how the deep learning pipeline is constructed, check the corresponding functions in the
Code/model
directory
- To see how the deep learning pipeline is constructed, check the corresponding functions in the
- For prediction of 343 yeast/fungi species via the deep learning model:
- To obtain prediction results for 343 yeast/fungi species based on the trained deep learning model, unzip the
input.zip
file inData/input.zip
and run the corresponding function in theCode/prediction
directory
- To obtain prediction results for 343 yeast/fungi species based on the trained deep learning model, unzip the