This repo contains source codes for a EC prediction tool namely ECRECer, which is an implementation of our paper: 「Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework」
Detailed information about the framework can be found in our paper
1. Zhenkun Shi, Qianqian Yuan, Ruoyu Wang, Hoaran Li, Xiaoping Liao*, Hongwu Ma* (2022). ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core Learning. arXiv preprint arXiv:2202.03632.
2. Zhenkun Shi, Rui Deng, Qianqian Yuan, Zhitao Mao, Ruoyu Wang, Haoran Li, Xiaoping Liao*, Hongwu Ma* (2023). Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework. Research.
For simply use our tools to predict EC numbers, please visit ECRECEer websiet at https://ecrecer.biodesign.ac.cn
We provide docker image and singularity image for users to run ECRECer locally.
Docker image:
# 1. pull ecrecer docker image
docker pull kingstdio/ecrecer
# 2. run ecrecer docker image
# gpu version:
sudo docker run -it -d --gpus all --name ecrecer -v ~/:/home/ kingstdio/ecrecer #~/ is your fasta file folder
# cpu version:
sudo docker run -it -d --name ecrecer -v ~/:/home/ kingstdio/ecrecer #~/ is your fasta file folder
# 3. run ECRECer prediction
sudo docker exec ecrecer python /ecrecer/production.py -i /home/input_fasta_file.fasta -o /home/output_tsv_file.tsv -mode h -topk 10
#-topk: top k predicted EC numbers
#-mode p: prediction mode, predict EC numbers only
#-mode r: recommendation mode, recommend EC numbers with predicted probabilities, the higher the better
#-mode h: hybird mode, use prediction, recommendation and sequence alignment methods
Singularity image:
# 1. pull ecrecer singularity image
# Image ~= 11GB, may take a while to download
wget -c https://tibd-public-datasets.s3.us-east-1.amazonaws.com/ecrecer/sifimages/ecrecer.sif
# 2. run ecrecer singularity image
# gpu version:
singularity run --nv ecrecer.sif python /ecrecer/production.py -i input_fasta_file.fasta -o output_tsv_file.tsv -mode h -topk 10
# cpu version:
singularity run ecrecer.sif python /ecrecer/production.py -i input_fasta_file.fasta -o output_tsv_file.tsv -mode h -topk 10
#-topk: top k predicted EC numbers
#-mode p: prediction mode, predict EC numbers only
#-mode r: recommendation mode, recommend EC numbers with predicted probabilities, the higher the better
#-mode h: hybird mode, use prediction, recommendation and sequence alignment methods
- Python >= 3.6
- Sklearn
- Xgboost
- conda
- jupyter lab
- ...
Create conda env use env.yaml
git clone git@github.com:kingstdio/ECRECer.git
conda env create -f env.yaml
Download and prepare the data set use the.
Or directly download the preprocessed data from aws public dataset and put it in the rootfolder/data/datasets/
python benchmark_train.py
python benchmark_test.py
python benchmark_evaluation.py
python production.py -i input_fasta_file -o output_tsv_file -mode [p|r] -topk 5
If you find these methods valuable for your research, we kindly request that you reference the pertinent paper:
@article{shi2023enzyme,
title={Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework},
author={Shi, Zhenkun and Deng, Rui and Yuan, Qianqian and Mao, Zhitao and Wang, Ruoyu and Li, Haoran and Liao, Xiaoping and Ma, Hongwu},
journal={Research},
year={2023},
publisher={AAAS}
}