TransGEM

Transformer-based model from gene expression to molecules.

TransGEM is a phenotype-based de novo drug design model, which can generate new bioactive molecules, independent from disease target information.

Setup

Install the environment

Create a conda environment:

conda env create -f environment.yaml

Activate the environment:

conda activate TransGEM

Download data

The data related to this study can be downloaded here.

Usage

TransGEM training

in subLINCS dataset

python train.py --data_path ./data/  --dataset subLINCS --gene_encoder tenfold_binary --gpu cuda:0 --epochs 200

in HCC515 dataset

python train.py --data_path ./data/  --dataset HCC515 --gene_encoder tenfold_binary --gpu cuda:0 --epochs 200

TransGEM fine-tuning

python ft_train.py --data_path ./data/  --dataset HCC515 --gene_encoder tenfold_binary --gpu cuda:0

Trained TransGEM testing

python test.py --data_path ./data/  --dataset subLINCS --gene_encoder tenfold_binary --gpu cuda:0

Fine-tuned TransGEM testing

python ft_test.py --data_path ./data/  --dataset HCC515 --gene_encoder tenfold_binary --gpu cuda:0

TransGEM application

for Prostate cancer

python app.py --data_path ./data/  --dataset PC --cell_line PC3 --gene_encoder tenfold_binary --gpu cuda:0 --seq_num 1000

for Non-small cell lung cancer

python app.py --data_path ./data/  --dataset nsclc --cell_line A549 --gene_encoder tenfold_binary --gpu cuda:0 --seq_num 1000

Model options

usage:

python train.py --help

optional arguments:

-h, --help            show this help message and exit
--data_path           the directory where the data was inputted
--out_path            the directory where the trianing results was output
--dataset             the dataset used by the model (subLINCS/HCC515/PC/nsclc)
--gene_encoder        encoding form of gene expression (value/one_hot/binary/tenfold_binary)
--gpu                 CUDA device ids
--hidden_dim          hidden size of transformer decoder
--ff_dim              dimension number of the feed-forward layer
--PE_dropout          dropout of position coding
--TF_dropout          dropout of transformer layer
--TF_N                number of transformer decoder layer
--TF_H                number of transformer decoder head
--TF_act              activation function of transformer layer
--batch_size          number of batch_size
--epochs              number of epochs
--lr                  learning rate of adam
--cell_line           cell line names of disease
--pad_idx             id of pad symbol
--start_idx           id of start symbol
--end_idx             id of end symbol
--max_len             maximum length of generated molecule
--vocab_size          vocab size
--k                   number of molecules generated in a single beam search
--alpha               the weight of the length and score of molecules generated by bundle search
--seq_num             number of molecules ultimately retained

Model parameters of 4 encoding forms

Encoding form	hidden_dim	ff_dim	TF_N	TF_H
value	64	2048	6	8
one_hot	64	512	6	8
binary	64	512	6	8
tenfold_binary	64	512	6	8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransGEM

Setup

Install the environment

Download data

Usage

TransGEM training

TransGEM fine-tuning

Trained TransGEM testing

Fine-tuned TransGEM testing

TransGEM application

Model options

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
TransGEM		TransGEM
assets		assets
ckpt		ckpt
README.md		README.md
app.py		app.py
environment.yaml		environment.yaml
ft_test.py		ft_test.py
ft_train.py		ft_train.py
rt_test.py		rt_test.py
test.py		test.py
train.py		train.py

yupliu/TransGEM

Folders and files

Latest commit

History

Repository files navigation

TransGEM

Setup

Install the environment

Download data

Usage

TransGEM training

TransGEM fine-tuning

Trained TransGEM testing

Fine-tuned TransGEM testing

TransGEM application

Model options

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages