Skip to content

Commit

Permalink
Create README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
wengong-jin authored Apr 10, 2020
1 parent 9ae10ef commit d6d308e
Showing 1 changed file with 30 additions and 0 deletions.
30 changes: 30 additions & 0 deletions generation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Molecule Generation

This fold contains the molecule generation script. The polymer generation experiment in the paper can be reproduced through the following steps:

## Motif Extraction
Extract substructure vocabulary from a given set of molecules:
```
python get_vocab.py --min_frequency 100 --ncpu 8 < data/polymers/all.txt > vocab.txt
```
Please replace `data/polymers/all.txt` with your molecules data file.
The `--min_frequency` means to discard any large motifs with lower than 100 occurances in the dataset. The discarded motifs will be decomposed into simple rings and bonds. Change `--ncpu` to specify the number of jobs for multiprocessing.

## Data Preprocessing
Preprocess the dataset using the vocabulary extracted from the first step:
```
python preprocess.py --train data/polymers/train.txt --vocab data/polymers/inter_vocab.txt --ncpu 8
mkdir train_processed
mv tensor* train_processed/
```

## Training
```
mkdir ckpt/
python gnn_train.py --train train_processed/ --vocab data/polymers/inter_vocab.txt --save_dir ckpt/
```

## Sample Molecules
```
python sample.py --vocab ../data/polymers/inter_vocab.txt --model ckpt/inter-h250z24b0.1/model.19
```

0 comments on commit d6d308e

Please sign in to comment.