Skip to content

Commit

Permalink
updates for methods chapter
Browse files Browse the repository at this point in the history
  • Loading branch information
rschwess committed Dec 15, 2021
1 parent f92cb67 commit db503c2
Show file tree
Hide file tree
Showing 8 changed files with 47 additions and 38 deletions.
8 changes: 6 additions & 2 deletions formatted_data_links/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,18 @@
### Links

* convolutional filter weights for for transfer learning obtained from training a [deepHaem](https://github.com/rschwess/deepHaem) CNN
* [human](http://userweb.molbiol.ox.ac.uk/datashare/rschwess/deepC/data_links/saved_conv_weights_dhw_5layer_1k_pool.npz) trained on 932 chromatinfeatures
* [human](http://userweb.molbiol.ox.ac.uk/datashare/rschwess/deepC/data_links/saved_conv_weights_human_deepc_arch.npy.npz) trained on 932 chromatinfeatures
* [mouse](http://userweb.molbiol.ox.ac.uk/datashare/rschwess/deepC/data_links/saved_conv_weights_mouse_deepc_arch.npy.npz) trained on 1022 chromatin features

* example HiC [skeleton](http://userweb.molbiol.ox.ac.uk/datashare/rschwess/deepC/data_links/example_skeleton_gm12878_5kb_chr17.bed) chr17 5kb GM12878 primary

* example GM12878 [HiC sparse matrix](http://userweb.molbiol.ox.ac.uk/datashare/rschwess/deepC/data_links/gm12878_primary_chr17_5kb.contacts.KRnorm.matrix.gz) KRnorm (Rao et al.)

* [formatted data](http://userweb.molbiol.ox.ac.uk/datashare/rschwess/deepC/data_links/data_GM12878_5kb_regression.txt.tar.gz) ready for deepC training (also see `./data_links`)
* formatted data Hi-C skelton data ready for deepC training
* [GM12878 at 5kb resolution](http://userweb.molbiol.ox.ac.uk/datashare/rschwess/deepC/data_links/data_GM12878_5kb_regression.txt.tar.gz)
* [K562 at 5kb resolution](http://userweb.molbiol.ox.ac.uk/datashare/rschwess/deepC/data_links/data_K562_5kb_regression.txt.tar.gz)
* [IMR90 at 5kb resolution](http://userweb.molbiol.ox.ac.uk/datashare/rschwess/deepC/data_links/data_IMR90_5kb_regression.txt.gz)


* [hg19 chr17](http://userweb.molbiol.ox.ac.uk/datashare/rschwess/deepC/data_links/hg19_chr17_fasta_for_test.tar.gz) fasta and index for tutorial

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -553,14 +553,14 @@ if(opt$plot.tracks){
print(paste('Combining', num.sub.plots, 'sub plots ...'))

prio.list <- list()
if(opt$plot.hic){ prio.list <- c(prio.list, list(p.hic + upper_overlay_theme)) }
if(opt$plot.skeleton){ prio.list <- c(prio.list, list(p.skel + upper_overlay_theme)) }
if(opt$plot.deepc.ref){ prio.list <- c(prio.list, list(p.ref + upper_overlay_theme)) }
if(opt$plot.deepc.var){ prio.list <- c(prio.list, list(p.var + upper_overlay_theme)) }
if(opt$plot.deepc.diff){ prio.list <- c(prio.list, list(p.diff + upper_overlay_theme)) }
if(opt$plot.tracks){ prio.list <- c(prio.list, list(p.tracks + upper_overlay_theme)) }
# set lower theme for bottom plot
prio.list[[length(prio.list)]] <- prio.list[[length(prio.list)]] + lower_overlay_theme
if(opt$plot.hic){ prio.list <- c(prio.list, list(p.hic + lower_overlay_theme)) }
if(opt$plot.skeleton){ prio.list <- c(prio.list, list(p.skel + lower_overlay_theme)) }
if(opt$plot.deepc.ref){ prio.list <- c(prio.list, list(p.ref + lower_overlay_theme)) }
if(opt$plot.deepc.var){ prio.list <- c(prio.list, list(p.var + lower_overlay_theme)) }
if(opt$plot.deepc.diff){ prio.list <- c(prio.list, list(p.diff + lower_overlay_theme)) }
if(opt$plot.tracks){ prio.list <- c(prio.list, list(p.tracks + lower_overlay_theme)) }
## set lower theme for bottom plot
##prio.list[[length(prio.list)]] <- prio.list[[length(prio.list)]] + lower_overlay_theme

p.combined <- plot_grid(plotlist=prio.list,
nrow = num.sub.plots,
Expand Down
1 change: 1 addition & 0 deletions models/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ recognize the model this way.
* [GM12878_primary](http://datashare.molbiol.ox.ac.uk/public/rschwess/deepC/models/model_deepCregr_5kb_GM12878_primary.tar.gz)
* [GM12878_combined](http://datashare.molbiol.ox.ac.uk/public/rschwess/deepC/models/model_deepCregr_5kb_GM12878_combined.tar.gz)
* [K562](http://datashare.molbiol.ox.ac.uk/public/rschwess/deepC/models/model_deepCregr_5kb_K562.tar.gz)
* [IMR90](http://datashare.molbiol.ox.ac.uk/public/rschwess/deepC/models/model_deepCregr_5kb_IMR90.tar.gz)
* mouse [mES](http://datashare.molbiol.ox.ac.uk/public/rschwess/deepC/models/model_deepCregr_5kb_mouse_ES.tar.gz)

* 10 kb resolution
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -525,8 +525,8 @@ def run_training():

train_chrom_d, train_chroms, train_position, train_regr_bin, test_chrom_d, test_chroms, test_position, test_regr_bin, valid_chrom_d, valid_chroms, valid_position, valid_regr_bin = read_train_file(FLAGS.data_file, NUM_CLASSES, test_chromosomes, validation_chromosomes, FLAGS.store_dtype)

print(test_chroms)
print(valid_chroms)
# print(test_chroms)
# print(valid_chroms)

# # test a input
# print('Test chromosomes')
Expand Down
22 changes: 11 additions & 11 deletions tutorials/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,20 +85,20 @@ files e.g. via `cat coords_and_hic_skeleton_5kb_chr*_IMR90.bed >training_set_IMR
Example command:

```
Rscript ./wrapper_preprocess_hic_data.R \
--hic.matrix=chr20_5kb.contacts.KRnorm.matrix \
Rscript ./deepC/helper_for_preprocessing_and_analysis/wrapper_preprocess_hic_data.R \
--hic.matrix=chr17_5kb.GM12878.KRnorm.matrix\
--chromosome.sizes=hg19_chrom_sizes.txt \
--sample=IMR90 \
--sample=GM128878 \
--bin.size=5000 \
--window.size=1005000 \
--chrom=chr20 \
--helper=../../repositories/deepC/helper_for_preprocessing_and_analysis \
--helper=./deepC/helper_for_preprocessing_and_analysis \
--plot.hic \
--plot.skeleton \
--plot.start=1e+06 \
--plot.end=7000000 \
--plot.height=8 \
--plot.width=10
--plot.start=2e+06 \
--plot.end=2000000 \
--plot.height=6 \
--plot.width=8
```

Use `--help` flag for detailed parameter explanations.
Expand Down Expand Up @@ -158,7 +158,7 @@ Set plot titles, track names and track colours as needed. And link to the helper

Example command:
```
Rscript wrapper_plot_deepc_predictions.R --sample=imr90_test \
Rscript wrapper_plot_deepc_predictions.R --sample=gm12878_test \
--out.dir='.' \
--bin.size 5000 \
--window.size 1005000 \
Expand All @@ -173,8 +173,8 @@ Rscript wrapper_plot_deepc_predictions.R --sample=imr90_test \
--plot.deepc.ref \
--plot.deepc.var \
--fill.deepc.var \
--hic.preprocessed=coords_and_hic_skeleton_5kb_chr17_IMR90_notrans.bed \
--skeleton.input=coords_and_hic_skeleton_5kb_chr17_IMR90.bed \
--hic.preprocessed=coords_and_hic_skeleton_5kb_chr17_GM12878_no_transform.bed \
--skeleton.input=coords_and_hic_skeleton_5kb_chr17_GM12878.bed \
--deepc.ref.input=test_predict_out/class_predictions_predict_provided_1_chr17_71000000_71999999.txt \
--deepc.var.input=test_variant_out/class_predictions_predict_variant_provided_1_chr17_71706322_71706671.txt \
--plot.tracks \
Expand Down
12 changes: 7 additions & 5 deletions tutorials/example_script_deepc_predict.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash

# # explanation
# python ../tensorflow1_version/run_deploy_shape_deepCregr.py --input example_region_short.bed \ # input deepC variant bed-like file
# python ./deepC/tensorflow2.1plus_compatibility_version/run_deploy_shape_deepCregr.py --input example_region_short.bed \ # input deepC variant bed-like file
# --out_dir ./test_predict_out \ #output directory
# --name_tag predict \ # name tag to add to ouput files
# --model ./model_deepCregr_5kb_GM12878_primary/model \ # trained deepC model downloaded and extracted
Expand All @@ -14,11 +14,12 @@
# --run_on gpu # specify to run on gpu or cpu (cpu takes significantly longer)

# actually run
python ../tensorflow1_version/run_deploy_shape_deepCregr.py --input example_region_short.bed \
python ./deepC/tensorflow2.1plus_compatibility_version/run_deploy_shape_deepCregr.py \
--input example_region_short.bed \
--out_dir ./test_predict_out \
--name_tag predict \
--model ./model_deepCregr_5kb_GM12878_primary/model \
--genome ./hg19_chr17_fasta_for_test/hg19_chr17.fa \
--genome ./hg19.fa \
--use_softmasked=False \
--bp_context 1005000 \
--add_window 500000 \
Expand All @@ -27,11 +28,12 @@ python ../tensorflow1_version/run_deploy_shape_deepCregr.py --input example_regi
--run_on gpu

# Run in terminal
python ../tensorflow1_version/run_deploy_shape_deepCregr.py --input example_variant.bed \
python ./deepC/tensorflow2.1plus_compatibility_version/run_deploy_shape_deepCregr.py \
--input example_variant.bed \
--out_dir ./test_variant_out \
--name_tag predict_variant \
--model ./model_deepCregr_5kb_GM12878_primary/model \
--genome ./hg19_chr17_fasta_for_test/hg19_chr17.fa \
--genome ./hg19.fa \
--bp_context 1005000 \
--add_window 500000 \
--num_classes 201 \
Expand Down
15 changes: 8 additions & 7 deletions tutorials/example_script_deepc_train.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/bin/bash

# GLOBAL RUN OPTIONS ==========================================================
SCRIPT_PATH="/path/to/deepC/current_version"
DATA_FILE="./data_for_training.txt"
SCRIPT_PATH="./deepC/tensorflow2.1plus_compatibility_version"
DATA_FILE="./minimal_training_set_example_IMR90.txt"

# Select Test and Validation Chromosomes
# Test chromosomes will be checked after each epoch
Expand All @@ -12,7 +12,7 @@ test_chromosomes='chr12,chr13'
validation_chromosomes='chr16,chr17'

# Settings ===================================================================
report_every='20' # how often to report training loss every X steps
report_every='10' # how often to report training loss every X steps
num_classes='201' # number of classes (output vector entries), 201 for 5kb models
# 101 for 10 kb models
bp_context='1005000' # bp context processed (1 Mb + 1x bin_size)
Expand Down Expand Up @@ -42,12 +42,12 @@ max_pool_scheme='4,5,5,5,2,1' # max pooling widths
dilation_scheme='2,4,8,16,32,64,128,256,1' # dilation rates
dilation_units='100' # dilation units/filters throughout
dilation_width='3'
dilation_residual=True # if to use residual connections in the dil layers
dilation_residual='True' # if to use residual connections in the dil layers

# Transfer learning settings
seed_weights=True # use seeding /transfer learning at all
seed_scheme='1,1,1,1,1,0' # specify which layers to seed (1: seed, 0: not seed)
seed_file='./saved_conv_weights_dhw_5layer_1k_pool.npz' #trained filters phase I download from gitHub link
seed_file='./saved_conv_weights_human_deepc_arch.npy.npz' #trained filters phase I download from gitHub link

# Other
shuffle=True
Expand All @@ -59,6 +59,8 @@ use_softmasked=False # specify if to use soft masked bases from the fasta file
# and not block the remaining
GPU=0

train_dir='./minimal_imr90_training'

# Run ==========================================================================
python ${SCRIPT_PATH}/run_training_deepCregr.py \
--data_file ${DATA_FILE} \
Expand All @@ -83,7 +85,6 @@ python ${SCRIPT_PATH}/run_training_deepCregr.py \
--dilation_units ${dilation_units} \
--dilation_width ${dilation_width} \
--dilation_residual=${dilation_residual} \
--dilation_residual_dense=${dilation_residual_dense} \
--epsilon ${epsilon} \
--seed_weights=${seed_weights} \
--seed_scheme ${seed_scheme} \
Expand All @@ -96,4 +97,4 @@ python ${SCRIPT_PATH}/run_training_deepCregr.py \

# To continue training from a previous checkpoint use the flags:
# --model "./my_run/best_checkpoint-10000" \
# --reload_model 'True'
# --reload_model=True
7 changes: 4 additions & 3 deletions tutorials/tutorial_train_a_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ dilation_residual=True # if to use residual connections in the dil layers
# Transfer learning settings
seed_weights=True # use seeding /transfer learning at all
seed_scheme='1,1,1,1,1,0' # specify which layers to seed (1: seed, 0: not seed)
seed_file='./saved_conv_weights_dhw_5layer_1k_pool.npz' #trained filters phase I download from gitHub link
seed_file='./saved_conv_weights_human_deepc_arch.npy.npz' #trained filters phase I download from gitHub link

# Other
shuffle=True
Expand All @@ -80,6 +80,8 @@ use_softmasked=False # specify if to use soft masked bases from the fasta file
# and not block the remaining
GPU=0

train_dir='./minimal_training_example_run'

# Run ==========================================================================
python ${SCRIPT_PATH}/run_training_deepCregr.py \
--data_file ${DATA_FILE} \
Expand All @@ -104,7 +106,6 @@ python ${SCRIPT_PATH}/run_training_deepCregr.py \
--dilation_units ${dilation_units} \
--dilation_width ${dilation_width} \
--dilation_residual=${dilation_residual} \
--dilation_residual_dense=${dilation_residual_dense} \
--epsilon ${epsilon} \
--seed_weights=${seed_weights} \
--seed_scheme ${seed_scheme} \
Expand All @@ -119,7 +120,7 @@ python ${SCRIPT_PATH}/run_training_deepCregr.py \
To continue training from a previous checkpoint use the flags:
```bash
--model "./my_run/best_checkpoint-10000" \
--reload_model 'True'
--reload_model=True

```

Expand Down

0 comments on commit db503c2

Please sign in to comment.