A tool for RNA secondary structure prediction with multiple types of experimental probing data. It can use experimental pairing probabilities to restrain the partition function, and predict the structure with maximum restrained expected accuray based on a MEA algorithm, MaxExpect (Lu et al., 2009 RNA ). It is based on the RNAstructure (http://rna.urmc.rochester.edu/RNAstructure.html) package. It also provides preprocessing scripts for transforming the SHAPE, PARS and DMS-seq data into pairing probability according a posterior probabilistic model. Moreover, it also contains a utility for optimizing the parameters of RME by RME-Optimize.
For updates, please refer to https://github.com/lulab/RME
Yang Wu, Binbin Shi et al., (2015) Improved prediction of RNA secondary structure by integrating the free energy model with restraints derived from experimental probing data.
RME is free for non-commercial research. For commercial use, please contact the authors.
Please download the software from https://github.com/lulab/RME/releases
- Linux
- Perl (version >= 5.10.1)
- R (version >= 3.1.0)
- R packages: argparse mixtools reshape MASS evir
- Bioconductor package: preprocessCore
First of all, enter the directory of RME and type:
make
Then the executables will be placed in the bin
directory.
You can add the bin
directory to your PATH
variable for convinient use:
export PATH=$PATH:/path/to/RME/bin
Optionally, you can install RME to another location by typing:
make install PREFIX=/path/to/install
Example files for reference structures:
example/dat/structure/*.ct
All structure files are in CT format (http://rna.urmc.rochester.edu/Text/File_Formats.html). And the files are nominated by the RNA name (also appeared in the data files) and a .ct suffix. For test RNAs without reference structures, the 5th column in the CT file can be meaningless (e.g. set to be a consistent number). See example/dat/structure/README
for details.
Example files for RNA sequences
example/dat/sequence/*.fa
All data files are in FASTA format (http://rna.urmc.rochester.edu/Text/File_Formats.html). And the files are nominated by the RNA name (also appeared in the data files) and a .fa suffix. See example/dat/sequence/README
for details.
Example files for training sets:
example/dat/data/SHAPE.train.23SrRNA.data
example/dat/data/PARS.train.25SrRNA.data
example/dat/data/DMSseq.train.25SrRNA.data
Example files for test sets:
example/dat/data/SHAPE.test.data
example/dat/data/PARS.test.data
example/dat/data/DMSseq.test.data
For SHAPE data with one reactivity value for each nucleotide, training files contain 5 columns separated by a TAB. The first lines of the files specify the column name (given in parentheses).
Column | Description |
---|---|
1 | name of RNA (RNA) |
2 | index on the RNA, 1-based (Index) |
3 | reactivity on this base (Reactivity) |
4 | base information in capitals, ACGU (Base) |
5 | structure information for this base, 0 stands for single-standed bases, 1 strands for paired bases (Structure) |
For PARS and DMS-seq data with two read counts for each nucleotdie, training files contain 6 columns by a TAB. The first lines of the files specify the column name (given in parentheses).
Column | Description |
---|---|
1 | name of RNA (RNA) |
2 | index on the RNA, 1-based (Index) |
3 | read count 1 for this base, V1 for PARS data and control for DMS-seq data (V1 or Control) |
4 | read count 2 for this base, S1 for PARS data and vivo for DMS-seq data (S1 or Vivo) |
5 | base information in capitals, ACGU (Base) |
6 | structure information for this base, 0 stands for single-standed bases, 1 strands for paired bases (Structure) |
Files for test RNAs are similar with the training files except that the last column (structure information) is missing.
Note that bases without probing signals are not included in these files. See example/dat/data/README
for details.
The preprocessing scripts accepts 3 input parametes: a data file for training (prepared in step 1c), a data file for test (prepared in step 1c) and the directory for output files. Meanwhile, two options are required for specifying the data type (-d) and the directory of structure files (prepared in step 1a) (-s).
RME-Preprocess -d SHAPE -s example/dat/structure example/dat/data/SHAPE.train.23SrRNA.data example/dat/data/SHAPE.test.data example/processing-data
RME-Preprocess -d PARS -s example/dat/structure example/dat/data/PARS.train.25SrRNA.data example/dat/data/PARS.test.data example/processing-data
RME-Preprocess -d DMSseq -s example/dat/structure example/dat/data/DMSseq.train.25SrRNA.data example/dat/data/DMSseq.test.data example/processing-data
Outfiles contains pairing probabilities and structure information of all bases in the training RNAs (useful for optimizing RME parameters, step 4)
example/processing-data/SHAPE.for-optimize-parameter.txt
example/processing-data/PARS.for-optimize-parameter.txt
example/processing-data/DMSseq.for-optimize-parameter.txt
Outfiles contains pairing probabilities of all bases in the test RNAs (used for structure prediction, step 3)
example/processing-data/SHAPE.for-test.txt
example/processing-data/PARS.for-test.txt
example/processing-data/DMSseq.for-test.txt
We provide several methods for calculating the prior probability of the paired bases, including using a constant (0.535, the default setting), calculating from the training RNAs (train) and calculating from partition function for each RNA (partition). You can select one of them by specify the -r option. Note that when using the partition mode, you should also specify the directory of the sequence files (prepared in step 1b) by -q.
RME-Preprocess -r constant -d SHAPE -s example/dat/structure example/dat/data/SHAPE.train.23SrRNA.data example/dat/data/SHAPE.test.data example/processing-data
RME-Preprocess -r train -d PARS -s example/dat/structure example/dat/data/PARS.train.25SrRNA.data example/dat/data/PARS.test.data example/processing-data
RME-Preprocess -r partition -q example/dat/sequence -d DMSseq -s example/dat/structure example/dat/data/DMSseq.train.25SrRNA.data example/dat/data/DMSseq.test.data example/processing-data
Meanwhile, for SHAPE data, we provided three types of normalization methods, quantile normalization (Bolstad et al., 2003) (quantile, the default setting), 2%/8% rule normalization (Ouyang et al., 2013) (28rule) and no normalization (no). You can select one of them for SHAPE data by specify the -n option.
RME-Preprocess -n quantile -d SHAPE -s example/dat/structure example/dat/data/SHAPE.train.23SrRNA.data example/dat/data/SHAPE.test.data example/processing-data
RME-Preprocess -n 28rule -d SHAPE -s example/dat/structure example/dat/data/SHAPE.train.23SrRNA.data example/dat/data/SHAPE.test.data example/processing-data
RME-Preprocess -n no -d SHAPE -s example/dat/structure example/dat/data/SHAPE.train.23SrRNA.data example/dat/data/SHAPE.test.data example/processing-data
You can get more information about the usage of RME-Preprocess
by typing RME-Preprocess -h
.
The RME software accepts 2 input parameters, a data file containing pairing probability for each test RNA (prepared in Step 2b) and a directory for output files. Meanwhile, we recommended to use the -d option to specify the data type, then RME would use default parameter values for that data type. And a -p option specifying the cpu number is quite useful for running in parallel.
mkdir example/prediction/
cd example/prediction/
RME -d SHAPE -p 8 ../processing-data/SHAPE.for-test.txt SHAPE
RME -d PARS -p 8 ../processing-data/PARS.for-test.txt PARS
RME -d DMSseq -p 8 ../processing-data/DMSseq.for-test.txt DMSseq
cd ../..
Outfiles of the predicted structures in CT format
example/prediction/SHAPE/*.ct
example/prediction/PARS/*.ct
example/prediction/DMSseq/*.ct
RME allows users to set the parameters in other ways. For example, parameters can be recorded in a file (example: example/optimize-param/*.params.txt
) (see step 4b for instructions) and fed to RME by using the -i
option.
mkdir example/prediction/
cd example/prediction/
RME -i ../optimize-param/SHAPE.params.txt -p 8 ../processing-data/SHAPE.for-test.txt SHAPE
RME -i ../optimize-param/PARS.params.txt -p 8 ../processing-data/PARS.for-test.txt PARS
RME -i ../optimize-param/DMSseq.params.txt -p 8 ../processing-data/DMSseq.for-test.txt DMSseq
cd ../..
Moreover, the parameters can be set directly by using the -m
, --gamma1
and --gamma2
options. Users are free to adjust the relative contribution of structure probing data by adjusting this values.
RME combined two rounds of BPPM updating for improved structure prediction, called RME-pre and RME-post, respectively. We provided the -pre
and -post
options to enable the users to select either part for structure prediction. This allows the users to test the importance of integrating RME-pre
and RME-post
into RME
.
Moreover, RME-post introduced an w(i,j)
term to penalize spurious pairs that usually involving in shorter helices. We also provided the --no-helix-weight
option to enable the users to run RME without this term. This allows the users to test the importance of the term for RME-post
or RME
.
You can get more information about usage of RME
by typing RME -h
.
This step is optional. The above 3 sections are sufficient for using RME. This section is for advanced users who want to optimize the RME parameters for their own training data. In this case, try RME-Optimize
!
RME-Optimize
accepts two parameters: one is the data file containg pairing probability for the training RNAs (prepared in step 2b), and the other is the output file for recording the optimal parameters for your own data. You can use the --details
option to specify a file for recording the whole optimizing process. For optimizing in parallel, you can specify the -p
option to define the cpu number for running. Besides, you can adjust the grid for parameter search by using the --m-range
, --gamma1-range
and --gamma2-range
(providing the min, max and step for grid search seperated by a colon).
mkdir example/optimize-param/
cd example/optimize-param/
RME-Optimize ../processing-data/SHAPE.for-optimize-parameter.txt SHAPE.params.txt --m-range 0:1:0.05 --details SHAPE.details.txt -p 4
RME-Optimize ../processing-data/PARS.for-optimize-parameter.txt PARS.params.txt --m-range 0:1:0.02 --details PARS.details.txt -p 4
RME-Optimize ../processing-data/DMSseq.for-optimize-parameter.txt DMSseq.params.txt --m-range 0:1:0.02 --details DMSseq.details.txt -p 4
cd ../..
You can get more information about usage of RME-Optimize
by typing RME-Optimize -h
.
Output files containing the optimal parameters, which can be used in step 3c
example/optimize-param/SHAPE.params.txt
example/optimize-param/PARS.params.txt
example/optimize-param/DMSseq.params.txt
Output files recording the detail of the optimizing processes
example/optimize-param/SHAPE.details.txt
example/optimize-param/PARS.details.txt
example/optimize-param/DMSseq.details.txt
structure prediction files
example/prediction/*/*.ct
pairing probability files
example/processing-data/*/*.prob