A common biological question when dealing with ChIP-seq data is to identify transcription factor binding sites. This problem is called motif discovery. Motif discovery algorithms produce a large number of motifs (e.g., MEME) and they don’t evaluate motifs systematically. Therefore, several methods (e.g., PMID: 24335146) have been developed to select high-confidence motifs.
The motif selection problem seeks to identify a minimal set of regulatory motifs that characterize sequences of interest (e.g. ChIP-seq binding regions). The output motifs represent putative binding sites for primary transcription factors (ChIP-ed factors) and co-factors.
Please see the documentation for more details.
The following example has been tested in Ubuntu x64.
Step 0. Download the GLPK library
https://www.gnu.org/software/glpk/
Step 1. Compile the C code
.. code:: bash
cd RILP
g++ -std=c++11 RIPL_simplex2_07_31.cc -o msdc -L/PATH/glpk/lib -I/PATH/glpk/include -lglpk
Step 2. Run the RILP method
.. code:: bash
./msdc foreground.list background.list motif_mapping.fimo
Output
The output is a list of selected motifs (i.e., subset of the input motifs) defined in motif_mapping.fimo
.
This folder contains all raw inputs (i.e., binary matrices) and the Fisher exact test p-values for every motif.
Please use the detailed installation steps below
https://www.anaconda.com/distribution/
conda create -n set_cover python=2.7
conda activate set_cover
or source activate set_cover
- Pandas:
conda install -c anaconda pandas
- Joblib:
conda install -c anaconda joblib
- scikit-learn:
conda install -c anaconda scikit-learn
cd Evams
Note that Evams imports conf.py
from the same directory. You can add Evams
to your env path by doing export PATH=$PATH:/path_to_Evams_folder/
Evams2 -h
Evams2 -jid test -confFile example.conf
In example.conf
, please change input to individual files in the Data folder.
Configuration files (.conf
) used in this paper are provided in the conf_paper folder. In the same folder, running log files including CPU time and memory used are also provided.
-jid
is job ID. An output folder will be created using job ID, in which all results will be put.