In this study, we propose a novel predictor for predicting DNase I hypersensitive sites (DHSs), which have proven to indicate specific functions on the chromatin structure. Inspired by the ideas in protein identification, we apply four methods: CKSNP, EIIP, PC, and Z-curve to convert the gene sequence with variable length into the equidimensional features. They focus on comprehensive properties of dinucleotide: residues surrounding information, average electron-ion interaction pseudopotentials, physicochemical properties, and global distribution, respectively. Then we make use of XGBoost, the boosting framework, taking care of both performance and speed. After the importance calculation and dimension reduction, we obtained our efficient predictor with 39-dimension and outstanding metrics. We examined the same pipeline on two datasets of DHS and non-DHS sequences, and both achieved excellent performance.
-
Notifications
You must be signed in to change notification settings - Fork 0
YanZheng-16/CEPZ_DHS
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published