Skip to content

YanZheng-16/CEPZ_DHS

Repository files navigation

CEPZ_DHS

In this study, we propose a novel predictor for predicting DNase I hypersensitive sites (DHSs), which have proven to indicate specific functions on the chromatin structure. Inspired by the ideas in protein identification, we apply four methods: CKSNP, EIIP, PC, and Z-curve to convert the gene sequence with variable length into the equidimensional features. They focus on comprehensive properties of dinucleotide: residues surrounding information, average electron-ion interaction pseudopotentials, physicochemical properties, and global distribution, respectively. Then we make use of XGBoost, the boosting framework, taking care of both performance and speed. After the importance calculation and dimension reduction, we obtained our efficient predictor with 39-dimension and outstanding metrics. We examined the same pipeline on two datasets of DHS and non-DHS sequences, and both achieved excellent performance.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published