Abstract
Enhancers are DNA regions that are responsible for controlling the expression of genes. Enhancers are usually found upstream or downstream of a gene, or even inside a gene's intron region, but are normally located at a distant location from the genes they control. By integrating experimental and computational approaches, it is possible to uncover enhancers within DNA sequences, which possess regulatory properties. Experimental techniques such as ChIP-seq and ATAC-seq can identify genomic regions that are associated with transcription factors or accessible to regulatory proteins. On the other hand, computational techniques can predict enhancers based on sequence features and epigenetic modifications. In our study, we have developed a multi-classifier stacked ensemble (MCSE-enhancer) model that can accurately identify enhancers. We utilized feature descriptors from various physiochemical properties as input for our six baseline classifiers and built a stacked classifier, which outperformed previous enhancer classification techniques in terms of accuracy, specificity, sensitivity, and Mathew's correlation coefficient. Our model achieved an accuracy of 81.5%, representing a 2–3% improvement over existing models. © 2023 Elsevier Ltd
Original language | British English |
---|---|
Journal | J. Mol. Biol. |
Volume | 435 |
Issue number | 23 |
DOIs | |
State | Published - 2023 |
Keywords
- bioinformatics
- computational biology
- DNA sequences
- enhancers
- meta classification
- Computational Biology
- DNA
- Enhancer Elements, Genetic
- Genomics
- Transcription Factors
- transcription factor
- Article
- catboost classifier
- classifier
- controlled study
- enhancer region
- ensemble learning
- extra tree classifier
- extreme gradient boosting
- false negative result
- false positive result
- feature extraction
- gene identification
- intermethod comparison
- light gradient boosting
- machine learning
- measurement accuracy
- multi classifier stacked ensemble
- multilayer perceptron
- physical chemistry
- random forest
- sensitivity and specificity
- genetics
- genomics
- procedures