Abstract
This chapter introduces a prediction approach that utilizes biophysically motivated intelligent voting model with a powerful randomized meta-learning technique through the use of amino acid (AA) sequence information only for the accurate and efficient proline cis-trans isomerization (CTI) prediction. The proposed model has been developed based on the random forest data modeling and evolutionary information. To accurately assess the predictive performance of each model, the authors adopted a cross-validation scheme for the model evaluation. Experimental results demonstrate that the proposed methods can achieve a test error better than the most widely used support vector machines (SVMs) models. It has also demonstrated that pure evolutionary information in the format of position-specific-scoring matrix (PSSM) scores as input works greatly in reducing the error rate during the model learning process, meaning that noise presented (i.e., predicted secondary information) in input data set may lead to significant degrading in the performance of the models.
Original language | British English |
---|---|
Title of host publication | Pattern Recognition in Computational Molecular Biology |
Subtitle of host publication | Techniques and Approaches |
Publisher | wiley |
Pages | 236-248 |
Number of pages | 13 |
ISBN (Electronic) | 9781119078845 |
ISBN (Print) | 9781118893685 |
DOIs | |
State | Published - 28 Dec 2015 |
Keywords
- Amino acid sequence
- Intelligent voting model
- Position-specific-scoring matrix
- Proline cis-trans isomerization prediction
- Random forest data modeling
- Randomized meta-learning technique