Abstract
This study proposes a new method to determine the functions of an unannotated protein. The proteins and amino acid residues mentioned in biomedical texts associated with an unannotated protein p can be considered as characteristics terms for p, which are highly predictive of the potential functions of p. Similarly, proteins and amino acid residues mentioned in biomedical texts associated with proteins annotated with a functional category f can be considered as characteristics terms of f. We introduce in this paper an information extraction system called IFP-IFC that predicts the functions of an unannotated protein p by representing p and each functional category f by a vector of weights. Each weight reflects the degree of association between a characteristic term and p (or a characteristic term and f). First, IFP-IFC constructs a network, whose nodes represent the different functional categories, and its edges the interrelationships between the nodes. Then, it determines the functions of p by employing random walks with restarts on the mentioned network. The walker is the vector of p. Finally, p is assigned to the functional categories of the nodes in the network that are visited most by the walker. We evaluated the quality of IFP-IFC by comparing it experimentally with two other systems. Results showed marked improvement.
| Original language | British English |
|---|---|
| Article number | 7585111 |
| Pages (from-to) | 157-167 |
| Number of pages | 11 |
| Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
| Volume | 15 |
| Issue number | 1 |
| DOIs | |
| State | Published - 2018 |
Keywords
- Biomedical text
- Information extraction
- Protein annotation
- Protein function prediction
- Text mining