TY - JOUR
T1 - Inferring the functions of proteins from the interrelationships between functional categories
AU - Taha, Kamal
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2018
Y1 - 2018
N2 - This study proposes a new method to determine the functions of an unannotated protein. The proteins and amino acid residues mentioned in biomedical texts associated with an unannotated protein p can be considered as characteristics terms for p, which are highly predictive of the potential functions of p. Similarly, proteins and amino acid residues mentioned in biomedical texts associated with proteins annotated with a functional category f can be considered as characteristics terms of f. We introduce in this paper an information extraction system called IFP-IFC that predicts the functions of an unannotated protein p by representing p and each functional category f by a vector of weights. Each weight reflects the degree of association between a characteristic term and p (or a characteristic term and f). First, IFP-IFC constructs a network, whose nodes represent the different functional categories, and its edges the interrelationships between the nodes. Then, it determines the functions of p by employing random walks with restarts on the mentioned network. The walker is the vector of p. Finally, p is assigned to the functional categories of the nodes in the network that are visited most by the walker. We evaluated the quality of IFP-IFC by comparing it experimentally with two other systems. Results showed marked improvement.
AB - This study proposes a new method to determine the functions of an unannotated protein. The proteins and amino acid residues mentioned in biomedical texts associated with an unannotated protein p can be considered as characteristics terms for p, which are highly predictive of the potential functions of p. Similarly, proteins and amino acid residues mentioned in biomedical texts associated with proteins annotated with a functional category f can be considered as characteristics terms of f. We introduce in this paper an information extraction system called IFP-IFC that predicts the functions of an unannotated protein p by representing p and each functional category f by a vector of weights. Each weight reflects the degree of association between a characteristic term and p (or a characteristic term and f). First, IFP-IFC constructs a network, whose nodes represent the different functional categories, and its edges the interrelationships between the nodes. Then, it determines the functions of p by employing random walks with restarts on the mentioned network. The walker is the vector of p. Finally, p is assigned to the functional categories of the nodes in the network that are visited most by the walker. We evaluated the quality of IFP-IFC by comparing it experimentally with two other systems. Results showed marked improvement.
KW - Biomedical text
KW - Information extraction
KW - Protein annotation
KW - Protein function prediction
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85041860521&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2016.2615608
DO - 10.1109/TCBB.2016.2615608
M3 - Article
C2 - 27723600
AN - SCOPUS:85041860521
SN - 1545-5963
VL - 15
SP - 157
EP - 167
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 1
M1 - 7585111
ER -