Extracting various classes of data from biological text using the concept of existence dependency

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

One of the key goals of biological natural language processing (NLP) is the automatic information extraction from biomedical publications. Most current constituency and dependency parsers overlook the semantic relationships between the constituents comprising a sentence and may not be well suited for capturing complex long-distance dependences. We propose in this paperahybrid constituency-dependency parser forbiological NLP information extraction called EDC-EDC. EDC-EDC aims at enhancing the state of the art of biological text mining by applying novel linguistic computational techniques that overcome the limitations of current constituency and dependency parsers outlined earlier, as follows: 1) it determines the semantic relationship between each pair of constituents in a sentence using novel semantic rules; and 2) it applies a semantic relationship extraction model that extracts information from different structural forms of constituents in sentences. EDC-EDC can be used to extract different types of data from biological texts for purposes such as protein function prediction, genetic network construction, and protein-protein interaction detection.We evaluated the quality of EDC-EDC by comparing it experimentally with six systems. Results showed marked improvement.

Original languageBritish English
Article number7014223
Pages (from-to)1918-1928
Number of pages11
JournalIEEE Journal of Biomedical and Health Informatics
Volume19
Issue number6
DOIs
StatePublished - 1 Nov 2015

Keywords

  • Biological natural language processing (NLP)
  • Biomedical literature
  • Dependency parsers
  • Information extraction
  • Text mining

Fingerprint

Dive into the research topics of 'Extracting various classes of data from biological text using the concept of existence dependency'. Together they form a unique fingerprint.

Cite this