Return to search

Integrating text-mining approaches to identify entities and extract events from the biomedical literature

The amount of biomedical literature available is increasing at an exponential rate and is becoming increasingly difficult to navigate. Text-mining methods can potentially mitigate this problem, through the systematic and large-scale extraction of structured information from inherently unstructured biomedical text. This thesis reports the development of four text-mining systems that, by building on each other, has enabled the extraction of information about a large number of published statements in the biomedical literature. The first system, LINNAEUS, enables highly accurate detection ('recognition') and identification ('normalization') of species names in biomedical articles. Building on LINNAEUS, we implemented a range of improvements in the GNAT system, enabling high-throughput gene/protein detection and identification. Using gene/protein identification from GNAT, we developed the Gene Expression Text Miner (GETM), which extracts information about gene expression statements. Finally, building on GETM as a pilot project, we constructed the BioContext integrated event extraction system, which was used to extract information about over 11 million distinct biomolecular processes in 10.9 million abstracts and 230,000 full-text articles. The ability to detect negated statements in the BioContext system enables the preliminary analysis of potential contradictions in the biomedical literature. All tools (LINNAEUS, GNAT, GETM, and BioContext) are available under open-source software licenses, and LINNAEUS and GNAT are available as online web-services. All extracted data (36 million BioContext statements, 720,000 GETM statements, 72,000 contradictions, 37 million mentions of species names, 80 million mentions of gene names, and 57 million mentions of anatomical location names) is available for bulk download. In addition, the data extracted by GETM and BioContext is also available to biologists through easy-to-use search interfaces.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:553476
Date January 2012
CreatorsGerner, Lars Martin Anders
ContributorsBergman, Casey; Nenadic, Goran
PublisherUniversity of Manchester
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttps://www.research.manchester.ac.uk/portal/en/theses/integrating-textmining-approaches-to-identify-entities-and-extract-events-from-the-biomedical-literature(44f8e79a-3782-4687-85c7-eee1fda5cb76).html

Page generated in 0.002 seconds