Return to search

Detecting biomedical relations using distant supervision

This work concerns the detection of relationships between key information in biomedical publications, such as treatments for diseases or side-effects of drugs. Given a sentence containing some medical concepts the goal is to determine their relationship to each other. Supervised machine learning methods are a very popular way to address this problem and often provide reliable results. Those methods require manually labelled examples to extract characteristics of particular relationships in order to detect similar information in unlabelled data. However, manually labelled data is not always available and its generation is time consuming and expensive. The main objective of this thesis is the exploration of distant supervision, a method which generates those labelled examples automatically using prior knowledge to detect relationships between key facts. First, relation extraction using a limited amount of training data is explored to detect adverse-drug effects in natural language. Then, work focuses on automatically labelling data using a large biomedical knowledge base, the Unified Medical Language System (UMLS). The effectiveness of a popular evaluation method that does not require manually labelled data is examined in more detail. The main goal is the investigation of whether UMLS is suitable to be used to label data automatically so as to detect similar information in natural language. Finally, a method to reduce falsely labelled instances in the automatically generated data is presented and found to improve the detection of relationships.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:695993
Date January 2015
CreatorsRoller, Roland
ContributorsMark, Stevenson
PublisherUniversity of Sheffield
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://etheses.whiterose.ac.uk/13892/

Page generated in 0.0021 seconds