The automatic extraction of temporal information from written texts is pivotal for many Natural Language Processing applications such as question answering, text summarisation and information retrieval. However, Temporal Information Extraction (TIE) is a challenging task because of the amount of types of expressions (durations, frequencies, times, dates) and their high morphological variability and ambiguity. As far as the approaches are concerned, the most common among the existing ones is rule-based, while data-driven ones are under-explored. This thesis introduces a novel domain-independent data-driven TIE strategy. The identification strategy is based on machine learning sequence labelling classifiers on features selected through an extensive exploration. Results are further optimised using an a posteriori label-adjustment pipeline. The normalisation strategy is rule-based and builds on a pre-existing system. The methodology has been applied to both specific (clinical) and generic domain, and has been officially benchmarked at the i2b2/2012 and TempEval-3 challenges, ranking respectively 3rd and 1st. The results prove the TIE task to be more challenging in the clinical domain (overall accuracy 63%) rather than in the general domain (overall accuracy 69%).Finally, this thesis also presents two applications of TIE. One of them introduces the concept of temporal footprint of a Wikipedia article, and uses it to mine the life span of persons. In the other case, TIE techniques are used to improve pre-existing information retrieval systems by filtering out temporally irrelevant results.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:728126 |
Date | January 2016 |
Creators | Filannino, Michele |
Publisher | University of Manchester |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | https://www.research.manchester.ac.uk/portal/en/theses/datadriven-temporal-information-extraction-with-applications-in-general-and-clinical-domains(34d7e698-f8a8-4fbf-b742-d522c4fe4a12).html |
Page generated in 0.0022 seconds