Return to search

Domain Adaptation for Hypernym Discovery via Automatic Collection of Domain-Specific Training Data / Domänanpassning för identifiering av hypernymer via automatisk insamling av domänspecifikt träningsdata

Identifying semantic relations in natural language text is an important component of many knowledge extraction systems. This thesis studies the task of hypernym discovery, i.e discovering terms that are related by the hypernymy (is-a) relation. Specifically, this thesis explores how state-of-the-art methods for hypernym discovery perform when applied in specific language domains. In recent times, state-of-the-art methods for hypernym discovery are mostly made up by supervised machine learning models that leverage distributional word representations such as word embeddings. These models require labeled training data in the form of term pairs that are known to be related by hypernymy. Such labeled training data is often not available when working with a specific language domain. This thesis presents experiments with an automatic training data collection algorithm. The algorithm leverages a pre-defined domain-specific vocabulary, and the lexical resource WordNet, to extract training pairs automatically. This thesis contributes by presenting experimental results when attempting to leverage such automatically collected domain-specific training data for the purpose of domain adaptation. Experiments are conducted in two different domains: One domain where there is a large amount of text data, and another domain where there is a much smaller amount of text data. Results show that the automatically collected training data has a positive impact on performance in both domains. The performance boost is most significant in the domain with a large amount of text data, with mean average precision increasing by up to 8 points.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-157693
Date January 2019
CreatorsPalm Myllylä, Johannes
PublisherLinköpings universitet, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0411 seconds