Return to search

Evaluating distributional models of compositional semantics

Distributional models (DMs) are a family of unsupervised algorithms that represent the meaning of words as vectors. They have been shown to capture interesting aspects of semantics. Recent work has sought to compose word vectors in order to model phrases and sentences. The most commonly used measure of a compositional DM's performance to date has been the degree to which it agrees with human-provided phrase similarity scores. The contributions of this thesis are three-fold. First, I argue that existing intrinsic evaluations are unreliable as they make use of small and subjective gold-standard data sets and assume a notion of similarity that is independent of a particular application. Therefore, they do not necessarily measure how well a model performs in practice. I study four commonly used intrinsic datasets and demonstrate that all of them exhibit undesirable properties. Second, I propose a novel framework within which to compare word- or phrase-level DMs in terms of their ability to support document classification. My approach couples a classifier to a DM and provides a setting where classification performance is sensitive to the quality of the DM. Third, I present an empirical evaluation of several methods for building word representations and composing them within my framework. I find that the determining factor in building word representations is data quality rather than quantity; in some cases only a small amount of unlabelled data is required to reach peak performance. Neural algorithms for building single-word representations perform better than counting-based ones regardless of what composition is used, but simple composition algorithms can outperform more sophisticated competitors. Finally, I introduce a new algorithm for improving the quality of distributional thesauri using information from repeated runs of the same non deterministic algorithm.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:687087
Date January 2016
CreatorsBatchkarov, Miroslav Manov
PublisherUniversity of Sussex
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttp://sro.sussex.ac.uk/id/eprint/61062/

Page generated in 0.0024 seconds