Return to search

Functional distributional semantics : learning linguistically informed representations from a precisely annotated corpus

The aim of distributional semantics is to design computational techniques that can automatically learn the meanings of words from a body of text. The twin challenges are: how do we represent meaning, and how do we learn these representations? The current state of the art is to represent meanings as vectors - but vectors do not correspond to any traditional notion of meaning. In particular, there is no way to talk about 'truth', a crucial concept in logic and formal semantics. In this thesis, I develop a framework for distributional semantics which answers this challenge. The meaning of a word is not represented as a vector, but as a 'function', mapping entities (objects in the world) to probabilities of truth (the probability that the word is true of the entity). Such a function can be interpreted both in the machine learning sense of a classifier, and in the formal semantic sense of a truth-conditional function. This simultaneously allows both the use of machine learning techniques to exploit large datasets, and also the use of formal semantic techniques to manipulate the learnt representations. I define a probabilistic graphical model, which incorporates a probabilistic generalisation of model theory (allowing a strong connection with formal semantics), and which generates semantic dependency graphs (allowing it to be trained on a corpus). This graphical model provides a natural way to model logical inference, semantic composition, and context-dependent meanings, where Bayesian inference plays a crucial role. I demonstrate the feasibility of this approach by training a model on WikiWoods, a parsed version of the English Wikipedia, and evaluating it on three tasks. The results indicate that the model can learn information not captured by vector space models.

Identiferoai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:763682
Date January 2018
CreatorsEmerson, Guy Edward Toh
ContributorsCopestake, Ann
PublisherUniversity of Cambridge
Source SetsEthos UK
Detected LanguageEnglish
TypeElectronic Thesis or Dissertation
Sourcehttps://www.repository.cam.ac.uk/handle/1810/284882

Page generated in 0.0027 seconds