The provision of compositionality in distributional models of meaning, where a word is represented as a vector of co-occurrence counts with every other word in the vocabulary, offers a solution to the fact that no text corpus, regardless of its size, is capable of providing reliable co-occurrence statistics for anything but very short text constituents. The purpose of a compositional distributional model is to provide a function that composes the vectors for the words within a sentence, in order to create a vectorial representation that re ects its meaning. Using the abstract mathematical framework of category theory, Coecke, Sadrzadeh and Clark showed that this function can directly depend on the grammatical structure of the sentence, providing an elegant mathematical counterpart of the formal semantics view. The framework is general and compositional but stays abstract to a large extent. This thesis contributes to ongoing research related to the above categorical model in three ways: Firstly, I propose a concrete instantiation of the abstract framework based on Frobenius algebras (joint work with Sadrzadeh). The theory improves shortcomings of previous proposals, extends the coverage of the language, and is supported by experimental work that improves existing results. The proposed framework describes a new class of compositional models thatfind intuitive interpretations for a number of linguistic phenomena. Secondly, I propose and evaluate in practice a new compositional methodology which explicitly deals with the different levels of lexical ambiguity (joint work with Pulman). A concrete algorithm is presented, based on the separation of vector disambiguation from composition in an explicit prior step. Extensive experimental work shows that the proposed methodology indeed results in more accurate composite representations for the framework of Coecke et al. in particular and every other class of compositional models in general. As a last contribution, I formalize the explicit treatment of lexical ambiguity in the context of the categorical framework by resorting to categorical quantum mechanics (joint work with Coecke). In the proposed extension, the concept of a distributional vector is replaced with that of a density matrix, which compactly represents a probability distribution over the potential different meanings of the specific word. Composition takes the form of quantum measurements, leading to interesting analogies between quantum physics and linguistics.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:658422 |
Date | January 2014 |
Creators | Kartsaklis, Dimitrios |
Contributors | Sadrzadeh, Mehrnoosh; Coecke, Bob; Pulman, Stephen |
Publisher | University of Oxford |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://ora.ox.ac.uk/objects/uuid:1f6647ef-4606-4b85-8f3b-c501818780f2 |
Page generated in 0.0021 seconds