Return to search

Finding Meaning in Context Using Graph Algorithms in Mono- and Cross-lingual Settings

Making computers automatically find the appropriate meaning of words in context is an interesting problem that has proven to be one of the most challenging tasks in natural language processing (NLP). Widespread potential applications of a possible solution to the problem could be envisaged in several NLP tasks such as text simplification, language learning, machine translation, query expansion, information retrieval and text summarization. Ambiguity of words has always been a challenge in these applications, and the traditional endeavor to solve the problem of this ambiguity, namely doing word sense disambiguation using resources like WordNet, has been fraught with debate about the feasibility of the granularity that exists in WordNet senses. The recent trend has therefore been to move away from enforcing any given lexical resource upon automated systems from which to pick potential candidate senses,and to instead encourage them to pick and choose their own resources. Given a sentence with a target ambiguous word, an alternative solution consists of picking potential candidate substitutes for the target, filtering the list of the candidates to a much shorter list using various heuristics, and trying to match these system predictions against a human generated gold standard, with a view to ensuring that the meaning of the sentence does not change after the substitutions. This solution has manifested itself in the SemEval 2007 task of lexical substitution and the more recent SemEval 2010 task of cross-lingual lexical substitution (which I helped organize), where given an English context and a target word within that context, the systems are required to provide between one and ten appropriate substitutes (in English) or translations (in Spanish) for the target word. In this dissertation, I present a comprehensive overview of state-of-the-art research and describe new experiments to tackle the tasks of lexical substitution and cross-lingual lexical substitution. In particular I attempt to answer some research questions pertinent to the tasks, mostly focusing on completely unsupervised approaches. I present a new framework for unsupervised lexical substitution using graphs and centrality algorithms. An additional novelty in this approach is the use of directional similarity rather than the traditional, symmetric word similarity. Additionally, the thesis also explores the extension of the monolingual framework into a cross-lingual one, and examines how well this cross-lingual framework can work for the monolingual lexical substitution and cross-lingual lexical substitution tasks. A comprehensive set of comparative investigations are presented amongst supervised and unsupervised methods, several graph based methods, and the use of monolingual and multilingual information.

Identiferoai:union.ndltd.org:unt.edu/info:ark/67531/metadc271899
Date05 1900
CreatorsSinha, Ravi Som
ContributorsMihalcea, Rada, 1974-, McCarthy, Diana, Tarau, Paul, Ruiz, Miguel
PublisherUniversity of North Texas
Source SetsUniversity of North Texas
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation
FormatText
RightsPublic, Sinha, Ravi Som, Copyright, Copyright is held by the author, unless otherwise noted. All rights Reserved.

Page generated in 0.0019 seconds