Return to search

A Graph Approach to Measuring Text Distance

Text comparison is a key step in many natural language processing (NLP)
applications in which texts can be classified on the basis of their semantic
distance (how similar or different the texts are). For example, comparing the
local context of an ambiguous word with that of a known word can help identify
the sense of the ambiguous word. Typically, a distributional measure is used
to capture the implicit semantic distance between two pieces of text. In this
thesis, we introduce an alternative method of measuring the semantic distance
between texts as a combination of distributional information and
relational/ontological knowledge. In this work, we propose a novel distance
measure within a network-flow formalism that combines these two distinct
components in a way that they are not treated as separate and orthogonal
pieces of information. First, we represent each text as a collection of
frequency-weighted concepts within a relational thesaurus. Then, we make use
of a network-flow method which provides an efficient way of measuring the
semantic distance between two texts by taking advantage of the inherently
graphical structure in an ontology. We evaluate our method in a variety of
NLP tasks.

In our task-based evaluation, we find that our method performs well on two of
three tasks. We introduce a novel measure which is intended to capture how
well our network-flow method perform on a dataset (represented as a collection
of frequency-weighted concepts). In our analysis, we find that an integrated
approach, rather than a purely distributional or graphical analysis, is more
effective in explaining the performance inconsistency.

Finally, we address a complexity issue that arises from the overhead
required to incorporate more sophisticated concept-to-concept distances
into the network-flow framework. We propose a graph transformation
method which generates a pared-down network that requires less time to
process. The new method achieves a significant speed improvement, and
does not seriously hamper performance as a result of the transformation,
as indicated in our analysis.

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OTU.1807/17257
Date26 February 2009
CreatorsTsang, Vivian
ContributorsStevenson, Suzanne
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
Languageen_ca
Detected LanguageEnglish
TypeThesis
Format1307533 bytes, application/pdf

Page generated in 0.0127 seconds