This paper presents and evaluates an algorithm to translate a dependency treebank into a Combinatory Categorial Grammar (CCG) lexicon. The dependency relations between a head and a child in a dependency tree are exploited to determine how CCG categories should be derived by making a functional distinction between adjunct and argument relations. Derivations for an English (CoNLL08 shared task treebank) and for an Italian (Turin University Treebank) dependency treebank are performed, each requiring a number of preprocessing steps.
In order to determine the adequacy of the lexicons, dubbed DepEngCCG and DepItCCG, they are compared via two methods to preexisting CCG lexicons derived from similar or equivalent sources (CCGbank and TutCCG). First, a number of metrics are used to compare the state of the lexicon, including category complexity and category growth. Second, to measures the potential applicability of the lexicons in NLP tasks, the derived English CCG lexicon and CCGbank are compared in a sentiment analysis task. While the numeric measurements show promising results for the quality of the lexicons, the sentiment analysis task fails to generate a usable comparison. / text
Identifer | oai:union.ndltd.org:UTEXAS/oai:repositories.lib.utexas.edu:2152/ETD-UT-2010-12-2563 |
Date | 21 February 2011 |
Creators | Brewster, Joshua Blake |
Source Sets | University of Texas |
Language | English |
Detected Language | English |
Type | thesis |
Format | application/pdf |
Page generated in 0.0019 seconds