Return to search

Web-assisted anaphora resolution

This dissertation investigates the utility of the web for anaphora resolution. Aside from offering a highly accurate, web-based method for pleonastic it detection, which eliminates up to 4% of errors in pronominal anaphora resolution, it also introduces a web-assisted model for definite description anaphoricity determination and a prototype system of anaphora resolution that uses the web for virtually all subtasks.

The thesis starts with a thorough analysis of the relationship between anaphora and definiteness, a study that bridges the gap between previously reported empirical studies of definite description anaphora and the linguistic theories developed around the concept of definiteness. Various naturally-occurring definite descriptions found in the WSJ corpus are analyzed from both perspectives of familiarity and uniqueness, and a new classification scheme for definite descriptions is developed.

With the fundamental issues solved, the rest of the thesis focuses on the various ways the web can be exploited for the purpose of anaphora resolution. This thesis presents methods of high-precision, high-recall anaphoricity determination for both pronouns and definite descriptions. Evaluation results suggest that the performance of the pleonastic it identification module is on par with casually-trained human annotators. When used together with a pronominal anaphora resolution system, the module offers a statistically significant performance gain of 4%. The performance of the anaphoricity determination module for definite descriptions, which benefits from both the insight gained from the study on anaphora and definiteness and the significantly expanded coverage offered by the web, is also one of the highest among existing studies. The thesis also introduces a web-centric anaphora resolution system. Aside from serving as the information source for implementing selectional restrictions and discovering hyponym/synonym relationships, the web is additionally used for gender/number determination and many other auxiliary tasks, such as determining the semantic subjects of as-prepositions, identifying antecedents for certain empty categories, and assigning appropriate labels for proper names using information available from the text itself. With a design that specifically leaves room for the application of verb-argument and genitive co-occurrence statistics, the web-based features provide statistically significant gains to the system's performance. / Software Engineering and Intelligent Systems

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:AEU.10048/933
Date06 1900
CreatorsLi, Yifan
ContributorsMusilek, Petr (Electrical and Computer Engineering), Reformat, Marek (Electrical and Computer Engineering), Zadrozny, Slawomir (Polish Academy of Sciences), Fair, Ivan (Electrical and Computer Engineering), Sutton, Richard (Computing Science), Pedrycz, Witold (Electrical and Computer Engineering)
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
Languageen_US
Detected LanguageEnglish
TypeThesis
Format2551171 bytes, application/pdf
RelationYifan Li, Petr Musilek, Marek Reformat, Loren Wyard-Scott (2009). Identification of Pleonastic It Using the Web. J. Artif. Intell. Res. (JAIR) 34: 339-389

Page generated in 0.002 seconds