• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Interpretation of anaphoric expressions in the Lolita system

Urbanowicz, Agnieszka Joanna January 1998 (has links)
This thesis addresses the issue of anaphora resolution in the large scale natural language system, LOLITA. The work described here involved a thorough analysis of the system’s initial performance, the collection of evidence for and the design of the new anaphora resolution algorithm, and subsequent implementation and evaluation of the system. Anaphoric expressions are elements of a discourse whose resolution depends on other elements of the preceding discourse. The processes involved in anaphora resolution have long been the subject of research in a variety of fields. The changes carried out to LOLITA first involved substantial improvements to the core, lower level modules which form the basis of the system. A major change specific to the interpretation of anaphoric expressions was then introduced. A system of filters, in which potential candidates for resolution are filtered according to a set of heuristics, has been changed to a system of penalties, where candidates accumulate points throughout the application of the heuristics. At the end of the process, the candidate with the smallest penalty is chosen as a referent. New heuristics, motivated by evidence drawn from research in linguistics, psycholinguistics and AI, have been added to the system. The system was evaluated using a procedure similar to that defined by MUC6 (DARPA 1995). Blind and open tests were used. The first evaluation was carried out after the general improvements to the lower level modules; the second after the introduction of the new anaphora algorithm. It was found that the general improvements led to a considerable rise in scores in both the blind and the open test sets. As a result of the anaphora specific improvements, on the other hand, the rise in scores on the open set was larger than the rise on the blind set. In the open set the category of pronouns showed the most marked improvement. It was concluded that it is the work carried out to the basic, lower level modules of a large scale system which leads to biggest gains. It was also concluded that considerable extra advantage can be gained by using the new weights-based algorithm together with the generally improved system.
2

Topic indexing and retrieval for open domain factoid question answering

Ahn, Kisuh January 2009 (has links)
Factoid Question Answering is an exciting area of Natural Language Engineering that has the potential to replace one major use of search engines today. In this dissertation, I introduce a new method of handling factoid questions whose answers are proper names. The method, Topic Indexing and Retrieval, addresses two issues that prevent current factoid QA system from realising this potential: They can’t satisfy users’ demand for almost immediate answers, and they can’t produce answers based on evidence distributed across a corpus. The first issue arises because the architecture common to QA systems is not easily scaled to heavy use because so much of the work is done on-line: Text retrieved by information retrieval (IR) undergoes expensive and time-consuming answer extraction while the user awaits an answer. If QA systems are to become as heavily used as popular web search engines, this massive process bottle-neck must be overcome. The second issue of how to make use of the distributed evidence in a corpus is relevant when no single passage in the corpus provides sufficient evidence for an answer to a given question. QA systems commonly look for a text span that contains sufficient evidence to both locate and justify an answer. But this will fail in the case of questions that require evidence from more than one passage in the corpus. Topic Indexing and Retrieval method developed in this thesis addresses both these issues for factoid questions with proper name answers by restructuring the corpus in such a way that it enables direct retrieval of answers using off-the-shelf IR. The method has been evaluated on 377 TREC questions with proper name answers and 41 questions that require multiple pieces of evidence from different parts of the TREC AQUAINT corpus. With regards to the first evaluation, scores of 0.340 in Accuracy and 0.395 in Mean Reciprocal Rank (MRR) show that the Topic Indexing and Retrieval performs well for this type of questions. A second evaluation compares performance on a corpus of 41 multi-evidence questions by a question-factoring baseline method that can be used with the standard QA architecture and by my Topic Indexing and Retrieval method. The superior performance of the latter (MRR of 0.454 against 0.341) demonstrates its value in answering such questions.

Page generated in 0.1154 seconds