• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Retrieving Definitions from Scientific Text in the Salmon Fish Domain by Lexical Pattern Matching

Gabbay, Igal 01 1900 (has links)
While an information retrieval system takes as input a user query and returns a list of relevant documents chosen from a large collection, a question answering system attempts to produce an exact answer. Recent research, motivated by the question answering track of the Text REtrieval Conference (TREC) has focused mainly on answering ‘factoid’ questions concerned with names, places, dates etc. in the news domain. However, questions seeking definitions of terms are common in the logs of search engines. The objective of this project was therefore to investigate methods of retrieving definitions from scientific documents. The subject domain was salmon, and an appropriate test collection of articles was created, pre-processed and indexed. Relevant terms were obtained from salmon researchers and a fish database. A system was built which accepted a term as input, retrieved relevant documents from the collection using a search engine, identified definition phrases within them using a vocabulary of syntactic patterns and associated heuristics, and produced as output phrases explaining the term. Four experiments were carried out which progressively extended and refined the patterns. The performance of the system, measured using an appropriate form of precision, improved over the experiments from 8.6% to 63.6%. The main findings of the research were: (1) Definitions were diverse despite the documents’ homogeneity and found not only in the Introduction and Abstract sections but also in the Methods and References; (2) Nevertheless, syntactic patterns were a useful starting point in extracting them; (3) Three patterns accounted for 90% of candidate phrases; (4) Statistically, the ordinal number of the instance of the term in a document was a better indicator of the presence of a definition than either sentence position and length, or the number of sentences in the document. Next steps include classifying terms, using information extraction-like templates, resolving basic anaphors, ranking answers, exploiting the structure of scientific papers, and refining the evaluation process.

Page generated in 0.1261 seconds