Global ETD Search

Return to search

Focused Retrieval

Traditional information retrieval applications, such as Web search, return atomic units of retrieval, which are generically called ``documents''. Depending on the application, a document may be a Web page, an email message, a journal article, or any similar object. In contrast to this traditional approach, focused retrieval helps users better pin-point their exact information needs by returning results at the sub-document level. These results may consist of predefined document components~---~such as pages, sections, and paragraphs~---~or they may consist of arbitrary passages, comprising any sub-string of a document. If a document is marked up with XML, a focused retrieval system might return individual XML elements or ranges of elements. This thesis proposes and evaluates a number of approaches to focused retrieval, including methods based on XML markup and methods based on arbitrary passages. It considers the best unit of retrieval, explores methods for efficient sub-document retrieval, and evaluates formulae for sub-document scoring. Focused retrieval is also considered in the specific context of the Wikipedia, where methods for automatic vandalism detection and automatic link generation are developed and evaluated.

http://hdl.handle.net/10012/5645

Information Retrieval

Computer Science

Identifer	oai:union.ndltd.org:WATERLOO/oai:uwspace.uwaterloo.ca:10012/5645
Date	January 2010
Creators	Itakura, Kalista Yuki
Source Sets	University of Waterloo Electronic Theses Repository
Language	English
Detected Language	English
Type	Thesis or Dissertation

Page generated in 0.0023 seconds

Focused Retrieval

Description

Links & Downloads

Tags

Additional Fields