Global ETD Search

Return to search

Searching and ranking structured documents

It is common to see documents with explicit structure marked up in languages such as XML. Queries, on the other hand, typically have no structure. There is a clear mismatch, although documents contain structure it is typically not used in information retrieval.
An efficient index structure for document-centric searching is proposed and its efficiency is discussed. It is shown to be at worst linear with respect to the number of occurrences of a given search term. The algorithm is then extended to accommodate element-centric information retrieval.
Ranking algorithms for structured documents are examined. Genetic Algorithms are used to learn different weights for each structure present in a document. Applying these weights as part of a function is shown to yield significant precision improvements in some functions. Genetic Programming is then used to learn an entire ranking function. This function is shown to be portable between document collections.
A query language for structured information retrieval is proposed. Use of this language in the 2004 INEX workshop resulted in a large decrease in query errors.
Structured information retrieval is now a viable alternative to its unstructured counterpart. A successful query language, efficient indexing structures, and improved ranking functions are all presented.

http://adt.otago.ac.nz./public/adt-NZDU20070403.110440

information retrieval

query languages (computer science)

algorithms

computer programs

Identifer	oai:union.ndltd.org:ADTP/217501
Date	January 2007
Creators	Trotman, Andrew, n/a
Publisher	University of Otago. Department of Computer Science
Source Sets	Australiasian Digital Theses Program
Language	English
Detected Language	English
Rights	http://policy01.otago.ac.nz/policies/FMPro?-db=policies.fm&-format=viewpolicy.html&-lay=viewpolicy&-sortfield=Title&Type=Academic&-recid=33025&-find), Copyright Andrew Trotman

Page generated in 0.0019 seconds

Searching and ranking structured documents

Description

Links & Downloads

Tags

Additional Fields