Return to search

Suffix Trees for Document Retrieval

This thesis presents a look at the suitability of Suffix Trees for full text indexing and retrieval. Typically suffix trees are built on a character level, where the tree records which characters follow each other character. By building suffix trees for documents based on words instead of characters, the resulting tree effectively indexes every word or sequence of words that occur in any of the documents. Ukkonnen's algorithm is adapted to build word-level suffix trees. But the primary focus is on developing Algorithms for searching the suffix tree for exact and approximate, or fuzzy, matches to arbitrary query strings. A proof-of-concept implementation is built and compared to a Lucene index for retrieval over a subset of the Reuters RCV1 data set.

Identiferoai:union.ndltd.org:CALPOLY/oai:digitalcommons.calpoly.edu:theses-1818
Date01 June 2012
CreatorsReck, Ryan
PublisherDigitalCommons@CalPoly
Source SetsCalifornia Polytechnic State University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMaster's Theses

Page generated in 0.0196 seconds