Information retrieval research usually deals with globally visible, static document collections. Practical applications, in contrast, like file system search and enterprise search, have to cope with highly dynamic text collections and have to take into account user-specific access permissions when generating the results to a search query.
The goal of this thesis is to close the gap between information retrieval research and the requirements exacted by these real-life applications. The algorithms and data structures presented in this thesis can be used to implement a file system search engine that is able to react to changes in the file system by updating its index data in real time. File changes (insertions, deletions, or modifications) are reflected by the search results within a few seconds, even under a very high system workload. The search engine exhibits a low main memory consumption. By integrating security restrictions into the query processing logic, as opposed to applying them in a postprocessing step, it produces search results that are guaranteed to be consistent with the access permissions defined by the file system.
The techniques proposed in this thesis are evaluated theoretically, based on a Zipfian model of term distribution, and through a large number of experiments, involving text collections of non-trivial size --- varying between a few gigabytes and a few hundred gigabytes.
Identifer | oai:union.ndltd.org:WATERLOO/oai:uwspace.uwaterloo.ca:10012/3149 |
Date | January 2007 |
Creators | Buettcher, Stefan |
Source Sets | University of Waterloo Electronic Theses Repository |
Language | English |
Detected Language | English |
Type | Thesis or Dissertation |
Format | 3516169 bytes, application/pdf |
Page generated in 0.0017 seconds