Index size savings from three techniques are measured. The three
techniques are: 1) eliminating common, low information words found in a
"stop list" (such as: of, the, at, etc.), 2) truncating terms by eliminating word
stems (such as: -s, -ed, -ing, etc.), and 3) simple data compression. Savings
are measured on two moderately large collections of text. The index size
savings that result from using the techniques individually and in
combination are reported. The impact on query performance in terms of
speed, recall and precision are estimated. / Graduation date: 1992
Identifer | oai:union.ndltd.org:ORGSU/oai:ir.library.oregonstate.edu:1957/36920 |
Date | 19 February 1992 |
Creators | Jacobson, Bryan L. |
Contributors | Bregar, William S. |
Source Sets | Oregon State University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Page generated in 0.0017 seconds