Return to search

Experimentation with inverted indexes for dynamic document collections

This report aims to asses the efficiency of various inverted indexes when the indexed document collection is dynamic. To achieve this goal, we experiment with three different overall structures: Remerge, hierarchical indexes and a naive B-tree index. An efficiency model is also developed. The resulting estimates for each structure from the efficiency model are compared to the actual results. We introduce two modifications to existing methods. The first is a new scheme for accumulating an index in memory during sort-based inversion. Even though the memory characteristics of this modified scheme are attractive, our experiments suggest that other proposed methods are more efficient. We also introduce a modification to the hierarchical indexes, which makes them more flexible. Tf-idf is used as the ranking scheme in all tested methods. Approximations to this scheme are suggested to make it more efficient in an updatable index. We conclude that in our implementation, the hierarchical index with the modification we have suggested performs best overall. We also conclude that the tf-idf ranking scheme is not fit for updatable indexes. The major problem with using the scheme is that it becomes difficult to make documents searchable immediately without sacrificing update speed.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-9645
Date January 2007
CreatorsBjørklund, Truls Amundsen
PublisherNorges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0018 seconds