Global ETD Search

Return to search

Implementation and evaluation of Norwegian Analyzer for use with DotLucene

<p>This work has focused on improving retrieval performance of search in Norwegian document collections. The initiator of the thesis, InfoFinder Norge, desired an Norwegian analyzer for DotLucene. The standard analyzer used before did not support stopword elimination and stemming for Norwegian language. Norwegian Analyzer and standard analyzer were used in turns on the same document collection before indexing and querying, then the respective results were compared to discover efficiency improvements. An evaluation method based on Term Relevance Sets were investigated and used on DotLucene with use of the two analyzer approaches. Term Relevance Sets methodology were also compared with common measurements for relevance judging, and found useful for evaluation of IR systems. The evaluation results of Norwegian analyzer and standard analyzer gave clear indications that use of stopword elimination and stemming for Norwegian documents improves retrieval efficiency. Term Relevance Set-based evaluation was found reliable by comparing the results with precision measurements. Precision was increased with 16% with use of Norwegian Analyzer compared to use an standard analyzer with no content preprocessing support for Norwegian. Term Relevance Set evaluation with use of 10 ontopic terms and 10 offtopic terms gave an increased $tScore$ of 44%. The results show that counting term occurrences in the content of retrieved documents can be used to gain confidence that documents are either relevant or not relevant.</p>

ntnudaim

SIF2 datateknikk

Intelligente systemer

Identifer	oai:union.ndltd.org:UPSALLA/oai:DiVA.org:ntnu-10092
Date	January 2006
Creators	Olsen, Bjørn Harald
Publisher	Norwegian University of Science and Technology, Department of Computer and Information Science, Institutt for datateknikk og informasjonsvitenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, text

Page generated in 0.0019 seconds

Implementation and evaluation of Norwegian Analyzer for use with DotLucene

Description

Links & Downloads

Tags

Additional Fields