Return to search

Weighting Document Genre in Enterprise Search

The creation of an Enterprise Search system involves many challenges that are not present in Web search. Searching a corporate collection is influenced both by the structure of the data present in the collection and by the policies of the corporation. These structures and policies may differ from corporation to corporation, and from collection to collection.

In particular, an Enterprise Search system must take a document's genre into account. Examples of document genre within a corporate collection might include FAQs, white papers, technical reports, memos, emails and chat messages. Depending on an individual's current work task, it might be appropriate to give one genre a greater weight than another during the processing of a search request. Moreover, this weighting may change as the individual's work task changes.

The work presented in this thesis adapts the Okapi BM25 scoring function to weight term frequency based on the relevance of a document genre to a work task. The method utilizes two user-provided resources, relevance judgments and clickthrough data, to estimate a realistic weight for each task-genre relationship. Using this approach, the method matches the purpose of each user search request with the purpose of each document. Therefore, the proper documents are returned to the user and her/his need can be fulfilled.

The method has been incorporated into a prototype search engine, X-site, currently deployed on a corporate intranet. X-Site is a contextual search engine that uses the relationships between work tasks and document genres to improve search precision for software engineers. The system provides a customized and user-controlled means of refining search results to suit the task context of a user. Through X-Site, each employee can make a single search request and has access to documents from the Internet, a corporate intranet, and Lotus Notes databases.

Identiferoai:union.ndltd.org:WATERLOO/oai:uwspace.uwaterloo.ca:10012/3160
Date16 July 2007
CreatorsYeung, Peter Chun Kai
Source SetsUniversity of Waterloo Electronic Theses Repository
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation
Format2135222 bytes, application/pdf

Page generated in 0.0013 seconds