The aim of this thesis is to describe passage retrieval (PR), with basis in results from various empirical experiments, and to critically investigate different approaches in PR. The main questions to be answered in the thesis are: (1) What characterizes PR? (2) What approaches have been proposed? (3) How well do the approaches work in experimental information retrieval (IR)? PR is a research topic in information retrieval, which instead of retrieving the fulltext of documents, that can lead to information overload for the user, tries to retrieve the most relevant passages in the documents. This technique was investigated studying a number of central articles in the research field. PR can be divided into three different types of approaches based on the segmentation of the documents. First, you can divide the text considering the semantics and where the topics change. Second, you can divide the text based on the explicit structure of the documents, with help from e.g. a markup language like SGML. And third, you can do a form of PR, where you divide the text in parts containing a fixed number of words. This method is called unmotivated segmentation. The study showed that an unmotivated segmentation resulted in the best retrieval effectiveness even though the results are difficult to compare because of different kinds of evaluation methods and different types of test collections. A combination between full text retrieval and PR also showed improved results. / Uppsatsnivå: D
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hb-18347 |
Date | January 2000 |
Creators | Åkesson, Mattias |
Publisher | Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, University College of Borås. Swedish School of Library and Information Science |
Source Sets | DiVA Archive at Upsalla University |
Language | Swedish |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | Magisteruppsats i biblioteks- och informationsvetenskap vid Bibliotekshögskolan/Biblioteks- och informationsvetenskap, 1404-0891 ; 2000:49 |
Page generated in 0.0019 seconds