CLIR, Cross-Lingual Information Retrieval, is a field of research that can behighly useful in web search and for several other applications. Extensiveresearch has been done on possible CLIR implementations, but as of yet thereare no open source frameworks or applications readily available. The thesisfocuses on building such a framework and evaluating it for use on theNorwegian/Spanish language pair.The framework implemented uses query translation to submit queries to existinginformation retrieval (IR) implementations, and the framework itself holds nolow-level IR algorithms. Experiments were performed on a small parallel corpusof Norwegian and Spanish texts, using the Xapian and PostgreSQL IRimplementations. A comprehensive comparison of possible configurations wasdone, and certain measures were shown to be effective when searching fordocuments in either language.The framework is implemented in a modular architecture, allowing the suggestedadditions and amendments to be implemented as add-on components. This is themain intent of the framework, and eases the process of building support foradditional languages as well. For easing the adoption of the framework,additional components and data may be beneficial.Some improvements are also possible for the tested language pair, throughobtaining larger data sets or implementing certain language specificalgorithms. Of particular interest is implementing effective decompounding ofNorwegian compound words and phrase translation support. Suggestions are alsomade for how the system can be used to perform CLIR tasks in other languages.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-18357 |
Date | January 2012 |
Creators | Neergaard, Morten Minde |
Publisher | Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, Institutt for datateknikk og informasjonsvitenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.09 seconds