Return to search

Cross Language Information Retrieval for Languages with Scarce Resources

Our generation has experienced one of the most dramatic changes in how society communicates. Today, we have online information on almost any imaginable topic. However, most of this information is available in only a few dozen languages. In this thesis, I explore the use of parallel texts to enable cross-language information retrieval (CLIR) for languages with scarce resources. To build the parallel text I use the Bible. I evaluate different variables and their impact on the resulting CLIR system, specifically: (1) the CLIR results when using different amounts of parallel text; (2) the role of paraphrasing on the quality of the CLIR output; (3) the impact on accuracy when translating the query versus translating the collection of documents; and finally (4) how the results are affected by the use of different dialects. The results show that all these variables have a direct impact on the quality of the CLIR system.

Identiferoai:union.ndltd.org:unt.edu/info:ark/67531/metadc12157
Date05 1900
CreatorsLoza, Christian
ContributorsMihalcea, Rada, 1974-, Tarau, Paul, Ruiz, Miguel E.
PublisherUniversity of North Texas
Source SetsUniversity of North Texas
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation
FormatText
RightsPublic, Copyright, Loza, Christian, Copyright is held by the author, unless otherwise noted. All rights reserved.

Page generated in 0.0022 seconds