Our generation has experienced one of the most dramatic changes in how society communicates. Today, we have online information on almost any imaginable topic. However, most of this information is available in only a few dozen languages. In this thesis, I explore the use of parallel texts to enable cross-language information retrieval (CLIR) for languages with scarce resources. To build the parallel text I use the Bible. I evaluate different variables and their impact on the resulting CLIR system, specifically: (1) the CLIR results when using different amounts of parallel text; (2) the role of paraphrasing on the quality of the CLIR output; (3) the impact on accuracy when translating the query versus translating the collection of documents; and finally (4) how the results are affected by the use of different dialects. The results show that all these variables have a direct impact on the quality of the CLIR system.
Identifer | oai:union.ndltd.org:unt.edu/info:ark/67531/metadc12157 |
Date | 05 1900 |
Creators | Loza, Christian |
Contributors | Mihalcea, Rada, 1974-, Tarau, Paul, Ruiz, Miguel E. |
Publisher | University of North Texas |
Source Sets | University of North Texas |
Language | English |
Detected Language | English |
Type | Thesis or Dissertation |
Format | Text |
Rights | Public, Copyright, Loza, Christian, Copyright is held by the author, unless otherwise noted. All rights reserved. |
Page generated in 0.0027 seconds