Using information retrieval systems to gain access to documents in languages other than English is becoming an increasingly significant problem. Rules, theories, algorithms, and retrieval methods designed and developed for English and other morphologically similar languages may or may not apply in the linguistic environments of other languages. The problem is particularly acute in languages that differ radically from English on account of morphological rules. This thesis compares the effects of two indexing and retrieval techniques (stemming and root retrieval) on information retrieval in Arabic through an exploratory study of the handling of Arabic words by an English search engine. It also investigates how best to adapt existing English-language information retrieval systems for use with Arabic-language texts, and specifically to process words and their morphological variations. Search experiments, using 2000 Arabic documents and 40 Arabic search terms (nouns), were conducted with a Web search engine developed for English, AltaVista, to compare the performances of stemming and root retrieval and to investigate the possibility of adapting this engine for use with Arabic text. The results of the experiments show that more effective retrieval can be accomplished through stemming, and that it is possible to adapt the engine for use with Arabic without the need to develop root-retrieval features.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:QMM.38247 |
Date | January 2001 |
Creators | Moukdad, Haidar. |
Contributors | Large, Andrew (advisor) |
Publisher | McGill University |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | English |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Format | application/pdf |
Coverage | Doctor of Philosophy (Graduate School of Library and Information Studies.) |
Rights | All items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated. |
Relation | alephsysno: 001871882, proquestno: NQ78741, Theses scanned by UMI/ProQuest. |
Page generated in 0.0022 seconds