In this thesis, cross-lingual information retrieval (CLIR) approaches are comparatively evaluated
for Turkish and English texts. As a complementary study, knowledge-based methods
for word sense disambiguation (WSD), which is one of the most important parts of the CLIR
studies, are compared for Turkish words.
Query translation and sense indexing based CLIR approaches are used in this study. In query
translation approach, we use automatic and manual word sense disambiguation methods and
Google translation service during translation of queries. In sense indexing based approach,
documents are indexed according to meanings of words instead of words themselves. Retrieval
of documents is performed according to meanings of the query words as well. During
the identification of intended meaning of query terms, manual and automatic word sense disambiguation
methods are used and compared to each other.
Knowledge based WSD methods that use different gloss enrichment techniques are compared
for Turkish words. Turkish WordNet is used as a primary knowledge base and English
WordNet and Turkish Wikipedia are employed as enrichment resources. Meanings of
words are more clearly identified by using semantic relations defined in WordNets and Turkish
Wikipedia. Also, during calculation of semantic relatedness of senses, cosine similarity
metric is used as an alternative metric to word overlap count. Effects of using cosine similarity
metric are observed for each WSD methods that use different knowledge bases.
Identifer | oai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/12611903/index.pdf |
Date | 01 April 2010 |
Creators | Boynuegri, Akif |
Contributors | Birturk, Aysenur |
Publisher | METU |
Source Sets | Middle East Technical Univ. |
Language | English |
Detected Language | English |
Type | M.S. Thesis |
Format | text/pdf |
Rights | To liberate the content for public access |
Page generated in 0.0021 seconds