Cross-Lingual Information Retrieval in the Medical Domain Shadi Saleh In recent years, there has been an exponential growth of the digital content available on the Internet, which has correlated with the increasing number of non-English Internet users due to the spread of the Internet across the globe. This raises the importance of unlocking resources for those who want to look up information not limited to the languages they understand. For example, those who want to use the Internet to find medical content related to their health conditions (self-diagnosis) but they do not have access to resources in their language. Cross-Lingual Information Retrieval (CLIR) breaks the lan- guage barriers by allowing search for documents written in a language different from the query language. This thesis tackles the task of CLIR in the medical domain and investigates the two main approaches: query translation (QT) where queries are machine translated to the language of documents and document translation (DT) where documents are translated to the language of queries. We proceed with our research by employing Statistical Machine Translation (SMT) systems that are tuned for the QT approach and the DT approach in the medical domain for seven European languages (Czech, German, French, Spanish, Hungarian, Polish and Swedish) and...
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:437025 |
Date | January 2020 |
Creators | Saleh, Shadi |
Contributors | Pecina, Pavel, Hanbury, Allan, Kliegr, Tomáš |
Source Sets | Czech ETDs |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/doctoralThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0016 seconds