Title: Processing of Turkic Languages Author: Sibel Ciddi Department: Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague Supervisor: RNDr. Daniel Zeman, Ph.D. Abstract: This thesis presents several methods for the morpholog- ical processing of Turkic languages, such as Turkish, which pose a specific set of challenges for natural language processing. In order to alleviate the problems with lack of large language resources, it makes the data sets used for morphological processing and expansion of lex- icons publicly available for further use by researchers. Data sparsity, caused by highly productive and agglutinative morphology in Turkish, imposes difficulties in processing of Turkish text, especially for meth- ods using purely statistical natural language processing. Therefore, we evaluated a publicly available rule-based morphological analyzer, TRmorph, based on finite state methods and technologies. In order to enhance the efficiency of this analyzer, we worked on expansion of lexicons, by employing heuristics-based methods for the extraction of named entities and multi-word expressions. Furthermore, as a prepro- cessing step, we introduced a dictionary-based recognition method for tokenization of multi-word expressions. This method complements...
Identifer | oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:323086 |
Date | January 2014 |
Creators | Ciddi, Sibel |
Contributors | Zeman, Daniel, Hlaváčová, Jaroslava |
Source Sets | Czech ETDs |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/masterThesis |
Rights | info:eu-repo/semantics/restrictedAccess |
Page generated in 0.0019 seconds