Return to search

Zpracování turkických jazyků / Processing of Turkic Languages

Title: Processing of Turkic Languages Author: Sibel Ciddi Department: Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague Supervisor: RNDr. Daniel Zeman, Ph.D. Abstract: This thesis presents several methods for the morpholog- ical processing of Turkic languages, such as Turkish, which pose a specific set of challenges for natural language processing. In order to alleviate the problems with lack of large language resources, it makes the data sets used for morphological processing and expansion of lex- icons publicly available for further use by researchers. Data sparsity, caused by highly productive and agglutinative morphology in Turkish, imposes difficulties in processing of Turkish text, especially for meth- ods using purely statistical natural language processing. Therefore, we evaluated a publicly available rule-based morphological analyzer, TRmorph, based on finite state methods and technologies. In order to enhance the efficiency of this analyzer, we worked on expansion of lexicons, by employing heuristics-based methods for the extraction of named entities and multi-word expressions. Furthermore, as a prepro- cessing step, we introduced a dictionary-based recognition method for tokenization of multi-word expressions. This method complements...

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:323086
Date January 2014
CreatorsCiddi, Sibel
ContributorsZeman, Daniel, Hlaváčová, Jaroslava
Source SetsCzech ETDs
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0018 seconds