Global ETD Search

Return to search

Zpracování tureckých jazyků / Processing of Turkic Languages

This thesis aims to present several combined methods for the morphological processing of Turkic languages, such as Turkish, which pose a specific set of challenges for computational processing, and also aims to make larger data sets publicly available. Because of the highly productive, agglutinative morphology in Turkish, data sparsity---besides the lack of the publicly available large data sets---impose difficulties in natural language processing, especially with regards to relying on purely statistical methods. Therefore, we evaluate a publicly available rule-based morphological analyzer, TRmorph, based on finite state transducers. In order to enhance the efficiency of this analyzer, and to expand its lexicon; we combine statistical and heuristics-based methods for the named entity processing (and construction of gazetteers), morphological disambiguation task and the multiword expression processing. Experiment results obtained so far point out that the use of heuristic-methods provides promising coverage increase for the text being processed by TRmorph, while the statistical approach is used as a back-up for more fine-grained tasks that may not be captured by pattern-based heuristics approach. This way, our proposed combined approach enhances the efficiency of a morphological analyzer based purely on FST...

http://www.nusl.cz/ntk/nusl-320990

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:320990
Date	January 2013
Creators	Ciddi, Sibel
Contributors	Zeman, Daniel, Lopatková, Markéta
Source Sets	Czech ETDs
Language	English
Detected Language	English
Type	info:eu-repo/semantics/masterThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.002 seconds

Zpracování tureckých jazyků / Processing of Turkic Languages

Description

Links & Downloads

Tags

Additional Fields