In this thesis, we mainly investigate the influence of using unsupervised morphological segmentation as features on the dependency parsing of morphologically rich languages such as Finnish, Estonian, Hungarian, Turkish, Uyghur, and Kazakh. Studying the morphology of these languages is of great importance for the dependency parsing of morphologically rich languages since dependency relations in a sentence of these languages mostly rely on morphemes rather than word order. In order to investigate our research questions, we have conducted a large number of parsing experiments both on MaltParser and UDPipe. We have generated the supervised morphology and the predicted POS tags from UDPipe, and obtained the unsupervised morphological segmentation from Morfessor, and have converted the unsupervised morphological segmentation into features and added them to the UD treebanks of each language. We have also investigated the different ways of converting the unsupervised segmentation into features and studied the result of each method. We have reported the Labeled Attachment Score (LAS) for all of our experimental results. The main finding of this study is that dependency parsing of some languages can be improved simply by providing unsupervised morphology during parsing if there is no manually annotated or supervised morphology available for such languages. After adding unsupervised morphological information with predicted POS tags, we get improvement of 4.9%, 6.0%, 8.7%, 3.3%, 3.7%, and 12.0% on the test set of Turkish, Uyghur, Kazakh, Finnish, Estonian, and Hungarian respectively on MaltParser, and the parsing accuracies have been improved by 2.7%, 4.1%, 8.2%, 2.4%, 1.6%, and 2.6% on the test set of Turkish, Uyghur, Kazakh, Finnish, Estonian, and Hungarian respectively on UDPipe when comparing the results from the models which do not use any morphological information during parsing.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-354459 |
Date | January 2018 |
Creators | Yusupujiang, Zulipiye |
Publisher | Uppsala universitet, Institutionen för lingvistik och filologi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0021 seconds