In this work, we explore the possibility of using word alignment in parallel corpus to project language annotations such as Part-of-Speech tags and dependency relation from high-resource languages to low-resource languages. We use a parallel corpus of Bible translations, including 1,444 translations in 986 languages, and a well-developed parser is used to annotate source languages (English, French, German, and Czech). The annotations are projected to low-resource languages based on the word alignment results. Then we design and refine the process of detecting verbs and the subjects/objects linked to this verb, find and count the word orders. We used data from The World Atlas of Language Structures (WALS) to check if our program gives satisfactory results, including some Central African languages with different word orders in positive and negative clauses. And our method gives acceptable results. We explain our results and propose some languages with different word orders in positive and negative clauses. After looking up grammar books, we ensure one language out of three has this feature. Also, some possible ways to improve the performance of this method are described.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-505920 |
Date | January 2023 |
Creators | Lyu, Chen |
Publisher | Uppsala universitet, Institutionen för lingvistik och filologi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0019 seconds