1 |
Bayesian Models for Multilingual Word AlignmentÖstling, Robert January 2015 (has links)
In this thesis I explore Bayesian models for word alignment, how they can be improved through joint annotation transfer, and how they can be extended to parallel texts in more than two languages. In addition to these general methodological developments, I apply the algorithms to problems from sign language research and linguistic typology. In the first part of the thesis, I show how Bayesian alignment models estimated with Gibbs sampling are more accurate than previous methods for a range of different languages, particularly for languages with few digital resources available—which is unfortunately the state of the vast majority of languages today. Furthermore, I explore how different variations to the models and learning algorithms affect alignment accuracy. Then, I show how part-of-speech annotation transfer can be performed jointly with word alignment to improve word alignment accuracy. I apply these models to help annotate the Swedish Sign Language Corpus (SSLC) with part-of-speech tags, and to investigate patterns of polysemy across the languages of the world. Finally, I present a model for multilingual word alignment which learns an intermediate representation of the text. This model is then used with a massively parallel corpus containing translations of the New Testament, to explore word order features in 1001 languages.
2 |
Waimaha verb morphology / Verbmorfologi i waimahaDahlgren, Bea January 2024 (has links)
Waimaha is a Tucanoan language spoken in the Northwest Amazon Basin in Colombia and Brazil by a few hundred speakers. Languages in this region have a complex verb morphology based on affixation, where several suffixes may be attached to a root to create information heavy words. This study aims towards describing Waimaha’s verb morphology to provide a base on which to build a full grammar description, which the language does not yet have. The study means to analyze Waimaha verb morphology through distributional analysis of its morphemes with the help of parallel texts, and compares findings with related languages. The results find that Waimaha is quite similar to other Tucanoan languages, with a few minor differences in future tense constructions and affix ordering. / Waimaha är ett tukanospråk som talas i nordvästra Amazonas i Colombia och Brasilien av ett hundratal talare. Språk i detta område har en komplex verbmorfologi baserad på affixering, där ett flertal suffix kan fästas vid en rot för att skapa informationstunga ord. Denna studie syftar till att beskriva Waimahas verbmorfologi för att bestå med en grund varpå en grammatiksbeskrivning kan skapas, vilket Waimaha inte ännu har. Studien analyserar Waimahas verbmorfologi genom distributionsanalys med hjälp av parallelltexter, och jämför resultaten med besläktade språk. Resultaten visar att Waimaha är tämligen likartad andra tukanospråk, med ett par mindre skillnader i futurumkonstruktioner och affixordning.
3 |
Vícejazyčnost v "A Clockwork Orange" a jeho překladech. / Multilingualism in "A Clockwork Orange" and its translations.Janák, Petr January 2015 (has links)
The paper explores intratextual multilingualism in A Clockwork Orange (ACO) by Anthony Burgess, and in two of its translations - into Czech and German. It analyses 180 words from Nadsat - the invented language in ACO - to reveal how lexical creativity is manifested in translation, i.e. whether and how lexical creativity that is present in the original text is changed in the translations. Changes in lexical creativity are linked to normalisation (a translation universal), and to the functions of the invented language. An existing classification of forms and functions of intratextual multilingualism is applied to invented languages and, in particular, to Nadsat. The analysis of Nadsat and its counterparts in the translations is quantitative, and is conducted using the concordancers AntConc and ParaConc. It examines the frequency of Nadsat words, their distribution throughout the text, and the way their meaning is conveyed to the reader. These data are then used in the comparison of Nadsat and the invented languages that replace it in the Czech and the German translations. The analysis shows that in both translations the number of invented lemmas is lower than in the original, and that in the German translation (UO) the number is significantly lower compared to the Czech translation (MP). In total, MP...
4 |
Selected topics in the grammar and lexicon of Matal / Utvalda ämnen inom matals grammatik och ordförrådVerdizade, Allahverdi January 2018 (has links)
This thesis describes basic grammatical features and lexicon of Matal, a Chadic language spoken by around 18 000 people in northern Cameroon. A translation of the New Testament is used as a parallel text for the purposes of this study. The identified language structures are compared with other Chadic languages. The results show that Matal is overall typical for the language family, except for the pronominal system, which lacks a clusivity distinction. Nouns and adjectives have a limited morphology, only expressing number as a grammatical category, whereas verbs have many categories that are expressed morphologically, by prefixation and suffixation. For finite verb forms, subject prefixes are obligatory. Tense is expressed either by altered tone in the stem vowel or morphologically. Several verbal suffixes with number and person variants have been identified, although their functions have not been entirely clarified. A system of complex adpositions that make extensive use of grammaticalized body concepts has also been inquired, within which the phenomenon of preposition agreement has been identified. Basic syntactic features, such as word order, negation and topicalization are also addressed. The analysis of the lexicon demonstrates that the basic vocabulary is mainly inherited from earlier stages of the language, but a large number of lexical loans in various semantic domains have also entered Matal. / Denna uppsats beskriver grundläggande grammatiska drag och ordförråd i Matal, ett tchadspråk som talas av omkring 18 000 personer i norra Kamerun. En översättning av Nya Testamentet används som parallelltext i denna studie. Identifierade språkliga strukturer jämförs med andra tchadspråk. Resultaten visar att Matal är på det stora hela typiskt för språkfamiljen, med undantag för det pronominella systemet, som inte uppvisar någon skillnad i klusivitet. Substantiv och adjektiv har en begränsad morfologi som endast uttrycker numerus som grammatisk kategori, medan verb har ett stort antal kategorier som uttrycks morfologiskt. Dessa har formen av affix som fogas både före och efterstammen. I finita verbformer är subjektprefix obligatoriska. Tempus kan uttryckas antingen genom ändrad ton i stamvokalen eller morfologiskt. Ett antal verbsuffix med varianter för numerus och person har identifierats, dock har deras funktion inte klarlagts helt. Ett system med komplexa adpositioner som i stor utsträckning använder sig av grammatikaliserade kroppsdelstermer har också undersökts, inom vilket fenomenet av prepositionskongruens i vissa komplexa adpositioner har påvisats. Grundläggande syntaktiska drag som ordföljd, negation och topikalisering tas också upp. Analysen av Matals lexikon visar att det grundläggande ordförrådet är företrädesvis nedärvt från tidigare språkstadier, men också att ett stort antal lånord i olika semantiska domäner har kommit in i språket.
5 |
Selected Topics in the Grammar of NalcaSvärd, Erik January 2013 (has links)
The present study analyzes a selection of topics in the grammar of Nalca (Mek language; Papua), with a focus on verbs and nominals. No published grammar or dictionary is available for Nalca, but a translation of the New Testament was used as a parallel text. The results showed that Nalca is split-ergative, strongly suffixing and agglutinating, with subject-object-verb (SOV) as the dominant word order. Verbs consist of a stem and a series of suffixes expressing tense/aspect/mood, negation, number and person. The case alignment is ergative-absolutive for nouns, for which syntactic function is indicated by a series of postpositions. These postpositions agree with nouns in gender. Ergativity was not observed for pronouns; while the results were inconclusive, they appeared to show a nominative-accusative case alignment. The numeral system is an extended body-part system with the base 27. Many of the features found in Nalca are comparable with other Mek languages, with the gender system and split-ergativity being two major exceptions. Finally, the use of the New Testament as a parallel text was a success, with a basic description of the grammar of Nalca having been made, although further investigation is needed. / Denna studie analyserar ett urval av områden i nalcas (mekspråk; Papua) grammatik, med fokus på verb och nominaler. Det finns ingen publicerad grammatik eller ordlista tillgänglig för nalca, men en översättning av Nya Testamentet användes som parallelltext. Resutltaten visade att nalca är split-ergativt, starkt suffigerande och agglutinerande, med subjekt-objekt-verb (SOV) som dominerande ordföljd. Verb består av en stam och en serie suffix som uttrycker tempus/aspekt/modus, negation, numerus och person. Argumentstrukturen är ergativ-absolutiv för substantiv, för vilka syntaktisk funktion indikeras av en serie postpositioner. Dessa postpositioner kongruerar med substantiven efter genus. Ergativitet observerades inte för pronomen; trots att resultaten inte var slutgiltiga, tycktes dessa istället uppvisa ett nominativ-ackusativt system. Det numeriska systemet är ett utökat kroppsdelssystem med basen 27. Många av karaktärsdragen i nalca hade motsvarigheter i de andra mekspråken, med genussystemet och split-ergativiteten som de största undantagen. Användandet av Nya Testamentet som parallelltext visade sig vara lyckat, eftersom en grundläggande beskrivning av nalcas grammatik åstadkoms, även om ytterligare forskning krävs.
Page generated in 0.3264 seconds