Global ETD Search

Return to search

Automatsko određivanje vrsta riječi u morfološki složenom jeziku / Automatic parts of speech determination in amorphologically complex language

Istraţivanje je imalo za cilj da provjeri u kojoj mjeri se na&scaron; kognitivni sistem moţe osloniti na fonotaktiĉke informacije, tj. moguće/dozvoljene kombinacije fonema/ grafema, u zadacima automatske percepcije i produkcije rijeĉi u jezicima sa bogatom infleksionom morfologijom. Da bi se dobio odgovor na to pitanje, sprovedene su tri studije. U prvoj studiji, uz pomoć ma&scaron;ina sa vektorima podr&scaron;ke (SVM), obavljena je diskriminacija promjenljivih vrsta rijeĉi. U drugoj studiji, produkcija infleksionih oblika rijeĉi izvedena je pomoću uĉenja zasnovanog na memoriji (MBL). Na osnovu rezultata iz druge studije, izveden je eksperiment u kojem se traţila potvrda kognitivne vjerodostojnosti modela i kori&scaron;ćenih informacija. Diskriminacija promjenljivih vrsta rijeĉi obavljena je na osnovu dozvoljenih sekvenci dva i tri grafema/fonema (tzv. bigrama i trigrama), ĉije su frekvencije javljanja unutar pojedinaĉnih gramatiĉkih tipova izraĉunate u zavisnosti od njihovog poloţaja u rijeĉima: na poĉetku, na kraju, unutar rijeĉi, svi zajedno. Maksimalna taĉnost se kretala oko 95% i dobijena je na svim bigramima, uz pomoć RBF jezgrene funkcije. Ovako visok procenat taĉne diskriminacije ukazuje da postoje karakteristiĉne distribucije bigrama za razliĉite vrste promjenljivih rijeĉi. S druge strane, najmanje informativnim su se pokazali bigrami na kraju i na poĉetku rijeĉi. MBL model iskori&scaron;ćen je u zadatku automatske infleksione produkcije, tako &scaron;to je za zadatu rijeĉ, na osnovu fonotaktiĉkih informacija iz posljednja ĉetiri sloga, generisan traţeni infleksioni oblik. Na uzorku od 89024 promjenljivih rijeĉi uzetih iz Frekvencijskog reĉnika dnevne &scaron;tampe srpskog jezika, koristeći metod izostavljanja jednog primjera i konstantu veliĉinu skupa susjeda (k = 7), ostvarena je taĉnost oko 92%. Identifikovano je nekoliko faktora koji su uticali na ovu taĉnost, kao &scaron;to su: vrsta rijeĉi, gramatiĉki tip, naĉin tvorbe i broj primjera u okviru jednog gramatiĉkog tipa, broju izuzetaka, broj fonolo&scaron;kih alternacija itd. U istraţivanju na subjektima, u zadatku leksiĉke odluke, za rijeĉi koje je MBL pogre&scaron;no obradio utvrĊeno je duţe vrijeme obrade. Ovo ukazuje na kognitivnu vjerodostojnost uĉenja zasnovanog na memoriji. Osim toga, potvrĊena je i kognitivna vjerodostojnost fonotaktiĉkih informacija, ovaj put u zadatku razumijevanja jezika. Sveukupno, nalazi dobijeni u ove tri studije govore u prilog teze o znaĉajnoj ulozi fonotaktiĉkih informacija u percepciji i produkciji morfolo&scaron;ki sloţenih rijeĉi. Rezultati, takoĊe, ukazuju na potrebu da se ove informacije uzmu u obzir kada se diskutuje pojavljivanje većih jeziĉkih jedinica i obrazaca. / The study was aimed at testing the extent to which our cognitive system can rely on phonotactic information, i.e., possible/ permissible combinations of phonemes/ graphemes, in the tasks of automatic processing and production of words in languages with rich inflectional morphology. In order to obtain the answer to this question, three studies have been conducted. In the first study, by applying the support vector machines (SVM) the discrimination of part of speech (PoS) with more than one possible meaning (i.e., ambiguous PoS) was performed. In the second study, the production of inflected word forms was done with memory based learning (MBL). Based on the results from the second study, a behavioral experiment was conducted as the third study, to test cognitive plausibility of the MBL performance. The discrimination of ambiguous PoS was performed using permissible sequences of two and three characters/sounds (i.e., bigrams and trigrams), whose frequency of occurrence within individual grammatical types was calculated depending on their position in a word: at the beginning, at the end, and irrespective of position in a word. Maximum accuracy achieved was approximatelly 95%. It was obtained when bigrams irrespective of position in a word were used. SVM model used RBF kernel function. Such high accuracy suggests that brigrams' probability distribution is informative about the types of flective words. Interestingly, the least informative were bigrams at the end and at the beginning of words. The MBL model was used in the task of automatic production of inflected forms, utilizingphonotactic information from the last four syllables. In a sample of 89024 flective words, taken from the Frequency dictionary of Serbian language (daily press), achieved accuracy was 92%. For this result the MBL used leave -one -out method and nearest neighborhood size of 7 (k = 7). Weidentified several factors that have contributed to the accuracy; in particular, part of speech, grammatical type, formation method and number of examples within one grammatical type, number of exceptions, the number of phonological alternations, etc. The visual lexical decision experiment revealed that words that the MBL model produced incorrectly also induced elongated reaction time latencies. Thus, we concluded that the MBL model might be cognitively plausibile. In addition, we reconfirmed informativeness of phonotactic information, this time in human conmprehension task. Overall, findings from three undertaken studies are in favor of phonotactic information for both processing and production of morphologically complex words. Results also suggest a necessity of taking into account this information when discussing emergence of larger units and language patterns.

Identifer	oai:union.ndltd.org:uns.ac.rs/oai:CRISUNS:(BISIS)94868
Date	24 July 2015
Creators	Dimitrijević Strahinja
Contributors	Milin Petar, Filipović-Đurđević Dušica, Kostić Aleksandar
Publisher	Univerzitet u Novom Sadu, Filozofski fakultet u Novom Sadu, University of Novi Sad, Faculty of Philosophy at Novi Sad
Source Sets	University of Novi Sad
Language	Serbian
Detected Language	English
Type	PhD thesis

Page generated in 0.0029 seconds

Automatsko određivanje vrsta riječi u morfološki složenom jeziku / Automatic parts of speech determination in amorphologically complex language

Description

Links & Downloads

Tags

Additional Fields