• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Automatsko određivanje vrsta riječi u morfološki složenom jeziku / Automatic parts of speech determination in amorphologically complex language

Dimitrijević Strahinja 24 July 2015 (has links)
<p>Istraţivanje je imalo za cilj da provjeri u<br />kojoj mjeri se na&scaron; kognitivni sistem moţe<br />osloniti na fonotaktiĉke informacije, tj.<br />moguće/dozvoljene kombinacije fonema/<br />grafema, u zadacima automatske percepcije i<br />produkcije rijeĉi u jezicima sa bogatom<br />infleksionom morfologijom.<br />Da bi se dobio odgovor na to pitanje,<br />sprovedene su tri studije. U prvoj studiji, uz<br />pomoć ma&scaron;ina sa vektorima podr&scaron;ke (SVM),<br />obavljena je diskriminacija promjenljivih<br />vrsta rijeĉi. U drugoj studiji, produkcija<br />infleksionih oblika rijeĉi izvedena je<br />pomoću uĉenja zasnovanog na memoriji<br />(MBL). Na osnovu rezultata iz druge studije,<br />izveden je eksperiment u kojem se traţila<br />potvrda kognitivne vjerodostojnosti modela i<br />kori&scaron;ćenih informacija.<br />Diskriminacija promjenljivih vrsta rijeĉi<br />obavljena je na osnovu dozvoljenih sekvenci<br />dva i tri grafema/fonema (tzv. bigrama i<br />trigrama), ĉije su frekvencije javljanja<br />unutar pojedinaĉnih gramatiĉkih tipova<br />izraĉunate u zavisnosti od njihovog poloţaja<br />u rijeĉima: na poĉetku, na kraju, unutar<br />rijeĉi, svi zajedno. Maksimalna taĉnost se<br />kretala oko 95% i dobijena je na svim<br />bigramima, uz pomoć RBF jezgrene<br />funkcije. Ovako visok procenat taĉne<br />diskriminacije ukazuje da postoje<br />karakteristiĉne distribucije bigrama za<br />razliĉite vrste promjenljivih rijeĉi. S druge<br />strane, najmanje informativnim su se<br />pokazali bigrami na kraju i na poĉetku rijeĉi.<br />MBL model iskori&scaron;ćen je u zadatku<br />automatske infleksione produkcije, tako &scaron;to<br />je za zadatu rijeĉ, na osnovu fonotaktiĉkih<br />informacija iz posljednja ĉetiri sloga,<br />generisan traţeni infleksioni oblik. Na<br />uzorku od 89024 promjenljivih rijeĉi uzetih<br />iz Frekvencijskog reĉnika dnevne &scaron;tampe<br />srpskog jezika, koristeći metod izostavljanja<br />jednog primjera i konstantu veliĉinu skupa<br />susjeda (k = 7), ostvarena je taĉnost oko<br />92%. Identifikovano je nekoliko faktora koji<br />su uticali na ovu taĉnost, kao &scaron;to su: vrsta<br />rijeĉi, gramatiĉki tip, naĉin tvorbe i broj<br />primjera u okviru jednog gramatiĉkog tipa,<br />broju izuzetaka, broj fonolo&scaron;kih alternacija<br />itd.<br />U istraţivanju na subjektima, u zadatku<br />leksiĉke odluke, za rijeĉi koje je MBL<br />pogre&scaron;no obradio utvrĊeno je duţe vrijeme<br />obrade. Ovo ukazuje na kognitivnu<br />vjerodostojnost uĉenja zasnovanog na<br />memoriji. Osim toga, potvrĊena je i<br />kognitivna vjerodostojnost fonotaktiĉkih<br />informacija, ovaj put u zadatku<br />razumijevanja jezika.<br />Sveukupno, nalazi dobijeni u ove tri studije<br />govore u prilog teze o znaĉajnoj ulozi<br />fonotaktiĉkih informacija u percepciji i<br />produkciji morfolo&scaron;ki sloţenih rijeĉi.<br />Rezultati, takoĊe, ukazuju na potrebu da se<br />ove informacije uzmu u obzir kada se<br />diskutuje pojavljivanje većih jeziĉkih<br />jedinica i obrazaca.</p> / <p>The study was aimed at testing the extent to<br />which our cognitive system can rely on<br />phonotactic information, i.e., possible/<br />permissible combinations of phonemes/<br />graphemes, in the tasks of automatic<br />processing and production of words in<br />languages with rich inflectional<br />morphology.<br />In order to obtain the answer to this<br />question, three studies have been conducted.<br />In the first study, by applying the support<br />vector machines (SVM) the discrimination<br />of part of speech (PoS) with more than one<br />possible meaning (i.e., ambiguous PoS) was<br />performed. In the second study, the<br />production of inflected word forms was<br />done with memory based learning (MBL).<br />Based on the results from the second study,<br />a behavioral experiment was conducted as<br />the third study, to test cognitive plausibility<br />of the MBL performance.<br />The discrimination of ambiguous PoS was<br />performed using permissible sequences of<br />two and three characters/sounds (i.e.,<br />bigrams and trigrams), whose frequency of<br />occurrence within individual grammatical<br />types was calculated depending on their<br />position in a word: at the beginning, at the<br />end, and irrespective of position in a word.<br />Maximum accuracy achieved was<br />approximatelly 95%. It was obtained when<br />bigrams irrespective of position in a word<br />were used. SVM model used RBF kernel<br />function. Such high accuracy suggests that<br />brigrams&#39; probability distribution is<br />informative about the types of flective<br />words. Interestingly, the least informative<br />were bigrams at the end and at the beginning<br />of words.<br />The MBL model was used in the task of<br />automatic production of inflected forms,<br />utilizingphonotactic information from the<br />last four syllables. In a sample of 89024<br />flective words, taken from the Frequency<br />dictionary of Serbian language (daily press),<br />achieved accuracy was 92%. For this result<br />the MBL used leave<br />-one<br />-out method and nearest neighborhood size of 7 (k = 7). We</p><p>identified several factors that have<br />contributed to the accuracy; in particular,<br />part of speech, grammatical type, formation<br />method and number of examples within one<br />grammatical type, number of exceptions, the<br />number of phonological alternations, etc.<br />The visual lexical decision experiment<br />revealed that words that the MBL model<br />produced incorrectly also induced elongated<br />reaction time latencies. Thus, we concluded<br />that the MBL model might be cognitively<br />plausibile. In addition, we reconfirmed<br />informativeness of phonotactic information,<br />this time in human conmprehension task.<br />Overall, findings from three undertaken<br />studies are in favor of phonotactic<br />information for both processing and<br />production of morphologically complex<br />words. Results also suggest a necessity of<br />taking into account this information when<br />discussing emergence of larger units and<br />language patterns.</p>

Page generated in 0.1228 seconds