Spelling suggestions: "subject:"homographs"" "subject:"tomographs""
1 |
Mathematical modelling of some aspects of stressing a Lithuanian text / Kai kurių lietuvių kalbos teksto kirčiavimo aspektų matematinis modeliavimasAnbinderis, Tomas 02 July 2010 (has links)
The present dissertation deals with one of the speech synthesizer components – automatic stressing of a text and two other goals relating to it – homographs (words that can be stressed in several ways) disambiguation and a search for clitics (unstressed words).
The method, which by means of decision trees finds sequences of letters that unambiguously define the word stressing, was applied to stress a Lithuanian text. Decision trees were created using large corpus of stressed words. Stressing rules based on sequences of letters at the beginning, ending and in the middle of a word have been formulated. The algorithm proposed reaches the accuracy of about 95.5%.
The homograph disambiguation algorithm proposed by the present author is based on frequencies of lexemes and morphological features, that were obtained from corpus containing about one million words. Such methods were not used for Lithuanian language so far. The proposed algorithm enables to select the correct variant of stressing within the accuracy of 85.01%.
Besides the author proposes methods of four types to search for the clitics in a Lithuanian text: methods based on recognising the combinational forms, based on statistical stressed/unstressed frequency of a word, grammar rules and stressing of the adjacent words. It is explained how to unite all the methods into a single algorithm. 4.1% of errors was obtained for the testing data among all the words, and the ratio of errors and unstressed words accounts for 18... [to full text] / Disertacijoje nagrinėjama viena iš kalbos sintezatoriaus sudedamųjų dalių – teksto automatinis kirčiavimas, bei su kirčiavimu susiję kiti uždaviniai: vienodai rašomų, bet skirtingai tariamų, žodžių (homografų) vienareikšminimas bei prie gretimo žodžio prišlijusių bekirčių žodžių (klitikų) paieška. Teksto kirčiavimui pritaikytas metodas, kuris naudodamas sprendimų medžius randa raidžių sekas, vienareikšmiai nusakančias žodžio kirčiavimą. Sprendimo medžiams sudaryti buvo naudojamas didelies apimties sukirčiuotų žodžių tekstynas. Buvo sudarytos kirčiavimo taisyklės remiantis raidžių sekomis žodžių pradžioje, pabaigoje ir viduryje. Pasiūlytas kirčiavimo algoritmas pasiekia apie 95,5% tikslumą. Homografams vienareikšminti pritaikyti iki šiol lietuvių kalbai nenaudoti metodai, pagrįsti leksemų ir morfologinių pažymų vartosenos dažniais, gautais iš vieno milijono žodžių tekstyno. Darbe parodyta, kad morfologinių pažymų dažniai yra svarbesni už leksemų dažnius. Pasiūlyti metodai leido homografus vienareikšminti 85,01% tikslumu. Klitikų paieškai pasiūlyti metodai, kurie remiasi: 1) samplaikinių formų atpažinimu, 2) statistiniu žodžio kirčiavimo/nekirčiavimo dažniu, 3) kai kuriomis gramatikos taisyklėmis bei 4) gretimų žodžių kirčių pasiskirstymu (ritmika). Paaiškinta, kaip visus metodus sujungti į vieną algoritmą. Pritaikius šį algoritmą testavimo duomenims, klaidų ir visų žodžių santykis buvo 4,1%, o klaidų ir nekirčiuotų žodžių santykis – 18,8%.
|
2 |
Kai kurių lietuvių kalbos teksto kirčiavimo aspektų matematinis modeliavimas / Mathematical modelling of some aspects of stressing a Lithuanian textAnbinderis, Tomas 02 July 2010 (has links)
Disertacijoje nagrinėjama viena iš kalbos sintezatoriaus sudedamųjų dalių – teksto automatinis kirčiavimas, bei su kirčiavimu susiję kiti uždaviniai: vienodai rašomų, bet skirtingai tariamų, žodžių (homografų) vienareikšminimas bei prie gretimo žodžio prišlijusių bekirčių žodžių (klitikų) paieška. Teksto kirčiavimui pritaikytas metodas, kuris naudodamas sprendimų medžius randa raidžių sekas, vienareikšmiai nusakančias žodžio kirčiavimą. Sprendimo medžiams sudaryti buvo naudojamas didelies apimties sukirčiuotų žodžių tekstynas. Buvo sudarytos kirčiavimo taisyklės remiantis raidžių sekomis žodžių pradžioje, pabaigoje ir viduryje. Pasiūlytas kirčiavimo algoritmas pasiekia apie 95,5% tikslumą. Homografams vienareikšminti pritaikyti iki šiol lietuvių kalbai nenaudoti metodai, pagrįsti leksemų ir morfologinių pažymų vartosenos dažniais, gautais iš vieno milijono žodžių tekstyno. Darbe parodyta, kad morfologinių pažymų dažniai yra svarbesni už leksemų dažnius. Pasiūlyti metodai leido homografus vienareikšminti 85,01% tikslumu. Klitikų paieškai pasiūlyti metodai, kurie remiasi: 1) samplaikinių formų atpažinimu, 2) statistiniu žodžio kirčiavimo/nekirčiavimo dažniu, 3) kai kuriomis gramatikos taisyklėmis bei 4) gretimų žodžių kirčių pasiskirstymu (ritmika). Paaiškinta, kaip visus metodus sujungti į vieną algoritmą. Pritaikius šį algoritmą testavimo duomenims, klaidų ir visų žodžių santykis buvo 4,1%, o klaidų ir nekirčiuotų žodžių santykis – 18,8%. / The present dissertation deals with one of the speech synthesizer components – automatic stressing of a text and two other goals relating to it – homographs (words that can be stressed in several ways) disambiguation and a search for clitics (unstressed words).
The method, which by means of decision trees finds sequences of letters that unambiguously define the word stressing, was applied to stress a Lithuanian text. Decision trees were created using large corpus of stressed words. Stressing rules based on sequences of letters at the beginning, ending and in the middle of a word have been formulated. The algorithm proposed reaches the accuracy of about 95.5%.
The homograph disambiguation algorithm proposed by the present author is based on frequencies of lexemes and morphological features, that were obtained from corpus containing about one million words. Such methods were not used for Lithuanian language so far. The proposed algorithm enables to select the correct variant of stressing within the accuracy of 85.01%.
Besides the author proposes methods of four types to search for the clitics in a Lithuanian text: methods based on recognising the combinational forms, based on statistical stressed/unstressed frequency of a word, grammar rules and stressing of the adjacent words. It is explained how to unite all the methods into a single algorithm. 4.1% of errors was obtained for the testing data among all the words, and the ratio of errors and unstressed words accounts for 18.8%... [to full text]
|
3 |
Ambiguity in XiTsongaHlongwana, Colfar January 2015 (has links)
Thesis (M.A. (Translation studies and Lingustics)) --University of Limpopo, 2015 / The aim of this study is to investigate ambiguity in Xitsonga. There are many kinds of ambiguity, but the study mainly focuses on lexical and structural ambiguity. Lexical ambiguity occurs at word level and is caused by homonyms (homophones and homographs) and polysemes. Structural ambiguity occurs at sentence level. This kind of ambiguity manifests in the structure of the sentence itself. Data were collected through self-observation as a native Xitsonga speaker. Words and sentences with multiple meanings in Xitsonga were listed and tree diagrams were used to illustrate and disambiguate ambiguity. The study reveals that, like other languages, Xitsonga has words and sentences with double or many meanings.
KEYWORDS
AMBIGUITY, LEXICAL AMBIGUITY, STRUCTURAL AMBIGUITY, HOMONYM, HOMOPHONES, HOMOGRAPHS, POLYSEMES.
|
4 |
Exploring the effect of stimulus list composition on the Cognate Facilitation Effect in bilingual lexical decision : A study of Danish-Swedish bilingualsAnagnostopoulou, Revekka Christina January 2022 (has links)
Cognate words have a shared orthographic and semantic representation across languages: kniv (‘knife’) in Danish means the same as kniv in Swedish. Their shared form and meaning give cognates a special status in the bilingual mental lexicon and there is robust evidence that because of this special status they are processed faster than non-cognate words. This effect is called the Cognate Facilitation Effect and represents strong evidence that bilinguals do not have two separate mental lexicons, but rather one integrated lexicon for both of their languages with nonselective access. The present study is a replication of Vanlangendonck et al. (2020) with a different language constellation. For the aims of this project, early and late Danish-Swedish bilinguals were recruited to examine the effect of stimulus list composition on the Cognate Facilitation Effect by means of two experiments: one language-specific visual lexical decision task that contained control words from the participants’ L2 (Swedish), a set of cognates, interlingual homographs and pseudowords, and a second task in which half of the pseudowords were replaced by Danish (L2) words that had to evoke a “no” response. This change from a pure to a mixed list was expected to increase response competition and turn cognate facilitation into inhibition. However, the results showed a null Cognate Facilitation Effect both for early and for late bilinguals. These findings are discussed in terms of the assumptions of the BIA+ model of bilingual lexical processing and it is suggested that the presence of language-specific diacritics in the stimulus list has hindered the emergence of the Cognate Facilitation Effect.
|
5 |
語境限制與第二語言能力對雙語詞彙觸接的影響:日中雙語者的眼動研究證據 / The influence of contextual constraint and L2 proficiency on bilingual lexical access: evidence from eye movements of Japanese-Chinese bilinguals翁翊倫, Weng, Yi Lun Unknown Date (has links)
過去學者們對於雙語詞彙觸接歷程持有兩種相異的觀點:選擇性觸接假設(selective access hypothesis)認為雙語者在進行詞彙觸接時,只有符合語境的目標語言才會被激發;非選擇性觸接假設(non-selective access hypothesis)則認為雙語者的兩種語言表徵會同時被激發而產生競爭或促進效果。至今已有眾多研究結果支持雙語詞彙觸接歷程為非選擇性,然而,這些研究大多採用促發典範(priming paradigm)忽略語境在雙語詞彙觸接歷程所扮演的角色,且多數實驗中的受試者二語能力皆相當流利,對於二語能力個別差異對詞彙觸接歷程的影響也尚未清楚。此外,以非拼音文字系統為研究對象的相關雙語研究也不多。因此,本研究旨在從非拼音文字的角度探討語境限制及中文能力在雙語詞彙觸接中所扮演的角色,實驗操弄語境限制程度(高限制、低限制)及詞彙類型(同形同義詞、同形異義詞、中文單義詞),以日中雙語者為研究對象,控制句子語境呈現中立或偏向目標詞中文語意,使用眼動實驗來即時記錄受試者在進行詞彙觸接的過程,檢視中文能力對跨語言同形詞效果在高、低限制語境下的影響性。此外,本研究也分別以高低分組與眼動表現兩種方法當作中文能力指標進行分析,並將結果進行比較,以瞭解何種中文能力指標能夠較準確反映出受試者在閱讀中文篇章的能力。
實驗結果顯示,雙語詞彙觸接歷程為非選擇性,中文能力和語境限制能夠對詞彙觸接歷程造成影響,使得跨語言同形詞效果產生消長。首先,在高低分組結果方面,中文能力指標和各效果主要在晚期詞彙處理階段產生交互作用,高程度組在高限制語境下觀察到形同異義效果,低限制語境則沒有看到任何效果;低程度組在高、低限制語境下皆觀察到顯著的形同義同效果。另一方面,以眼動表現作為中文能力指標的分析結果中,則清楚中文能力在早期詞彙觸接階段就已經和語境限制、跨語言同形詞效果產生影響性,顯示眼動表現能夠視為測量中文能力的指標之一。總而言之,不同的分析結果皆反映雙語詞彙觸接歷程為非選擇性,語境與中文能力在語意提取歷程中扮演重要角色,中文能力較好的雙語者在早期詞彙觸接階段就會受到語境限制影響,而中文能力較低者則是在晚期階段受到語境影響。 / For decades, psycholinguists have disputes on the organization of the two language systems of bilinguals’ brain and how they retrieve lexical representations. The selective access hypothesis predicts that two languages are independent in the brain and bilinguals activate only one lexicon at a time while reading or speaking. Alternatively, non-selective access hypothesis predicts that two languages share an integrated conception representation, so representations from both languages are accessed simultaneously during comprehension. So far, many bilingual studies have demonstrated that bilingual lexical access is non-selective. However, these studies usually used priming paradigm such as lexical decision task which words are presented in isolation, ignoring the role of context in the bilingual lexical access processing. According to the monolingual literature, it is clear that lexical ambiguity resolution is influenced by the surrounding sentence context. While most of the previous studies investigated highly proficient bilinguals, the same question about non-selective access could also be asked of less proficient bilinguals. Moreover, most of results are based on alphabetic writing systems such as English-French or Dutch; only few of them examined the non-alphabetic systems. Besides, since bilingual experience is dynamic and poses a challenge for researchers to develop instruments that capture its relevant dimensions. The present study also examined the result of language proficiency from class level and eye movement indexs to confirm which one is more accurate.The present study aimed to examine whether Japanese-Chinese bilingual lexical access is non-selective and whether the context and L2 proficiency modulate the word recognition processing. Experiment manipulated contextual constraint (high or low constraint) and target word types (cognates, interlingual homographs, or Chinese words), using eye movement recordings to investigate the effects of contextual constraint for bilingual lexical access when reading Chinese sentences by Japanese-Chinese bilinguals, L1 and L2 proficiency were measured.
The results support the non-selective hypothesis. Both sentence context and L2 proficiency could affect the bilingual lexical access. According to class level analysis, L2 proficiency has significant interaction with other effects in the late processing stage. The eye movement measures that reflects early processing of target words showed significant interlingual homograph interference and cognate facilitation in the higher proficient bilinguals. However, only cognate facilitation was observed for high-constraint sentences in the lower proficient bilinguals and no effect was founded in the low-constraint sentences. On the other hand, the eye movement index analysis showed L2 proficiency has significant interaction with other effects in the early processing stage, demonstrating the L2 reading proficiency can be measured by eye movement index. In summary, both sentence context and L2 proficiency can modulate bilingual lexical access. The early process is non-selective and bilinguals with more L2 proficiency could make use of sentence context in the early process than less L2 proficiency when reading L2 sentences.
|
Page generated in 0.0263 seconds