Spelling suggestions: "subject:"multiword"" "subject:"buzzword""
1 |
Unknown word sequences in HPSGMielens, Jason David 06 October 2014 (has links)
This work consists of an investigation into the properties of unknown words in HPSG, and in particular into the phenomenon of multi-word unknown expressions consisting of multiple unknown words in a sequence. The work presented consists first of a study determining the relative frequency of multi-word unknown expressions, and then a survey of the efficacy of a variety of techniques for handling these expressions. The techniques presented consist of modified versions of techniques from the existing unknown-word prediction literature as well as novel techniques, and they are evaluated with a specific concern for how they fare in the context of sentences with many unknown words and long unknown sequences. / text
|
2 |
Automatic Multi-word Term Extraction and its Application to Web-page SummarizationHuo, Weiwei 20 December 2012 (has links)
In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification.
We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
|
3 |
Víceslovné lexikální jednotky v Calvinově Il sentiero dei nidi di ragno a jejich protějšky v českém překladu / Multi-Word Expressions in Calvino's Il sentiero dei nidi di ragno and Their Equivalents in Czech TranslationEbrová, Agáta January 2021 (has links)
This diploma thesis is embodied in a wider phraseological project CREAMY (Calvino REpertoire for the Analysis of Multilingual PhraseologY) solved at the University of Rome La Sapienza. The aim of the thesis was to compare the Italian multi-word expressions from the novel Il sentiero dei nidi di ragno, written by Italo Calvino, with their counterparts in the Czech translation by Libor Piruchta, with the aid of a data-set obtained through the phraseological web database CREAMY. Processing of part of the Czech entries into the database was integral to writing the thesis. The work is divided into theoretical and practical part. The first chapter of the theoretical part provides basic information about the CREAMY project and the web application of the same name, which is the main tool used in the research within the project. The second chapter deals with the basic typological properties of the studied languages with emphasis on morphosyntax and word formation. The third chapter is devoted to multi-word expressions and their conception in the Italian and Czech linguistic tradition. The introductory chapter of the practical part describes the procedure of entry processing in the CREAMY application. In this chapter, we present two specific examples of processed entries but we also point out the...
|
4 |
Lietuvių kalbos samplaikos / Multi-word lexemes in the Lithuanian languageKovalevskaitė, Jolanta 12 April 2012 (has links)
Darbo objektas yra lietuvių kalbos samplaikos, apibrėžiamos kaip dvižodžiai ar ilgesni iš kaitomų ir nekaitomų žodžių sudaryti stabilieji junginiai, sudarantys vientisos reikšmės leksinį vienetą, kuris dažniausiai vartojamas nesavarankiškos (tarnybinės) kalbos dalies funkcija. Disertacijos tyrimo tikslas – ištirti lietuvių kalbos samplaikų, kaip leksinio vieneto, pasižyminčio formos ir turinio stabilumu, autonomiškumą.
Darbo šaltiniai: neanotuotas Dabartinės lietuvių kalbos tekstynas, morfologiškai anotuotas lietuvių kalbos tekstynas ir lygiagretusis vokiečių–lietuvių kalbų tekstynas. Darbo metodai: aprašomasis metodas, tekstynų lingvistikos metodas, statistiniai metodai, gretinamasis metodas.
Ginamieji teiginiai:
1. Remiantis išplėstąja frazeologijos samprata, samplaikos yra sustabarėjusių kalbos vienetų tipas, laikomas frazeologijos objektu nuo tada, kai tekstynų analize įrodytas šių junginių dažnumas ir vartojimo pastovumas.
2. Samplaikų stabilumas yra nevienodas. Samplaikų dėmenų traukos įverčio ir morfologinės paradigmos nuokrypio tyrimas rodo, kad samplaikų stabilumo laipsnį lemia samplaikų sandara.
3. Samplaikų kontekstui būdingas stabilumas arba kintamumas. Stabilesnių samplaikų kontekstas kintamas, todėl jos yra autonomiškesnės. Mažesniu stabilumu pasižyminčios samplaikos, kurių kontekstas labiau apibrėžtas, yra ne tokios autonomiškos.
4. Autonomiškesnės samplaikos labiau linkusios būti vertimo vienetais nei mažiau autonomiškos. Kuo samplaika autonomiškesnė, tuo... [toliau žr. visą tekstą] / The object of the study is multi-word lexemes (samplaikos in Lithuanian), defined as combinations composed of two or more inflective or non-inflective parts of speech, grammatically and semantically perceived as one unit. The goal of the dissertation is to investigate the autonomy of multi-word lexemes in the Lithuanian language. Two monolingual corpora (the non-annotated Corpus of the Contemporary Lithuanian Language and the morphologically annotated Lithuanian language corpus) and the parallel German-Lithuanian corpus have been used for the extraction and the analysis of data. Several research methods have been applied: descriptive, corpus-based, statistical, and contrastive.
The statements to be defended are as follows:
1. According to the broad conception of phraseology, multi-word lexemes are a subtype of multi-word units. They are considered to be an object of phraseology, since their frequency and fixedness have been confirmed by corpus analysis. 2. There are variations in the degree of fixedness of multi-word lexemes. The analysis of collocation strength between the elements of multi-word lexemes and of deviations in morphological paradigm indicates that the degree of fixedness of multi-word lexemes is largely determined by their composition. 3. The context of multi-word lexemes is characterized by stability or variability. The context of more stable multi-word lexemes is variable, which determines their greater autonomy. Less stable multi-word lexemes that occur in... [to full text]
|
5 |
Multi-word lexemes in the Lithuanian Language / Lietuvių kalbos samplaikosKovalevskaitė, Jolanta 12 April 2012 (has links)
The object of the study is multi-word lexemes (samplaikos in Lithuanian), defined as combinations composed of two or more inflective or non-inflective parts of speech, grammatically and semantically perceived as one unit. The goal of the dissertation is to investigate the autonomy of multi-word lexemes in the Lithuanian language. Two monolingual corpora (the non-annotated Corpus of the Contemporary Lithuanian Language and the morphologically annotated Lithuanian language corpus) and the parallel German-Lithuanian corpus have been used for the extraction and the analysis of data. Several research methods have been applied: descriptive, corpus-based, statistical, and contrastive.
The statements to be defended are as follows:
1. According to the broad conception of phraseology, multi-word lexemes are a subtype of multi-word units. They are considered to be an object of phraseology, since their frequency and fixedness have been confirmed by corpus analysis. 2. There are variations in the degree of fixedness of multi-word lexemes. The analysis of collocation strength between the elements of multi-word lexemes and of deviations in morphological paradigm indicates that the degree of fixedness of multi-word lexemes is largely determined by their composition. 3. The context of multi-word lexemes is characterized by stability or variability. The context of more stable multi-word lexemes is variable, which determines their greater autonomy. Less stable multi-word lexemes that occur in... [to full text] / Darbo objektas yra lietuvių kalbos samplaikos, apibrėžiamos kaip dvižodžiai ar ilgesni iš kaitomų ir nekaitomų žodžių sudaryti stabilieji junginiai, sudarantys vientisos reikšmės leksinį vienetą, kuris dažniausiai vartojamas nesavarankiškos (tarnybinės) kalbos dalies funkcija. Disertacijos tyrimo tikslas – ištirti lietuvių kalbos samplaikų, kaip leksinio vieneto, pasižyminčio formos ir turinio stabilumu, autonomiškumą.
Darbo šaltiniai: neanotuotas Dabartinės lietuvių kalbos tekstynas, morfologiškai anotuotas lietuvių kalbos tekstynas ir lygiagretusis vokiečių–lietuvių kalbų tekstynas. Darbo metodai: aprašomasis metodas, tekstynų lingvistikos metodas, statistiniai metodai, gretinamasis metodas.
Ginamieji teiginiai:
1. Remiantis išplėstąja frazeologijos samprata, samplaikos yra sustabarėjusių kalbos vienetų tipas, laikomas frazeologijos objektu nuo tada, kai tekstynų analize įrodytas šių junginių dažnumas ir vartojimo pastovumas.
2. Samplaikų stabilumas yra nevienodas. Samplaikų dėmenų traukos įverčio ir morfologinės paradigmos nuokrypio tyrimas rodo, kad samplaikų stabilumo laipsnį lemia samplaikų sandara.
3. Samplaikų kontekstui būdingas stabilumas arba kintamumas. Stabilesnių samplaikų kontekstas kintamas, todėl jos yra autonomiškesnės. Mažesniu stabilumu pasižyminčios samplaikos, kurių kontekstas labiau apibrėžtas, yra ne tokios autonomiškos.
4. Autonomiškesnės samplaikos labiau linkusios būti vertimo vienetais nei mažiau autonomiškos. Kuo samplaika autonomiškesnė, tuo... [toliau žr. visą tekstą]
|
6 |
Collocation Segmentation for Text Chunking / Teksto skaidymas pastoviųjų junginių segmentaisDaudaravičius, Vidas 04 February 2013 (has links)
Segmentation is a widely used paradigm in text processing. Rule-based, statistical and hybrid methods are employed to perform the segmentation. This dissertation introduces a new type of segmentation - collocation segmentation - and a new method to perform it, and applies them to three different text processing tasks. In lexicography, collocation segmentation makes possible the use of large corpora to evaluate the usage and importance of terminology over time. Text categorization results can be improved using collocation segmentation. The study shows that collocation segmentation, without any other language resources, achieves better results than the widely used n-gram techniques together with POS (Part-of-Speech) processing tools. Also, the preprocessing of data with collocation segmentation and subsequent integration of these segments into a Statistical Machine Translation system improves the translation results. Diverse word combinability measures variously influence the final collocation segmentation and, thus, the translation results. The new collocation segmentation method is simple, efficient and applicable to language processing for diverse applications. / Teksto skaidymo įvairaus tipo segmentais metodai yra plačiai naudojami teksto apdorojimui. Segmentuojant naudojami tiek statistiniai, tiek formalieji metodai. Disertacijoje pristatomas naujas segmentavimo tipas ir metodas - segmentavimas pastoviaisiais junginiais - ir pateikiami taikymai įvairiose teksto apdorojimo srityse. Taikant pastoviųjų junginių segmentavimą leksikografijoje atskleidžiama, kaip objektyviai ir greitai galima analizuoti labai didelius tekstų archyvus aptinkant vartojamą terminiją ir šių automatiškai identifikuotų terminų svarbumą ir kaitą laiko tėkmėje. Ši analizė leidžia greitai nustatyti svarbius metodologinius pokyčius mokslinių tyrimų istorijoje ir nustatyti pastarojo meto aktualias tyrimų sritis. Tekstų klasifikavimo taikyme atskleidžiama, kaip taikant segmentavimą pastoviaisiais junginiais galima pagerinti tekstų klasifikavimo rezultatus. Taip pat, pasitelkiant segmentavimą pastoviaisiais junginiais, atskleidžiama, kad nežymiai galima pagerinti statistinio mašininio vertimo kokybę, ir atskleidžiama įvairių žodžių junglumo įverčių įtaka segmentavimui pastoviaisiais junginiais. Naujas teksto skaidymo pastoviaisiais junginiais metodas atskleidžia naujas galimybes gerinti teksto apdorojimo rezultatus įvairiuose taikymuose ir įvairiose kalbose.
|
7 |
Teksto skaidymas pastoviųjų junginių segmentais / Collocation segmentation for text chunkingDaudaravičius, Vidas 04 February 2013 (has links)
Teksto skaidymo įvairaus tipo segmentais metodai yra plačiai naudojami teksto apdorojimui. Segmentuojant naudojami tiek statistiniai, tiek formalieji metodai. Disertacijoje pristatomas naujas segmentavimo tipas ir metodas - segmentavimas pastoviaisiais junginiais - ir pateikiami taikymai įvairiose teksto apdorojimo srityse. Taikant pastoviųjų junginių segmentavimą leksikografijoje atskleidžiama, kaip objektyviai ir greitai galima analizuoti labai didelius tekstų archyvus aptinkant vartojamą terminiją ir šių automatiškai identifikuotų terminų svarbumą ir kaitą laiko tėkmėje. Ši analizė leidžia greitai nustatyti svarbius metodologinius pokyčius mokslinių tyrimų istorijoje ir nustatyti pastarojo meto aktualias tyrimų sritis. Tekstų klasifikavimo taikyme atskleidžiama, kaip taikant segmentavimą pastoviaisiais junginiais galima pagerinti tekstų klasifikavimo rezultatus. Taip pat, pasitelkiant segmentavimą pastoviaisiais junginiais, atskleidžiama, kad nežymiai galima pagerinti statistinio mašininio vertimo kokybę, ir atskleidžiama įvairių žodžių junglumo įverčių įtaka segmentavimui pastoviaisiais junginiais. Naujas teksto skaidymo pastoviaisiais junginiais metodas atskleidžia naujas galimybes gerinti teksto apdorojimo rezultatus įvairiuose taikymuose ir įvairiose kalbose. / Segmentation is a widely used paradigm in text processing. Rule-based, statistical and hybrid methods are employed to perform the segmentation. This dissertation introduces a new type of segmentation - collocation segmentation - and a new method to perform it, and applies them to three different text processing tasks. In lexicography, collocation segmentation makes possible the use of large corpora to evaluate the usage and importance of terminology over time. Text categorization results can be improved using collocation segmentation. The study shows that collocation segmentation, without any other language resources, achieves better results than the widely used n-gram techniques together with POS (Part-of-Speech) processing tools. Also, the preprocessing of data with collocation segmentation and subsequent integration of these segments into a Statistical Machine Translation system improves the translation results. Diverse word combinability measures variously influence the final collocation segmentation and, thus, the translation results. The new collocation segmentation method is simple, efficient and applicable to language processing for diverse applications.
|
8 |
'Enxergando' as colocações: para ajudar a vencer o medo de um texto autêntico. / Learning collocations: to help read a text.Louro, Inês da Conceição dos Anjos 27 August 2001 (has links)
Este trabalho lida com unidades lexicais compostas por mais de uma palavra usadas com função referencial,ou seja, cada uma dessas unidades lexicais constitui um nome. Em uma sala de aula de ensino de língua inglesa para brasileiros, observou-se como o fato de o aluno 'enxergar' essas unidades lexicais pode ajudá-lo a ler um texto. / This study is about multi-word lexical units which have referential meaning, i.e., each unit is used as a name. In an English teaching classroom for Brazilian students it was noticed that making students aware of such lexical units may help them read a text.
|
9 |
'Enxergando' as colocações: para ajudar a vencer o medo de um texto autêntico. / Learning collocations: to help read a text.Inês da Conceição dos Anjos Louro 27 August 2001 (has links)
Este trabalho lida com unidades lexicais compostas por mais de uma palavra usadas com função referencial,ou seja, cada uma dessas unidades lexicais constitui um nome. Em uma sala de aula de ensino de língua inglesa para brasileiros, observou-se como o fato de o aluno 'enxergar' essas unidades lexicais pode ajudá-lo a ler um texto. / This study is about multi-word lexical units which have referential meaning, i.e., each unit is used as a name. In an English teaching classroom for Brazilian students it was noticed that making students aware of such lexical units may help them read a text.
|
10 |
Περιγραφή και ανάλυση των χαλαρών πολυλεκτικών συνθέτων της νέας ελληνικήςΚολιοπούλου, Μαρία 01 September 2008 (has links)
Η εργασία αυτή κινείται στα όρια της μορφολογίας με τη σύνταξη και έχει ως απώτερο στόχο να προσδιορίσει τη θέση της μορφολογίας στο πλαίσιο της γραμματικής. Οι δομές που μελετώνται, τα χαλαρά πολυλεκτικά σύνθετα (βλ. Ράλλη, 2005, προσεχώς), έχουν τη μορφή [Ε Ο] και [Ο Ο σε γενική]. Ανήκουν στην κατηγορία των συνθέτων και ορίζονται ως μορφολογικοί σχηματισμοί. Σε αντίθεση όμως με τα κλασικά σύνθετα, εμφανίζουν συντακτικά χαρακτηριστικά. Στο πλαίσιο επομένως ενός δομικού συνεχούς (Ράλλη 2005) οι σχηματισμοί αυτοί θα μπορούσαν να τοποθετηθούν εντός της μορφολογίας, στα όρια όμως με τη σύνταξη.
Στην εργασία αυτή θα αναλυθούν οι ξεχωριστές ιδιότητες των χαλαρών πολυλεκτικών συνθέτων και θα διαφοροποιηθούν τόσο από τις ονοματικές φράσεις, όσο και από τις ενδιάμεσες δομές. Η διαφοροποίηση των τριών ειδών δεν αποκλείει όμως την κοινή αναπαράστασή τους, με τη μορφή σχεδίων παραγωγής λέξεων, σε ένα ιεραρχημένο δομησιολόγιο, έτσι όπως προτείνεται από το θεωρητικό πλαίσιο της Δομησιακής Γραμματικής (Construction Grammar, Booij 2005). Στο ίδιο δομησιολόγιο εντάσσονται και οι σχηματισμοί της μορφής [Ο Ο σε ονομαστική], για τους οποίους υποστηρίζεται ότι βρίσκονται σε διαδικασία γλωσσικής αλλαγής και συγκεκριμένα μορφολογοποίησης. / The main object of my assignment is the analysis of loose multi-word compounds in Greek. There are two possible structures of multi-word compounds in Greek. So they appear combinations of adjective and noun, for example «μαύρη λίστα» (black list) and combinations of a noun in nominative and an other one in genitive, for example «μηχανικός αυτοκινήτων» (engineer of cars). It is argued that multi-word compounds are object of morphological analysis because they share a lot of common characteristics with the classical one-word compounds. I could mention their non compositional meaning. At the same time they share some syntactic features or properties. These compounds consist of two words like a syntactic construction. There is also agreement or inflection. So the object of this work is considered to be the interaction between morphology and syntax, because multi-word compounds in Greek are morphological constructions, in which syntax has some kind of authority. In this analysis are used some tests in order to distinguish similar formations, like noun phrases or compound-like phrases. Even though these kinds of structures do not belong in morphology, are represented in the same way in a hierarchical construction. The same fact occurs also for structures, which consist of two nouns both in nominative, for example «άνθρωπος αράχνη» (man spider). This kind of structure is considered to be under morphologization.
|
Page generated in 0.0455 seconds