Spelling suggestions: "subject:"1translation unit"" "subject:"atranslation unit""
1 |
Lietuvių kalbos samplaikos / Multi-word lexemes in the Lithuanian languageKovalevskaitė, Jolanta 12 April 2012 (has links)
Darbo objektas yra lietuvių kalbos samplaikos, apibrėžiamos kaip dvižodžiai ar ilgesni iš kaitomų ir nekaitomų žodžių sudaryti stabilieji junginiai, sudarantys vientisos reikšmės leksinį vienetą, kuris dažniausiai vartojamas nesavarankiškos (tarnybinės) kalbos dalies funkcija. Disertacijos tyrimo tikslas – ištirti lietuvių kalbos samplaikų, kaip leksinio vieneto, pasižyminčio formos ir turinio stabilumu, autonomiškumą.
Darbo šaltiniai: neanotuotas Dabartinės lietuvių kalbos tekstynas, morfologiškai anotuotas lietuvių kalbos tekstynas ir lygiagretusis vokiečių–lietuvių kalbų tekstynas. Darbo metodai: aprašomasis metodas, tekstynų lingvistikos metodas, statistiniai metodai, gretinamasis metodas.
Ginamieji teiginiai:
1. Remiantis išplėstąja frazeologijos samprata, samplaikos yra sustabarėjusių kalbos vienetų tipas, laikomas frazeologijos objektu nuo tada, kai tekstynų analize įrodytas šių junginių dažnumas ir vartojimo pastovumas.
2. Samplaikų stabilumas yra nevienodas. Samplaikų dėmenų traukos įverčio ir morfologinės paradigmos nuokrypio tyrimas rodo, kad samplaikų stabilumo laipsnį lemia samplaikų sandara.
3. Samplaikų kontekstui būdingas stabilumas arba kintamumas. Stabilesnių samplaikų kontekstas kintamas, todėl jos yra autonomiškesnės. Mažesniu stabilumu pasižyminčios samplaikos, kurių kontekstas labiau apibrėžtas, yra ne tokios autonomiškos.
4. Autonomiškesnės samplaikos labiau linkusios būti vertimo vienetais nei mažiau autonomiškos. Kuo samplaika autonomiškesnė, tuo... [toliau žr. visą tekstą] / The object of the study is multi-word lexemes (samplaikos in Lithuanian), defined as combinations composed of two or more inflective or non-inflective parts of speech, grammatically and semantically perceived as one unit. The goal of the dissertation is to investigate the autonomy of multi-word lexemes in the Lithuanian language. Two monolingual corpora (the non-annotated Corpus of the Contemporary Lithuanian Language and the morphologically annotated Lithuanian language corpus) and the parallel German-Lithuanian corpus have been used for the extraction and the analysis of data. Several research methods have been applied: descriptive, corpus-based, statistical, and contrastive.
The statements to be defended are as follows:
1. According to the broad conception of phraseology, multi-word lexemes are a subtype of multi-word units. They are considered to be an object of phraseology, since their frequency and fixedness have been confirmed by corpus analysis. 2. There are variations in the degree of fixedness of multi-word lexemes. The analysis of collocation strength between the elements of multi-word lexemes and of deviations in morphological paradigm indicates that the degree of fixedness of multi-word lexemes is largely determined by their composition. 3. The context of multi-word lexemes is characterized by stability or variability. The context of more stable multi-word lexemes is variable, which determines their greater autonomy. Less stable multi-word lexemes that occur in... [to full text]
|
2 |
Multi-word lexemes in the Lithuanian Language / Lietuvių kalbos samplaikosKovalevskaitė, Jolanta 12 April 2012 (has links)
The object of the study is multi-word lexemes (samplaikos in Lithuanian), defined as combinations composed of two or more inflective or non-inflective parts of speech, grammatically and semantically perceived as one unit. The goal of the dissertation is to investigate the autonomy of multi-word lexemes in the Lithuanian language. Two monolingual corpora (the non-annotated Corpus of the Contemporary Lithuanian Language and the morphologically annotated Lithuanian language corpus) and the parallel German-Lithuanian corpus have been used for the extraction and the analysis of data. Several research methods have been applied: descriptive, corpus-based, statistical, and contrastive.
The statements to be defended are as follows:
1. According to the broad conception of phraseology, multi-word lexemes are a subtype of multi-word units. They are considered to be an object of phraseology, since their frequency and fixedness have been confirmed by corpus analysis. 2. There are variations in the degree of fixedness of multi-word lexemes. The analysis of collocation strength between the elements of multi-word lexemes and of deviations in morphological paradigm indicates that the degree of fixedness of multi-word lexemes is largely determined by their composition. 3. The context of multi-word lexemes is characterized by stability or variability. The context of more stable multi-word lexemes is variable, which determines their greater autonomy. Less stable multi-word lexemes that occur in... [to full text] / Darbo objektas yra lietuvių kalbos samplaikos, apibrėžiamos kaip dvižodžiai ar ilgesni iš kaitomų ir nekaitomų žodžių sudaryti stabilieji junginiai, sudarantys vientisos reikšmės leksinį vienetą, kuris dažniausiai vartojamas nesavarankiškos (tarnybinės) kalbos dalies funkcija. Disertacijos tyrimo tikslas – ištirti lietuvių kalbos samplaikų, kaip leksinio vieneto, pasižyminčio formos ir turinio stabilumu, autonomiškumą.
Darbo šaltiniai: neanotuotas Dabartinės lietuvių kalbos tekstynas, morfologiškai anotuotas lietuvių kalbos tekstynas ir lygiagretusis vokiečių–lietuvių kalbų tekstynas. Darbo metodai: aprašomasis metodas, tekstynų lingvistikos metodas, statistiniai metodai, gretinamasis metodas.
Ginamieji teiginiai:
1. Remiantis išplėstąja frazeologijos samprata, samplaikos yra sustabarėjusių kalbos vienetų tipas, laikomas frazeologijos objektu nuo tada, kai tekstynų analize įrodytas šių junginių dažnumas ir vartojimo pastovumas.
2. Samplaikų stabilumas yra nevienodas. Samplaikų dėmenų traukos įverčio ir morfologinės paradigmos nuokrypio tyrimas rodo, kad samplaikų stabilumo laipsnį lemia samplaikų sandara.
3. Samplaikų kontekstui būdingas stabilumas arba kintamumas. Stabilesnių samplaikų kontekstas kintamas, todėl jos yra autonomiškesnės. Mažesniu stabilumu pasižyminčios samplaikos, kurių kontekstas labiau apibrėžtas, yra ne tokios autonomiškos.
4. Autonomiškesnės samplaikos labiau linkusios būti vertimo vienetais nei mažiau autonomiškos. Kuo samplaika autonomiškesnė, tuo... [toliau žr. visą tekstą]
|
3 |
Relating Dependent Terms in Information RetrievalShi, Lixin 11 1900 (has links)
Les moteurs de recherche font partie de notre vie quotidienne. Actuellement, plus d’un tiers de la population mondiale utilise l’Internet. Les moteurs de recherche leur permettent de trouver rapidement les informations ou les produits qu'ils veulent. La recherche d'information (IR) est le fondement de moteurs de recherche modernes. Les approches traditionnelles de recherche d'information supposent que les termes d'indexation sont indépendants. Pourtant, les termes qui apparaissent dans le même contexte sont souvent dépendants. L’absence de la prise en compte de ces dépendances est une des causes de l’introduction de bruit dans le résultat (résultat non pertinents). Certaines études ont proposé d’intégrer certains types de dépendance, tels que la proximité, la cooccurrence, la contiguïté et de la dépendance grammaticale. Dans la plupart des cas, les modèles de dépendance sont construits séparément et ensuite combinés avec le modèle traditionnel de mots avec une importance constante. Par conséquent, ils ne peuvent pas capturer correctement la dépendance variable et la force de dépendance. Par exemple, la dépendance entre les mots adjacents "Black Friday" est plus importante que celle entre les mots "road constructions". Dans cette thèse, nous étudions différentes approches pour capturer les relations des termes et de leurs forces de dépendance. Nous avons proposé des méthodes suivantes: ─ Nous réexaminons l'approche de combinaison en utilisant différentes unités d'indexation pour la RI monolingue en chinois et la RI translinguistique entre anglais et chinois. En plus d’utiliser des mots, nous étudions la possibilité d'utiliser bi-gramme et uni-gramme comme unité de traduction pour le chinois. Plusieurs modèles de traduction sont construits pour traduire des mots anglais en uni-grammes, bi-grammes et mots chinois avec un corpus parallèle. Une requête en anglais est ensuite traduite de plusieurs façons, et un score classement est produit avec chaque traduction. Le score final de classement combine tous ces types de traduction. Nous considérons la dépendance entre les termes en utilisant la théorie d’évidence de Dempster-Shafer. Une occurrence d'un fragment de texte (de plusieurs mots) dans un document est considérée comme représentant l'ensemble de tous les termes constituants. La probabilité est assignée à un tel ensemble de termes plutôt qu’a chaque terme individuel. Au moment d’évaluation de requête, cette probabilité est redistribuée aux termes de la requête si ces derniers sont différents. Cette approche nous permet d'intégrer les relations de dépendance entre les termes. Nous proposons un modèle discriminant pour intégrer les différentes types de dépendance selon leur force et leur utilité pour la RI. Notamment, nous considérons la dépendance de contiguïté et de cooccurrence à de différentes distances, c’est-à-dire les bi-grammes et les paires de termes dans une fenêtre de 2, 4, 8 et 16 mots. Le poids d’un bi-gramme ou d’une paire de termes dépendants est déterminé selon un ensemble des caractères, en utilisant la régression SVM. Toutes les méthodes proposées sont évaluées sur plusieurs collections en anglais et/ou chinois, et les résultats expérimentaux montrent que ces méthodes produisent des améliorations substantielles sur l'état de l'art. / Search engine has become an integral part of our life. More than one-third of world populations are Internet users. Most users turn to a search engine as the quick way to finding the information or product they want. Information retrieval (IR) is the foundation for modern search engines. Traditional information retrieval approaches assume that indexing terms are independent. However, terms occurring in the same context are often dependent. Failing to recognize the dependencies between terms leads to noise (irrelevant documents) in the result. Some studies have proposed to integrate term dependency of different types, such as proximity, co-occurrence, adjacency and grammatical dependency. In most cases, dependency models are constructed apart and then combined with the traditional word-based (unigram) model on a fixed importance proportion. Consequently, they cannot properly capture variable term dependency and its strength. For example, dependency between adjacent words “black Friday” is more important to consider than those of between “road constructions”. In this thesis, we try to study different approaches to capture term relationships and their dependency strengths. We propose the following methods for monolingual IR and Cross-Language IR (CLIR): We re-examine the combination approach by using different indexing units for Chinese monolingual IR, then propose the similar method for CLIR. In addition to the traditional method based on words, we investigate the possibility of using Chinese bigrams and unigrams as translation units. Several translation models from English words to Chinese unigrams, bigrams and words are created based on a parallel corpus. An English query is then translated in several ways, each producing a ranking score. The final ranking score combines all these types of translations. We incorporate dependencies between terms in our model using Dempster-Shafer theory of evidence. Every occurrence of a text fragment in a document is represented as a set which includes all its implied terms. Probability is assigned to such a set of terms instead of individual terms. During query evaluation phase, the probability of the set can be transferred to those of the related query, allowing us to integrate language-dependent relations to IR. We propose a discriminative language model that integrates different term dependencies according to their strength and usefulness to IR. We consider the dependency of adjacency and co-occurrence within different distances, i.e. bigrams, pairs of terms within text window of size 2, 4, 8 and 16. The weight of bigram or a pair of dependent terms in the final model is learnt according to a set of features. All the proposed methods are evaluated on several English and/or Chinese collections, and experimental results show these methods achieve substantial improvements over state-of-the-art baselines.
|
4 |
Specifika počítačem podporovaného překladu z němčiny do češtiny / CAT Tools in German - Czech TranslationHandšuhová, Jana January 2013 (has links)
Abstract This thesis handles special translation software, the mastery of which is becoming one of the basic requirements of successful translation work. The theoretical part describes the historical development, classification and main functions of translation memory systems. The thesis will further attempt to determine the criteria for the effective use of CAT tools and explore the text types and sorts for which the translation memory systems are most commonly used in the translation process. The functional view of the language-based text typology and the principles on which the translation memory systems work will also be handled. The practical part compares the result of a translation process (translation as a product) with and without CAT tools. The corpus of parallel texts (original translation) will be subjected to a translation analysis. This analysis concludes the levels which are affected by differences between translations made with and without CAT tools. The differences in the actual translation process with and without CAT tools which are not empirically verifiable will be analysed based on a survey conducted amongst translators. Then, the empirical part of the findings are summarized and systemized. The last chapter deals with the expected development in the translation market, the...
|
Page generated in 0.1146 seconds