Global ETD Search

1	Las funciones del pasado en los sistemas verbales del español y del ruso Westerholm, David January 2010 (has links) This monograph is a comparative study of past time reference in Spanish and Russian.The ambition is to present a functional perspective of how both languages systemically express temporal and aspectual information. The verb, naturally, attracts the main attention of the thesis and the focus is almost exclusively on verbs in the indicative mood.The definition of the parameters of Time and Aspect plays an important part in the present dissertation. A particular emphasis concerns the elaboration and testing of the ‘ABC’ model, which represents a graphic definition of verbal aspect as a grammatical category.Another important issue is the distinction of aspect and Aktionsart; these concepts are closely related but operate on different functional levels. The analysis is essentially based on linguistic material from parallel corpora, constructed for this purpose. At first the material is treated statistically in order to create astarting point for the qualitative part of the analysis. The three main areas of investigation dealt with are:1) The relation between the simple past tenses, pretérito and imperfecto, in Spanish and their imperfective and perfective counterparts in Russian.2) The relation between the compound tenses in Spanish and the Russian verbal system.The analysis of this relation also comprises a critique of the traditional interpretationof the aspectual contents of the compound tenses.3) The usage of alternative strategies in both languages. In this part of the analysis the focus is widened to include verbal periphrasis, infinite verb forms and subordination. The results of the analysis demonstrate that verbal aspect, according to the definition represented by ‘the ABC model’, works as a grammatical category in both Spanish and Russian. It is also shown that there are systemic differences in the manifestation of this functional category in both languages. Another important result is that the neither the compound tenses nor the progressive express verbal aspect, at least not in a narrow sense of the word but represent different verbal functions related to aspect. verbal functions tense aspect systemic perspective parallel corpora
2	Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing Tiedemann, Jörg January 2003 (has links) <p>The focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing.</p><p>Five parallel corpora containing more than 35 million words in 60 languages have been collected within co-operative projects. All corpora are sentence aligned and parts of them have been analyzed automatically and annotated with linguistic markup.</p><p>Lexical data are extracted from the corpora by means of word alignment. Two automatic word alignment systems have been developed, the Uppsala Word Aligner (UWA) and the Clue Aligner. UWA implements an iterative "knowledge-poor" word alignment approach using association measures and alignment heuristics. The Clue Aligner provides an innovative framework for the combination of statistical and linguistic resources in aligning single words and multi-word units. Both aligners have been applied to several corpora. Detailed evaluations of the alignment results have been carried out for three of them using fine-grained evaluation techniques.</p><p>A corpus processing toolbox, Uplug, has been developed. It includes the implementation of UWA and is freely available for research purposes. A new version, Uplug II, includes the Clue Aligner. It can be used via an experimental web interface (UplugWeb).</p><p>Lexical data extracted by the word aligners have been applied to different tasks in computational lexicography and machine translation. The use of word alignment in monolingual lexicography has been investigated in two studies. In a third study, the feasibility of using the extracted data in interactive machine translation has been demonstrated. Finally, extracted lexical data have been used for enhancing the lexical components of two machine translation systems.</p> Computational linguistics word alignment parallel corpora translation corpora computational lexicography machine translation Datorlingvistik Computational linguistics Datorlingvistik
3	Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing Tiedemann, Jörg January 2003 (has links) The focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing. Five parallel corpora containing more than 35 million words in 60 languages have been collected within co-operative projects. All corpora are sentence aligned and parts of them have been analyzed automatically and annotated with linguistic markup. Lexical data are extracted from the corpora by means of word alignment. Two automatic word alignment systems have been developed, the Uppsala Word Aligner (UWA) and the Clue Aligner. UWA implements an iterative "knowledge-poor" word alignment approach using association measures and alignment heuristics. The Clue Aligner provides an innovative framework for the combination of statistical and linguistic resources in aligning single words and multi-word units. Both aligners have been applied to several corpora. Detailed evaluations of the alignment results have been carried out for three of them using fine-grained evaluation techniques. A corpus processing toolbox, Uplug, has been developed. It includes the implementation of UWA and is freely available for research purposes. A new version, Uplug II, includes the Clue Aligner. It can be used via an experimental web interface (UplugWeb). Lexical data extracted by the word aligners have been applied to different tasks in computational lexicography and machine translation. The use of word alignment in monolingual lexicography has been investigated in two studies. In a third study, the feasibility of using the extracted data in interactive machine translation has been demonstrated. Finally, extracted lexical data have been used for enhancing the lexical components of two machine translation systems. Computational linguistics word alignment parallel corpora translation corpora computational lexicography machine translation Datorlingvistik Computational linguistics Datorlingvistik
4	Constitution d'une ressource sémantique arabe à partir d'un corpus multilingue aligné / Constitution of a semantic resource for the Arabic language from multilingual aligned corpora Abdulhay, Authoul 23 November 2012 (has links) Cette thèse vise à la mise en œuvre et à l'évaluation de techniques d'extraction de relations sémantiques à partir d'un corpus multilingue aligné. Ces relations seront extraites par transitivité de l'équivalence traductionnelle, deux lexèmes possédant les mêmes équivalents dans une langue cible étant susceptibles de partager un même sens. D'abord, nos observations porteront sur la comparaison sémantique d'équivalents traductionnels dans des corpus multilingues alignés. A partir des équivalences, nous tâcherons d'extraire des "cliques", ou sous-graphes maximaux complets connexes, dont toutes les unités sont en interrelation, du fait d'une probable intersection sémantique. Ces cliques présentent l'intérêt de renseigner à la fois sur la synonymie et la polysémie des unités, et d'apporter une forme de désambiguïsation sémantique. Elles seront créées à partir de l'extraction automatique de correspondances lexicales, basée sur l'observation des occurrences et cooccurrences en corpus. Le recours à des techniques de lemmatisation sera envisagé. Ensuite nous tâcherons de relier ces cliques avec un lexique sémantique (de type Wordnet) afin d'évaluer la possibilité de récupérer pour les unités arabes des relations sémantiques définies pour des unités en anglais ou en français. Ces relations permettraient de construire automatiquement un réseau utile pour certaines applications de traitement de la langue arabe, comme les moteurs de question-réponse, la traduction automatique, les systèmes d'alignement, la recherche d'information, etc. / This study aims at the implementation and evaluation of techniques for extracting semantic relations from a multilingual aligned corpus. Firstly, our observations will focus on the semantic comparison of translational equivalents in multilingual aligned corpus. From these equivalences, we will try to extract "cliques", which ara maximum complete related sub-graphs, where all units are interrelated because of a probable semantic intersection. These cliques have the advantage of giving information on both the synonymy and polysemy of units, and providing a form of semantic disambiguation. Secondly, we attempt to link these cliques with a semantic lexicon (like WordNet) in order to assess the possibility of recovering, for the Arabic units, a semantic relationships already defined for English, French or Spanish units. These relations would automatically build a semantic resource which would be useful for different applications of NLP, such as Question Answering systems, machine translation, alignment systems, Information Retrieval…etc. Réseaux sémantiques Cliques Alignement Corpus parallèle Wordnet Semantic network Cliques Alignment Parallel corpora Wordnet
5	Slovesný vid ve španělštině a češtině / Verbal Aspect in Spanish and Czech Bělehrádková, Kateřina January 2020 (has links) The aim of the thesis is to compare the verbal aspect in Spanish and Czech. The thesis has two parts: theoretical and practical. The theoretical part focuses on defining the category of verbal aspect in both languages and on comparison of this category in these two aspectual systems. Furthemore, we pay attention to the relation of verbal aspect to the category aspecto and to the lexical aspect (Aktionsart). The practical part consists of a corpus case study carried out using the newest version of the Czech parallel corpus InterCorp. In the case study, we analyse aspectual differences between original texts in Czech and their Spanish translations. Key words: verbal aspect - verb - Spanish - Czech - parallel corpora
6	Evaluation of two word alignment systems Wang, Xiaoyang January 2004 (has links) <p>This project evaluates two different systems that generate wordalignments on English-Swedish data. The systems to be used are the Giza++ system, that may generate a variety of statistical translation models, and ITrix system developed at IDA/NLPLab that generates word pairs with frequencies. </p><p>The file formats of these two systems, the way of running them and the differences of the two systems are addressed in this paper. Evaluation in this project considers a variety of parameters such as corpus size, characteristics of the corpus, the effect of linguistic knowledge, etc. At the end of this paper, the conclusions of the two systems evaluation are presented. In general, Giza++ is better applying on big corpora while ITrix is better for small corpora. Especially for corpora with high statistical ratio or special resource, ITrix has a better performance.</p> Datalogi Word alignment Giza++ ITrix Parallel corpora Statistical ratio Evaluation I*Eval Gold standard. Datalogi Computer science Datalogi
7	Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignment Schreiner, Paulo January 2010 (has links) O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work. Linguística computacional Processamento : Linguagem natural Natural language processing Lexical alignment Machine learning Parallel corpora Multiword expressions UFRGS
8	Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignment Schreiner, Paulo January 2010 (has links) O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work. Linguística computacional Processamento : Linguagem natural Natural language processing Lexical alignment Machine learning Parallel corpora Multiword expressions UFRGS
9	Alinhamento léxico utilizando técnicas híbridas discriminativas e de pós-processamento / Text alignment Schreiner, Paulo January 2010 (has links) O alinhamento léxico automático é uma tarefa essencial para as técnicas de tradução de máquina empíricas modernas. A abordagem gerativa não-supervisionado têm sido substituída recentemente por uma abordagem discriminativa supervisionada que facilite inclusão de conhecimento linguístico de uma diversidade de fontes. Dentro deste contexto, este trabalho descreve uma série alinhadores léxicos discriminativos que incorporam heurísticas de pós-processamento com o objetivo de melhorar o desempenho dos mesmos para expressões multi-palavra, que constituem um dos desafios da área de processamento de linguagens naturais atualmente. A avaliação é realizada utilizando um gold-standard obtido a partir da anotação de um corpus paralelo de legendas de filmes. Os alinhadores propostos apresentam um desempenho superior tanto ao obtido por uma baseline quanto ao obtido por um alinhador gerativo do estado-da-arte (Giza++), tanto no caso geral quanto para as expressões foco do trabalho. / Lexical alignment is an essential task for modern empirical machine translation techniques. The unsupervised generative approach is being replaced by a supervised, discriminative one that considerably facilitates the inclusion of linguistic knowledge from several sources. Given this context, the present work describes a series of discriminative lexical aligners that incorporate post-processing heuristics with the goal of improving the quality of the alignments of multiword expressions, which is one of the major challanges in natural language processing today. The evaluation is conducted using a gold-standard obtained from a movie subtitle parallel corpus. The aligners proposed show an alignment quality that is superior both to our baseline and to a state-of-the-art generative aligner (Giza++), for the general case as well as for the expressions that are the focus of this work. Linguística computacional Processamento : Linguagem natural Natural language processing Lexical alignment Machine learning Parallel corpora Multiword expressions UFRGS
10	Evaluation of two word alignment systems Wang, Xiaoyang January 2004 (has links) This project evaluates two different systems that generate wordalignments on English-Swedish data. The systems to be used are the Giza++ system, that may generate a variety of statistical translation models, and ITrix system developed at IDA/NLPLab that generates word pairs with frequencies. The file formats of these two systems, the way of running them and the differences of the two systems are addressed in this paper. Evaluation in this project considers a variety of parameters such as corpus size, characteristics of the corpus, the effect of linguistic knowledge, etc. At the end of this paper, the conclusions of the two systems evaluation are presented. In general, Giza++ is better applying on big corpora while ITrix is better for small corpora. Especially for corpora with high statistical ratio or special resource, ITrix has a better performance. Datalogi Word alignment Giza++ ITrix Parallel corpora Statistical ratio Evaluation I*Eval Gold standard. Datalogi Computer Sciences Datavetenskap (datalogi)

Search results