81 |
Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language ProcessingTiedemann, Jörg January 2003 (has links)
The focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing. Five parallel corpora containing more than 35 million words in 60 languages have been collected within co-operative projects. All corpora are sentence aligned and parts of them have been analyzed automatically and annotated with linguistic markup. Lexical data are extracted from the corpora by means of word alignment. Two automatic word alignment systems have been developed, the Uppsala Word Aligner (UWA) and the Clue Aligner. UWA implements an iterative "knowledge-poor" word alignment approach using association measures and alignment heuristics. The Clue Aligner provides an innovative framework for the combination of statistical and linguistic resources in aligning single words and multi-word units. Both aligners have been applied to several corpora. Detailed evaluations of the alignment results have been carried out for three of them using fine-grained evaluation techniques. A corpus processing toolbox, Uplug, has been developed. It includes the implementation of UWA and is freely available for research purposes. A new version, Uplug II, includes the Clue Aligner. It can be used via an experimental web interface (UplugWeb). Lexical data extracted by the word aligners have been applied to different tasks in computational lexicography and machine translation. The use of word alignment in monolingual lexicography has been investigated in two studies. In a third study, the feasibility of using the extracted data in interactive machine translation has been demonstrated. Finally, extracted lexical data have been used for enhancing the lexical components of two machine translation systems.
|
82 |
Interactive Visualizations of Natural LanguageCollins, Christopher 06 August 2010 (has links)
While linguistic skill is a hallmark of humanity, the increasing volume of linguistic data each of us faces is causing individual and societal problems — ‘information overload’ is a commonly discussed condition. Tasks such as finding the most appropriate information online, understanding the contents of a personal email repository, and translating documents from another language are now commonplace. These tasks need not cause stress and feelings of overload: the human intellectual capacity is not the problem. Rather, the computational interfaces to linguistic data are problematic — there exists a Linguistic Visualization Divide in the current state-of-the-art. Through five design studies, this dissertation combines sophisticated natural language processing algorithms with information visualization techniques grounded in evidence of human visuospatial
capabilities. The first design study, Uncertainty Lattices, augments real-time computermediated communication, such as cross-language instant messaging chat
and automatic speech recognition. By providing explicit indications of algorithmic confidence, the visualization enables informed decisions about the quality of computational outputs.
Two design studies explore the space of content analysis. DocuBurst is an interactive visualization of document content, which spatially organizes words using an expert-created ontology. Broadening from single documents to document collections, Parallel Tag Clouds combine keyword extraction and coordinated visualizations to provide comparative overviews across subsets of a faceted text corpus. Finally, two studies address visualization for natural language processing
research. The Bubble Sets visualization draws secondary set relations around arbitrary collections of items, such as a linguistic parse tree. From this design study we propose a theory of spatial rights to consider when assigning visual encodings to data. Expanding considerations of spatial
rights, we present a formalism to organize the variety of approaches to coordinated and linked visualization, and introduce VisLink, a new method to relate and explore multiple 2d visualizations in 3d space. Intervisualization connections allow for cross-visualization queries and support
high level comparison between visualizations.
From the design studies we distill challenges common to visualizing language data, including maintaining legibility, supporting detailed reading, addressing data scale challenges, and managing problems arising from semantic ambiguity.
|
83 |
Interactive Visualizations of Natural LanguageCollins, Christopher 06 August 2010 (has links)
While linguistic skill is a hallmark of humanity, the increasing volume of linguistic data each of us faces is causing individual and societal problems — ‘information overload’ is a commonly discussed condition. Tasks such as finding the most appropriate information online, understanding the contents of a personal email repository, and translating documents from another language are now commonplace. These tasks need not cause stress and feelings of overload: the human intellectual capacity is not the problem. Rather, the computational interfaces to linguistic data are problematic — there exists a Linguistic Visualization Divide in the current state-of-the-art. Through five design studies, this dissertation combines sophisticated natural language processing algorithms with information visualization techniques grounded in evidence of human visuospatial
capabilities. The first design study, Uncertainty Lattices, augments real-time computermediated communication, such as cross-language instant messaging chat
and automatic speech recognition. By providing explicit indications of algorithmic confidence, the visualization enables informed decisions about the quality of computational outputs.
Two design studies explore the space of content analysis. DocuBurst is an interactive visualization of document content, which spatially organizes words using an expert-created ontology. Broadening from single documents to document collections, Parallel Tag Clouds combine keyword extraction and coordinated visualizations to provide comparative overviews across subsets of a faceted text corpus. Finally, two studies address visualization for natural language processing
research. The Bubble Sets visualization draws secondary set relations around arbitrary collections of items, such as a linguistic parse tree. From this design study we propose a theory of spatial rights to consider when assigning visual encodings to data. Expanding considerations of spatial
rights, we present a formalism to organize the variety of approaches to coordinated and linked visualization, and introduce VisLink, a new method to relate and explore multiple 2d visualizations in 3d space. Intervisualization connections allow for cross-visualization queries and support
high level comparison between visualizations.
From the design studies we distill challenges common to visualizing language data, including maintaining legibility, supporting detailed reading, addressing data scale challenges, and managing problems arising from semantic ambiguity.
|
84 |
Productivity and quality in the post-editing of outputs from translation memories and machine translationGuerberof Arenas, Ana 24 September 2012 (has links)
This study presents empirical research on no-match, machine-translated and translation-memory segments, analyzed in terms of translators’ productivity, final quality and prior professional experience. The findings suggest that translators have higher productivity and quality when using machine-translated output than when translating on their own, and that the productivity and quality gained with machine translation are not significantly different from the values obtained when processing fuzzy matches from a translation memory in the 85-94 percent range. The translators’ prior experience impacts on the quality they deliver but not on their productivity. These quantitative findings are triangulatedwith qualitative data from an online questionnaire and from one-to-one debriefings with the translators.
Este estudio presenta una investigación empírica sobre la traducción de segmentos nuevos y aquellos procesados con traducción automática y memorias de traducción analizados en relación a la productividad, calidad final y experiencia profesional de un grupo de traductores. Los resultados sugieren que los traductores obtienen una productividad y calidad más altas cuando procesan segmentos de traducción automática que cuando traducen sin ninguna ayuda y que dicha productividad y calidad no son significativamente diferentes a la que se obtiene cuando procesan coincidencias parciales de una memoria de traducción (del 85 al 94 por ciento). La experiencia profesional previa de los traductores influye en la calidad pero no así en la productividad obtenidas. Los resultados cuantitativos se triangulan, además, con datos cualitativos obtenidos a través de un cuestionario en línea y de entrevistas individuales realizadas a los trad
|
85 |
“Översätt den här sidan” : The advancement of Google Translate and how it performs in the online translation of compound and proper nouns from Swedish into EnglishStefansson, Ida January 2011 (has links)
The English translation of the Swedish compound fönsterbräda into windowsill, or the proper noun Danmark into Denmark makes perfect sense. But how about the compound fossilbränslefri as simply fossil fuel or the name Mälaren as Lake? All four of these translations have been produced with the help of automatic machine translation. The aim of this paper is to present the expanding field of application of machine translation and some issues related to this type of translation. More specifically, the study has looked at Google Translate as one of the most commonly used machine translation systems online, and how it responds to the two linguistic categories that were selected for this small study: compound nouns and proper nouns. Besides analyzing these categories, two different text types were chosen: general information articles from a local authority website (Stockholm City) and patent texts, both of which belong to the expanding field of application of Google Translate. The results of the study show that in terms of compound nouns, neither of the text types proved to be significantly better suited for machine translation than the other and neither had an error rate below 20 %. Most of the errors related to words being erroneously omitted in the English output and words which were incorrectly translated in relation to context. As for proper nouns, the patent texts contained none and subsequently no error analysis could be made, whereas the general information articles included 76 proper nouns (out of a total word count of 810). The most prominent error related to the Swedish version not being maintained in the English output where it should have been, e.g. translating Abrahamsberg as Abraham rock. The errors in both of the linguistic categories had varying impact on the meaning of the texts, some of which distorted the meaning of the word completely, and some which were of minor importance. This factor, along with the fact that the reader of the text influences how the comprehension level of the text is perceived through their language and subject knowledge, makes it difficult to evaluate the full impact of the various errors. It can, however, be said that patent text could pose as a better option for machine translation than general information articles in relation to proper nouns, as this text type is likely to contain no or very few proper nouns.
|
86 |
Coupling Speech Recognition And Rule-based Machine TranslationKopru, Selcuk 01 September 2008 (has links) (PDF)
The objective of this thesis was to study the coupling of automatic speech recognition (ASR) systems with rule-based machine translation (MT) systems. In this thesis, a unique approach to integrating ASR with MT for speech translation (ST) tasks was proposed. The proposed approach is unique, essentially because it includes the rst rule-based MT system that can process speech data in a word graph format. Compared to other rule-based MT systems, our system processes both a word graph and a stream of words. Thus, the suggested integration method of the ASR and the rule-based MT system is more detailed than a simple software engineering practice. The second reason why it is unique is because this coupling approach performed better than the rst-best and N-best list techniques, which are the only other methods used to integrate an ASR with a rule-based MT system. The enhanced performance of the coupling approach was verified with experiments.
The utilization of rule-based MT systems for ST tasks is important / however, there are some unresolved issues. Most of the literature concerning coupling systems has focused on how to integrate ASR with statistical MT rather than rule-based MT. This is because statistical MT systems can process word graphs as input, and therefore, the resolution of ambiguities can be moved to the MT component. With the new approach proposed in this thesis, this same advantage exists in rule-based MT systems. The success of such an approach could facilitate the efficient usage of rule-based systems for ST tasks.
|
87 |
'Consider' and its Swedish equivalents in relation to machine translationAndersson, Karin January 2007 (has links)
<p>This study describes the English verb ’consider’ and the characteristics of some of its senses. An investigation of this kind may be useful, since a machine translation program, SYSTRAN, has invariably translated ’consider’ with the Swedish verbs ’betrakta’ (Eng: ’view’, regard’) and ’anse’ (Eng: ’regard’). This handling of ’consider’ is not satisfactory in all contexts.</p><p>Since ’consider’ is a cogitative verb, it is fascinating to observe that both the theory of semantic primes and universals and conceptual semantics are concerned with cogitation in various ways. Anna Wierzbicka, who is one of the advocates of semantic primes and universals, argues that THINK should be considered as a semantic prime. Moreover, one of the prime issues of conceptual semantics is to describe how thoughts are constructed by virtue of e.g. linguistic components, perception and experience.</p><p>In order to define and clarify the distinctions between the different senses, we have taken advantage of the theory of mental spaces.</p><p>This thesis has been structured in accordance with the meanings that have been indicated in WordNet as to ’consider’. As a consequence, the senses that ’consider’ represents have been organized to form the subsequent groups: ’Observation’, ’Opinion’ together with its sub-group ’Likelihood’ and ’Cogitation’ followed by its sub-group ’Attention/Consideration’.</p><p>A concordance tool, http://www.nla.se/culler, provided us with 90 literary quotations that were collected in a corpus. Afterwards, these citations were distributed between the groups mentioned above and translated into Swedish by SYSTRAN.</p><p>Furthermore, the meanings as to ’consider’ have also been related to the senses, recorded by the FrameNet scholars. Here, ’consider’ is regarded as a verb of ’Cogitation’ and ’Categorization’.</p><p>When this study was accomplished, it could be inferred that certain senses are connected to specific syntactic constructions. In other cases, however, the distinctions between various meanings can only be explained by virtue of semantics.</p><p>To conclude, it appears to be likely that an implementation is facilitated if a specific syntactic construction can be tied to a particular sense. This may be the case concerning some meanings of ’consider’. Machine translation is presumably a much more laborious task, if one is solely governed by semantic conditions.</p>
|
88 |
Machine Translation (MT) - History, Theory, Problems and UsageRiedel, Marion, Schwarze, Tino 11 May 2001 (has links)
The presentation outlines the historical development of machine translation.
Standard MT problems are listed and partly discussed.
|
89 |
Machine Translation: A Theoretical and Practical Introduction / Maschinelle Übersetzung: Eine theoretische und praktische EinführungRiedel, Marion 08 May 2002 (has links) (PDF)
The paper presents the basics and the development
of Machine Translation and explains different
methods for evaluating translation machines on the
base of a detailed example. / Die im Rahmen des Seminars "Language and Computers"
der englischen Sprachwissenschaft entstandene Arbeit
behandelt die Grundlagen und die Entwicklung der
Maschinellen Übersetzung und gibt anhand eines
ausführlichen Beispiels Einblick in Methoden zur
Evaluation von Übersetzungsmaschinen.
|
90 |
Translation memory-systemer som værktøjer til juridisk oversættelse : kritisk vurdering af anvendeligheden af translation memory-systemer til oversættelse af selskabsretlig dokumentation.Christensen, Tina Paulsen. January 2003 (has links) (PDF)
Ph.D-afhandling.
|
Page generated in 0.069 seconds