211 |
Generierung von natürlichsprachlichen Texten aus semantischen Strukturen im Prozeß der maschinellen Übersetzung - Allgemeine Strukturen und AbbildungenRosenpflanzer, Lutz, Karl, Hans-Ulrich 14 December 2012 (has links) (PDF)
0 VORWORT
Bei der maschinellen Übersetzung natürlicher Sprache dominieren mehrere Probleme. Man hat es immer mit sehr großen Datenmengen zu tun. Auch wenn man nur einen kleinen Text übersetzen will, ist diese Aufgabe in umfänglichen Kontext eingebettet, d.h. alles Wissen über Quell- und Zielsprache muß - in möglichst formalisierter Form - zur Verfügung stehen. Handelt es sich um gesprochenes Wort treten Spracherkennungs- und Sprachausgabeaufgaben sowie harte Echtzeitforderungen hinzu. Die Komplexität des Problems ist - auch unter Benutzung moderner Softwareentwicklungskonzepte - für jeden, der eine Implementation versucht, eine nicht zu unterschätzende Herausforderung.
Ansätze, die die Arbeitsprinzipien und Methoden der Informatik konsequent nutzen, stellen ihre Ergebnisse meist nur prototyisch für einen sehr kleinen Teil der Sprache -etwa eine Phrase, einen Satz bzw. mehrere Beispielsätze- heraus und folgern mehr oder weniger induktiv, daß die entwickelte Lösung auch auf die ganze Sprache erfolgreich angewendet werden kann, wenn man nur genügend „Lemminge“ hat, die nach allen Seiten ausschwärmend, die „noch notwendigen Routinearbeiten“ schnell und bienenfleißig ausführen könnten.
|
212 |
Intégration du contexte en traduction statistique à l’aide d’un perceptron à plusieurs couchesPatry, Alexandre 04 1900 (has links)
Les systèmes de traduction statistique à base de segments traduisent les phrases
un segment à la fois, en plusieurs étapes. À chaque étape, ces systèmes ne considèrent que très peu d’informations pour choisir la traduction d’un segment. Les
scores du dictionnaire de segments bilingues sont calculés sans égard aux contextes dans lesquels ils sont utilisés et les modèles de langue ne considèrent que les
quelques mots entourant le segment traduit.Dans cette thèse, nous proposons un nouveau modèle considérant la phrase en
entier lors de la sélection de chaque mot cible. Notre modèle d’intégration du
contexte se différentie des précédents par l’utilisation d’un ppc (perceptron à plusieurs couches). Une propriété intéressante des ppc est leur couche cachée, qui propose une représentation alternative à celle offerte par les mots pour encoder
les phrases à traduire. Une évaluation superficielle de cette représentation alter-
native nous a montré qu’elle est capable de regrouper certaines phrases sources
similaires même si elles étaient formulées différemment. Nous avons d’abord comparé avantageusement les prédictions de nos ppc à celles
d’ibm1, un modèle couramment utilisé en traduction. Nous avons ensuite intégré
nos ppc à notre système de traduction statistique de l’anglais vers le français. Nos ppc ont amélioré les traductions de notre système de base et d’un deuxième système de référence auquel était intégré IBM1. / Phrase-based statistical machine translation systems translate source sentences
one phrase at a time, conditioning the choice of each phrase on very little information. Bilingual phrase table scores are computed regardless of the context in which the phrases are used and language models only look at few words surrounding
the target phrases.
In this thesis, we propose a novel model to predict words that should appear in
a translation given the source sentence as a whole. Our model differs from previous works by its use of mlp (multilayer perceptrons). Our interest in mlp lies in their hidden layer that encodes source sentences in a representation that is only loosely tied to words. We observed that this hidden layer was able to cluster some sentences having similar translations even if they were formulated differently.
In a first set of experiments, we compared favorably our mlp to ibm1, a well known
model in statistical machine translation. In a second set of experiments, we embedded our ppc in our English to French statistical machine translation system. Our MLP improved translations quality over our baseline system and a second system embedding an IBM1 model.
|
213 |
Déploiement automatique d’une application de routage téléphonique d’une langue source vers une langue cibleTremblay, Jérôme 08 1900 (has links)
Les modèles de compréhension statistiques appliqués à des applications vocales nécessitent beaucoup de données pour être entraînés. Souvent, une même application doit pouvoir supporter plusieurs langues, c’est le cas avec les pays ayant plusieurs langues officielles. Il s’agit donc de gérer les mêmes requêtes des utilisateurs, lesquelles présentent une sémantique similaire, mais dans plusieurs langues différentes. Ce projet présente des techniques pour déployer automatiquement un modèle de compréhension statistique d’une langue source vers une langue cible. Ceci afin de réduire le nombre de données nécessaires ainsi que le temps relié au déploiement d’une application dans une nouvelle langue.
Premièrement, une approche basée sur les techniques de traduction automatique est présentée. Ensuite une approche utilisant un espace sémantique commun pour comparer plusieurs langues a été développée. Ces deux méthodes sont comparées pour vérifier leurs limites et leurs faisabilités. L’apport de ce projet se situe dans l’amélioration d’un modèle de traduction grâce à l’ajout de données très proche de l’application ainsi que d’une nouvelle façon d’inférer un espace sémantique multilingue. / Statistical understanding models applied to dialog applications need a lot of training data. Often, an application needs to support more than one language. This is relevant for countries that have more than one official language. In those applications, users queries convey the same meanings but in different languages. This project presents techniques to automatically deploy statistical comprehension models from a source language to a target language. The goal is to reduce the training data needed and the time requiered to deploy an application in a new language. First, an approach using machine translation techniques is presented. Then, an approach that uses a common semantic space to compare both languages has been developed. Those methods are compared to verify their limits and feasibility. This work present an improvement of the translation model using in-domain data and a novel technique for inferring a multilingual semantic space
|
214 |
Kelių automatinio vertimo sistemų integracija / The integration of several automatic translation systemsMarin, Igor 23 July 2012 (has links)
Baigiamajame magistro darbe nagrinėjamos automatinės vertimo sistemos, pagrindiniai jų veikimo principai ir šių sistemų integracijos būdai. Detaliai aprašoma populiarių šiuo metu statistinių vertimo sistemų struktūra, pateikiami šių ir tradicinių (taisyklėmis paremtų) sistemų privalumai ir trūkumai. Pristatomos visos šiuo metu egzistuojančios automatinio vertimo sistemos lietuvių ir anglų kalbų porai, išaiškinami jų privalumai ir trūkumai. Lingvistiniu požiūriu nagrinėjamos lietuvių ir anglų kalbos, išvardinami šių kalbų panašumai, skirtumai ir sunkumai, kylantys verčiant iš vienos kalbos į kitą. Taip pat pateikiami įvairūs automatinio vertimo įvertinimo būdai, įskaitant populiarų BLEU įvertinimo metodą. Išvardinamos ir analizuojamos užsienio autorių siūlomos automatinio vertimo sistemų integracijos architektūros. Apžvelgiami sumaišymo tinklai, kurie naudojami kuriant integruotą vertimo sistemą. Pateikiama originali mišriosios vertimo sistemos įgyvendinimo metodika. Integruota sistema yra praktiškai įgyvendinama. Šios sistemos ir kitų vertimo sistemų anglų ir lietuvių kalbų porai rezultatai yra įvertinami ir palyginami. Atlikus teorinę automatinio vertimo sistemų apžvalgą ir praktiškai įgyvendinus mišriąją vertimo sistemą, pateikiamos baigiamojo darbo išvados ir siūlymai.
Darbą sudaro 6 dalys: įvadas, automatinio vertimo sistemų analizė, mišriųjų automatinio vertimo sistemų analizė, mišriosios automatinio vertimo sistemos sukūrimas, išvados, literatūros sąrašas.
Darbo apimtis... [toliau žr. visą tekstą] / The Master’s thesis analyses machine translation systems, their principles of operation and the methods used in integrating these systems. The structure of popular statistical machine translation systems, as well as the advantages and disadvantages of such systems is described in detali. The existing machine translation systems for the Lithuanian-English language pair along with their abilities and shortcomings are presented. Lithuanian and English languages are analysed from the linguistic perspective. The similarities and differences between these languages, as well as the difficulties, arising in translating the text from one language to another are discussed. Moreover, different machine translation evaluation methods, including the popular BLEU metric, are reviewed. Various architectures for integrating multiple machine translation systems, offered by foreign authors, are presented and analysed. Confusion networks, which are used in integrating machine translation systems, are discussed. An original method of implementing the hybrid machine translation system is offered. The hybrid system is implemented in practice. The translation results obtained from the created system and the existing systems for the Lithuanian-English language pair are assessed and compared. Finally, after performing the theoretical review of machine translation systems and implementing the hybrid system, conclusions and recommendations are provided.
The thesis consists of 6 parts: introduction, the... [to full text]
|
215 |
Algebraic decoder specification: coupling formal-language theory and statistical machine translationBüchse, Matthias 28 January 2015 (has links) (PDF)
The specification of a decoder, i.e., a program that translates sentences from one natural language into another, is an intricate process, driven by the application and lacking a canonical methodology. The practical nature of decoder development inhibits the transfer of knowledge between theory and application, which is unfortunate because many contemporary decoders are in fact related to formal-language theory. This thesis proposes an algebraic framework where a decoder is specified by an expression built from a fixed set of operations. As yet, this framework accommodates contemporary syntax-based decoders, it spans two levels of abstraction, and, primarily, it encourages mutual stimulation between the theory of weighted tree automata and the application.
|
216 |
Combining machine learning and rule-based approaches in Spanish syntactic generationMelero Nogués, Maria Teresa 02 June 2006 (has links)
Aquesta tesi descriu una gramàtica de Generació que combina regles escrites a mà i tècniques d'aprenentatge automàtic. Aquesta gramàtica pertany a un sistema de Traducció Automàtica de qualitat comercial desenvolupat a Microsoft Research. La primera part presenta la gramàtica i les principals estratègies lingüístiques que aquesta gramàtica implementa. Els requeriments de robustesa que reclama l'ús real del sistema de TA, exigeix del Generador un esforç suplementari que es resol afegint un nivell de pre-generació, capaç de garantir la integritat de l'entrada, sense incorporar elements ad-hoc en les regles de la gramàtica. A la segona part, explorem l'ús dels classificadors d'arbres de decisió (DT) per tal d'aprendre automàticament una de les operacions que tenen lloc al mòdul de pre-generació, en concret la selecció lèxica del verb copulatiu en espanyol (ser o estar). Mostrem que és possible inferir a partir d'exemples els contextos per aquest fenòmen lingüístic no trivial, amb gran precisió. / This thesis describes a Spanish Generation grammar which combines hand-written rules and Machine Learning techniques. This grammar belongs to a full-scale commercial quality Machine Translation system developed at Microsoft Research. The first part presents the grammar and the linguistic strategies it embodies. The need for robustness in real-world situations in the everyday use of the MT system requires from the Generator an extra effort which is resolved by adding a Pre-Generation layer which is able to fix the input to Generation, without contaminating the grammar rules. In the second part we explore the use of Decision Tree classifiers (DT) for automatically learning one of the operations that take place in the Pre-Generation component, namely lexical selection of the Spanish copula (i.e. ser and estar). We show that it is possible to infer from examples the contexts for this non-trivial linguistic phenomenon with high accuracy.
|
217 |
Attitydanalys av svenska produktomdömen – behövs språkspecifika verktyg? / Sentiment Analysis of Swedish Product Reviews – Are Language-specific Tools Necessary?Glant, Oliver January 2018 (has links)
Sentiment analysis of Swedish data is often performed using English tools and machine. This thesis compares using a neural network trained on Swedish data with a corresponding one trained on English data. Two datasets were used: approximately 200,000 non-neutral Swedish reviews from the company Prisjakt Sverige AB, one of the largest annotated datasets used for Swedish sentiment analysis, and 1,000,000 non-neutral English reviews from Amazon.com. Both networks were evaluated on 11,638 randomly selected reviews, in Swedish and in English machine translation. The test set had the same overrepresentation of positive reviews as the Swedish dataset (84% were positive). The results suggest that English tools can be used with machine translation for sentiment analysis of Swedish reviews, without loss of classification ability. However, the English tool required 33% more training data to achieve maximum performance. Evaluation on the unbalanced test set required extra consideration regarding statistical measures. F1-measure turned out to be reliable only when calculated for the underrepresented class. It then showed a strong correlation with the Matthews correlation coefficient, which has been found to be more reliable. This warrants further investigation into whether the correlation is valid for all different balances, which would simplify comparison between studies. / Attitydanalys av svensk data sker i många fall genom maskinöversättning till engelska för att använda tillgängliga analysverktyg. I den här uppsatsen undersöktes skillnaden mellan användning av ett neuronnät tränat på svensk data och av motsvarande neuronnät tränat på engelsk data. Två datamängder användes: cirka 200 000 icke-neutrala svenska produktomdömen från Prisjakt Sverige AB, en av de största annoterade datamängder som använts för svensk attitydanalys, och 1 000 000 icke-neutrala engelskaproduktomdömen från Amazon.com. Båda versionerna av neuronnätet utvärderades på 11 638 slumpmässigt utvalda svenska produktomdömen, i original och maskinöversatta till engelska. Testmängden hade samma överrepresentation av positiva omdömen som den svenska datamängden (84% positiva omdömen). Resultaten tyder på att engelska verktyg med hjälp av maskinöversättning kan användas för attitydanalys av svenska produktomdömen med bibehållen klassificeringsförmåga, dock krävdes cirka 33% större träningsdata för att det engelska verktyget skulle uppnå maximal klassificeringsförmåga. Utvärdering på den obalanserade datamängden visade sig ställa särskilda krav på de statistiska mått som användes. F1-värde fungerade tillfredsställande endast när det beräknades för den underrepresenterade klassen. Det korrelerade då starkt med Matthews korrelationskoefficient, som tidigare funnits vara ett pålitligare mått. Om korrelationen gäller vid alla olika balanser skulle jämförelser mellan olika studiers resultat underlättas, något som bör undersökas.
|
218 |
The mat sat on the cat : investigating structure in the evaluation of order in machine translationMcCaffery, Martin January 2017 (has links)
We present a multifaceted investigation into the relevance of word order in machine translation. We introduce two tools, DTED and DERP, each using dependency structure to detect differences between the structures of machine-produced translations and human-produced references. DTED applies the principle of Tree Edit Distance to calculate edit operations required to convert one structure into another. Four variants of DTED have been produced, differing in the importance they place on words which match between the two sentences. DERP represents a more detailed procedure, making use of the dependency relations between words when evaluating the disparities between paths connecting matching nodes. In order to empirically evaluate DTED and DERP, and as a standalone contribution, we have produced WOJ-DB, a database of human judgments. Containing scores relating to translation adequacy and more specifically to word order quality, this is intended to support investigations into a wide range of translation phenomena. We report an internal evaluation of the information in WOJ-DB, then use it to evaluate variants of DTED and DERP, both to determine their relative merit and their strength relative to third-party baselines. We present our conclusions about the importance of structure to the tools and their relevance to word order specifically, then propose further related avenues of research suggested or enabled by our work.
|
219 |
Traductologie et traduction outillée : du traducteur spécialisé professionnel à l’expert métier en entreprise / Translation Technologies for English, French or German : From Individual Specialized Translators To Company Domain ExpertsLemaire, Claire 23 June 2017 (has links)
Comment adapter des technologies de la traduction, initialement conçues pour des traducteurs spécialisés professionnels, à des experts métier devant traduire pour leur entreprise ? Pour répondre à cette question, nous avons comparé les pratiques de ces deux types d'utilisateurs, à l’aide de questionnaires. Ensuite, nous avons constitué un corpus à partir de traductions d’experts métier et nous l’avons passé en revue pour renforcer l’analyse des différences. La différence la plus flagrante est l'utilisation de la traduction automatique (TA) ainsi que le contexte de production des traductions. La réalité du terrain montre en effet des textes source qui ne sont souvent pas exploitables par des machines ; nous proposons de travailler sur l'exploitabilité informatique des textes. En étudiant les technologies de TA actuelles, nous constatons qu'elles permettent soit une post-édition en langue cible après le processus de traduction, soit une pré-édition en langue source avant le processus de traduction. Nous suggérons de tirer profit de la situation inédite de rédacteur traduisant, pour utiliser l’expertise du rédacteur pendant le processus de traduction et de développer une fonctionnalité de TA permettant une édition en cours de processus. / How to adapt translation technologies, initially designed for professional translators, to domain experts who have to translate for their company?We address this issue by first comparing the practices of two groups of translators, professional and non-professional, with two surveys. Secondly, we built a corpus of translations done by domain experts and we studied it to reinforce the analysis. The most obvious difference are the use of machine translation (MT) and the production context. Actually, the reality in companies shows texts, in source language that often cannot be processed by machines; we propose to focus on text processability. By looking at current MT technologies, it appears that they can either post-edit the texts that are in target language, after the translation process or pre-edit the texts that are in source language, before the translation process. We propose to take advantage of the unprecedented situation of having the "writer" and the "translator" working together, to use the writer expertise during the translation process by creating a new MT feature that allow editing during the process.
|
220 |
A criação de um sistema híbrido de tradução automática para a conversão de expressões nominais da língua inglesa / The creation of a hybrid machine translation for the conversion of nominal expressions from EnglishCunha, Tiago Martins da January 2013 (has links)
CUNHA, Tiago Martins da. A criação de um sistema híbrido de tradução automática para a conversão de expressões nominais da língua inglesa. 2013. 165f. – Tese (Doutorado) – Universidade Federal do Ceará, Departamento de Letras Vernáculas, Programa de Pós-graduação em Linguística, Fortaleza (CE), 2013. / Submitted by Márcia Araújo (marcia_m_bezerra@yahoo.com.br) on 2014-06-06T11:34:07Z
No. of bitstreams: 1
2013_tese_tmcunha.pdf: 2297384 bytes, checksum: 3e9b3947bf85b0ed8cd10a76a12f1fa0 (MD5) / Approved for entry into archive by Márcia Araújo(marcia_m_bezerra@yahoo.com.br) on 2014-06-06T12:33:22Z (GMT) No. of bitstreams: 1
2013_tese_tmcunha.pdf: 2297384 bytes, checksum: 3e9b3947bf85b0ed8cd10a76a12f1fa0 (MD5) / Made available in DSpace on 2014-06-06T12:33:22Z (GMT). No. of bitstreams: 1
2013_tese_tmcunha.pdf: 2297384 bytes, checksum: 3e9b3947bf85b0ed8cd10a76a12f1fa0 (MD5)
Previous issue date: 2013 / Machine translation (MT) had much of its credibility questioned by professional translators for many years. However, the use of MT systems has become a necessity in order to organize and accelerate the translation process. Most users, professionals or not, have no knowledge about the design of the tools that integrate the system they use. The design of a MT system consists of a pipeline of tools that form the system’s engine. Thus, we propose the description and the creation of a translation tool that would able to handle nominal expressions from English to Portuguese. The nominal expressions in English may be composed of elements as genitive and gerunds, which lack Portuguese correspondents. Thus, these elements cause difficulties for MT systems. Our goal is to create a MT system that is able to deal satisfactorily with this problem. The system developed and described in this thesis was trained with nominal expressions from the Europarl corpus and tested with nominal expressions handled in the literature of noun phrases syntax. Our system showed what we consider satisfactory results according to the scores in the manual and automatic evaluation when we compare the results from other MT systems freely available for use. / A tradução automática (TA) teve grande parte de sua credibilidade questionada por tradutores profissionais por muitos anos. No entanto, o uso de sistemas de TA tornou-se uma necessidade, a fim de organizar e acelerar o processo de tradução. A maioria dos usuários, profissionais ou não, não tem conhecimento sobre o design das ferramentas que integram o sistema que eles usam. A concepção de um sistema de TA consiste de uma cadeia de ferramentas que formam o motor de um sistema de TA. Assim, propõe-se a descrição e a criação de uma ferramenta de tradução que seja capaz de lidar com expressões nominais da língua Inglesa para portuguesa. As expressões nominais em Inglês podem ser compostas de elementos como genitivo e gerúndios, que não apresentam correspondentes para o português. Assim, estes elementos causam dificuldades para os sistemas de TA . O nosso objetivo é o de criar um sistema de TA que seja capaz de lidar com este problema de maneira satisfatória. O sistema desenvolvido e descrito nesta tese foi treinado com expressões nominais do corpus Europarl e testado com expressões nominais tratadas na literatura sobre a sintaxe dos sintagmas nominais. Nosso sistema apresentou resultados que consideramos satisfatórios de acordo com escores obtidos nas avaliações manual e automática ao compararmos com os resultados obtidos por outros sistemas de TA disponíveis gratuitamente para utilização.
|
Page generated in 0.0334 seconds