221 |
Enfrentamento do problema das divergências de tradução por um sistema de tradução automática: um exercício exploratórioOliveira, Mirna Fernanda de [UNESP] 25 April 2006 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:32:47Z (GMT). No. of bitstreams: 0
Previous issue date: 2006-04-25Bitstream added on 2014-06-13T20:43:58Z : No. of bitstreams: 1
oliveira_mf_dr_ararafcl.pdf: 631650 bytes, checksum: fa4233637c661c5e993adcc08801d158 (MD5) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / O objetivo desta tese é desenvolver um estudo lingüístico-computacional exploratório de um problema específico que deve ser enfrentado por sistemas de tradução automática: o problema da divergências de tradução quer de natureza sintática quer de natureza léxico-semântica que se verificam entre pares de sentenças de línguas naturais diferentes. Para isso, fundamenta-se na metodologia de pesquisa interdisciplinar em PLN (Processamento Automático de Línguas Naturais) de Dias-da-Silva (1996, 1998 e 2003) e na teoria lingüístico-computacional subjacente ao sistema de tradução automática UNITRAN de Dorr (1993), que, por sua vez é subsidiado pela teoria sintática dos princípios e Parâmetros de Chomsky (1981) e pela teoria semântica das Estruturas conceituais de Jackendoff (1990). Como contribuição, a tese descreve a composição e o funcionamento do UNITRAN, desenhado para dar conta de parte do problema posto pelas divergências de tradução e ilustra a possibilidade de inclusão do português nesse sistema através do exame de alguns tipos de divergências que se verificam entre frases do inglês e do português. / This dissertation aims to develop an exploratory linguistic and computational study of an especific type of problem that must be faced by machine translation systems: the problem of translation divergences, whether syntactic or lexical-semantic ones that can be verified between distinct natural language sentence. In order to achieve this aim, this work is based on the interdisciplinary research metodology of the NLP (Natural Language Processing) field developed by Dias-da-Silva (1996, 1998 & 2003) and on the linguistic computacional theory behind UNITRAN, a machine translation systemdeveloped by Dorr (1993), a system that is on its turned based on Chomsky's syntactic theory of Government and Binding (1981) and Jackendoff's semantic theory of Conceptual Structures (1990). As a contribution to the field of NLP, this dissertation describes the machinery of UNITRAN, designed to deal with part of the problem of translation divergencies, and it illustrates the possibility of including Brazilian Portuguese language in the system through the investigation of certain kinds of divergences that can be found between English and Brazilian Portuguese senteces.
|
222 |
Un environnement générique et ouvert pour le traitement des expressions polylexicales : de l'acquisition aux applications / A generic and open framework for multiword expressions treatment : from acquisition to applicationsRamisch, Carlos Eduardo 11 September 2012 (has links)
Cette thèse présente un environnement ouvert et souple pour l'acquisition automatique d'expressions multimots (MWE) à partir de corpus textuels monolingues. Cette recherche est motivée par l'importance des MWE pour les applications du TALN. Après avoir brièvement présenté les modules de l'environnement, le mémoire présente des résultats d'évaluation intrinsèque en utilisant deux applications: la lexicographie assistée par ordinateur et la traduction automatique statistique. Ces deux applications peuvent bénéficier de l'acquisition automatique de MWE, et les expressions acquises automatiquement à partir de corpus peuvent à la fois les accélérer et améliorer leur qualité. Les résultats prometteurs de nos expériences nous encouragent à mener des recherches ultérieures sur la façon optimale d'intégrer le traitement des MWE dans ces applications et dans bien d'autres / This thesis presents an open and flexible methodological framework for the automatic acquisition of multiword expressions (MWEs) from monolingual textual corpora. This research is motivated by the importance of MWEs for NLP applications. After briefly presenting the modules of the framework, the work reports extrinsic evaluation results considering two applications: computer-aided lexicography and statistical machine translation. Both applications can benefit from automatic MWE acquisition and the expressions acquired automatically from corpora can both speed up and improve their quality. The promising results of our experiments encourage further investigation about the optimal way to integrate MWE treatment into these and many other applications.
|
223 |
Word Confidence Estimation and Its Applications in Statistical Machine Translation / Les mesures de confiance au niveau des mots et leurs applications pour la traduction automatique statistiqueLuong, Ngoc Quang 12 November 2014 (has links)
Les systèmes de traduction automatique (TA), qui génèrent automatiquement la phrase de la langue cible pour chaque entrée de la langue source, ont obtenu plusieurs réalisations convaincantes pendant les dernières décennies et deviennent les aides linguistiques efficaces pour la communauté entière dans un monde globalisé. Néanmoins, en raison de différents facteurs, sa qualité en général est encore loin de la perfection, constituant le désir des utilisateurs de savoir le niveau de confiance qu'ils peuvent mettre sur une traduction spécifique. La construction d'une méthode qui est capable d'indiquer des bonnes parties ainsi que d'identifier des erreurs de la traduction est absolument une bénéfice pour non seulement les utilisateurs, mais aussi les traducteurs, post-éditeurs, et les systèmes de TA eux-mêmes. Nous appelons cette méthode les mesures de confiance (MC). Cette thèse se porte principalement sur les méthodes des MC au niveau des mots (MCM). Le système de MCM assigne à chaque mot de la phrase cible un étiquette de qualité. Aujourd'hui, les MCM jouent un rôle croissant dans nombreux aspects de TA. Tout d'abord, elles aident les post-éditeurs d'identifier rapidement les erreurs dans la traduction et donc d'améliorer leur productivité de travail. De plus, elles informent les lecteurs des portions qui ne sont pas fiables pour éviter leur malentendu sur le contenu de la phrase. Troisièmement, elles sélectionnent la meilleure traduction parmi les sorties de plusieurs systèmes de TA. Finalement, et ce qui n'est pas le moins important, les scores MCM peuvent aider à perfectionner la qualité de TA via certains scénarios: ré-ordonnance des listes N-best, ré-décodage du graphique de la recherche, etc. Dans cette thèse, nous visons à renforcer et optimiser notre système de MCM, puis à l'exploiter pour améliorer TA ainsi que les mesures de confiance au niveau des phrases (MCP). Comparer avec les approches précédentes, nos nouvelles contributions étalent sur les points principaux comme suivants. Tout d'abord, nous intégrons différents types des paramètres: ceux qui sont extraits du système TA, avec des caractéristiques lexicales, syntaxiques et sémantiques pour construire le système MCM de base. L'application de différents méthodes d'apprentissage nous permet d'identifier la meilleure (méthode: "Champs conditionnels aléatoires") qui convient le plus nos donnés. En suite, l'efficacité de touts les paramètres est plus profond examinée en utilisant un algorithme heuristique de sélection des paramètres. Troisièmement, nous exploitons l'algorithme Boosting comme notre méthode d'apprentissage afin de renforcer la contribution des sous-ensembles des paramètres dominants du système MCM, et en conséquence d'améliorer la capacité de prédiction du système MCM. En outre, nous enquérons les contributions des MCM vers l'amélioration de la qualité de TA via différents scénarios. Dans le re-ordonnance des liste N-best, nous synthétisons les scores à partir des sorties du système MCM et puis les intégrons avec les autres scores du décodeur afin de recalculer la valeur de la fonction objective, qui nous permet d'obtenir un mieux candidat. D'ailleurs, dans le ré-décodage du graphique de la recherche, nous appliquons des scores de MCM directement aux noeuds contenant chaque mot pour mettre à jour leurs coûts. Une fois la mise à jour se termine, la recherche pour meilleur chemin sur le nouveau graphique nous donne la nouvelle hypothèse de TA. Finalement, les scores de MCM sont aussi utilisés pour renforcer les performances des systèmes de MCP. Au total, notre travail apporte une image perspicace et multidimensionnelle sur des MCM et leurs impacts positifs sur différents secteurs de la TA. Les résultats très prometteurs ouvrent une grande avenue où MCM peuvent exprimer leur rôle, comme: MCM pour la reconnaissance automatique de la parole (RAP), pour la sélection parmi plusieurs systèmes de TA, et pour les systèmes de TA auto-apprentissage. / Machine Translation (MT) systems, which generate automatically the translation of a target language for each source sentence, have achieved impressive gains during the recent decades and are now becoming the effective language assistances for the entire community in a globalized world. Nonetheless, due to various factors, MT quality is still not perfect in general, and the end users therefore expect to know how much should they trust a specific translation. Building a method that is capable of pointing out the correct parts, detecting the translation errors and concluding the overall quality of each MT hypothesis is definitely beneficial for not only the end users, but also for the translators, post-editors, and MT systems themselves. Such method is widely known under the name Confidence Estimation (CE) or Quality Estimation (QE). The motivations of building such automatic estimation methods originate from the actual drawbacks of assessing manually the MT quality: this task is time consuming, effort costly, and sometimes impossible in case where the readers have little or no knowledge of the source language. This thesis mostly focuses on the CE methods at word level (WCE). The WCE classifier tags each word in the MT output a quality label. The WCE working mechanism is straightforward: a classifier trained beforehand by a number of features using ML methods computes the confidence score of each label for each MT output word, then tag this word with highest score label. Nowadays, WCE shows an increasing importance in many aspects of MT. Firstly, it assists the post-editors to quickly identify the translation errors, hence improve their productivity. Secondly, it informs readers of portions of sentence that are not reliable to avoid the misunderstanding about the sentence's content. Thirdly, it selects the best translation among options from multiple MT systems. Last but not least, WCE scores can help to improve the MT quality via some scenarios: N-best list re-ranking, Search Graph Re-decoding, etc. In this thesis, we aim at building and optimizing our baseline WCE system, then exploiting it to improve MT and Sentence Confidence Estimation (SCE). Compare to the previous approaches, our novel contributions spread of these following main points. Firstly, we integrate various types of prediction indicators: system-based features extracted from the MT system, together with lexical, syntactic and semantic features to build the baseline WCE systems. We also apply multiple Machine Learning (ML) models on the entire feature set and then compare their performances to select the optimal one to optimize. Secondly, the usefulness of all features is deeper investigated using a greedy feature selection algorithm. Thirdly, we propose a solution that exploits Boosting algorithm as a learning method in order to strengthen the contribution of dominant feature subsets to the system, thus improve of the system's prediction capability. Lastly, we explore the contributions of WCE in improving MT quality via some scenarios. In N-best list re-ranking, we synthesize scores from WCE outputs and integrate them with decoder scores to calculate again the objective function value, then to re-order the N-best list to choose a better candidate. In the decoder's search graph re-decoding, the proposition is to apply WCE score directly to the nodes containing each word to update its cost regarding on the word quality. Furthermore, WCE scores are used to build useful features, which can enhance the performance of the Sentence Confidence Estimation system. In total, our work brings the insightful and multidimensional picture of word quality prediction and its positive impact on various sectors for Machine Translation. The promising results open up a big avenue where WCE can play its role, such as WCE for Automatic Speech Recognition (ASR) System, WCE for multiple MT selection, and WCE for re-trainable and self-learning MT systems.
|
224 |
Sistemas de memórias de tradução e tecnologias de tradução automática: possíveis efeitos na produção de tradutores em formação / Translation memory systems and machine translation: possible effects on the production of translation traineesTalhaferro, Lara Cristina Santos 26 February 2018 (has links)
Submitted by Lara Cristina Santos Talhaferro null (lara.talhaferro@hotmail.com) on 2018-03-07T01:06:11Z
No. of bitstreams: 1
Dissertação_LaraCSTalhaferro_2018.pdf: 4550332 bytes, checksum: 634c0356d3f9c55e334ef6a26a877056 (MD5) / Approved for entry into archive by Elza Mitiko Sato null (elzasato@ibilce.unesp.br) on 2018-03-07T15:46:44Z (GMT) No. of bitstreams: 1
talhaferro_lcs_me_sjrp.pdf: 4550332 bytes, checksum: 634c0356d3f9c55e334ef6a26a877056 (MD5) / Made available in DSpace on 2018-03-07T15:46:44Z (GMT). No. of bitstreams: 1
talhaferro_lcs_me_sjrp.pdf: 4550332 bytes, checksum: 634c0356d3f9c55e334ef6a26a877056 (MD5)
Previous issue date: 2018-02-26 / Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) / O processo da globalização, que tem promovido crescente circulação de informações multilíngues em escala mundial, tem proporcionado notáveis mudanças no mercado da tradução. No contexto globalizado, para manterem-se competitivos e atenderem à demanda de trabalho, a qual conta com frequentes atualizações de conteúdo e prazos reduzidos, os tradutores passaram a adotar ferramentas de tradução assistidas por computador em sua rotina de trabalho. Duas dessas ferramentas, utilizadas principalmente por tradutores das áreas técnica, científica e comercial, são os sistemas de memórias de tradução e as tecnologias de tradução automática. O emprego de tais recursos pode ter influências imprevisíveis nas traduções, sobre as quais os tradutores raramente têm oportunidade de ponderar. Se os profissionais são iniciantes ou se lhes falta experiência em determinada ferramenta, essa influência pode ser ainda maior. Considerando que os profissionais novatos tendem a utilizar cada vez mais as ferramentas disponíveis para aumentar sua eficiência, neste trabalho são investigados os possíveis efeitos do uso de sistemas de memórias de tradução e tecnologias de tradução automática, especificamente o sistema Wordfast Anywhere e um de seus tradutores automáticos, o Google Cloud Translate API, nas escolhas de graduandos em Tradução. Foi analisada a aplicação dessas ferramentas na tradução (inglês/português) de quatro abstracts designados a dez alunos do quarto ano do curso de Bacharelado em Letras com Habilitação de Tradutor da Unesp de São José do Rio Preto, divididos em três grupos: os que fizeram o uso do Wordfast Anywhere, os que utilizaram essa ferramenta para realizar a pós-edição da tradução feita pelo Google Cloud Translate API e os que não utilizaram nenhuma dessas ferramentas para traduzir os textos. Tal exame consistiu de uma análise numérica entre as traduções, com a ajuda do software Turnitin e uma análise contrastiva da produção dos alunos, em que foram considerados critérios como tempo de realização da tradução, emprego da terminologia específica, coesão e coerência textual, utilização da norma culta da língua portuguesa e adequação das traduções ao seu fim. As traduções também passaram pelo exame de profissionais das áreas sobre as quais tratam os abstracts, para avaliá-las do ponto de vista de um usuário do material traduzido. Além de realizarem as traduções, os alunos responderam a um questionário, em que esclarecem seus hábitos e suas percepções sobre as ferramentas computacionais de tradução. A análise desses trabalhos indica que a automação não influenciou significativamente na produção das traduções, confirmando nossa hipótese de que o tradutor tem papel central nas escolhas terminológicas e na adequação do texto traduzido a seu fim. / Globalization has promoted a growing flow of multilingual information worldwide, causing significant changes in translation market. In this scenario, translators have been employing computer-assisted translation tools (CAT Tools) in a proficient way to meet the demand for information translated into different languages in condensed turnarounds. Translation memory systems and machine translation are two of these tools, used especially when translating technical, scientific and commercial texts. This configuration may have inevitable influences in the production of translated texts. Nonetheless, translators seldom have the opportunity to ponder on how their production may be affected by the use of these tools, especially if they are novice in the profession or lack experience with the tools used. Seeking to examine how the work of translators in training may be influenced by translation memory systems and machine translation technologies they employ, this work investigates how a translation memory system, Wordfast Anywhere, and one of its machine translation tools, Google Cloud Translate API, may affect the choices of Translation trainees. To achieve this goal, we present an analysis of English-to-Portuguese translations of four abstracts assigned to ten students of the undergraduate Program in Languages with Major in Translation at São Paulo State University, divided into three groups: one aided by Wordfast Anywhere, one aided by Google Cloud Translate API, and one unassisted by any of these tools. This study consists of a numerical analysis, assisted by Turnitin, and a comparative analysis, whose aspects examined are the following: time spent to perform the translation, use of specific terminology, cohesion and coherence, use of standard Portuguese, and suitability for their purposes. Apart from this analysis, a group of four experts were consulted on the translations as users of their content. Finally, the students filled a questionnaire on their habits and perceptions on CAT Tools. The examination of their work suggests that automation did not influence the production of the translations significantly, confirming our hypothesis that human translators are at the core of decision-making when it comes to terminological choices and suitability of translated texts to their purpose. / 2016/07907-0
|
225 |
Tradução automática com adequação sintático-semântica para LIBRASLima, Manuella Aschoff Cavalcanti Brandão 26 August 2015 (has links)
Submitted by Clebson Anjos (clebson.leandro54@gmail.com) on 2016-02-15T21:36:06Z
No. of bitstreams: 1
arquivototal.pdf: 2545614 bytes, checksum: d022fd3dbe168cb8f6486517b7db1286 (MD5) / Made available in DSpace on 2016-02-15T21:36:06Z (GMT). No. of bitstreams: 1
arquivototal.pdf: 2545614 bytes, checksum: d022fd3dbe168cb8f6486517b7db1286 (MD5)
Previous issue date: 2015-08-26 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / Deaf people communicate naturally using visual-spatial languages, called sign languages. The sign languages (SL) are recognized as official languages in many countries, but the problems faced by deaf people to access to information remains. As a result, they have difficult to exercise their citizenship and to access information in LS. In order to minimize this problem, some works have been developed related to the machine translation of spoken languages to sign languages. However, these solutions have some limitations, since they have to generate contents for deaf with the same quality to the listeners. Thus, this work aims to develop a solution for machine translation to Brazilian Sign Language (LIBRAS) addressing syntactic-semantic issues. This solution includes a LIBRAS machine translation component; a rule description language, modeled to describe morphosyntactic-semantic machine translation rules; the definition of a grammar exploring these aspects; and the integration of these elements with VLibras, a machine translation service of digital contents in Brazilian Portuguese to LIBRAS. To evaluate the solution, some computational tests were performed using WER and BLEU metrics, along with some tests with Brazilian deaf users and LIBRAS specialists. The results show that the proposed approach could improve the results of the current version of VLIBRAS. / Pessoas surdas se comunicam naturalmente usando linguagens viso-espaciais, denominadas línguas de sinais. No entanto, apesar das línguas de sinais (LS), em muitos países, serem reconhecidas como língua, os problemas enfrentados pelos surdos no tocante ao acesso a informação permanecem. Em consequência disso, observa-se uma grande dificuldade dos surdos exercerem a sua cidadania e terem acesso à informação através das LS, o que acaba geralmente implicando em atraso linguístico e de aquisição do conhecimento. Visando propor soluções alternativas para minimizar a marginalização dos surdos, alguns trabalhos vêm sendo desenvolvidos relacionados à tradução automática de línguas orais para línguas de sinais. No entanto, as soluções existentes apresentam muitas limitações, pois precisam garantir que o conteúdo disponibilizado aos surdos chegue com a mesma qualidade que aos ouvintes. Neste sentido, o presente trabalho tem como objetivo desenvolver uma solução para tradução automática para LIBRAS com adequação sintático-semântica. Essa solução envolve um componente de tradução automática para LIBRAS; uma linguagem formal de descrição de regras, modelada para criar regras de tradução sintático-semânticas; a definição de uma gramática explorando esses aspectos; e a integração desses elementos no serviço VLibras, um serviço de tradução automática de conteúdos digitais em Português para LIBRAS. Para avaliar a solução, alguns testes computacionais utilizando as métricas WER e BLEU e com usuários surdos e ouvintes da LIBRAS foram realizados para aferir a qualidade da saída gerada pela solução. Os resultados mostram que a abordagem proposta conseguiu melhorar os resultados da versão atual do VLibras.
|
226 |
Swedish-English Verb Frame Divergences in a Bilingual Head-driven Phrase Structure Grammar for Machine Translation / Skillnader i verbramar mellan svenska och engelska i en tvåspråkig HPSG-grammatik för maskinöversättningStymne, Sara January 2006 (has links)
In this thesis I have investigated verb frame divergences in a bilingual Head-driven Phrase Structure Grammar for machine translation. The purpose was threefold: (1) to describe and classify verb frame divergences (VFDs) between Swedish and English, (2) to practically implement a bilingual grammar that covered many of the identified VFDs and (3) to find out what cases of VFDs could be solved and implemented using a common semantic representation, or interlingua, for Swedish and English. The implemented grammar, BiTSE, is a Head-driven Phrase Structure Grammar based on the LinGO Grammar Matrix, a language independent grammar base. BiTSE is a bilingual grammar containing both Swedish and English. The semantic representation used is Minimal Recursion Semantics (MRS). It is language independent, so generating from it gives all equivalent sentences in both Swedish and English. Both the core of the languages and a subset of the identified VFDs are successfully implemented in BiTSE. For other VFDs tentative solutions are discussed. MRS have previously been proposed as suitable for semantic transfer machine translation. I have shown that VFDs can naturally be handled by an interlingual design in many cases, minimizing the need of transfer. The main contributions of this thesis are: an inventory of English and Swedish verb frames and verb frame divergences; the bilingual grammar BiTSE and showing that it is possible in many cases to use MRS as an interlingua in machine translation.
|
227 |
Étude syntaxique des Wh-questions en vue de leur traduction automatique de l’anglais vers l’arabe / Syntactic study of the " Wh-Questions” for machine translation from English into ArabicLasfer-Kedad, Sandra 17 March 2014 (has links)
Premièrement, ce travail de recherche a pour objet d’esquisser une étude syntaxique des wh-questions, et d’analyser les aspects de la formation des wh-questions dans deux langues différentes : l’anglais et l’arabe , dans le cadre de la Grammaire Générative et de l’Approche Minimaliste. Il sera démontré et allégué que dans les deux langues respectives, le wh-mot qui se trouve au début de la phrase interrogative est déplacé vers le [Spec, CP] et que le wh-movement est visible.Deuxièmement, cette thèse tente d’examiner et d’analyser la traduction des wh-questions de l’anglais vers l’arabe par trois systèmes de traduction automatique, employant différentes méthodes de traduction selon trois méthodes d’évaluation. Nous décrirons les problèmes liés aux différences linguistiques entre les deux langues. Ces problèmes ont une grande influence sur la qualité et l’acceptabilité de l’output. L’évaluation de l’output nous permettra de présenter les informations concernant les aspects positifs à conserver et les aspects négatifs à faire évoluer des systèmes. En se basant sur l’étude syntaxique préalable des wh-questions, nous fournirons une étude comparative qui déterminera le meilleur système quant à la qualité de la traduction et à la performance de ce système. A travers l’analyse des résultats de l’évaluation, nous spécifierons les raisons pour lesquelles le système produit des traductions de mauvaise qualité. Enfin, nous proposerons quelques recommandations qui pourraient être nécessaires aux concepteurs et aux développeurs de systèmes de traduction afin de résoudre les problèmes linguistiques et opérationnels susceptibles d’entraver le processus de traduction. / Firstly, this research aims to outline a syntactic study of the wh-questions, and analyse aspects of wh-question formation in typologically two different languages: Arabic and English within the framework of Generative Grammar and Minimalist Approach. It will be shown and argued that in both languages, the wh-phrase, which is in initial position, is moved to [Spec, CP] and that wh-movement applies overtly.Secondly, the thesis attempts to discuss and analyse the translation of English wh-questions into Arabic by three machine translation systems using different methods of translation through different methods of evaluation. We describe a set of important problems related to linguistic differences between the two languages. These problems have great influence not only on the quality of the output but also on its acceptability. The evaluation of the output will help us to present a diagnostic information about where a given system succeeds or needs improvement, relative to its intended users and use based on the syntactic study of wh-questions, to provide a comparative information which allows identifying the best system with respect to the translation quality and performance, to specify through the analysis of the results of evaluation the sources of problems that are responsible for producing ill-formed translations and inadequate systems’ performance and finally to outline some recommendations that are useful for system’s designers and developers to overcome various linguistic and operational problems that might impede the translation process.
|
228 |
Strojový překlad pro vietnamštinu s pivotním jazykem / Pivoting Machine Translation for VietnameseHoang, Duc Tam January 2015 (has links)
Czech and Vietnamese are the national languages of the Czech Republic and Vietnam, re- spectively. The distinctive features and the shortage of resources renders Czech-Vietnamese machine translation into a difficult task, leading to the fact that no effort has been put into developing a translation tool specifically for the language pair. In this thesis, we develop phrase-based statistical machine translation systems for the language pair and investigate the potential to improve the translation quality with pivoting. Pivoting refers to a set of ma- chine translation approaches through which a natural language, called pivoting language, is introduced to solve the problem of data scarcity between source and target languages, one of the most challenging problems of statistical machine translation. Selecting English as the sole pivoting language for Czech-Vietnamese translation, we prepare training and test- ing corpora for the three language pairs. All possible corpus sources are explored regarding each specific language pair. The next step is to improve quality of the training corpora through normalizing and filtering. Various experiments with pivoting methods are carried out to analyse the performance of pivoting methods in a realistic working condition.
|
229 |
Automatická korektura chyb ve výstupu strojového překladu / Automatic Error Correction of Machine Translation OutputVariš, Dušan January 2016 (has links)
We present MLFix, an automatic statistical post-editing system, which is a spiritual successor to the rule- based system, Depfix. The aim of this thesis was to investigate the possible approaches to automatic identification of the most common morphological errors produced by the state-of-the-art machine translation systems and to train sufficient statistical models built on the acquired knowledge. We performed both automatic and manual evaluation of the system and compared the results with Depfix. The system was mainly developed on the English-to- Czech machine translation output, however, the aim was to generalize the post-editing process so it can be applied to other language pairs. We modified the original pipeline to post-edit English-German machine translation output and performed additional evaluation of this modification. Powered by TCPDF (www.tcpdf.org)
|
230 |
Modèles de traduction évolutifs / Evolutive translation modelsBlain, Frédéric 23 September 2013 (has links)
Bien que la recherche ait fait progresser la traduction automatique depuis plusieurs années, la sortie d’un système automatisé ne peut être généralement publiée sans avoir été révisée humainement au préalable, et corrigée le cas échéant. Forts de ce constat, nous avons voulu exploiter ces retours utilisateurs issus du processus de révision pour adapter notre système statistique dans le temps, au moyen d’une approche incrémentale.Dans le cadre de cette thèse Cifre-Défense, nous nous sommes donc intéressés à la postédition, un des champs de recherche les plus actifs du moment, et qui plus est très utilisé dans l’industrie de la traduction et de la localisation.L’intégration de retours utilisateurs n’est toutefois pas une tâche aussi évidente qu’il n’y paraît. D’une part, il faut être capable d’identifier l’information qui sera utile au système, parmi l’ensemble des modifications apportées par l’utilisateur. Pour répondre à cette problématique, nous avons introduit une nouvelle notion (les « Actions de Post-Édition »), et proposé une méthodologie d’analyse permettant l’identification automatique de cette information à partir de données post-éditées. D’autre part, concernant l’intégration continue des retours utilisateurs nous avons développé un algorithme d’adaptation incrémentale pour un système de traduction statistique, lequel obtient des performances supérieures à la procédure standard. Ceci est d’autant plus intéressant que le développement et l’optimisation d’un tel système de traduction estune tâche très coûteuse en ressources computationnelles, nécessitant parfois jusqu’à plusieurs jours de calcul.Conduits conjointement au sein de l’entreprise SYSTRAN et du LIUM, les travaux de recherche de cette thèse s’inscrivent dans le cadre du projet ANR COSMAT 1. En partenariat avec l’INRIA, ce projet avait pour objectif de fournir à la communauté scientifique un service collaboratif de traduction automatique de contenus scientifiques. Outre les problématiques liéesà ce type de contenu (adaptation au domaine, reconnaissance d’entités scientifiques, etc.), c’est l’aspect collaboratif de ce service avec la possibilité donnée aux utilisateurs de réviser les traductions qui donne un cadre applicatif à nos travaux de recherche. / Although machine translation research achieved big progress for several years, the output of an automated system cannot be published without prior revision by human annotators. Based on this fact, we wanted to exploit the user feedbacks from the review process in order to incrementally adapt our statistical system over time.As part of this thesis, we are therefore interested in the post-editing, one of the most active fields of research, and what is more widely used in the translation and localization industry.However, the integration of user feedbacks is not an obvious task. On the one hand, we must be able to identify the information that will be useful for the system, among all changes made by the user. To address this problem, we introduced a new concept (the “Post-Editing Actions”), and proposed an analysis methodology for automatic identification of this information from post-edited data. On the other hand, for the continuous integration of user feedbacks, we havedeveloped an algorithm for incremental adaptation of a statistical machine translation system, which gets higher performance than the standard procedure. This is even more interesting as both development and optimization of this type of translation system has a very computational cost, sometimes requiring several days of computing.Conducted jointly with SYSTRAN and LIUM, the research work of this thesis is part of the French Government Research Agency project COSMAT 2. This project aimed to provide a collaborative machine translation service for scientific content to the scientific community. The collaborative aspect of this service with the possibility for users to review the translations givesan application framework for our research.
|
Page generated in 0.0286 seconds