• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 8
  • 3
  • 1
  • Tagged with
  • 15
  • 15
  • 15
  • 11
  • 10
  • 7
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Refinements in hierarchical phrase-based translation systems

Pino, Juan Miguel January 2015 (has links)
The relatively recently proposed hierarchical phrase-based translation model for statistical machine translation (SMT) has achieved state-of-the-art performance in numerous recent translation evaluations. Hierarchical phrase-based systems comprise a pipeline of modules with complex interactions. In this thesis, we propose refinements to the hierarchical phrase-based model as well as improvements and analyses in various modules for hierarchical phrase-based systems. We took the opportunity of increasing amounts of available training data for machine translation as well as existing frameworks for distributed computing in order to build better infrastructure for extraction, estimation and retrieval of hierarchical phrase-based grammars. We design and implement grammar extraction as a series of Hadoop MapReduce jobs. We store the resulting grammar using the HFile format, which offers competitive trade-offs in terms of efficiency and simplicity. We demonstrate improvements over two alternative solutions used in machine translation. The modular nature of the SMT pipeline, while allowing individual improvements, has the disadvantage that errors committed by one module are propagated to the next. This thesis alleviates this issue between the word alignment module and the grammar extraction and estimation module by considering richer statistics from word alignment models in extraction. We use alignment link and alignment phrase pair posterior probabilities for grammar extraction and estimation and demonstrate translation improvements in Chinese to English translation. This thesis also proposes refinements in grammar and language modelling both in the context of domain adaptation and in the context of the interaction between first-pass decoding and lattice rescoring. We analyse alternative strategies for grammar and language model cross-domain adaptation. We also study interactions between first-pass and second-pass language model in terms of size and n-gram order. Finally, we analyse two smoothing methods for large 5-gram language model rescoring. The last two chapters are devoted to the application of phrase-based grammars to the string regeneration task, which we consider as a means to study the fluency of machine translation output. We design and implement a monolingual phrase-based decoder for string regeneration and achieve state-of-the-art performance on this task. By applying our decoder to the output of a hierarchical phrase-based translation system, we are able to recover the same level of translation quality as the translation system.
2

Analýza chyb a možností zlepšení frázového strojového překladu z angličtiny do urdštiny / Analýza chyb a možností zlepšení frázového strojového překladu z angličtiny do urdštiny

Ata, Naila January 2010 (has links)
No description available.
3

Analýza chyb a možností zlepšení frázového strojového překladu z angličtiny do urdštiny

Ata, Naila January 2011 (has links)
This thesis evaluates the translation quality of phrase-based machine translation system. It explains the translation error annotation scheme to manually annotate errors related to English to Urdu translation system. The primary goal of the thesis is to experiment with different heuristic in order to improve the translation quality based on through manual analysis of 200 test sentences. Different hueristics such as (1) pre-processing of input English, such as word reordering, (2) preprocessing the training corpus in order to improve word alignment, (3) using additional factors (in Moses factored translation) to better model target-side morphological coherence are applied and their impact on the translation quality is evaluated.
4

Discriminative Alignment Models For Statistical Machine Translation

Tomeh, Nadi 27 June 2012 (has links) (PDF)
Bitext alignment is the task of aligning a text in a source language and its translation in the target language. Aligning amounts to finding the translational correspondences between textual units at different levels of granularity. Many practical natural language processing applications rely on bitext alignments to access the rich linguistic knowledge present in a bitext. While the most predominant application for bitexts is statistical machine translation, they are also used in multilingual (and monolingual) lexicography, word sense disambiguation, terminology extraction, computer-aided language learning andtranslation studies, to name a few.Bitext alignment is an arduous task because meaning is not expressed seemingly across languages. It varies along linguistic properties and cultural backgrounds of different languages, and also depends on the translation strategy that have been used to produce the bitext.Current practices in bitext alignment model the alignment as a hidden variable in the translation process. In order to reduce the complexity of the task, such approaches suppose that a word in the source sentence is aligned to one word at most in the target sentence.However, this over-simplistic assumption results in asymmetric, one-to-many alignments, whereas alignments are typically symmetric and many-to-many.To achieve symmetry, two one-to-many alignments in opposite translation directions are built and combined using a heuristic.In order to use these word alignments in phrase-based translation systems which use phrases instead of words, a heuristic is used to extract phrase pairs that are consistent with the word alignment.In this dissertation we address both the problems of word alignment and phrase pairs extraction.We improve the state of the art in several ways using discriminative learning techniques.We present a maximum entropy (MaxEnt) framework for word alignment.In this framework, links are predicted independently from one another using a MaxEnt classifier.The interaction between alignment decisions is approximated using stackingtechniques, which allows us to account for a part of the structural dependencies without increasing the complexity. This formulation can be seen as an alignment combination method,in which the union of several input alignments is used to guide the output alignment. Additionally, input alignments are used to compute a rich set of feature functions.Our MaxEnt aligner obtains state of the art results in terms of alignment quality as measured by thealignment error rate, and translation quality as measured by BLEU on large-scale Arabic-English NIST'09 systems.We also present a translation quality informed procedure for both extraction and evaluation of phrase pairs. We reformulate the problem in the supervised framework in which we decide for each phrase pair whether we keep it or not in the translation model. This offers a principled way to combine several features to make the procedure more robust to alignment difficulties. We use a simple and effective method, based on oracle decoding,to annotate phrase pairs that are useful for translation. Using machine learning techniques based on positive examples only,these annotations can be used to learn phrase alignment decisions. Using this approach we obtain improvements in BLEU scores for recall-oriented translation models, which are suitable for small training corpora.
5

On improving natural language processing through phrase-based and one-to-one syntactic algorithms

Meyer, Christopher Henry January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / William H. Hsu / Machine Translation (MT) is the practice of using computational methods to convert words from one natural language to another. Several approaches have been created since MT’s inception in the 1950s and, with the vast increase in computational resources since then, have continued to evolve and improve. In this thesis I summarize several branches of MT theory and introduce several newly developed software applications, several parsing techniques to improve Japanese-to-English text translation, and a new key algorithm to correct translation errors when converting from Japanese kanji to English. The overall translation improvement is measured using the BLEU metric (an objective, numerical standard in Machine Translation quality analysis). The baseline translation system was built by combining Giza++, the Thot Phrase-Based SMT toolkit, the SRILM toolkit, and the Pharaoh decoder. The input and output parsing applications were created as intermediary to improve the baseline MT system as to eliminate artificially high improvement metrics. This baseline was measured with and without the additional parsing provided by the thesis software applications, and also with and without the thesis kanji correction utility. The new algorithm corrected for many contextual definition mistakes that are common when converting from Japanese to English text. By training the new kanji correction utility on an existing dictionary, identifying source text in Japanese with a high number of possible translations, and checking the baseline translation against other translation possibilities; I was able to increase the translation performance of the baseline system from minimum normalized BKEU scores of .0273 to maximum normalized scores of .081. The preliminary phase of making improvements to Japanese-to-English translation focused on correcting segmentation mistakes that occur when attempting to parse Japanese text into meaningful tokens. The initial increase is not indicative of future potential and is artificially high as the baseline score was so low to begin with, but was needed to create a reasonable baseline score. The final results of the tests confirmed that a significant, measurable improvement had been achieved through improving the initial segmentation of the Japanese text through parsing the input corpora and through correcting kanji translations after the Pharaoh decoding process had completed.
6

Stream-based statistical machine translation

Levenberg, Abby D. January 2011 (has links)
We investigate a new approach for SMT system training within the streaming model of computation. We develop and test incrementally retrainable models which, given an incoming stream of new data, can efficiently incorporate the stream data online. A naive approach using a stream would use an unbounded amount of space. Instead, our online SMT system can incorporate information from unbounded incoming streams and maintain constant space and time. Crucially, we are able to match (or even exceed) translation performance of comparable systems which are batch retrained and use unbounded space. Our approach is particularly suited for situations when there is arbitrarily large amounts of new training material and we wish to incorporate it efficiently and in small space. The novel contributions of this thesis are: 1. An online, randomised language model that can model unbounded input streams in constant space and time. 2. An incrementally retrainable translationmodel for both phrase-based and grammarbased systems. The model presented is efficient enough to incorporate novel parallel text at the single sentence level. 3. Strategies for updating our stream-based language model and translation model which demonstrate how such components can be successfully used in a streaming translation setting. This operates both within a single streaming environment and also in the novel situation of having to translate multiple streams. 4. Demonstration that recent data from the stream is beneficial to translation performance. Our stream-based SMT system is efficient for tackling massive volumes of new training data and offers-up new ways of thinking about translating web data and dealing with other natural language streams.
7

Discriminative Alignment Models For Statistical Machine Translation / Modèles Discriminants d'Alignement Pour La Traduction Automatique Statistique

Tomeh, Nadi 27 June 2012 (has links)
La tâche d'alignement d'un texte dans une langue source avec sa traduction en langue cible est souvent nommée alignement de bi-textes. Elle a pour but de faire émerger les relations de traduction qui peuvent s'exprimer à différents niveaux de granularité entre les deux faces du bi-texte. De nombreuses applications de traitement automatique des langues naturelles s'appuient sur cette étape afin d'accéder à des connaissances linguistiques de plus haut niveau.Parmi ces applications, nous pouvons citer bien sûr la traduction automatique, mais également l'extraction de lexiques et de terminologies bilingues, la désambigüisation sémantique ou l'apprentissage des langues assisté par ordinateur.La complexité de la tâche d'alignement de bi-textes s'explique par les différences linguistiques entre les langues. Ces différences peuvent être d'ordre sémantique, syntaxique, ou morphologique.Dans le cadre des approches probabilistes, l'alignement de bi-textes est modélisé par un ensemble de variables aléatoires cachés. Afin de réduire la complexité du problème, le processus aléatoire sous-jacent fait l'hypothèse simplificatrice qu'un mot en langue source est lié à au plus un mot en langue cible, ce qui induit une relation de traduction asymétrique. Néanmoins, cette hypothèse est simpliste, puisque les alignements peuvent de manière générale impliquer des groupes de mots dans chacune des langues. Afin de rétablir cette symétrie, chaque langue est considérée tour à tour comme la langue source et les deux alignements asymétriques résultants sont combinés à l'aide d'une heuristique. Cette étape de symétrisation revêt une importance particulière dans l’approche standard en traduction automatique, puisqu'elle précède l'extraction des unités de traduction, à savoir les paires de segments.L'objectif de cette thèse est de proposer de nouvelles approches pour d'une part l'alignement debi-texte, et d'autre part l'extraction des unités de traduction. La spécificité de notre approche consiste à remplacer les heuristiques utilisées par des modèles d'apprentissage discriminant.Nous présentons un modèle "Maximum d'entropie'' (ou MaxEnt) pour l'alignement de mot, pour lequel chaque lien d'alignement est prédit de manière indépendante. L'interaction entre les liens d'alignement est alors prise en compte par l'empilement ("stacking'') d'un second modèle prenant en compte la structure à prédire sans pour autant augmenter la complexité globale. Cette formulation peut être vue comme une manière d'apprendre la combinaison de différentes méthodes d'alignement: le modèle considère ainsi l'union des alignements d'entrées pour en sélectionner les liens jugés fiables. Le modèle MaxEnt proposé permet d'améliorer les performances d'un système état de l'art de traduction automatique en considérant le jeu de données de la tâche NIST'09, Arabe vers Anglais. Ces améliorations sont mesurées en terme de taux d'erreur sur les alignements et aussi en terme de qualité de traduction via la métrique automatique BLEU.Nous proposons également un modèle permettant à la fois de sélectionner et d'évaluer les unités de traduction extraites d'un bi texte aligné. Ces deux étapes sont reformulées dans le cadre de l'apprentissage supervisé, afin de modéliser la décision de garder ou pas une paire de segments comme une unité fiable de traduction. Ce cadre permet l'utilisation de caractéristiques riches et nombreuses favorisant ainsi une décision robuste. Nous proposons une méthode simple et efficace pour annoter les paires de segments utiles pour la traduction. Le problème d'apprentissage automatique qui se pose alors est particulier, puisque nous disposons que d'exemples positifs. Nous proposons donc d'utiliser l'approche SVM à une classe afin de modéliser la sélection des unités de traduction.Grâce à cette approche, nous obtenons des améliorations significatives en terme de score BLEU pour un système entrainé avec un petit ensemble de données. / Bitext alignment is the task of aligning a text in a source language and its translation in the target language. Aligning amounts to finding the translational correspondences between textual units at different levels of granularity. Many practical natural language processing applications rely on bitext alignments to access the rich linguistic knowledge present in a bitext. While the most predominant application for bitexts is statistical machine translation, they are also used in multilingual (and monolingual) lexicography, word sense disambiguation, terminology extraction, computer-aided language learning andtranslation studies, to name a few.Bitext alignment is an arduous task because meaning is not expressed seemingly across languages. It varies along linguistic properties and cultural backgrounds of different languages, and also depends on the translation strategy that have been used to produce the bitext.Current practices in bitext alignment model the alignment as a hidden variable in the translation process. In order to reduce the complexity of the task, such approaches suppose that a word in the source sentence is aligned to one word at most in the target sentence.However, this over-simplistic assumption results in asymmetric, one-to-many alignments, whereas alignments are typically symmetric and many-to-many.To achieve symmetry, two one-to-many alignments in opposite translation directions are built and combined using a heuristic.In order to use these word alignments in phrase-based translation systems which use phrases instead of words, a heuristic is used to extract phrase pairs that are consistent with the word alignment.In this dissertation we address both the problems of word alignment and phrase pairs extraction.We improve the state of the art in several ways using discriminative learning techniques.We present a maximum entropy (MaxEnt) framework for word alignment.In this framework, links are predicted independently from one another using a MaxEnt classifier.The interaction between alignment decisions is approximated using stackingtechniques, which allows us to account for a part of the structural dependencies without increasing the complexity. This formulation can be seen as an alignment combination method,in which the union of several input alignments is used to guide the output alignment. Additionally, input alignments are used to compute a rich set of feature functions.Our MaxEnt aligner obtains state of the art results in terms of alignment quality as measured by thealignment error rate, and translation quality as measured by BLEU on large-scale Arabic-English NIST'09 systems.We also present a translation quality informed procedure for both extraction and evaluation of phrase pairs. We reformulate the problem in the supervised framework in which we decide for each phrase pair whether we keep it or not in the translation model. This offers a principled way to combine several features to make the procedure more robust to alignment difficulties. We use a simple and effective method, based on oracle decoding,to annotate phrase pairs that are useful for translation. Using machine learning techniques based on positive examples only,these annotations can be used to learn phrase alignment decisions. Using this approach we obtain improvements in BLEU scores for recall-oriented translation models, which are suitable for small training corpora.
8

Machine Translation Of Fictional And Non-fictional Texts : An examination of Google Translate's accuracy on translation of fictional versus non-fictional texts.

Salimi, Jonni January 2014 (has links)
This study focuses on and tries to identify areas where machine translation can be useful by examining translated fictional and non-fictional texts, and the extent to which these different text types are better or worse suited for machine translation.  It additionally evaluates the performance of the free online translation tool Google Translate (GT). The BLEU automatic evaluation metric for machine translation was used for this study, giving a score of 27.75 BLEU value for fictional texts and 32.16 for the non-fictional texts. The non-fictional texts are samples of law documents, (commercial) company reports, social science texts (religion, welfare, astronomy) and medicine. These texts were selected because of their degree of difficulty. The non-fictional sentences are longer than those of the fictional texts and in this regard MT systems have struggled. In spite of having longer sentences, the non-fictional texts got a higher BLUE score than the fictional ones. It is speculated that one reason for the higher score of non-fictional texts might be that more specific terminology is used in these texts, leaving less room for subjective interpretation than for the fictional texts. There are other levels of meaning at work in the fictional texts that the human translator needs to capture.
9

Traduction statistique par recherche locale

Monty, Pierre Paul 08 1900 (has links)
La traduction statistique vise l’automatisation de la traduction par le biais de modèles statistiques. Dans ce travail, nous relevons un des grands défis du domaine : la recherche (Brown et al., 1993). Les systèmes de traduction statistique de référence, tel Moses (Koehn et al., 2007), effectuent généralement la recherche en explorant l’espace des préfixes par programmation dynamique, une solution coûteuse sur le plan computationnel pour ce problème potentiellement NP-complet (Knight, 1999). Nous postulons qu’une approche par recherche locale (Langlais et al., 2007) peut mener à des solutions tout aussi intéressantes en un temps et un espace mémoire beaucoup moins importants (Russell et Norvig, 2010). De plus, ce type de recherche facilite l’incorporation de modèles globaux qui nécessitent des traductions complètes et permet d’effectuer des modifications sur ces dernières de manière non-continue, deux tâches ardues lors de l’exploration de l’espace des préfixes. Nos expériences nous révèlent que la recherche locale en traduction statistique est une approche viable, s’inscrivant dans l’état de l’art. / Statistical machine translation is a concerted effort towards the automation of the translation process. In the work presented here, we explore one of the major challenges of statistical machine translation: the search step (Brown et al., 1993). State of the art systems such as Moses (Koehn et al., 2007) search by exploring the prefix search space, a computationally costly solution to this potentially NP-complete problem (Knight, 1999). We propose that a local search approach can yield solutions which are qualitatively just as interesting, while keeping memory space and execution time at lower levels (Russell et Norvig, 2010). Furthermore, this type of search facilitates the use of global models for which a complete translation is needed and allows for non-continuous modifications, two tasks made difficult by exploring the prefix search space. The experiments we have conducted reveal that the use of local search during the search step in statistical machine translation is a viable, state of the art approach.
10

Traduction statistique par recherche locale

Monty, Pierre Paul 08 1900 (has links)
La traduction statistique vise l’automatisation de la traduction par le biais de modèles statistiques. Dans ce travail, nous relevons un des grands défis du domaine : la recherche (Brown et al., 1993). Les systèmes de traduction statistique de référence, tel Moses (Koehn et al., 2007), effectuent généralement la recherche en explorant l’espace des préfixes par programmation dynamique, une solution coûteuse sur le plan computationnel pour ce problème potentiellement NP-complet (Knight, 1999). Nous postulons qu’une approche par recherche locale (Langlais et al., 2007) peut mener à des solutions tout aussi intéressantes en un temps et un espace mémoire beaucoup moins importants (Russell et Norvig, 2010). De plus, ce type de recherche facilite l’incorporation de modèles globaux qui nécessitent des traductions complètes et permet d’effectuer des modifications sur ces dernières de manière non-continue, deux tâches ardues lors de l’exploration de l’espace des préfixes. Nos expériences nous révèlent que la recherche locale en traduction statistique est une approche viable, s’inscrivant dans l’état de l’art. / Statistical machine translation is a concerted effort towards the automation of the translation process. In the work presented here, we explore one of the major challenges of statistical machine translation: the search step (Brown et al., 1993). State of the art systems such as Moses (Koehn et al., 2007) search by exploring the prefix search space, a computationally costly solution to this potentially NP-complete problem (Knight, 1999). We propose that a local search approach can yield solutions which are qualitatively just as interesting, while keeping memory space and execution time at lower levels (Russell et Norvig, 2010). Furthermore, this type of search facilitates the use of global models for which a complete translation is needed and allows for non-continuous modifications, two tasks made difficult by exploring the prefix search space. The experiments we have conducted reveal that the use of local search during the search step in statistical machine translation is a viable, state of the art approach.

Page generated in 0.0489 seconds