• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 3
  • 3
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Improving statistical machine translation with linguistic information

Hoang, Hieu January 2011 (has links)
Statistical machine translation (SMT) should benefit from linguistic information to improve performance but current state-of-the-art models rely purely on data-driven models. There are several reasons why prior efforts to build linguistically annotated models have failed or not even been attempted. Firstly, the practical implementation often requires too much work to be cost effective. Where ad-hoc implementations have been created, they impose too strict constraints to be of general use. Lastly, many linguistically-motivated approaches are language dependent, tackling peculiarities in certain languages that do not apply to other languages. This thesis successfully integrates linguistic information about part-of-speech tags, lemmas and phrase structure to improve MT quality. The major contributions of this thesis are: 1. We enhance the phrase-based model to incorporate linguistic information as additional factors in the word representation. The factored phrase-based model allows us to make use of different types of linguistic information in a systematic way within the predefined framework. We show how this model improves translation by as much as 0.9 BLEU for small German-English training corpora, and 0.2 BLEU for larger corpora. 2. We extend the factored model to the factored template model to focus on improving reordering. We show that by generalising translation with part-of-speech tags, we can improve performance by as much as 1.1 BLEU on a small French- English system. 3. Finally, we switch from the phrase-based model to a syntax-based model with the mixed syntax model. This allows us to transition from the word-level approaches using factors to multiword linguistic information such as syntactic labels and shallow tags. The mixed syntax model uses source language syntactic information to inform translation. We show that the model is able to explain translation better, leading to a 0.8 BLEU improvement over the baseline hierarchical phrase-based model for a small German-English task. Also, the model requires only labels on continuous source spans, it is not dependent on a tree structure, therefore, other types of syntactic information can be integrated into the model. We experimented with a shallow parser and see a gain of 0.5 BLEU for the same dataset. Training with more training data, we improve translation by 0.6 BLEU (1.3 BLEU out-of-domain) over the hierarchical baseline. During the development of these three models, we discover that attempting to rigidly model translation as linguistic transfer process results in degraded performance. However, by combining the advantages of standard SMT models with linguistically-motivated models, we are able to achieve better translation performance. Our work shows the importance of balancing the specificity of linguistic information with the robustness of simpler models.
2

The mat sat on the cat : investigating structure in the evaluation of order in machine translation

McCaffery, Martin January 2017 (has links)
We present a multifaceted investigation into the relevance of word order in machine translation. We introduce two tools, DTED and DERP, each using dependency structure to detect differences between the structures of machine-produced translations and human-produced references. DTED applies the principle of Tree Edit Distance to calculate edit operations required to convert one structure into another. Four variants of DTED have been produced, differing in the importance they place on words which match between the two sentences. DERP represents a more detailed procedure, making use of the dependency relations between words when evaluating the disparities between paths connecting matching nodes. In order to empirically evaluate DTED and DERP, and as a standalone contribution, we have produced WOJ-DB, a database of human judgments. Containing scores relating to translation adequacy and more specifically to word order quality, this is intended to support investigations into a wide range of translation phenomena. We report an internal evaluation of the information in WOJ-DB, then use it to evaluate variants of DTED and DERP, both to determine their relative merit and their strength relative to third-party baselines. We present our conclusions about the importance of structure to the tools and their relevance to word order specifically, then propose further related avenues of research suggested or enabled by our work.
3

Generierung von natürlichsprachlichen Texten aus semantischen Strukturen im Prozeß der maschinellen Übersetzung - Allgemeine Strukturen und Abbildungen

Rosenpflanzer, Lutz, Karl, Hans-Ulrich 14 December 2012 (has links)
0 VORWORT Bei der maschinellen Übersetzung natürlicher Sprache dominieren mehrere Probleme. Man hat es immer mit sehr großen Datenmengen zu tun. Auch wenn man nur einen kleinen Text übersetzen will, ist diese Aufgabe in umfänglichen Kontext eingebettet, d.h. alles Wissen über Quell- und Zielsprache muß - in möglichst formalisierter Form - zur Verfügung stehen. Handelt es sich um gesprochenes Wort treten Spracherkennungs- und Sprachausgabeaufgaben sowie harte Echtzeitforderungen hinzu. Die Komplexität des Problems ist - auch unter Benutzung moderner Softwareentwicklungskonzepte - für jeden, der eine Implementation versucht, eine nicht zu unterschätzende Herausforderung. Ansätze, die die Arbeitsprinzipien und Methoden der Informatik konsequent nutzen, stellen ihre Ergebnisse meist nur prototyisch für einen sehr kleinen Teil der Sprache -etwa eine Phrase, einen Satz bzw. mehrere Beispielsätze- heraus und folgern mehr oder weniger induktiv, daß die entwickelte Lösung auch auf die ganze Sprache erfolgreich angewendet werden kann, wenn man nur genügend „Lemminge“ hat, die nach allen Seiten ausschwärmend, die „noch notwendigen Routinearbeiten“ schnell und bienenfleißig ausführen könnten.:0 Vorwort S. 2 1 Allgemeiner Ablauf der Generierung S. 3 1.1 AUFGABE DER GENERIERUNG S. 3 1.2 EINORDNUNG DER GENERIERUNG IN DIE MASCHINELLE ÜBERSETZUNG S.4 1.3 REALISIERUNG S. 4 1.4 MORPHOLOGISCHE GENERIERUNG S.6 2 Strukturen und Abbildungen S. 8 2.1 UNIVERSELLE STRUKTUR: DEFINITION VON GRAPHEN S.8 2.2 FORMALISIERUNG SPEZIELLER SEMANTISCHER STRUKTUREN ALS GRAPHEN S.9 2.3 ABBILDUNG VON STRUKTUREN S.11 2.3.1 Strukturtyperhaltende Funktionen S. 12 2.3.2 Strukturtypverändernde Funktionen S. 19 2.3.3 Komplexe Funktionen S. 20 2.3.4 Abbildung eines gesamten Generierungsprozesses S. 21 4 Beispiel: Generierung von Texten aus prädikatenlogischen Ausdrücken (inkrementeller Algorithmus) S. 23 4.1 ABLAUF S.23 4.2 BEISPIELE VON REGELSTRUKTUREN S.27 5 Zusammenfassung S. 28 6 Quellenverzeichnis S. 30

Page generated in 0.2283 seconds