1 |
Automatic summarising of English textsTait, J. I. January 1982 (has links)
No description available.
|
2 |
Graph grammars : an approach to transfer-based M.T. exemplified by a Turkish-English systemCarroll, Jeremy J. January 1989 (has links)
No description available.
|
3 |
Combining Outputs from On-Line Translation Systems on Mobile DevicesChen, Yi-Chang 08 September 2009 (has links)
In this research, we propose two different frameworks combining outputs from multiple on-line machine translation systems. We train the language model and translation model from IWSLT07 training data. The first framework consists of several modules, including selection, substitution, insertion, and deletion. In the second framework, after selection, we use a maximum entropy classifier to classify each word in the selected hypothesis according to Damerau-Levenshtein distance. According to these classification results, each word in the selected hypothesis are processed with different post-processing. We evaluate these combination frameworks on IWSLT07 task. It contains tourism-related sentences. The translation direction is from Chinese to English in our test set. Three on-line machine translation systems, Google, Yahoo, and TransWhiz are used in the investigation. The experimental results show that first combination framework improves BLEU score from 19.15% to 20.55%. The second combination framework improves BLEU from 19.15% to 20.47%. These frameworks achieves absolute improvement of 1.4% and 1.32% in BLEU score, respectively.
|
4 |
Improving statistical machine translation with linguistic informationHoang, Hieu January 2011 (has links)
Statistical machine translation (SMT) should benefit from linguistic information to improve performance but current state-of-the-art models rely purely on data-driven models. There are several reasons why prior efforts to build linguistically annotated models have failed or not even been attempted. Firstly, the practical implementation often requires too much work to be cost effective. Where ad-hoc implementations have been created, they impose too strict constraints to be of general use. Lastly, many linguistically-motivated approaches are language dependent, tackling peculiarities in certain languages that do not apply to other languages. This thesis successfully integrates linguistic information about part-of-speech tags, lemmas and phrase structure to improve MT quality. The major contributions of this thesis are: 1. We enhance the phrase-based model to incorporate linguistic information as additional factors in the word representation. The factored phrase-based model allows us to make use of different types of linguistic information in a systematic way within the predefined framework. We show how this model improves translation by as much as 0.9 BLEU for small German-English training corpora, and 0.2 BLEU for larger corpora. 2. We extend the factored model to the factored template model to focus on improving reordering. We show that by generalising translation with part-of-speech tags, we can improve performance by as much as 1.1 BLEU on a small French- English system. 3. Finally, we switch from the phrase-based model to a syntax-based model with the mixed syntax model. This allows us to transition from the word-level approaches using factors to multiword linguistic information such as syntactic labels and shallow tags. The mixed syntax model uses source language syntactic information to inform translation. We show that the model is able to explain translation better, leading to a 0.8 BLEU improvement over the baseline hierarchical phrase-based model for a small German-English task. Also, the model requires only labels on continuous source spans, it is not dependent on a tree structure, therefore, other types of syntactic information can be integrated into the model. We experimented with a shallow parser and see a gain of 0.5 BLEU for the same dataset. Training with more training data, we improve translation by 0.6 BLEU (1.3 BLEU out-of-domain) over the hierarchical baseline. During the development of these three models, we discover that attempting to rigidly model translation as linguistic transfer process results in degraded performance. However, by combining the advantages of standard SMT models with linguistically-motivated models, we are able to achieve better translation performance. Our work shows the importance of balancing the specificity of linguistic information with the robustness of simpler models.
|
5 |
Formulaic expressions in computer-assisted translation : a specialised translation approachFernández Parra, Maria Asunción January 2011 (has links)
No description available.
|
6 |
A Comparative Analysis of Web-based Machine Translation Quality: English to French and French to EnglishBarnhart, Zachary 12 1900 (has links)
This study offers a partial reduplication of a 2006 study by Williams, which focused primarily on the analysis of the quality of translation produced by online software, namely Yahoo!® Babelfish, Freetranslation.com, and Google Translate. Since the data for the study by Williams were collected in 2004 and the data for present study in 2012, this gives a lapse of eight years for a diachronic analysis of the differences in quality of the translations provided by these online services. At the time of the 2006 study by Williams, all three services used a rule-based translation system, but, in October 2007, however, Google Translate switched to a system that is entirely statistical in nature. Thus, the present study is also able to examine the differences in quality between contemporary statistical and rule-based approaches to machine translation.
|
7 |
Stone Soup Translation: The Linked Automata ModelDavis, Paul C. 02 July 2002 (has links)
No description available.
|
8 |
Comparing Encoder-Decoder Architectures for Neural Machine Translation: A Challenge Set ApproachDoan, Coraline 19 November 2021 (has links)
Machine translation (MT) as a field of research has known significant advances in recent years, with the increased interest for neural machine translation (NMT). By combining deep learning with translation, researchers have been able to deliver systems that perform better than most, if not all, of their predecessors. While the general consensus regarding NMT is that it renders higher-quality translations that are overall more idiomatic, researchers recognize that NMT systems still struggle to deal with certain classic difficulties, and that their performance may vary depending on their architecture. In this project, we implement a challenge-set based approach to the evaluation of examples of three main NMT architectures: convolutional neural network-based systems (CNN), recurrent neural network-based (RNN) systems, and attention-based systems, trained on the same data set for English to French translation. The challenge set focuses on a selection of lexical and syntactic difficulties (e.g., ambiguities) drawn from literature on human translation, machine translation, and writing for translation, and also includes variations in sentence lengths and structures that are recognized as sources of difficulties even for NMT systems. This set allows us to evaluate performance in multiple areas of difficulty for the systems overall, as well as to evaluate any differences between architectures’ performance. Through our challenge set, we found that our CNN-based system tends to reword sentences, sometimes shifting their meaning, while our RNN-based system seems to perform better when provided with a larger context, and our attention-based system seems to struggle the longer a sentence becomes.
|
9 |
Unification-based constraints for statistical machine translationWilliams, Philip James January 2014 (has links)
Morphology and syntax have both received attention in statistical machine translation research, but they are usually treated independently and the historical emphasis on translation into English has meant that many morphosyntactic issues remain under-researched. Languages with richer morphologies pose additional problems and conventional approaches tend to perform poorly when either source or target language has rich morphology. In both computational and theoretical linguistics, feature structures together with the associated operation of unification have proven a powerful tool for modelling many morphosyntactic aspects of natural language. In this thesis, we propose a framework that extends a state-of-the-art syntax-based model with a feature structure lexicon and unification-based constraints on the target-side of the synchronous grammar. Whilst our framework is language-independent, we focus on problems in the translation of English to German, a language pair that has a high degree of syntactic reordering and rich target-side morphology. We first apply our approach to modelling agreement and case government phenomena. We use the lexicon to link surface form words with grammatical feature values, such as case, gender, and number, and we use constraints to enforce feature value identity for the words in agreement and government relations. We demonstrate improvements in translation quality of up to 0.5 BLEU over a strong baseline model. We then examine verbal complex production, another aspect of translation that requires the coordination of linguistic features over multiple words, often with long-range discontinuities. We develop a feature structure representation of verbal complex types, using constraint failure as an indicator of translation error and use this to automatically identify and quantify errors that occur in our baseline system. A manual analysis and classification of errors informs an extended version of the model that incorporates information derived from a parse of the source. We identify clause spans and use model features to encourage the generation of complete verbal complex types. We are able to improve accuracy as measured using precision and recall against values extracted from the reference test sets. Our framework allows for the incorporation of rich linguistic information and we present sketches of further applications that could be explored in future work.
|
10 |
Introducing corpus-based rules and algorithms in a rule-based machine translation systemDugast, Loic January 2013 (has links)
Machine translation offers the challenge of automatically translating a text from one natural language into another. Statistical methods - originating from the field of information theory - have shown to be a major breakthrough in the field of machine translation. Prior to this paradigm, many systems had been developed following a rule-based approach. This denotes a system based on a linguistic description of the languages involved and of how translation occurs in the mind of the (human) translator. Statistical models on the contrary use empirical means and may work with very little linguistic hypothesis on language and translation as performed by humans. This had implications for rule-based translation systems, in terms of software architecture and the nature of the rules, which were manually input and lack any statistical feature. In the view of such diverging paradigms, we can imagine trying to combine both in a hybrid system. In the present work, we start by examining the state-of-the-art of both rule-based and statistical systems. We restrict the rule-based approach to transfer-based systems. We compare rule-based and statistical paradigms in terms of global translation quality and give a qualitative analysis of their respective specific errors. We also introduce initial black-box hybrid models that confirm there is an expected gain in combining the two approaches. Motivated by the qualitative analysis, we focus our study and experiments on lexical phrasal rules. We propose a setup allowing to extract such resources from corpora. Going one step further in the integration of rule-based and statistical approaches, we then examine how to combine the extracted rules with decoding modules that will allow for a corpus-based handling of ambiguity. This then leads to the final delivery of this work: a rule-based system for which we can learn non-deterministic rules from corpora, and whose decoder can be optimised on a tuning set in the same domain.
|
Page generated in 0.1449 seconds