Spelling suggestions: "subject:"automatic postediting"" "subject:"automatic postdating""
1 |
Automatic post-editing of phrase-based machine translation outputs / Automatic post-editing of phrase-based machine translation outputsRosa, Rudolf January 2013 (has links)
We present Depfix, a system for automatic post-editing of phrase-based English-to-Czech machine trans- lation outputs, based on linguistic knowledge. First, we analyzed the types of errors that a typical machine translation system makes. Then, we created a set of rules and a statistical component that correct errors that are common or serious and can have a potential to be corrected by our approach. We use a range of natural language processing tools to provide us with analyses of the input sentences. Moreover, we reimple- mented the dependency parser and adapted it in several ways to parsing of statistical machine translation outputs. We performed both automatic and manual evaluations which confirmed that our system improves the quality of the translations.
|
2 |
Automatic Post-editing and Quality Estimation in Machine Translation of Product DescriptionsKukk, Kätriin January 2022 (has links)
As a result of drastically improved machine translation quality in recent years, machine translation followed by manual post-editing is currently a trend in the language industry that is slowly but surely replacing manual translation from scratch. In this thesis, the applicability of machine translation to product descriptions of clothing items is studied. The focus lies on determining whether automatic post-editing is a viable approach for improving baseline translations when new training data becomes available and finding out if there is an existing quality estimation system that could reliably assign quality scores to machine translated texts. It is shown that machine translation is a promising approach for the target domain with the majority of systems experimented with being able to generate translations that on average are of almost publishable quality according to the human evaluation carried out, meaning that only light post-editing is needed before the translations can be published. Automatic post-editing is shown to be able to improve the worst baseline translations but struggles with improving the overall translation quality due to its tendency to overcorrect good translations. Nevertheless, one of the trained post-editing systems is still rated higher than the baseline by human evaluators. A new finding is that training a post-editing model on more data using worse translations leads to better performance compared to training on less but higher-quality data. None of the quality estimation systems experimented with shows a strong correlation with human evaluation results which is why it is suggested not to provide the confidence scores of the baseline model to the human evaluators responsible for correcting and approving translations. The main contributions of this work are showing that the target domain of product descriptions is suitable for integrating machine translation into the translation workflow, proposing an approach for that translation workflow that is more automated than the current one as well as the finding that it is better to use more data and poorer translations compared to less data and higher-quality translations when training an automatic post-editing system.
|
3 |
A Study on Manual and Automatic Evaluation Procedures and Production of Automatic Post-editing Rules for Persian Machine TranslationMostofian, Nasrin January 2017 (has links)
Evaluation of machine translation is an important step towards improving MT. One way to evaluate the output of MT is to focus on different types of errors occurring in the translation hypotheses, and to think of possible solutions to fix those errors. An error categorization is a rather beneficent tool that makes it easy to analyze the translation errors and can also be utilized to manually generate post-editing rules to be applied automatically to the product of machine translation. In this work, we define a categorization for the errors occurring in Swedish--Persian machine translation by analyzing the errors that occur in three data-sets from two websites: 1177.se, and Linköping municipality. We define three types of monolingual reference free evaluation (MRF), and use two automatic metrics BLEU and TER, to conduct a bilingual evaluation for Swedish-Persian translation. Later on, based on the experience of working with the errors that occur in the corpora, we manually generate automatic post-editing (APE) rules and apply them to the product of machine translation. Three different sets of results are obtained: (1) The results of analyzing MT errors show that the three most common types of errors that occur in the translation hypotheses are mistranslated words, wrong word order, and extra prepositions. These types of errors are placed in semantic and syntactic categories respectively. (2) The results of comparing the correlation between the automatic and manual evaluation show a low correlation between the two evaluations. (3) Lastly, applying the APE rules to the product of machine translation gives an increase in BLEU score on the largest data-set while remaining almost unchanged on the other two data-sets. The results for TER show a better score on one data-set, while the scores on the two other data-sets remain unchanged.
|
4 |
Automatic Post-Editing for Machine TranslationChatterjee, Rajen 16 October 2019 (has links)
Automatic Post-Editing (APE) aims to correct systematic errors in a machine translated text. This is primarily useful when the machine translation (MT) system is not accessible for improvement, leaving APE as a viable option to improve translation quality as a downstream task - which is the focus of this thesis. This field has received less attention compared to MT due to several reasons, which include: the limited availability of data to perform a sound research, contrasting views reported by different researchers about the effectiveness of APE, and limited attention from the industry to use APE in current production pipelines.
In this thesis, we perform a thorough investigation of APE as a down- stream task in order to: i) understand its potential to improve translation quality; ii) advance the core technology - starting from classical methods to recent deep-learning based solutions; iii) cope with limited and sparse data; iv) better leverage multiple input sources; v) mitigate the task-specific problem of over-correction; vi) enhance neural decoding to leverage external knowledge; and vii) establish an online learning framework to handle data diversity in real-time.
All the above contributions are discussed across several chapters, and most of them are evaluated in the APE shared task organized each year at the Conference on Machine Translation. Our efforts in improving the technology resulted in the best system at the 2017 APE shared task, and our work on online learning received a distinguished paper award at the Italian Conference on Computational Linguistics. Overall, outcomes and findings of our work have boost interest among researchers and attracted industries to examine this technology to solve real-word problems.
|
5 |
Automatická korektura chyb ve výstupu strojového překladu / Automatic Error Correction of Machine Translation OutputVariš, Dušan January 2016 (has links)
We present MLFix, an automatic statistical post-editing system, which is a spiritual successor to the rule- based system, Depfix. The aim of this thesis was to investigate the possible approaches to automatic identification of the most common morphological errors produced by the state-of-the-art machine translation systems and to train sufficient statistical models built on the acquired knowledge. We performed both automatic and manual evaluation of the system and compared the results with Depfix. The system was mainly developed on the English-to- Czech machine translation output, however, the aim was to generalize the post-editing process so it can be applied to other language pairs. We modified the original pipeline to post-edit English-German machine translation output and performed additional evaluation of this modification. Powered by TCPDF (www.tcpdf.org)
|
Page generated in 0.0801 seconds