131 |
Studies on Subword-based Low-Resource Neural Machine Translation: Segmentation, Encoding, and Decoding / サブワードに基づく低資源ニューラル機械翻訳に関する研究:分割、符号化、及び復号化Haiyue, Song 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第25423号 / 情博第861号 / 新制||情||144(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)特定教授 黒橋 禎夫, 教授 河原 達也, 教授 西野 恒 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
132 |
User-side Realization / 利用者自主実現Sato, Ryoma 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第25419号 / 情博第857号 / 新制||情||143(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 鹿島 久嗣, 教授 山本 章博, 教授 下平 英寿 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
133 |
Translation 4.0 – Evolution, Revolution, Innovation or Disruption?Schmitt, Peter A. 11 December 2024 (has links)
The international translation industry is undergoing fundamental
changes with the potential of disrupting the market and the business basis of
many freelance translators. This paper outlines the present scenario in the field of
non-literary translations, its development, reasons, symptoms and current trends.
Analogous to the established concept of “Industry 4.0”, the philosophy of an
emerging new translation industry can be called “Translation 4.0”. Challenged by
a continuously growing demand for translations, increasingly volatile markets,
fierce global competition and aggressive pricing, the translation industry is
responding with fully digitized data handling, real-time project management,
strictly organized processes, quality control, short response times and comprehensive
added-value services for clients. Key variables in this new work environment
are fragmentation of projects to accelerate turn-around times, outsourcing,
crowdsourcing, teamworking, connectivity, cloud-based translation platforms,
integrated – and often compulsory – translation tools and, last but not least,
machine translation (MT) and post-editing of MT (PEMT). Due to the closing gap
between the quality of human translations (HT) and neural MT (in particular of
DeepL), MT will cover a growing share of the low-end translation market volume,
with the consequence that translators who cannot offer a substantially better
value (i. e. quality/price ratio) than MT will become obsolete. The future will be for
translators who have the competencies defined in the EMT (European Masters’ of
Translation) and who adapt to the changing translation ecosystem.
|
134 |
Neural Speech Translation: From Neural Machine Translation to Direct Speech TranslationDi Gangi, Mattia Antonino 27 April 2020 (has links)
Sequence-to-sequence learning led to significant improvements to machine translation (MT) and automatic speech recognition (ASR) systems. These advancements were first reflected in spoken language translation (SLT) when using a cascade of (at least) ASR and MT with the new "neural" models, then by using sequence-to-sequence learning to directly translate the input audio speech into text in the target language. In this thesis we cover both approaches to the SLT task. First, we show the limits of NMT in terms of robustness to input errors when compared to the previous phrase-based state of the art. We then focus on the NMT component to achieve better translation quality with higher computational efficiency by using a network based on weakly-recurrent units. Our last work involving a cascade explores the effects on the NMT robustness when adding automatic transcripts to the training data. In order to move to the direct speech-to-text approach, we introduce MuST-C, the largest multilingual SLT corpus for training direct translation systems. MuST-C increases significantly the size of publicly available data for this task as well as their language coverage. With such availability of data, we adapted the Transformer architecture to the SLT task for its computational efficiency . Our adaptation, which we call S-Transformer, is meant to better model the audio input, and with it we set a new state of the art for MuST-C. Building on these positive results, we finally use S-Transformer with different data applications: i) one-to-many multilingual translation by training it on MuST-C; ii participation to the IWSLT 19 shared task with data augmentation; and iii) instance-based adaptation for using the training data at test time. The results in this thesis show a steady quality improvement in direct SLT. Our hope is that the presented resources and technological solutions will increase its adoption in the near future, so to make multilingual information access easier in a globalized world.
|
135 |
Strojový překlad a strojové tlumočení / Machine Translation and Machine InterpretingSkadchenko, Yulia January 2019 (has links)
This thesis aims to provide an in-depth overview of machine translation and machine interpreting, describing their history, development, and current state, as well as their place on the market and their potential use. The thesis describes machine translation and machine interpreting on theoretical, practical, and technological level, including the description of their basic principles, evaluation criteria, obstacles and challenges, and scope of their use. The thesis also includes a practical test of one of currently available machine interpreting programs - Skype Translator by Microsoft. The aim of the test is to determine whether the program can facilitate successful communication between two people who don't speak the same language, and to describe the user's experience. Keywords: machine translation, machine interpreting, machine translation limitations, neural translation, Skype Translator, STACL, speech recognition, speech production
|
136 |
Translation as Linear Transduction : Models and Algorithms for Efficient Learning in Statistical Machine TranslationSaers, Markus January 2011 (has links)
Automatic translation has seen tremendous progress in recent years, mainly thanks to statistical methods applied to large parallel corpora. Transductions represent a principled approach to modeling translation, but existing transduction classes are either not expressive enough to capture structural regularities between natural languages or too complex to support efficient statistical induction on a large scale. A common approach is to severely prune search over a relatively unrestricted space of transduction grammars. These restrictions are often applied at different stages in a pipeline, with the obvious drawback of committing to irrevocable decisions that should not have been made. In this thesis we will instead restrict the space of transduction grammars to a space that is less expressive, but can be efficiently searched. First, the class of linear transductions is defined and characterized. They are generated by linear transduction grammars, which represent the natural bilingual case of linear grammars, as well as the natural linear case of inversion transduction grammars (and higher order syntax-directed transduction grammars). They are recognized by zipper finite-state transducers, which are equivalent to finite-state automata with four tapes. By allowing this extra dimensionality, linear transductions can represent alignments that finite-state transductions cannot, and by keeping the mechanism free of auxiliary storage, they become much more efficient than inversion transductions. Secondly, we present an algorithm for parsing with linear transduction grammars that allows pruning. The pruning scheme imposes no restrictions a priori, but guides the search to potentially interesting parts of the search space in an informed and dynamic way. Being able to parse efficiently allows learning of stochastic linear transduction grammars through expectation maximization. All the above work would be for naught if linear transductions were too poor a reflection of the actual transduction between natural languages. We test this empirically by building systems based on the alignments imposed by the learned grammars. The conclusion is that stochastic linear inversion transduction grammars learned from observed data stand up well to the state of the art.
|
137 |
Spelling Normalization of English Student WritingsHONG, Yuchan January 2018 (has links)
Spelling normalization is the task to normalize non-standard words into standard words in texts, resulting in a decrease in out-of-vocabulary (OOV) words in texts for natural language processing (NLP) tasks such as information retrieval, machine translation, and opinion mining, improving the performance of various NLP applications on normalized texts. In this thesis, we explore different methods for spelling normalization of English student writings including traditional Levenshtein edit distance comparison, phonetic similarity comparison, character-based Statistical Machine Translation (SMT) and character-based Neural Machine Translation (NMT) methods. An important improvement of our implementation is that we develop an approach combining Levenshtein edit distance and phonetic similarity methods with added components of frequency count and compound splitting and it is evaluated as a best approach with 0.329% accuracy improvement and 63.63% error reduction on the original unnormalized test set.
|
138 |
Comparaison de systèmes de traduction automatique pour la post édition des alertes météorologique d'Environnement Canadavan Beurden, Louis 08 1900 (has links)
Ce mémoire a pour but de déterminer la stratégie de traduction automatique des alertes
météorologiques produites par Environnement Canada, qui nécessite le moins d’efforts de
postédition de la part des correcteurs du bureau de la traduction. Nous commencerons
par constituer un corpus bilingue d’alertes météorologiques représentatives de la tâche de
traduction. Ensuite, ces données nous serviront à comparer les performances de différentes
approches de traduction automatique, de configurations de mémoires de traduction et de
systèmes hybrides. Nous comparerons les résultats de ces différents modèles avec le système
WATT, développé par le RALI pour Environnement Canada, ainsi qu’avec les systèmes de
l’industrie GoogleTranslate et DeepL. Nous étudierons enfin une approche de postédition
automatique. / The purpose of this paper is to determine the strategy for the automatic translation of
weather warnings produced by Environment Canada, which requires the least post-editing
effort by the proofreaders of the Translation Bureau. We will begin by developing a bilingual
corpus of weather warnings representative of this task. Then, this data will be used to
compare the performance of different approaches of machine translation, translation memory
configurations and hybrid systems. We will compare the results of these models with the
system WATT, the latest system provided by RALI for Environment Canada, as well as
with the industry systems GoogleTranslate and DeepL. Finaly, we will study an automatic
post-edition system.
|
139 |
Strojový překlad pomocí umělých neuronových sítí / Machine Translation Using Artificial Neural NetworksHolcner, Jonáš January 2018 (has links)
The goal of this thesis is to describe and build a system for neural machine translation. System is built with recurrent neural networks - encoder-decoder architecture in particular. The result is a nmt library used to conduct experiments with different model parameters. Results of the experiments are compared with system built with the statistical tool Moses.
|
140 |
Translation of keywords between English and Swedish / Översättning av nyckelord mellan engelska och svenskaAhmady, Tobias, Klein Rosmar, Sander January 2014 (has links)
In this project, we have investigated how to perform rule-based machine translation of sets of keywords between two languages. The goal was to translate an input set, which contains one or more keywords in a source language, to a corresponding set of keywords, with the same number of elements, in the target language. However, some words in the source language may have several senses and may be translated to several, or no, words in the target language. If ambiguous translations occur, the best translation of the keyword should be chosen with respect to the context. In traditional machine translation, a word's context is determined by a phrase or sentences where the word occurs. In this project, the set of keywords represents the context. By investigating traditional approaches to machine translation (MT), we designed and described models for the specific purpose of keyword- translation. We have proposed a solution, based on direct translation for translating keywords between English and Swedish. In the proposed solu- tion, we also introduced a simple graph-based model for solving ambigu- ous translations. / I detta projekt har vi undersökt hur man utför regelbaserad maskinöver- sättning av nyckelord mellan två språk. Målet var att översätta en given mängd med ett eller flera nyckelord på ett källspråk till en motsvarande, lika stor mängd nyckelord på målspråket. Vissa ord i källspråket kan dock ha flera betydelser och kan översättas till flera, eller inga, ord på målsprå- ket. Om tvetydiga översättningar uppstår ska nyckelordets bästa över- sättning väljas med hänsyn till sammanhanget. I traditionell maskinö- versättning bestäms ett ords sammanhang av frasen eller meningen som det befinner sig i. I det här projektet representerar den givna mängden nyckelord sammanhanget. Genom att undersöka traditionella tillvägagångssätt för maskinöversätt- ning har vi designat och beskrivit modeller specifikt för översättning av nyckelord. Vi har presenterat en direkt maskinöversättningslösning av nyckelord mellan engelska och svenska där vi introducerat en enkel graf- baserad modell för tvetydiga översättningar.
|
Page generated in 0.0288 seconds