Global ETD Search

Return to search

Neural Speech Translation: From Neural Machine Translation to Direct Speech Translation

Sequence-to-sequence learning led to significant improvements to machine translation (MT) and automatic speech recognition (ASR) systems. These advancements were first reflected in spoken language translation (SLT) when using a cascade of (at least) ASR and MT with the new "neural" models, then by using sequence-to-sequence learning to directly translate the input audio speech into text in the target language. In this thesis we cover both approaches to the SLT task. First, we show the limits of NMT in terms of robustness to input errors when compared to the previous phrase-based state of the art. We then focus on the NMT component to achieve better translation quality with higher computational efficiency by using a network based on weakly-recurrent units. Our last work involving a cascade explores the effects on the NMT robustness when adding automatic transcripts to the training data. In order to move to the direct speech-to-text approach, we introduce MuST-C, the largest multilingual SLT corpus for training direct translation systems. MuST-C increases significantly the size of publicly available data for this task as well as their language coverage. With such availability of data, we adapted the Transformer architecture to the SLT task for its computational efficiency . Our adaptation, which we call S-Transformer, is meant to better model the audio input, and with it we set a new state of the art for MuST-C. Building on these positive results, we finally use S-Transformer with different data applications: i) one-to-many multilingual translation by training it on MuST-C; ii participation to the IWSLT 19 shared task with data augmentation; and iii) instance-based adaptation for using the training data at test time. The results in this thesis show a steady quality improvement in direct SLT. Our hope is that the presented resources and technological solutions will increase its adoption in the near future, so to make multilingual information access easier in a globalized world.

Settore INF/01 - Informatica

Identifer	oai:union.ndltd.org:unitn.it/oai:iris.unitn.it:11572/259137
Date	27 April 2020
Creators	Di Gangi, Mattia Antonino
Contributors	Di Gangi, Mattia Antonino
Publisher	Università degli studi di Trento, place:Trento
Source Sets	Università di Trento
Language	English
Detected Language	English
Type	info:eu-repo/semantics/doctoralThesis
Rights	info:eu-repo/semantics/embargoedAccess
Relation	firstpage:1, lastpage:236, numberofpages:236

Page generated in 0.1931 seconds

Neural Speech Translation: From Neural Machine Translation to Direct Speech Translation

Description

Links & Downloads

Tags

Additional Fields