Global ETD Search

Return to search

Towards a unified model for speech and language processing

Ce travail de recherche explore les méthodes d’apprentissage profond de la parole et du
langage, y inclus la reconnaissance et la synthèse de la parole, la conversion des graphèmes en
phonèmes et vice-versa, les modèles génératifs, visant de reformuler des tâches spécifiques dans
un problème plus général de trouver une représentation universelle d’information contenue
dans chaque modalité et de transférer un signal d’une modalité à une autre en se servant de
telles représentations universelles et à générer des représentations dans plusieurs modalités.
Il est compris de deux projets de recherche: 1) SoundChoice, un modèle graphème-phonème
tenant compte du contexte au niveau de la phrase qui réalise de bonnes performances et
des améliorations remarquables comparativement à un modèle de base et 2) MAdmixture, une
nouvelle approche pour apprendre des représentations multimodales dans un espace latent
commun. / The present work explores the use of deep learning methods applied to a variety of areas
in speech and language processing including speech recognition, grapheme-to-phoneme conversion,
speech synthesis, generative models for speech and others to build toward a unified
approach that reframes these individual tasks into a more general problem of finding a
universal representation of information encoded in different modalities and being able to
seamlessly transfer a signal from one modality to another by converting it to this universal
representations and to generate samples in multiple modalities. It consists of two main
research projects: 1) SoundChocice, a context-aware sentence level Grapheme-to-Phoneme
model achieving solid performance on the task and a significant improvement on phoneme
disambiguation over baseline models and 2) MAdmixture, a novel approach to learning a variety
of speech representations in a common latent space.

http://hdl.handle.net/1866/32716

Parole

Apprentisage de représentations

Representation learning

Identifer	oai:union.ndltd.org:umontreal.ca/oai:papyrus.bib.umontreal.ca:1866/32716
Date	12 1900
Creators	Ploujnikov, Artem
Contributors	Ravanelli, Mirco
Source Sets	Université de Montréal
Language	English
Detected Language	French
Type	thesis, thèse
Format	application/pdf

Page generated in 0.0028 seconds

Towards a unified model for speech and language processing

Description

Links & Downloads

Tags

Additional Fields