Global ETD Search

Return to search

Exploring variabilities through factor analysis in automatic acoustic language recognition

Language Recognition is the problem of discovering the language of a spoken definitionutterance. This thesis achieves this goal by using short term acoustic information within a GMM-UBM approach.The main problem of many pattern recognition applications is the variability of problemthe observed data. In the context of Language Recognition (LR), this troublesomevariability is due to the speaker characteristics, speech evolution, acquisition and transmission channels.In the context of Speaker Recognition, the variability problem is solved by solutionthe Joint Factor Analysis (JFA) technique. Here, we introduce this paradigm toLanguage Recognition. The success of JFA relies on several assumptions: The globalJFA assumption is that the observed information can be decomposed into a universalglobal part, a language-dependent part and the language-independent variabilitypart. The second, more technical assumption consists in the unwanted variability part to be thought to live in a low-dimensional, globally defined subspace. In this work, we analyze how JFA behaves in the context of a GMM-UBM LR framework. We also introduce and analyze its combination with Support Vector Machines(SVMs).The first JFA publications put all unwanted information (hence the variability) improvemen tinto one and the same component, which is thought to follow a Gaussian distribution.This handles diverse kinds of variability in a unique manner. But in practice,we observe that this hypothesis is not always verified. We have for example thecase, where the data can be divided into two clearly separate subsets, namely datafrom telephony and from broadcast sources. In this case, our detailed investigations show that there is some benefit of handling the two kinds of data with two separatesystems and then to elect the output score of the system, which corresponds to the source of the testing utterance.For selecting the score of one or the other system, we need a channel source related analyses detector. We propose here different novel designs for such automatic detectors.In this framework, we show that JFA's variability factors (of the subspace) can beused with success for detecting the source. This opens the interesting perspectiveof partitioning the data into automatically determined channel source categories,avoiding the need of source-labeled training data, which is not always available.The JFA approach results in up to 72% relative cost reduction, compared to the overall resultsGMM-UBM baseline system. Using source specific systems followed by a scoreselector, we achieve 81% relative improvement.

[INFO:INFO_OH] Computer Science/Other

[INFO:INFO_OH] Informatique/Autre

Pattern recognition

Automatic speech processing

Automatic learning

Language recognition

Acoustic modeling

(joint) factor analysis

Variability compensation

System robustness

Short-term acoustic information

Gaussian Mixture Models (GMM)

Universal Background Model (UBM)

Information decomposition

Support Vector Machines (SVM)

Audio source channel

Channel detector

Identifer	oai:union.ndltd.org:CCSD/oai:tel.archives-ouvertes.fr:tel-00954255
Date	05 September 2011
Creators	Verdet, Florian
Publisher	Université d'Avignon
Source Sets	CCSD theses-EN-ligne, France
Language	French
Detected Language	English
Type	PhD thesis

Page generated in 0.0019 seconds

Exploring variabilities through factor analysis in automatic acoustic language recognition

Description

Links & Downloads

Tags

Additional Fields