Global ETD Search

Return to search

Text-independent speaker recognition using discriminative subspace analysis. / CUHK electronic theses & dissertations collection

說話人識別(Speaker Recognition) 主要利用聲音來檢測說話人的身份，是一項重要且極具挑戰性的生物認證研究課題。通常來說，針對語音信號的文本內容差別，說話人識別可以分成文本相關和文本無關兩類。另外，說話人識別有兩類重要應用，第一類是說話人確認，主要是通過給定話者聲音信息對說話人聲稱之身份進行二元判定。另一類是說話人辨識，其主要是從待選說話人集中判斷未知身份信息的話者身份。 / 在先進的說話人識別系統中，每個說話人模型是通過給定的說話人數據進行特徵統計分佈估計由生成模型訓練得到。這類方法由於需要逐帧進行概率或似然度計算而得出最終判決，會耗費大量系統資源並降低實時性性能。採用子空間降維技術，我們不僅避免選取冗餘高維度數據，同時能夠有效删除於識別中無用之數據。為克服上述生成性模型的不足並獲得不同說話人間的區分邊界，本文提出了利用區分性子空間方法訓練模型並採用有效的距離測度作為最終的建模識別新算法。 / 在本篇論文中，我們將先介紹並分析各類產生性說話人識別方法，例如高斯混合模型及聯合因子分析。另外，為了降低特徵空間維度和運算時間，我們也對子空間分析技術做了調研。除此之外，我們提出了一種取名為Fishervoice 基於非參數分佈假定的新穎說話人識別框架。所提出的Fishervoice 框架的主要目的是為了降低噪聲干擾同時加重分類信息，而能夠加強在可區分性的子空間內對聲音特徵建模。採用上述Fishervoice 框架，說話人識別可以簡單地通過測試樣本映射到Fishervoice 子空間並計算其簡單歐氏距離而實現。為了更好得降低維度及提高識別率，我們還對Fishervocie 框架進行多樣化探索。另外，我們也在低維度的全變化空間(Total Variability) 對各類多種子空間分析模型進行調比較。基於XM2VTS 和NIST 公開數據庫的實驗驗證了本文提出的算法的有效性。 / Speaker Recognition (SR), which uses the voice to determine the speaker’s identity, is an important and challenging research topic for biometric authentication. Generally speaking, speaker recognition can be divided into text-dependent and text-independent methods according to the verbal content of the speech signal. There are two major applications of speaker recognition: the first is speaker verification, also referred to speaker authentication, which is used to validate the identity of a speaker according to the voice and it involves a binary decision. The second is speaker identification, which is used to determine an unknown speaker’s identity. / In a state-of-art speaker recognition system, the speaker training model is usually trained by generative methods, which estimate feature distribution of each speaker among the given data. These generative methods need a frame-based metric (e.g. probability, likelihoods) calculation for making final decision, which consumes much computer resources, slowing down the real-time responses. Meanwhile, lots of redundant data frames are blindly selected for training without efficient subspace dimension reduction. In order to overcome disadvantages of generative methods and obtain boundary information between individual speakers, we propose to apply the discriminative subspace technique for model training and employ simple but efficient distance metrics for decision score calculation. / In this thesis, we shall present an overview of both conventional and state-of-the-art generative speaker recognition methods (e.g. Gaussian Mixture Model and Joint Factor Analysis) and analyze their advantages and disadvantages. In addition, we have also made an investigation of the application of subspace analysis techniques to reduce feature dimensions and computation time. After that, a novel speaker recognition framework based on the nonparametric Fisher’s discriminant analysis which we name Fishervoice is proposed. The objective of the proposed Fishervoice algorithm is to model the intrinsic vocal characteristics in a discriminant subspace for de-emphasizing unwanted noise variations and emphasizing classification boundaries information. Using the proposed Fishervoice framework, speaker recognition can be easily realized by mapping a test utterance to the Fishervoice subspace and then calculating the score between the test utterance and its reference. Besides, we explore the proposed Fishervoice framework with several extensions for further dimensionality reduction and performance improvement. Furthermore, we investigate various subspace analysis techniques in a total variability-based low-dimensional space for fast computation. Extensive experiments on two large speaker recognition corpora (XM2VTS and NIST) demonstrate significant improvements of Fishervoice over standard, state-of-the-art approaches for both speaker identification and verification systems. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Jiang, Weiwu. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 127-135). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstract --- p.i / Acknowledgements --- p.vi / Contents --- p.xiv / List of Figures --- p.xvii / List of Tables --- p.xxiii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview of Speaker Recognition Systems --- p.1 / Chapter 1.2 --- Motivation --- p.4 / Chapter 1.3 --- Outline of Thesis --- p.6 / Chapter 2 --- Background Study --- p.7 / Chapter 2.1 --- Generative Gaussian Mixture Model (GMM) --- p.7 / Chapter 2.1.1 --- Basic GMM --- p.7 / Chapter 2.1.2 --- The Gaussian Mixture Model-Universal Background Model (GMM-UBM) System --- p.9 / Chapter 2.2 --- Discriminative Subspace Analysis --- p.12 / Chapter 2.2.1 --- Principal Component Analysis --- p.12 / Chapter 2.2.2 --- Linear Discriminant Analysis --- p.16 / Chapter 2.2.3 --- Heteroscedastic Linear Discriminant Analysis --- p.17 / Chapter 2.2.4 --- Locality Preserving Projections --- p.18 / Chapter 2.3 --- Noise Compensation --- p.20 / Chapter 2.3.1 --- Eigenvoice --- p.20 / Chapter 2.3.2 --- Joint Factor Analysis --- p.24 / Chapter 2.3.3 --- Probabilistic Linear Discriminant Analysis --- p.26 / Chapter 2.3.4 --- Nuisance Attribute Projection --- p.30 / Chapter 2.3.5 --- Within-class Covariance Normalization --- p.32 / Chapter 2.4 --- Support Vector Machine --- p.33 / Chapter 2.5 --- Score Normalization --- p.35 / Chapter 2.6 --- Summary --- p.39 / Chapter 3 --- Corpora for Speaker Recognition Experiments --- p.41 / Chapter 3.1 --- Corpora for Speaker Identification Experiments --- p.41 / Chapter 3.1.1 --- XM2VTS Corpus --- p.41 / Chapter 3.1.2 --- NIST Corpora --- p.42 / Chapter 3.2 --- Corpora for Speaker Verification Experiments --- p.45 / Chapter 3.3 --- Summary --- p.47 / Chapter 4 --- Performance Measures for Speaker Recognition --- p.48 / Chapter 4.1 --- Performance Measures for Identification --- p.48 / Chapter 4.2 --- Performance Measures for Verification --- p.49 / Chapter 4.2.1 --- Equal Error Rate --- p.49 / Chapter 4.2.2 --- Detection Error Tradeoff Curves --- p.49 / Chapter 4.2.3 --- Detection Cost Function --- p.50 / Chapter 4.3 --- Summary --- p.51 / Chapter 5 --- The Discriminant Fishervoice Framework --- p.52 / Chapter 5.1 --- The Proposed Fishervoice Framework --- p.53 / Chapter 5.1.1 --- Feature Representation --- p.53 / Chapter 5.1.2 --- Nonparametric Fisher’s Discriminant Analysis --- p.55 / Chapter 5.2 --- Speaker Identification Experiments --- p.60 / Chapter 5.2.1 --- Experiments on the XM2VTS Corpus --- p.60 / Chapter 5.2.2 --- Experiments on the NIST Corpus --- p.62 / Chapter 5.3 --- Summary --- p.64 / Chapter 6 --- Extension of the Fishervoice Framework --- p.66 / Chapter 6.1 --- Two-level Fishervoice Framework --- p.66 / Chapter 6.1.1 --- Proposed Algorithm --- p.66 / Chapter 6.2 --- Performance Evaluation on the Two-level Fishervoice Framework --- p.70 / Chapter 6.2.1 --- Experimental Setup --- p.70 / Chapter 6.2.2 --- Performance Comparison of Different Types of Input Supervectors --- p.72 / Chapter 6.2.3 --- Performance Comparison of Different Numbers of Slices --- p.73 / Chapter 6.2.4 --- Performance Comparison of Different Dimensions of Fishervoice Projection Matrices --- p.75 / Chapter 6.2.5 --- Performance Comparison with Other Systems --- p.77 / Chapter 6.2.6 --- Fusion with Other Systems --- p.78 / Chapter 6.2.7 --- Extension of the Two-level Subspace Analysis Framework --- p.80 / Chapter 6.3 --- Random Subspace Sampling Framework --- p.81 / Chapter 6.3.1 --- Supervector Extraction --- p.82 / Chapter 6.3.2 --- Training Stage --- p.83 / Chapter 6.3.3 --- Testing Procedures --- p.84 / Chapter 6.3.4 --- Discussion --- p.84 / Chapter 6.4 --- Performance Evaluation of the Random Subspace Sampling Framework --- p.85 / Chapter 6.4.1 --- Experimental Setup --- p.85 / Chapter 6.4.2 --- Random Subspace Sampling Analysis --- p.87 / Chapter 6.4.3 --- Comparison with Other Systems --- p.90 / Chapter 6.4.4 --- Fusion with the Other Systems --- p.90 / Chapter 6.5 --- Summary --- p.92 / Chapter 7 --- Discriminative Modeling in Low-dimensional Space --- p.94 / Chapter 7.1 --- Discriminative Subspace Analysis in Low-dimensional Space --- p.95 / Chapter 7.1.1 --- Experimental Setup --- p.96 / Chapter 7.1.2 --- Performance Evaluation on Individual Subspace Analysis Techniques --- p.98 / Chapter 7.1.3 --- Performance Evaluation on Multi-type of Subspace Analysis Techniques --- p.105 / Chapter 7.2 --- Discriminative Subspace Analysis with Support Vector Machine --- p.115 / Chapter 7.2.1 --- Experimental Setup --- p.116 / Chapter 7.2.2 --- Performance Evaluation on LDA+WCCN+SVM --- p.117 / Chapter 7.2.3 --- Performance Evaluation on Fishervoice+SVM --- p.118 / Chapter 7.3 --- Summary --- p.118 / Chapter 8 --- Conclusions and Future Work --- p.120 / Chapter 8.1 --- Contributions --- p.120 / Chapter 8.2 --- Future Directions --- p.121 / Chapter A --- EM Training GMM --- p.123 / Bibliography --- p.127

Automatic speech recognition

Identifer	oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328016
Date	January 2012
Contributors	Jiang, Weiwu., Chinese University of Hong Kong Graduate School. Division of Systems Engineering and Engineering Management.
Source Sets	The Chinese University of Hong Kong
Language	English, Chinese
Detected Language	English
Type	Text, bibliography
Format	electronic resource, electronic resource, remote, 1 online resource (2, xxiii, 135 leaves) : ill. (chiefly col.)
Rights	Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0259 seconds

Text-independent speaker recognition using discriminative subspace analysis. / CUHK electronic theses & dissertations collection

Description

Links & Downloads

Tags

Additional Fields