Return to search

Query-by-example spoken term detection for low-resource languages / CUHK electronic theses & dissertations collection

In this thesis, we consider the problem of query-by-example (QbyE) spoken term detection (STD) for low-resource languages. The problem is to automatically detect and locate the occurrences of a query term in a large audio database. The query term is given in the form of one or more audio examples. This research is motivated by the demand for information retrieval technologies that can handle speech data of low-resource languages. The major technical difficulty is that manual transcriptions and linguistic knowledge are not available for these languages. / The framework of acoustic segment modeling (ASM) is adopted for unsupervised training of a speech tokenizer. Three novel algorithms are developed for segment labeling in the ASM framework. The proposed algorithms are based on the use of different class-by-segment posterior representations and spectral clustering techniques. The posterior representations are shown to be more robust than conventional spectral representations. Spectral clustering has achieved significant success in many applications. Reformulations of spectral clustering algorithms are made to make them computationally feasible for clustering a large number of speech segments. Experiments on a multilingual speech database demonstrate the advantage of the proposed algorithms over existing approaches. / The speech tokenizer obtained with ASM is applied to QbyE STD. The detection of spoken queries is based on a frame-based template matching framework. The ASM tokenizer serves as the front-end to generate posterior features, which are used for temporal template matching by dynamic time warping (DTW). Experiments show that the ASM tokenizer outperforms a GMM tokenizer and language-mismatched phoneme recognizers. Moreover, a two-step approach is proposed for efficient search. / The frame-based template matching framework for QbyE STD is enhanced in three ways. A novel DTW matrix combination approach is proposed for the fusion of multiple systems with different posterior features. Pseudo-relevance feedback is used for query expansion, and score normalization is applied to calibrate the score distributions of different query terms. Experimental results show that the performances of the QbyE STD system are significantly improved by the three approaches. / 關鍵詞檢測是一項在大量語音數據庫中查找某關鍵詞位置的技術。關鍵詞檢測無論在學術研究領域還是實際應用領域都有非常重要的價值。傳統關鍵詞檢測的研究主要針對資源豐富的語言。本文研究針對資源匱乏的語言的關鍵詞檢測。在本文設定條件下,目標語言沒有足夠的資源訓練語音識別系統,並且關鍵詞以聲音樣例的形式給定。 / 本文採用聲學語音段建模(ASM)框架來無監督訓練語音識別器。我們提出三種新的方法用於ASM框架中的語音片段聚類。我們的方法基於一種新的魯棒的語音片段特徵,並且採用了譜聚類技術。實驗證明我們的方法優於另外三種常用的基線方法,能夠取得更好的建模效果。 / 我們將ASM識別器用於基於模板匹配的關鍵詞檢測系統中。在該系統中,ASM識別器被視為前端特徵轉換模塊,用於提取後驗概率特徵。為了提高檢測效率,我們還提出一種兩步檢測方法。實驗效果證明我們的方法能夠取得較高的檢測準確率。 / 為了進一步提高檢測準確率,本文從三個角度優化基於模板匹配的關鍵詞檢測系統。首先我們提出在動態時間規整的距離矩陣上進行系統融合。其次我們提出用偽相關反饋技術來獲取更多的關鍵詞樣例。最後我們對系統打分進行規整從而有利於在設定統一的打分門限。實驗結果證明這三種方法都有效的提高了關鍵詞檢測的系統性能。 / Wang, Haipeng. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2014. / Includes bibliographical references (leaves 110-127). / Abstracts also in Chinese. / Title from PDF title page (viewed on 05, December, 2016). / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only.

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_1290682
Date January 2014
ContributorsWang, Haipeng (author.), Lee, Tan (thesis advisor.), Chinese University of Hong Kong Graduate School. Division of Electronic Engineering. (degree granting institution.)
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography, text
Formatelectronic resource, electronic resource, remote, 1 online resource (xiii, 127 leaves) : illustrations (some color), computer, online resource
RightsUse of this resource is governed by the terms and conditions of the Creative Commons "Attribution-NonCommercial-NoDerivatives 4.0 International" License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0021 seconds