1 |
Robust Techniques Of Language Modeling For Spoken Language IdentificationBasavaraja, S V January 2007 (has links)
Language Identification (LID) is the task of automatically identifying the language of speech signal uttered by an unknown speaker. An N language LID task is to classify an input speech utterance, spoken by an unknown speaker and of unknown text, as belonging to one of the N languages L1, L2, . . , LN.
We present a new approach to spoken language modeling for language identification using the Lempel-Ziv-Welch (LZW) algorithm, with which we try to overcome the limitations of n-gram stochastic models by automatically identifying the valid set of variable length patterns from the training data. However, since several patterns in a language pattern table are also shared by other language pattern tables, confusability prevailed in the LID task. To overcome this, three pruning techniques are proposed to make these pattern tables more language specific. For LID with limited training data, we present another language modeling technique, which compensates for language specific patterns missing in the language specific LZW pattern table. We develop two new discriminative measures for LID based on the LZW algorithm, viz., (i) Compression Ratio Score (LZW-CRS) and (ii) Weighted Discriminant Score (LZW-WDS). It is shown that for a 6-language LID task of the OGI-TS database, the new model (LZW-WDS) significantly outperforms the conventional bigram approach.
With regard to the front end of the LID system, we develop a modified technique to model for Acoustic Sub-Word Units (ASWU) and explore its effectiveness. The segmentation of speech signal is done using an acoustic criterion (ML-segmentation). However, we believe that consistency and discriminability among speech units is the key issue for the success of ASWU based speech processing. We develop a new procedure for clustering and modeling the segments using sub-word GMMs. Because of the flexibility in choosing the labels for the sub-word units, we do an iterative re-clustering and modeling of the segments. Using a consistency measure of labeling the acoustic segments, the convergence of iterations is demonstrated. We show that the performance of new ASWU based front-end and the new LZW based back-end for LID outperforms the earlier reported PSWR based LID.
|
2 |
Spoken language identification in resource-scarce environmentsPeche, Marius 24 August 2010 (has links)
South Africa has eleven official languages, ten of which are considered “resource-scarce”. For these languages, even basic linguistic resources required for the development of speech technology systems can be difficult or impossible to obtain. In this thesis, the process of developing Spoken Language Identification (S-LID) systems in resource-scarce environments is investigated. A Parallel Phoneme Recognition followed by Language Modeling (PPR-LM) architecture is utilized and three specific scenarios are investigated: (1) incomplete resources, including the lack of audio transcriptions and/or pronunciation dictionaries; (2) inconsistent resources, including the use of speech corpora that are unmatched with regard to domain or channel characteristics; and (3) poor quality resources, such as wrongly labeled or poorly transcribed data. Each situation is analysed, techniques defined to mitigate the effect of limited or poor quality resources, and the effectiveness of these techniques evaluated experimentally. Techniques evaluated include the development of orthographic tokenizers, bootstrapping of transcriptions, filtering of low quality audio, diarization and channel normalization techniques, and the human verification of miss-classified utterances. The knowledge gained from this research is used to develop the first S-LID system able to distinguish between all South African languages. The system performs well, able to differentiate among the eleven languages with an accuracy of above 67%, and among the six primary South African language families with an accuracy of higher than 80%, on segments of speech of between 2s and 10s in length. AFRIKAANS : Suid-Afrika het elf amptelike tale waarvan tien as hulpbron-skaars beskou word. Vir die tien tale kan selfs die basiese hulpbronne wat benodig word om spraak tegnologie stelsels te ontwikkel moeilik wees om te bekom. Die proses om ‘n Gesproke Taal Identifisering stelsel vir hulpbron-skaars omgewings te ontwikkel, word in hierdie tesis ondersoek. ‘n Parallelle Foneem Herkenning gevolg deur Taal Modellering argitektuur word ingespan om drie spesifieke moontlikhede word ondersoek: (1) Onvolledige Hulpbronne, byvoorbeeld vermiste transkripsies en uitspraak woordeboeke; (2) Teenstrydige Hulpbronne, byvoorbeeld die gebruik van spraak data-versamelings wat teenstrydig is in terme van kanaal kenmerke; en (3) Hulpbronne van swak kwaliteit, byvoorbeeld foutief geklasifiseerde data en klank opnames wat swak getranskribeer is. Elke situasie word geanaliseer, tegnieke om die negatiewe effekte van min of swak hulpbronne te verminder word ontwikkel, en die bruikbaarheid van hierdie tegnieke word deur middel van eksperimente bepaal. Tegnieke wat ontwikkel word sluit die ontwikkeling van ortografiese ontleders, die outomatiese ontwikkeling van nuwe transkripsies, die filtrering van swak kwaliteit klank-data, klank-verdeling en kanaal normalisering tegnieke, en menslike verifikasie van verkeerd geklassifiseerde uitsprake in. Die kennis wat deur hierdie navorsing bekom word, word gebruik om die eerste Gesproke Taal Identifisering stelsel wat tussen al die tale van Suid-Afrika kan onderskei, te ontwikkel. Hierdie stelsel vaar relatief goed, en kan die elf tale met ‘n akkuraatheid van meer as 67% identifiseer. Indien daar op die ses taal families gefokus word, verbeter die persentasie tot meer as 80% vir segmente wat tussen 2 en 10 sekondes lank. Copyright / Dissertation (MEng)--University of Pretoria, 2010. / Electrical, Electronic and Computer Engineering / unrestricted
|
Page generated in 0.6213 seconds