The focus of this thesis is an fairly new approach to phonotactic language recognition, i.e. identifying a language from the sounds in an spoken utterance, known as iVector subspace modeling. The goal of the iVector is to compactly represent the discriminative information in a utterance so that further processing of the utterance is less computationally intensive. This might enable the system to be trained with more data, and thereby reach an higher performance. We present both the theory behind iVectors and experiments to better fit the iVector space to our development data. The final system got comparable result to our baseline PRLM system on the NIST LRE03 30 second evaluation set.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ntnu-19079 |
Date | January 2012 |
Creators | Tokheim, Åsmund Einar Haugland |
Publisher | Norges teknisk-naturvitenskapelige universitet, Institutt for elektronikk og telekommunikasjon, Institutt for elektronikk og telekommunikasjon |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0045 seconds