Abstract
In this dissertation, person identification systems based on Karhunen-Loeve transform (KLT) are investigated. Both speaker and face recognition are considered in our design. Among many aspects of the system design issues, three important problems: how to improve the correct classification rate, how to reduce the computational cost and how to increase the robustness property of the system, are addressed in this thesis.
Improvement of the correct classification rate and reduction of the computational cost for the person identification system can be accomplished by appropriate feature design methodology. KLT and hard-limited KLT (HLKLT) are proposed here to extract class related features. Theoretically, KLT is the optimal transform in minimum mean square error and maximal energy packing sense. The transformed data is totally uncorrelated and it contains most of the classification information in the first few coordinates. Therefore, satisfactory correct classification rate can be achieved by using only the first few KLT derived eigenfeatures.
In the above data transformation process, the transformed data is calculated from the inner products of the original samples and the selected eigenvectors. The computation is of course floating point arithmetic. If this linear transformation process can be further reduced to integer arithmetic, the time used for both person feature training and person classification will be greatly reduced. The hard-limiting process (HLKLT) here is used to extract the zero-crossing information in the eigenvectors, which is hypothesized to contain important information that can be used for classification. This kind of feature tremendously simplifies the linear transformation process since the computation is merely integer arithmetic.
In this thesis, it is demonstrated that the hard-limited KL transform has much simpler structure than that of the KL transform and it possess approximately the same excellent performances for both speaker identification system and face recognition system.
Moreover, a hybrid KLT/GMM speaker identification system is proposed in this thesis to improve classification rate and to save computational time. The increase of the correct rate comes from the fact that two different sets of speech features, one from the KLT features, the other from the MFCC features of the Gaussian mixture speaker model (GMM), are applied in the hybrid system.
Furthermore, this hybrid system performs classification in a sequential manner. In the first stage, the relatively faster KLT features are used as the initial candidate selection tool to discard those speakers with larger separability. Then in the second stage, the GMM is utilized as the final speaker recognition means to make the ultimate decision. Therefore, only a small portion of the speakers needed to be discriminated in the time-consuming GMM stage. Our results show that the combination is beneficial to both classification accuracy and computational cost.
The above hybrid KLT/GMM design is also applied to a robust speaker identification system. Under both additive white Gaussian noise (AWGN) and car noise environments, it is demonstrated that accuracy improvement and computational saving compared to the conventional GMM model can be achieved.
Genetic algorithm (GA) is proposed in this thesis to improve the speaker identification performance of the vector quantizer (VQ) by avoiding typical local minima incurred in the LBG process. The results indicates that this scheme is useful for our application on recognition and practice.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0716104-171952 |
Date | 16 July 2004 |
Creators | Chen, Chin-Ta |
Contributors | none, Chih-Chien Thomas Chen, none, none, none |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | Cholon |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0716104-171952 |
Rights | not_available, Copyright information available at source archive |
Page generated in 0.0015 seconds