Global ETD Search

31	Person Identification Based on Karhunen-Loeve Transform Chen, Chin-Ta 16 July 2004 (has links) Abstract In this dissertation, person identification systems based on Karhunen-Loeve transform (KLT) are investigated. Both speaker and face recognition are considered in our design. Among many aspects of the system design issues, three important problems: how to improve the correct classification rate, how to reduce the computational cost and how to increase the robustness property of the system, are addressed in this thesis. Improvement of the correct classification rate and reduction of the computational cost for the person identification system can be accomplished by appropriate feature design methodology. KLT and hard-limited KLT (HLKLT) are proposed here to extract class related features. Theoretically, KLT is the optimal transform in minimum mean square error and maximal energy packing sense. The transformed data is totally uncorrelated and it contains most of the classification information in the first few coordinates. Therefore, satisfactory correct classification rate can be achieved by using only the first few KLT derived eigenfeatures. In the above data transformation process, the transformed data is calculated from the inner products of the original samples and the selected eigenvectors. The computation is of course floating point arithmetic. If this linear transformation process can be further reduced to integer arithmetic, the time used for both person feature training and person classification will be greatly reduced. The hard-limiting process (HLKLT) here is used to extract the zero-crossing information in the eigenvectors, which is hypothesized to contain important information that can be used for classification. This kind of feature tremendously simplifies the linear transformation process since the computation is merely integer arithmetic. In this thesis, it is demonstrated that the hard-limited KL transform has much simpler structure than that of the KL transform and it possess approximately the same excellent performances for both speaker identification system and face recognition system. Moreover, a hybrid KLT/GMM speaker identification system is proposed in this thesis to improve classification rate and to save computational time. The increase of the correct rate comes from the fact that two different sets of speech features, one from the KLT features, the other from the MFCC features of the Gaussian mixture speaker model (GMM), are applied in the hybrid system. Furthermore, this hybrid system performs classification in a sequential manner. In the first stage, the relatively faster KLT features are used as the initial candidate selection tool to discard those speakers with larger separability. Then in the second stage, the GMM is utilized as the final speaker recognition means to make the ultimate decision. Therefore, only a small portion of the speakers needed to be discriminated in the time-consuming GMM stage. Our results show that the combination is beneficial to both classification accuracy and computational cost. The above hybrid KLT/GMM design is also applied to a robust speaker identification system. Under both additive white Gaussian noise (AWGN) and car noise environments, it is demonstrated that accuracy improvement and computational saving compared to the conventional GMM model can be achieved. Genetic algorithm (GA) is proposed in this thesis to improve the speaker identification performance of the vector quantizer (VQ) by avoiding typical local minima incurred in the LBG process. The results indicates that this scheme is useful for our application on recognition and practice. Gaussian Mixture Model Karhunen Loeve Transform Vector Quantizer Genetic Algorithm Hard-Limit Karhunen Loeve Transform
32	A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition Wang, Long-Cheng 07 September 2006 (has links) A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixture model. Both single-session and multi-session problems are discussed in the thesis. Experimental results indicate that 90% correct rate can be achieved for a single-session 3000-speaker corpus while only 67% correct rate can be obtained for a two-session 800-speaker dataset. The performance of a multi-session speaker recognition system is greatly reduced due to the variability incurred in the recording environment, speakers¡¦ recording mood and other unknown factors. How to increase the system performance under multi-session conditions becomes a challenging task in the future. And the establishment of such a multi-session large-scale speaker database does indeed play an indispensable role in this task. Speaker recognition Text independent Vector quantization Gaussian mixture model Mel-frequency cepstrum coefficients
33	A design of text-independent medium-size speaker recognition system Zheng, Shun-De 13 September 2002 (has links) This paper presents text-independent speaker identification results for medium-size speaker population sizes up to 400 speakers for TV speech and TIMIT database . A system based on Gaussian mixture speaker models is used for speaker identification, and experiments are conducted on the TV database and TIMIT database. The TV-Database results show medium-size population performance under TV conditions. These are believed to be the first speaker identification experiments on the complete 400 speaker TV databases and the largest text-independent speaker identification task reported to date. Identification accuracies of 94.5% on the TV databases, respectively and 98.5% on the TIMIT database . Gaussian mixture model Vector quantization Speaker recognition Mel-frequency cepstrum coefficients
34	A Feature Design of Multi-Language Identification System Lin, Jun-Ching 17 July 2003 (has links) A multi-language identification system of 10 languages: Mandarin, Japanese, Korean, Tamil, Vietnamese, English, French, German, Spanish and Farsi, is built in this thesis. The system utilizes cepstrum coefficients, delta cepstrum coefficients and linear predictive coding coefficients to extract the language features, and incorporates Gaussian mixture model and N-gram model to make the language classification. The feasibility of the system is demonstrated in this thesis. Gaussian Mixture Model Linear Predictive Coding Cepstrum Language Identification Delta Cepstrum
35	Statistical analysis of high dimensional data Ruan, Lingyan 05 November 2010 (has links) This century is surely the century of data (Donoho, 2000). Data analysis has been an emerging activity over the last few decades. High dimensional data is in particular more and more pervasive with the advance of massive data collection system, such as microarrays, satellite imagery, and financial data. However, analysis of high dimensional data is of challenge with the so called curse of dimensionality (Bellman 1961). This research dissertation presents several methodologies in the application of high dimensional data analysis. The first part discusses a joint analysis of multiple microarray gene expressions. Microarray analysis dates back to Golub et al. (1999). It draws much attention after that. One common goal of microarray analysis is to determine which genes are differentially expressed. These genes behave significantly differently between groups of individuals. However, in microarray analysis, there are thousands of genes but few arrays (samples, individuals) and thus relatively low reproducibility remains. It is natural to consider joint analyses that could combine microarrays from different experiments effectively in order to achieve improved accuracy. In particular, we present a model-based approach for better identification of differentially expressed genes by incorporating data from different studies. The model can accommodate in a seamless fashion a wide range of studies including those performed at different platforms, and/or under different but overlapping biological conditions. Model-based inferences can be done in an empirical Bayes fashion. Because of the information sharing among studies, the joint analysis dramatically improves inferences based on individual analysis. Simulation studies and real data examples are presented to demonstrate the effectiveness of the proposed approach under a variety of complications that often arise in practice. The second part is about covariance matrix estimation in high dimensional data. First, we propose a penalised likelihood estimator for high dimensional t-distribution. The student t-distribution is of increasing interest in mathematical finance, education and many other applications. However, the application in t-distribution is limited by the difficulty in the parameter estimation of the covariance matrix for high dimensional data. We show that by imposing LASSO penalty on the Cholesky factors of the covariance matrix, EM algorithm can efficiently compute the estimator and it performs much better than other popular estimators. Secondly, we propose an estimator for high dimensional Gaussian mixture models. Finite Gaussian mixture models are widely used in statistics thanks to its great flexibility. However, parameter estimation for Gaussian mixture models with high dimensionality can be rather challenging because of the huge number of parameters that need to be estimated. For such purposes, we propose a penalized likelihood estimator to specifically address such difficulties. The LASSO penalty we impose on the inverse covariance matrices encourages sparsity on its entries and therefore helps reducing the dimensionality of the problem. We show that the proposed estimator can be efficiently computed via an Expectation-Maximization algorithm. To illustrate the practical merits of the proposed method, we consider its application in model-based clustering and mixture discriminant analysis. Numerical experiments with both simulated and real data show that the new method is a valuable tool in handling high dimensional data. Finally, we present structured estimators for high dimensional Gaussian mixture models. The graphical representation of every cluster in Gaussian mixture models may have the same or similar structure, which is an important feature in many applications, such as image processing, speech recognition and gene network analysis. Failure to consider the sharing structure would deteriorate the estimation accuracy. To address such issues, we propose two structured estimators, hierarchical Lasso estimator and group Lasso estimator. An EM algorithm can be applied to conveniently solve the estimation problem. We show that when clusters share similar structures, the proposed estimator perform much better than the separate Lasso estimator. Group lasso Hierarchical lasso Lasso Microarrays Covariance matrix Gaussian mixture models Joint analysis Statistics Analysis of covariance
36	Voice recognition system based on intra-modal fusion and accent classification Mangayyagari, Srikanth 01 June 2007 (has links) Speaker or voice recognition is the task of automatically recognizing people from their speech signals. This technique makes it possible to use uttered speech to verify the speaker's identity and control access to secured services. Surveillance, counter-terrorism and homeland security department can collect voice data from telephone conversation without having to access to any other biometric dataset. In this type of scenario it would be beneficial if the confidence level of authentication is high. Other applicable areas include online transactions,database access services, information services, security control for confidential information areas, and remote access to computers. Speaker recognition systems, even though they have been around for four decades, have not been widely considered as standalone systems for biometric security because of their unacceptably low performance, i.e., high false acceptance and true rejection. This thesis focuses on the enhancement of speaker recognition through a combination of intra-modal fusion and accent modeling. Initial enhancement of speaker recognition was achieved through intra-modal hybrid fusion (HF) of likelihood scores generated by Arithmetic Harmonic Sphericity (AHS) and Hidden Markov Model (HMM) techniques. Due to the Contrastive nature of AHS and HMM, we have observed a significant performance improvement of 22% , 6% and 23% true acceptance rate (TAR) at 5% false acceptance rate (FAR), when this fusion technique was evaluated on three different datasets -- YOHO, USF multi-modal biometric and Speech Accent Archive (SAA), respectively. Performance enhancement has been achieved on both the datasets; however performance on YOHO was comparatively higher than that on USF dataset, owing to the fact that USF dataset is a noisy outdoor dataset whereas YOHO is an indoor dataset. In order to further increase the speaker recognition rate at lower FARs, we combined accent information from an accent classification (AC) system with our earlier HF system. Also, in homeland security applications, speaker accent will play a critical role in the evaluation of biometric systems since users will be international in nature. So incorporating accent information into the speaker recognition/verification system is a key component that our study focused on. The proposed system achieved further performance improvements of 17% and 15% TAR at an FAR of 3% when evaluated on SAA and USF multi-modal biometric datasets. The accent incorporation method and the hybrid fusion techniques discussed in this work can also be applied to any other speaker recognition systems. Speaker recognition Accent modeling Speech processing Hidden Markov model Gaussian mixture model American Studies Arts and Humanities
37	HIGH QUALITY HUMAN 3D BODY MODELING, TRACKING AND APPLICATION Zhang, Qing 01 January 2015 (has links) Geometric reconstruction of dynamic objects is a fundamental task of computer vision and graphics, and modeling human body of high fidelity is considered to be a core of this problem. Traditional human shape and motion capture techniques require an array of surrounding cameras or subjects wear reflective markers, resulting in a limitation of working space and portability. In this dissertation, a complete process is designed from geometric modeling detailed 3D human full body and capturing shape dynamics over time using a flexible setup to guiding clothes/person re-targeting with such data-driven models. As the mechanical movement of human body can be considered as an articulate motion, which is easy to guide the skin animation but has difficulties in the reverse process to find parameters from images without manual intervention, we present a novel parametric model, GMM-BlendSCAPE, jointly taking both linear skinning model and the prior art of BlendSCAPE (Blend Shape Completion and Animation for PEople) into consideration and develop a Gaussian Mixture Model (GMM) to infer both body shape and pose from incomplete observations. We show the increased accuracy of joints and skin surface estimation using our model compared to the skeleton based motion tracking. To model the detailed body, we start with capturing high-quality partial 3D scans by using a single-view commercial depth camera. Based on GMM-BlendSCAPE, we can then reconstruct multiple complete static models of large pose difference via our novel non-rigid registration algorithm. With vertex correspondences established, these models can be further converted into a personalized drivable template and used for robust pose tracking in a similar GMM framework. Moreover, we design a general purpose real-time non-rigid deformation algorithm to accelerate this registration. Last but not least, we demonstrate a novel virtual clothes try-on application based on our personalized model utilizing both image and depth cues to synthesize and re-target clothes for single-view videos of different people. 3D Human Body Reconstruction Mesh Deformation BlendSCAPE Gaussian Mixture Model Virtual Try-on Graphics and Human Computer Interfaces
38	Nonlinear orbit uncertainty prediction and rectification for space situational awareness DeMars, Kyle Jordan 07 February 2011 (has links) A new method for predicting the uncertainty in a nonlinear dynamical system is developed and analyzed in the context of uncertainty evolution for resident space objects (RSOs) in the near-geosynchronous orbit regime under the influence of central body gravitational acceleration, third body perturbations, and attitude-dependent solar radiation pressure (SRP) accelerations and torques. The new method, termed the splitting Gaussian mixture unscented Kalman filter (SGMUKF), exploits properties of the differential entropy or Renyi entropy for a linearized dynamical system to determine when a higher-order prediction of uncertainty reaches a level of disagreement with a first-order prediction, and then applies a multivariate Gaussian splitting algorithm to reduce the impact of induced nonlinearity. In order to address the relative accuracy of the new method with respect to the more traditional approaches of the extended Kalman filter (EKF) and unscented Kalman filter (UKF), several concepts regarding the comparison of probability density functions (pdfs) are introduced and utilized in the analysis. The research also describes high-fidelity modeling of the nonlinear dynamical system which drives the motion of an RSO, and includes models for evaluation of the central body gravitational acceleration, the gravitational acceleration due to other celestial bodies, and attitude-dependent SRP accelerations and torques when employing a macro plate model of an RSO. Furthermore, a high-fidelity model of the measurement of the line-of-sight of a spacecraft from a ground station is presented, which applies light-time and stellar aberration corrections, and accounts for observer and target lighting conditions, as well as for the sensor field of view. The developed algorithms are applied to the problem of forward predicting the time evolution of the region of uncertainty for RSO tracking, and uncertainty rectification via the fusion of incoming measurement data with prior knowledge. It is demonstrated that the SGMUKF method is significantly better able to forward predict the region of uncertainty and is subsequently better able to utilize new measurement data. / text Nonlinear estimation Kalman filter Space situational awareness Gaussian mixture models Dynamics modeling Gravitational acceleration
39	Αυτόματη αναγνώριση ομιλητή χρησιμοποιώντας μεθόδους ταυτοποίησης κλειστού συνόλου / Automatic speaker recognition using closed-set recognition methods Κεραμεύς, Ηλίας 03 August 2009 (has links) Ο στόχος ενός συστήματος αυτόματης αναγνώρισης ομιλητή είναι άρρηκτα συνδεδεμένος με την εξαγωγή, το χαρακτηρισμό και την αναγνώριση πληροφοριών σχετικά με την ταυτότητα ενός ομιλητή. Η αναγνώριση ομιλητή αναφέρεται είτε στην ταυτοποίηση είτε στην επιβεβαίωσή του. Συγκεκριμένα, ανάλογα με τη μορφή της απόφασης που επιστρέφει, ένα σύστημα ταυτοποίησης μπορεί να χαρακτηριστεί ως ανοιχτού συνόλου (open-set) ή ως κλειστού συνόλου (closed-set). Αν ένα σύστημα βασιζόμενο σε ένα άγνωστο δείγμα φωνής αποκρίνεται με μια ντετερμινιστικής μορφής απόφαση, εάν το δείγμα ανήκει σε συγκεκριμένο ή σε άγνωστο ομιλητή, το σύστημα χαρακτηρίζεται ως σύστημα ταυτοποίησης ανοιχτού συνόλου. Από την άλλη πλευρά, στην περίπτωση που το σύστημα επιστρέφει τον πιθανότερο ομιλητή, από αυτούς που ήδη είναι καταχωρημένοι στη βάση, από τον οποίο προέρχεται το δείγμα φωνής το σύστημα χαρακτηρίζεται ως σύστημα κλειστού συνόλου. Η ταυτοποίηση συστήματος κλειστού συνόλου, περαιτέρω μπορεί να χαρακτηριστεί ως εξαρτημένη ή ανεξάρτητη από κείμενο, ανάλογα με το εάν το σύστημα γνωρίζει την εκφερόμενη φράση ή εάν αυτό είναι ικανό να αναγνωρίσει τον ομιλητή από οποιαδήποτε φράση που μπορεί αυτός να εκφέρει. Στην εργασία αυτή εξετάζονται και υλοποιούνται αλγόριθμοι αυτόματης αναγνώρισης ομιλητή που βασίζονται σε κλειστού τύπου και ανεξαρτήτως κειμένου συστήματα ταυτοποίησης. Συγκεκριμένα, υλοποιούνται αλγόριθμοι που βασίζονται στην ιδέα της διανυσματικής κβάντισης, τα στοχαστικά μοντέλα και τα νευρωνικά δίκτυα. / The purpose of system of automatic recognition of speaker is unbreakably connected with the export, the characterization and the recognition of information with regard to the identity of speaker. The recognition of speaker is reported or in the identification or in his confirmation. Concretely, depending on the form of decision that returns, a system of identification can be characterized as open-set or as closed-set. If a system based on an unknown sample of voice is replied with deterministic form decision, if the sample belongs in concrete or in unknown speaker, the system is characterized as system of identification of open set. On the other hand, in the case where the system return the more likely speaker than which emanates the sample of voice, the system is characterized as system of closed set. The identification of system of close set, further can be characterized as made dependent or independent from text, depending on whether the system knows the speaking phrase or if this is capable to recognize the speaker from any phrase that can speak. In this work they are examined and they are implemented algorithms of automatic recognition of speaker that are based in closed type and independent text systems of identification. Concretely, are implemented algorithms that are based in the idea of the Vector Quantization, the stochastic models and the neural networks. Αναγνώριση ομιλητή 006.454 Speaker recognition Vector quantization Gaussian mixture model
40	INFORMATION THEORETIC CRITERIA FOR IMAGE QUALITY ASSESSMENT BASED ON NATURAL SCENE STATISTICS Zhang, Di January 2006 (has links) Measurement of visual quality is crucial for various image and video processing applications. <br /><br /> The goal of objective image quality assessment is to introduce a computational quality metric that can predict image or video quality. Many methods have been proposed in the past decades. Traditionally, measurements convert the spatial data into some other feature domains, such as the Fourier domain, and detect the similarity, such as mean square distance or Minkowsky distance, between the test data and the reference or perfect data, however only limited success has been achieved. None of the complicated metrics show any great advantage over other existing metrics. <br /><br /> The common idea shared among many proposed objective quality metrics is that human visual error sensitivities vary in different spatial and temporal frequency and directional channels. In this thesis, image quality assessment is approached by proposing a novel framework to compute the lost information in each channel not the similarities as used in previous methods. Based on natural scene statistics and several image models, an information theoretic framework is designed to compute the perceptual information contained in images and evaluate image quality in the form of entropy. <br /><br /> The thesis is organized as follows. Chapter I give a general introduction about previous work in this research area and a brief description of the human visual system. In Chapter II statistical models for natural scenes are reviewed. Chapter III proposes the core ideas about the computation of the perceptual information contained in the images. In Chapter IV, information theoretic criteria for image quality assessment are defined. Chapter V presents the simulation results in detail. In the last chapter, future direction and improvements of this research are discussed. Systems Design Information Theory Gaussian Mixture Model Entropy Human Vision Model Perceptual Information

Search results