• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 3
  • Tagged with
  • 7
  • 7
  • 7
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Person Identification Based on Karhunen-Loeve Transform

Chen, Chin-Ta 16 July 2004 (has links)
Abstract In this dissertation, person identification systems based on Karhunen-Loeve transform (KLT) are investigated. Both speaker and face recognition are considered in our design. Among many aspects of the system design issues, three important problems: how to improve the correct classification rate, how to reduce the computational cost and how to increase the robustness property of the system, are addressed in this thesis. Improvement of the correct classification rate and reduction of the computational cost for the person identification system can be accomplished by appropriate feature design methodology. KLT and hard-limited KLT (HLKLT) are proposed here to extract class related features. Theoretically, KLT is the optimal transform in minimum mean square error and maximal energy packing sense. The transformed data is totally uncorrelated and it contains most of the classification information in the first few coordinates. Therefore, satisfactory correct classification rate can be achieved by using only the first few KLT derived eigenfeatures. In the above data transformation process, the transformed data is calculated from the inner products of the original samples and the selected eigenvectors. The computation is of course floating point arithmetic. If this linear transformation process can be further reduced to integer arithmetic, the time used for both person feature training and person classification will be greatly reduced. The hard-limiting process (HLKLT) here is used to extract the zero-crossing information in the eigenvectors, which is hypothesized to contain important information that can be used for classification. This kind of feature tremendously simplifies the linear transformation process since the computation is merely integer arithmetic. In this thesis, it is demonstrated that the hard-limited KL transform has much simpler structure than that of the KL transform and it possess approximately the same excellent performances for both speaker identification system and face recognition system. Moreover, a hybrid KLT/GMM speaker identification system is proposed in this thesis to improve classification rate and to save computational time. The increase of the correct rate comes from the fact that two different sets of speech features, one from the KLT features, the other from the MFCC features of the Gaussian mixture speaker model (GMM), are applied in the hybrid system. Furthermore, this hybrid system performs classification in a sequential manner. In the first stage, the relatively faster KLT features are used as the initial candidate selection tool to discard those speakers with larger separability. Then in the second stage, the GMM is utilized as the final speaker recognition means to make the ultimate decision. Therefore, only a small portion of the speakers needed to be discriminated in the time-consuming GMM stage. Our results show that the combination is beneficial to both classification accuracy and computational cost. The above hybrid KLT/GMM design is also applied to a robust speaker identification system. Under both additive white Gaussian noise (AWGN) and car noise environments, it is demonstrated that accuracy improvement and computational saving compared to the conventional GMM model can be achieved. Genetic algorithm (GA) is proposed in this thesis to improve the speaker identification performance of the vector quantizer (VQ) by avoiding typical local minima incurred in the LBG process. The results indicates that this scheme is useful for our application on recognition and practice.
2

Mitigating discontinuities in segmented Karhunen-Loeve Transforms

Stadnicka, Monika, Blanes, Ian, Serra-Sagrista, Joan, Marcellin, Michael W. 09 1900 (has links)
The Karhunen-Loeve Transform (KLT) is a popular transform used in multiple image processing scenarios. Sometimes, the application of the KLT is not carried out as a single transform over an entire image Rather, the image is divided into smaller spatial regions (segments), each of which is transformed by a smaller dimensional KLT. Such a situation may penalize the transform efficiency. An improvement for the segmented KLT, aiming at mitigating discontinuities arising on the edge of adjacent regions, is proposed in this paper. In the case of moderately varying image regions, discontinuities occur as the consequence of disregarded similarity between transform domains, as the order and sign of eigenvectors in the transform matrices are mismatched. In the proposed method, the KLT is adjusted to guarantee the best achievable similarity via the optimal assignment and sign correspondence for eigenvectors. Experimental results indicate that the proposed transform improves the similarity between transform domains, and reduces RMSE on the edge of adjacent regions. In consequence, images processed by the adjusted KLT present better cohesion and continuity between independently transformed regions.
3

A Design of Speech Recognition System for Two-Word Mandarin Phrases

Jheng, He-de 06 September 2007 (has links)
The objective of this thesis is to increase the correct recognition rate of the two-word Mandarin phrases. The reason for inaccuracy is due to the ambiguities of the syllables and the intonations. For the syllable ambiguity, a balanced speech training dataset is designed and the weights of the state observation probabilities on vowels and consonants are adjusted. For the tone ambiguity, both the pitch contour and the spectrum evolution property derived from the Karhunen-Loéve transform are applied. The experimental results indicate that an 85% correct rate can be achieved, that is a 6% increase in the performance for the system without the above improvements.
4

A design of face recognition system

Jiang, Ming-Hong 11 August 2003 (has links)
The design of a face recognition system ( FRS ) can been separated into two major modules ¡V face detection and face recognition. In the face detection part, we combine image pre-processing techniques with maximum-likelihood estimation to detect the nearest frontal face in a single image. Under limited restrictions, our detection method overcomes some of the challenging tasks, such as variability in scale, location, orientation, facial expression, occlusion ( glasses ), and lighting change. In the face recognition part, we use both Karhunen-Loeve transform and linear discrimant analysis ( LDA ) to perform feature extraction. In this feature extraction process, the features are calculated from the inner products of the original samples and the selected eigenvectors. In general, as the size of the face database is increased, the recognition time will be proportionally increased. To solve this problem, hard-limited Karhunen-Loeve transform ( HLKLT ) is applied to reduce the computation time in our FRS.
5

A Hybrid Design of Speech Recognition System for Chinese Names

Hsu, Po-Min 06 September 2004 (has links)
A speech recognition system for Chinese names based on Karhunen Loeve transform (KLT), MFCC, hidden Markov model (HMM) and Viterbi algorithm is proposed in this thesis. KLT is the optimal transform in minimum mean square error and maximal energy packing sense to reduce data. HMM is a stochastic approach which characterizes many of the variability in speech signal by recording the state transitions. For the speaker-dependent case, the correct identification rate can be achieved 93.97% within 3 seconds in the laboratory environment.
6

Compression of Hyperspectral Images

Cheng, Kai-Jen January 2013 (has links)
No description available.
7

Detection of Human Emotion from Noise Speech

Nallamilli, Sai Chandra Sekhar Reddy, Kandi, Nihanth January 2020 (has links)
Detection of a human emotion from human speech is always a challenging task. Factors like intonation, pitch, and loudness of signal vary from different human voice. So, it's important to know the exact pitch, intonation and loudness of a speech for making it a challenging task for detection. Some voices exhibit high background noise which will affect the amplitude or pitch of the signal. So, knowing the detailed properties of a speech to detect emotion is mandatory. Detection of emotion in humans from speech signals is a recent research field. One of the scenarios where this field has been applied is in situations where the human integrity and security are at risk In this project we are proposing a set of features based on the decomposition signals from discrete wavelet transform to characterize different types of negative emotions such as anger, happy, sad, and desperation. The features are measured in three different conditions: (1) the original speech signals, (2) the signals that are contaminated with noise or are affected by the presence of a phone channel, and (3) the signals that are obtained after processing using an algorithm for Speech Enhancement Transform. According to the results, when the speech enhancement is applied, the detection of emotion in speech is increased and compared to results obtained when the speech signal is highly contaminated with noise. Our objective is to use Artificial neural network because the brain is the most efficient and best machine to recognize speech. The brain is built with some neural network. At the same time, Artificial neural networks are clearly advanced with respect to several features, such as their nonlinearity and high classification capability. If we use Artificial neural networks to evolve the machine or computer that it can detect the emotion. Here we are using feedforward neural network which is suitable for classification process and using sigmoid function as activation function. The detection of human emotion from speech is achieved by training the neural network with features extracted from the speech. To achieve this, we need proper features from the speech. So, we must remove background noise in the speech. We can remove background noise by using filters. wavelet transform is the filtering technique used to remove the background noise and enhance the required features in the speech.

Page generated in 0.0787 seconds