• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 2
  • Tagged with
  • 6
  • 6
  • 6
  • 6
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Driver Modeling Based on Driving Behavior and Its Evaluation in Driver Identification

Miyajima, Chiyomi, Nishiwaki, Yoshihiro, Ozawa, Koji, Wakita, Toshihiro, Itou, Katsunobu, Takeda, Kazuya, Itakura, Fumitada January 2007 (has links)
No description available.
2

Balance-guaranteed optimized tree with reject option for live fish recognition

Huang, Xuan January 2014 (has links)
This thesis investigates the computer vision application of live fish recognition, which is needed in application scenarios where manual annotation is too expensive, when there are too many underwater videos. This system can assist ecological surveillance research, e.g. computing fish population statistics in the open sea. Some pre-processing procedures are employed to improve the recognition accuracy, and then 69 types of features are extracted. These features are a combination of colour, shape and texture properties in different parts of the fish such as tail/head/top/bottom, as well as the whole fish. Then, we present a novel Balance-Guaranteed Optimized Tree with Reject option (BGOTR) for live fish recognition. It improves the normal hierarchical method by arranging more accurate classifications at a higher level and keeping the hierarchical tree balanced. BGOTR is automatically constructed based on inter-class similarities. We apply a Gaussian Mixture Model (GMM) and Bayes rule as a reject option after the hierarchical classification to evaluate the posterior probability of being a certain species to filter less confident decisions. This novel classification-rejection method cleans up decisions and rejects unknown classes. After constructing the tree architecture, a novel trajectory voting method is used to eliminate accumulated errors during hierarchical classification and, therefore, achieves better performance. The proposed BGOTR-based hierarchical classification method is applied to recognize the 15 major species of 24150 manually labelled fish images and to detect new species in an unrestricted natural environment recorded by underwater cameras in south Taiwan sea. It achieves significant improvements compared to the state-of-the-art techniques. Furthermore, the sequence of feature selection and constructing a multi-class SVM is investigated. We propose that an Individual Feature Selection (IFS) procedure can be directly exploited to the binary One-versus-One SVMs before assembling the full multiclass SVM. The IFS method selects different subsets of features for each Oneversus- One SVM inside the multiclass classifier so that each vote is optimized to discriminate the two specific classes. The proposed IFS method is tested on four different datasets comparing the performance and time cost. Experimental results demonstrate significant improvements compared to the normal Multiclass Feature Selection (MFS) method on all datasets.
3

SPEAKER AND GENDER IDENTIFICATION USING BIOACOUSTIC DATA SETS

Jose, Neenu 01 January 2018 (has links)
Acoustic analysis of animal vocalizations has been widely used to identify the presence of individual species, classify vocalizations, identify individuals, and determine gender. In this work automatic identification of speaker and gender of mice from ultrasonic vocalizations and speaker identification of meerkats from their Close calls is investigated. Feature extraction was implemented using Greenwood Function Cepstral Coefficients (GFCC), designed exclusively for extracting features from animal vocalizations. Mice ultrasonic vocalizations were analyzed using Gaussian Mixture Models (GMM) which yielded an accuracy of 78.3% for speaker identification and 93.2% for gender identification. Meerkat speaker identification with Close calls was implemented using Gaussian Mixture Models (GMM) and Hidden Markov Models (HMM), with an accuracy of 90.8% and 94.4% respectively. The results obtained shows these methods indicate the presence of gender and identity information in vocalizations and support the possibility of robust gender identification and individual identification using bioacoustic data sets.
4

Bitrate Reduction Techniques for Low-Complexity Surveillance Video Coding

Gorur, Pushkar January 2016 (has links) (PDF)
High resolution surveillance video cameras are invaluable resources for effective crime prevention and forensic investigations. However, increasing communication bandwidth requirements of high definition surveillance videos are severely limiting the number of cameras that can be deployed. Higher bitrate also increases operating expenses due to higher data communication and storage costs. Hence, it is essential to develop low complexity algorithms which reduce data rate of the compressed video stream without affecting the image fidelity. In this thesis, a computer vision aided H.264 surveillance video encoder and four associated algorithms are proposed to reduce the bitrate. The proposed techniques are (I) Speeded up foreground segmentation, (II) Skip decision, (III) Reference frame selection and (IV) Face Region-of-Interest (ROI) coding. In the first part of the thesis, a modification to the adaptive Gaussian Mixture Model (GMM) based foreground segmentation algorithm is proposed to reduce computational complexity. This is achieved by replacing expensive floating point computations with low cost integer operations. To maintain accuracy, we compute periodic floating point updates for the GMM weight parameter using the value of an integer counter. Experiments show speedups in the range of 1.33 - 1.44 on standard video datasets where a large fraction of pixels are multimodal. In the second part, we propose a skip decision technique that uses a spatial sampler to sample pixels. The sampled pixels are segmented using the speeded up GMM algorithm. The storage pattern of the GMM parameters in memory is also modified to improve cache performance. Skip selection is performed using the segmentation results of the sampled pixels. In the third part, a reference frame selection algorithm is proposed to maximize the number of background Macroblocks (MB’s) (i.e. MB’s that contain background image content) in the Decoded Picture Buffer. This reduces the cost of coding uncovered background regions. Distortion over foreground pixels is measured to quantify the performance of skip decision and reference frame selection techniques. Experimental results show bit rate savings of up to 94.5% over methods proposed in literature on video surveillance data sets. The proposed techniques also provide up to 74.5% reduction in compression complexity without increasing the distortion over the foreground regions in the video sequence. In the final part of the thesis, face and shadow region detection is combined with the skip decision algorithm to perform ROI coding for pedestrian surveillance videos. Since person identification requires high quality face images, MB’s containing face image content are encoded with a low Quantization Parameter setting (i.e. high quality). Other regions of the body in the image are considered as RORI (Regions of reduced interest) and are encoded at low quality. The shadow regions are marked as Skip. Techniques that use only facial features to detect faces (e.g. Viola Jones face detector) are not robust in real world scenarios. Hence, we propose to initially detect pedestrians using deformable part models. The face region is determined using the deformed part locations. Detected pedestrians are tracked using an optical flow based tracker combined with a Kalman filter. The tracker improves the accuracy and also avoids the need to run the object detector on already detected pedestrians. Shadow and skin detector scores are computed over super pixels. Bilattice based logic inference is used to combine multiple likelihood scores and classify the super pixels as ROI, RORI or RONI. The coding mode and QP values of the MB’s are determined using the super pixel labels. The proposed techniques provide a further reduction in bitrate of up to 50.2%.
5

Speaker adaptation of deep neural network acoustic models using Gaussian mixture model framework in automatic speech recognition systems / Utilisation de modèles gaussiens pour l'adaptation au locuteur de réseaux de neurones profonds dans un contexte de modélisation acoustique pour la reconnaissance de la parole

Tomashenko, Natalia 01 December 2017 (has links)
Les différences entre conditions d'apprentissage et conditions de test peuvent considérablement dégrader la qualité des transcriptions produites par un système de reconnaissance automatique de la parole (RAP). L'adaptation est un moyen efficace pour réduire l'inadéquation entre les modèles du système et les données liées à un locuteur ou un canal acoustique particulier. Il existe deux types dominants de modèles acoustiques utilisés en RAP : les modèles de mélanges gaussiens (GMM) et les réseaux de neurones profonds (DNN). L'approche par modèles de Markov cachés (HMM) combinés à des GMM (GMM-HMM) a été l'une des techniques les plus utilisées dans les systèmes de RAP pendant de nombreuses décennies. Plusieurs techniques d'adaptation ont été développées pour ce type de modèles. Les modèles acoustiques combinant HMM et DNN (DNN-HMM) ont récemment permis de grandes avancées et surpassé les modèles GMM-HMM pour diverses tâches de RAP, mais l'adaptation au locuteur reste très difficile pour les modèles DNN-HMM. L'objectif principal de cette thèse est de développer une méthode de transfert efficace des algorithmes d'adaptation des modèles GMM aux modèles DNN. Une nouvelle approche pour l'adaptation au locuteur des modèles acoustiques de type DNN est proposée et étudiée : elle s'appuie sur l'utilisation de fonctions dérivées de GMM comme entrée d'un DNN. La technique proposée fournit un cadre général pour le transfert des algorithmes d'adaptation développés pour les GMM à l'adaptation des DNN. Elle est étudiée pour différents systèmes de RAP à l'état de l'art et s'avère efficace par rapport à d'autres techniques d'adaptation au locuteur, ainsi que complémentaire. / Differences between training and testing conditions may significantly degrade recognition accuracy in automatic speech recognition (ASR) systems. Adaptation is an efficient way to reduce the mismatch between models and data from a particular speaker or channel. There are two dominant types of acoustic models (AMs) used in ASR: Gaussian mixture models (GMMs) and deep neural networks (DNNs). The GMM hidden Markov model (GMM-HMM) approach has been one of the most common technique in ASR systems for many decades. Speaker adaptation is very effective for these AMs and various adaptation techniques have been developed for them. On the other hand, DNN-HMM AMs have recently achieved big advances and outperformed GMM-HMM models for various ASR tasks. However, speaker adaptation is still very challenging for these AMs. Many adaptation algorithms that work well for GMMs systems cannot be easily applied to DNNs because of the different nature of these models. The main purpose of this thesis is to develop a method for efficient transfer of adaptation algorithms from the GMM framework to DNN models. A novel approach for speaker adaptation of DNN AMs is proposed and investigated. The idea of this approach is based on using so-called GMM-derived features as input to a DNN. The proposed technique provides a general framework for transferring adaptation algorithms, developed for GMMs, to DNN adaptation. It is explored for various state-of-the-art ASR systems and is shown to be effective in comparison with other speaker adaptation techniques and complementary to them.
6

Rate-Distortion Performance And Complexity Optimized Structured Vector Quantization

Chatterjee, Saikat 07 1900 (has links)
Although vector quantization (VQ) is an established topic in communication, its practical utility has been limited due to (i) prohibitive complexity for higher quality and bit-rate, (ii) structured VQ methods which are not analyzed for optimum performance, (iii) difficulty of mapping theoretical performance of mean square error (MSE) to perceptual measures. However, an ever increasing demand for various source signal compression, points to VQ as the inevitable choice for high efficiency. This thesis addresses all the three above issues, utilizing the power of parametric stochastic modeling of the signal source, viz., Gaussian mixture model (GMM) and proposes new solutions. Addressing some of the new requirements of source coding in network applications, the thesis also presents solutions for scalable bit-rate, rate-independent complexity and decoder scalability. While structured VQ is a necessity to reduce the complexity, we have developed, analyzed and compared three different schemes of compensation for the loss due to structured VQ. Focusing on the widely used methods of split VQ (SVQ) and KLT based transform domain scalar quantization (TrSQ), we develop expressions for their optimum performance using high rate quantization theory. We propose the use of conditional PDF based SVQ (CSVQ) to compensate for the split loss in SVQ and analytically show that it achieves coding gain over SVQ. Using the analytical expressions of complexity, an algorithm to choose the optimum splits is proposed. We analyze these techniques for their complexity as well as perceptual distortion measure, considering the specific case of quantizing the wide band speech line spectrum frequency (LSF) parameters. Using natural speech data, it is shown that the new conditional PDF based methods provide better perceptual distortion performance than the traditional methods. Exploring the use of GMMs for the source, we take the approach of separately estimating the GMM parameters and then use the high rate quantization theory in a simplified manner to derive closed form expressions for optimum MSE performance. This has led to the development of non-linear prediction for compensating the split loss (in contrast to the linear prediction using a Gaussian model). We show that the GMM approach can improve the recently proposed adaptive VQ scheme of switched SVQ (SSVQ). We derive the optimum performance expressions for SSVQ, in both variable bit rate and fixed bit rate formats, using the simplified approach of GMM in high rate theory. As a third scheme for recovering the split loss in SVQ and reduce the complexity, we propose a two stage SVQ (TsSVQ), which is analyzed for minimum complexity as well as perceptual distortion. Utilizing the low complexity of transform domain SVQ (TrSVQ) as well as the two stage approach in a universal coding framework, it is shown that we can achieve low complexity as well as better performance than SSVQ. Further, the combination of GMM and universal coding led to the development of a highly scalable coder which can provide both bit-rate scalability, decoder scalability and rate-independent low complexity. Also, the perceptual distortion performance is comparable to that of SSVQ. Since GMM is a generic source model, we develop a new method of predicting the performance bound for perceptual distortion using VQ. Applying this method to LSF quantization, the minimum bit rates for quantizing telephone band LSF (TB-LSF) and wideband LSF (WB-LSF) are derived.

Page generated in 0.0761 seconds