Global ETD Search

111	Improved GMM-Based Classification Of Music Instrument Sounds Krishna, A G 05 1900 (has links) This thesis concerns with the recognition of music instruments from isolated notes. Music instrument recognition is a relatively nascent problem fast gaining importance not only because of the academic value the problem provides, but also for the potential it has in being able to realize applications like music content analysis, music transcription etc. Line spectral frequencies are proposed as features for music instrument recognition and shown to perform better than Mel filtered cepstral coefficients and linear prediction cepstral coefficients. Assuming a linear model of sound production, features based on the prediction residual, which represents the excitation signal, is proposed. Four improvements are proposed for classification using Gaussian mixture model (GMM) based classifiers. One of them involves characterizing the regions of overlap between classes in the feature space to improve classification. Applications to music instrument recognition and speaker recognition are shown. An experiment is proposed for discovering the hierarchy in music instrument in a data-driven manner. The hierarchy thus discovered closely corresponds to the hierarchy defined by musicians and experts and therefore shows that the feature space has successfully captured the required features for music instrument characterization. Sound-Pattern Perception Music Instrument Recognition Speaker Recognition Gaussian Mixture Models GMM MIR Speaker Identification Speaker Segmentation Music Instruments Improved Classification Communication Engineering
112	Design of robust blind detector with application to watermarking Anamalu, Ernest Sopuru 14 February 2014 (has links) One of the difficult issues in detection theory is to design a robust detector that takes into account the actual distribution of the original data. The most commonly used statistical detection model for blind detection is Gaussian distribution. Specifically, linear correlation is an optimal detection method in the presence of Gaussian distributed features. This has been found to be sub-optimal detection metric when density deviates completely from Gaussian distributions. Hence, we formulate a detection algorithm that enhances detection probability by exploiting the true characterises of the original data. To understand the underlying distribution function of data, we employed the estimation techniques such as parametric model called approximated density ratio logistic regression model and semiparameric estimations. Semiparametric model has the advantages of yielding density ratios as well as individual densities. Both methods are applicable to signals such as watermark embedded in spatial domain and outperform the conventional linear correlation non-Gaussian distributed. Signal detection Parametric and nonparametric estimations K-means expectation maximization maximum likelihood estimations density ratio estimation Gaussian mixture model Logistic regression model
113	Design of robust blind detector with application to watermarking Anamalu, Ernest Sopuru 14 February 2014 (has links) One of the difficult issues in detection theory is to design a robust detector that takes into account the actual distribution of the original data. The most commonly used statistical detection model for blind detection is Gaussian distribution. Specifically, linear correlation is an optimal detection method in the presence of Gaussian distributed features. This has been found to be sub-optimal detection metric when density deviates completely from Gaussian distributions. Hence, we formulate a detection algorithm that enhances detection probability by exploiting the true characterises of the original data. To understand the underlying distribution function of data, we employed the estimation techniques such as parametric model called approximated density ratio logistic regression model and semiparameric estimations. Semiparametric model has the advantages of yielding density ratios as well as individual densities. Both methods are applicable to signals such as watermark embedded in spatial domain and outperform the conventional linear correlation non-Gaussian distributed. Signal detection Parametric and nonparametric estimations K-means expectation maximization maximum likelihood estimations density ratio estimation Gaussian mixture model Logistic regression model
114	Human Action Recognition In Video Data For Surveillance Applications Gurrapu, Chaitanya January 2004 (has links) Detecting human actions using a camera has many possible applications in the security industry. When a human performs an action, his/her body goes through a signature sequence of poses. To detect these pose changes and hence the activities performed, a pattern recogniser needs to be built into the video system. Due to the temporal nature of the patterns, Hidden Markov Models (HMM), used extensively in speech recognition, were investigated. Initially a gesture recognition system was built using novel features. These features were obtained by approximating the contour of the foreground object with a polygon and extracting the polygon's vertices. A Gaussian Mixture Model (GMM) was fit to the vertices obtained from a few frames and the parameters of the GMM itself were used as features for the HMM. A more practical activity detection system using a more sophisticated foreground segmentation algorithm immune to varying lighting conditions and permanent changes to the foreground was then built. The foreground segmentation algorithm models each of the pixel values using clusters and continually uses incoming pixels to update the cluster parameters. Cast shadows were identified and removed by assuming that shadow regions were less likely to produce strong edges in the image than real objects and that this likelihood further decreases after colour segmentation. Colour segmentation itself was performed by clustering together pixel values in the feature space using a gradient ascent algorithm called mean shift. More robust features in the form of mesh features were also obtained by dividing the bounding box of the binarised object into grid elements and calculating the ratio of foreground to background pixels in each of the grid elements. These features were vector quantized to reduce their dimensionality and the resulting symbols presented as features to the HMM to achieve a recognition rate of 62% for an event involving a person writing on a white board. The recognition rate increased to 80% for the &quotseen" person sequences, i.e. the sequences of the person used to train the models. With a fixed lighting position, the lack of a shadow removal subsystem improved the detection rate. This is because of the consistent profile of the shadows in both the training and testing sequences due to the fixed lighting positions. Even with a lower recognition rate, the shadow removal subsystem was considered an indispensable part of a practical, generic surveillance system. Event Detection Hidden Markov Models Gesture Recognition Shadow Elimination Gaussian Mixture Models GMM Front End Vertex Features High Curvatures Points Contour Information Human Body Spatial Distribution Metrics
115	Genomic sequence processing: gene finding in eukaryotes Akhtar, Mahmood, Electrical Engineering & Telecommunications, Faculty of Engineering, UNSW January 2008 (has links) Of the many existing eukaryotic gene finding software programs, none are able to guarantee accurate identification of genomic protein coding regions and other biological signals central to pathway from DNA to the protein. Eukaryotic gene finding is difficult mainly due to noncontiguous and non-continuous nature of genes. Existing approaches are heavily dependent on the compositional statistics of the sequences they learn from and are not equally suitable for all types of sequences. This thesis firstly develops efficient digital signal processing-based methods for the identification of genomic protein coding regions, and then combines the optimum signal processing-based non-data-driven technique with an existing data-driven statistical method in a novel system demonstrating improved identification of acceptor splice sites. Most existing well-known DNA symbolic-to-numeric representations map the DNA information into three or four numerical sequences, potentially increasing the computational requirement of the sequence analyzer. Proposed mapping schemes, to be used for signal processing-based gene and exon prediction, incorporate DNA structural properties in the representation, in addition to reducing complexity in subsequent processing. A detailed comparison of all DNA representations, in terms of computational complexity and relative accuracy for the gene and exon prediction problem, reveals the newly proposed ?paired numeric? to be the best DNA representation. Existing signal processing-based techniques rely mostly on the period-3 behaviour of exons to obtain one dimensional gene and exon prediction features, and are not well equipped to capture the complementary properties of exonic / intronic regions and deal with the background noise in detection of exons at their nucleotide levels. These issues have been addressed in this thesis, by proposing six one-dimensional and three multi-dimensional signal processing-based gene and exon prediction features. All one-dimensional and multi-dimensional features have been evaluated using standard datasets such as Burset/Guigo1996, HMR195, and the GENSCAN test set. This is the first time that different gene and exon prediction features have been compared using substantial databases and using nucleotide-level metrics. Furthermore, the first investigation of the suitability of different window sizes for period-3 exon detection is performed. Finally, the optimum signal processing-based gene and exon prediction scheme from our evaluations is combined with a data-driven statistical technique for the recognition of acceptor splice sites. The proposed DSP-statistical hybrid is shown to achieve 43% reduction in false positives over WWAM, as used in GENSCAN. Gaussian mixture models deoxyribonucleic acid (DNA) discrete Fourier transforms (DFTs) correlation discrete cosine transforms (DCTs) Eukaryotes Gene mapping -- Computer simulation DNA -- Analysis
116	Foreground Segmentation of Moving Objects Molin, Joel January 2010 (has links) Foreground segmentation is a common first step in tracking and surveillance applications. The purpose of foreground segmentation is to provide later stages of image processing with an indication of where interesting data can be found. This thesis is an investigation of how foreground segmentation can be performed in two contexts: as a pre-step to trajectory tracking and as a pre-step in indoor surveillance applications. Three methods are selected and detailed: a single Gaussian method, a Gaussian mixture model method, and a codebook method. Experiments are then performed on typical input video using the methods. It is concluded that the Gaussian mixture model produces the output which yields the best trajectories when used as input to the trajectory tracker. An extension is proposed to the Gaussian mixture model which reduces shadow, improving the performance of foreground segmentation in the surveillance context. Foreground Segmentation Background Subtraction Gaussian Mixture Models Codebook Tracking Shadow Detection Auto Exposure
117	Statistical Models for Characterizing and Reducing Uncertainty in Seasonal Rainfall Pattern Forecasts to Inform Decision Making AlMutairi, Bandar Saud 01 July 2017 (has links) Uncertainty in rainfall forecasts affects the level of quality and assurance for decisions made to manage water resource-based systems. However, eliminating uncertainty in a complete manner could be difficult, decision-makers thus are challenged to make decisions in the light of uncertainty. This study provides statistical models as an approach to cope with uncertainty, including: a) a statistical method relying on a Gaussian mixture (GM) model to assist in better characterize uncertainty in climate model projections and evaluate their performance in matching observations; b) a stochastic model that incorporates the El Niño–Southern Oscillation (ENSO) cycle to narrow uncertainty in seasonal rainfall forecasts; and c) a statistical approach to determine to what extent drought events forecasted using ENSO information could be utilized in the water resources decision-making process. This study also investigates the relationship between calibration and lead time on the ability to narrow the interannual uncertainty of forecasts and the associated usefulness for decision making. These objectives are demonstrated for the northwest region of Costa Rica as a case study of a developing country in Central America. This region of Costa Rica is under an increasing risk of future water shortages due to climate change, increased demand, and high variability in the bimodal cycle of seasonal rainfall. First, the GM model is shown to be a suitable approach to compare and characterize long-term projections of climate models. The GM representation of seasonal cycles is then employed to construct detailed comparison tests for climate models with respect to observed rainfall data. Three verification metrics demonstrate that an acceptable degree of predictability can be obtained by incorporating ENSO information in reducing error and interannual variability in the forecast of seasonal rainfall. The predictability of multicategory rainfall forecasts in the late portion of the wet season surpasses that in the early portion of the wet season. Later, the value of drought forecast information for coping with uncertainty in making decisions on water management is determined by quantifying the reduction in expected losses relative to a perfect forecast. Both the discrimination ability and the relative economic value of drought-event forecasts are improved by the proposed forecast method, especially after calibration. Positive relative economic value is found only for a range of scenarios of the cost-loss ratio, which indicates that the proposed forecast could be used for specific cases. Otherwise, taking actions (no-actions) is preferred as the cost-loss ratio approaches zero (one). Overall, the approach of incorporating ENSO information into seasonal rainfall forecasts would provide useful value to the decision-making process - in particular at lead times of one year ahead. Central America Climate El Niño–Southern Oscillation (ENSO) Gaussian mixture model Multicategory Seasonal Forecast Relative Value
118	Phoneme duration modelling for speaker verification Van Heerden, Charl Johannes 26 June 2009 (has links) Higher-level features are considered to be a potential remedy against transmission line and cross-channel degradations, currently some of the biggest problems associated with speaker verification. Phoneme durations in particular are not altered by these factors; thus a robust duration model will be a particularly useful addition to traditional cepstral based speaker verification systems. In this dissertation we investigate the feasibility of phoneme durations as a feature for speaker verification. Simple speaker specific triphone duration models are created to statistically represent the phoneme durations. Durations are obtained from an automatic hidden Markov model (HMM) based automatic speech recognition system and are modeled using single mixture Gaussian distributions. These models are applied in a speaker verification system (trained and tested on the YOHO corpus) and found to be a useful feature, even when used in isolation. When fused with acoustic features, verification performance increases significantly. A novel speech rate normalization technique is developed in order to remove some of the inherent intra-speaker variability (due to differing speech rates). Speech rate variability has a negative impact on both speaker verification and automatic speech recognition. Although the duration modelling seems to benefit only slightly from this procedure, the fused system performance improvement is substantial. Other factors known to influence the duration of phonemes are incorporated into the duration model. Utterance final lengthening is known be a consistent effect and thus “position in sentence” is modeled. “Position in word” is also modeled since triphones do not provide enough contextual information. This is found to improve performance since some vowels’ duration are particularly sensitive to its position in the word. Data scarcity becomes a problem when building speaker specific duration models. By using information from available data, unknown durations can be predicted in an attempt to overcome the data scarcity problem. To this end we develop a novel approach to predict unknown phoneme durations from the values of known phoneme durations for a particular speaker, based on the maximum likelihood criterion. This model is based on the observation that phonemes from the same broad phonetic class tend to co-vary strongly, but that there is also significant cross-class correlations. This approach is tested on the TIMIT corpus and found to be more accurate than using back-off techniques. / Dissertation (MEng)--University of Pretoria, 2009. / Electrical, Electronic and Computer Engineering / unrestricted Eigen vectors Speech rate normalization Speaker verification Phoneme durations Duration modeling Prosodic features Hidden markov models Gaussian mixture models Maximum likelihood UCTD
119	Fusion techniques for iris recognition in degraded sequences / Techniques de fusion pour la reconnaissance de personne par l’iris dans des séquences dégradées Othman, Nadia 11 March 2016 (has links) Parmi les diverses modalités biométriques qui permettent l'identification des personnes, l'iris est considéré comme très fiable, avec un taux d'erreur remarquablement faible. Toutefois, ce niveau élevé de performances est obtenu en contrôlant la qualité des images acquises et en imposant de fortes contraintes à la personne (être statique et à proximité de la caméra). Cependant, dans de nombreuses applications de sécurité comme les contrôles d'accès, ces contraintes ne sont plus adaptées. Les images résultantes souffrent alors de diverses dégradations (manque de résolution, artefacts...) qui affectent négativement les taux de reconnaissance. Pour contourner ce problème, il est possible d’exploiter la redondance de l’information découlant de la disponibilité de plusieurs images du même œil dans la séquence enregistrée. Cette thèse se concentre sur la façon de fusionner ces informations, afin d'améliorer les performances. Dans la littérature, diverses méthodes de fusion ont été proposées. Cependant, elles s’accordent sur le fait que la qualité des images utilisées dans la fusion est un facteur crucial pour sa réussite. Plusieurs facteurs de qualité doivent être pris en considération et différentes méthodes ont été proposées pour les quantifier. Ces mesures de qualité sont généralement combinées pour obtenir une valeur unique et globale. Cependant, il n'existe pas de méthode de combinaison universelle et des connaissances a priori doivent être utilisées, ce qui rend le problème non trivial. Pour faire face à ces limites, nous proposons une nouvelle manière de mesurer et d'intégrer des mesures de qualité dans un schéma de fusion d'images, basé sur une approche de super-résolution. Cette stratégie permet de remédier à deux problèmes courants en reconnaissance par l'iris: le manque de résolution et la présence d’artefacts dans les images d'iris. La première partie de la thèse consiste en l’élaboration d’une mesure de qualité pertinente pour quantifier la qualité d’image d’iris. Elle repose sur une mesure statistique locale de la texture de l’iris grâce à un modèle de mélange de Gaussienne. L'intérêt de notre mesure est 1) sa simplicité, 2) son calcul ne nécessite pas d'identifier a priori les types de dégradations, 3) son unicité, évitant ainsi l’estimation de plusieurs facteurs de qualité et un schéma de combinaison associé et 4) sa capacité à prendre en compte la qualité intrinsèque des images mais aussi, et surtout, les défauts liés à une mauvaise segmentation de la zone d’iris. Dans la deuxième partie de la thèse, nous proposons de nouvelles approches de fusion basées sur des mesures de qualité. Tout d’abord, notre métrique est utilisée comme une mesure de qualité globale de deux façons différentes: 1) comme outil de sélection pour détecter les meilleures images de la séquence et 2) comme facteur de pondération au niveau pixel dans le schéma de super-résolution pour donner plus d'importance aux images de bonnes qualités. Puis, profitant du caractère local de notre mesure de qualité, nous proposons un schéma de fusion original basé sur une pondération locale au niveau pixel, permettant ainsi de prendre en compte le fait que les dégradations peuvent varier d’une sous partie à une autre. Ainsi, les zones de bonne qualité contribueront davantage à la reconstruction de l'image fusionnée que les zones présentant des artéfacts. Par conséquent, l'image résultante sera de meilleure qualité et pourra donc permettre d'assurer de meilleures performances en reconnaissance. L'efficacité des approches proposées est démontrée sur plusieurs bases de données couramment utilisées: MBGC, Casia-Iris-Thousand et QFIRE à trois distances différentes. Nous étudions séparément l'amélioration apportée par la super-résolution, la qualité globale, puis locale dans le processus de fusion. Les résultats montrent une amélioration importante apportée par l'utilisation de la qualité globale, amélioration qui est encore augmentée en utilisant la qualité locale / Among the large number of biometric modalities, iris is considered as a very reliable biometrics with a remarkably low error rate. The excellent performance of iris recognition systems are obtained by controlling the quality of the captured images and by imposing certain constraints on users, such as standing at a close fixed distance from the camera. However, in many real-world applications such as control access and airport boarding these constraints are no longer suitable. In such non ideal conditions, the resulting iris images suffer from diverse degradations which have a negative impact on the recognition rate. One way to try to circumvent this bad situation is to use some redundancy arising from the availability of several images of the same eye in the recorded sequence. Therefore, this thesis focuses on how to fuse the information available in the sequence in order to improve the performance. In the literature, diverse schemes of fusion have been proposed. However, they agree on the fact that the quality of the used images in the fusion process is an important factor for its success in increasing the recognition rate. Therefore, researchers concentrated their efforts in the estimation of image quality to weight each image in the fusion process according to its quality. There are various iris quality factors to be considered and diverse methods have been proposed for quantifying these criteria. These quality measures are generally combined to one unique value: a global quality. However, there is no universal combination scheme to do so and some a priori knowledge has to be inserted, which is not a trivial task. To deal with these drawbacks, in this thesis we propose of a novel way of measuring and integrating quality measures in a super-resolution approach, aiming at improving the performance. This strategy can handle two types of issues for iris recognition: the lack of resolution and the presence of various artifacts in the captured iris images. The first part of the doctoral work consists in elaborating a relevant quality metric able to quantify locally the quality of the iris images. Our measure relies on a Gaussian Mixture Model estimation of clean iris texture distribution. The interest of our quality measure is 1) its simplicity, 2) its computation does not require identifying in advance the type of degradations that can occur in the iris image, 3) its uniqueness, avoiding thus the computation of several quality metrics and associated combination rule and 4) its ability to measure the intrinsic quality and to specially detect segmentation errors. In the second part of the thesis, we propose two novel quality-based fusion schemes. Firstly, we suggest using our quality metric as a global measure in the fusion process in two ways: as a selection tool for detecting the best images and as a weighting factor at the pixel-level in the super-resolution scheme. In the last case, the contribution of each image of the sequence in final fused image will only depend on its overall quality. Secondly, taking advantage of the localness of our quality measure, we propose an original fusion scheme based on a local weighting at the pixel-level, allowing us to take into account the fact that degradations can be different in diverse parts of the iris image. This means that regions free from occlusions will contribute more in the image reconstruction than regions with artefacts. Thus, the quality of the fused image will be optimized in order to improve the performance. The effectiveness of the proposed approaches is shown on several databases commonly used: MBGC, Casia-Iris-Thousand and QFIRE at three different distances: 5, 7 and 11 feet. We separately investigate the improvement brought by the super-resolution, the global quality and the local quality in the fusion process. In particular, the results show the important improvement brought by the use of the global quality, improvement that is even increased using the local quality Reconnaissance par l'iris Fusion d'information Qualité globale et locale Super résolution Mélange de gaussiennes Motif de texture Iris recognition Data Fusion Global and local quality Super Resolution Gaussian mixture model Texture pattern
120	Comparing unsupervised clustering algorithms to locate uncommon user behavior in public travel data : A comparison between the K-Means and Gaussian Mixture Model algorithms Andrésen, Anton, Håkansson, Adam January 2020 (has links) Clustering machine learning algorithms have existed for a long time and there are a multitude of variations of them available to implement. Each of them has its advantages and disadvantages, which makes it challenging to select one for a particular problem and application. This study focuses on comparing two algorithms, the K-Means and Gaussian Mixture Model algorithms for outlier detection within public travel data from the travel planning mobile application MobiTime1[1]. The purpose of this study was to compare the two algorithms against each other, to identify differences between their outlier detection results. The comparisons were mainly done by comparing the differences in number of outliers located for each model, with respect to outlier threshold and number of clusters. The study found that the algorithms have large differences regarding their capabilities of detecting outliers. These differences heavily depend on the type of data that is used, but one major difference that was found was that K-Means was more restrictive then Gaussian Mixture Model when it comes to classifying data points as outliers. The result of this study could help people determining which algorithms to implement for their specific application and use case. Machine learning clustering K-Means Gaussian Mixture Model expectation-maximum data analysis public transport silhouette analysis outliers outlier detection data algorithms experiment Computer and Information Sciences Data- och informationsvetenskap

Search results