Global ETD Search

41	SPEAKER AND GENDER IDENTIFICATION USING BIOACOUSTIC DATA SETS Jose, Neenu 01 January 2018 (has links) Acoustic analysis of animal vocalizations has been widely used to identify the presence of individual species, classify vocalizations, identify individuals, and determine gender. In this work automatic identification of speaker and gender of mice from ultrasonic vocalizations and speaker identification of meerkats from their Close calls is investigated. Feature extraction was implemented using Greenwood Function Cepstral Coefficients (GFCC), designed exclusively for extracting features from animal vocalizations. Mice ultrasonic vocalizations were analyzed using Gaussian Mixture Models (GMM) which yielded an accuracy of 78.3% for speaker identification and 93.2% for gender identification. Meerkat speaker identification with Close calls was implemented using Gaussian Mixture Models (GMM) and Hidden Markov Models (HMM), with an accuracy of 90.8% and 94.4% respectively. The results obtained shows these methods indicate the presence of gender and identity information in vocalizations and support the possibility of robust gender identification and individual identification using bioacoustic data sets. Speaker identification gender identification Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Mice Meerkat Electrical and Computer Engineering Signal Processing
42	A Model Fusion Based Framework For Imbalanced Classification Problem with Noisy Dataset January 2014 (has links) abstract: Data imbalance and data noise often coexist in real world datasets. Data imbalance affects the learning classifier by degrading the recognition power of the classifier on the minority class, while data noise affects the learning classifier by providing inaccurate information and thus misleads the classifier. Because of these differences, data imbalance and data noise have been treated separately in the data mining field. Yet, such approach ignores the mutual effects and as a result may lead to new problems. A desirable solution is to tackle these two issues jointly. Noting the complementary nature of generative and discriminative models, this research proposes a unified model fusion based framework to handle the imbalanced classification with noisy dataset. The phase I study focuses on the imbalanced classification problem. A generative classifier, Gaussian Mixture Model (GMM) is studied which can learn the distribution of the imbalance data to improve the discrimination power on imbalanced classes. By fusing this knowledge into cost SVM (cSVM), a CSG method is proposed. Experimental results show the effectiveness of CSG in dealing with imbalanced classification problems. The phase II study expands the research scope to include the noisy dataset into the imbalanced classification problem. A model fusion based framework, K Nearest Gaussian (KNG) is proposed. KNG employs a generative modeling method, GMM, to model the training data as Gaussian mixtures and form adjustable confidence regions which are less sensitive to data imbalance and noise. Motivated by the K-nearest neighbor algorithm, the neighboring Gaussians are used to classify the testing instances. Experimental results show KNG method greatly outperforms traditional classification methods in dealing with imbalanced classification problems with noisy dataset. The phase III study addresses the issues of feature selection and parameter tuning of KNG algorithm. To further improve the performance of KNG algorithm, a Particle Swarm Optimization based method (PSO-KNG) is proposed. PSO-KNG formulates model parameters and data features into the same particle vector and thus can search the best feature and parameter combination jointly. The experimental results show that PSO can greatly improve the performance of KNG with better accuracy and much lower computational cost. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2014 Industrial engineering Information science Gaussian mixture model Imbalanced classification K nearest Gaussian Particle swarm optimization Support vector machine
43	Classificação de fluxos de dados não estacionários com algoritmos incrementais baseados no modelo de misturas gaussianas / Non-stationary data streams classification with incremental algorithms based on Gaussian mixture models Luan Soares Oliveira 18 August 2015 (has links) Aprender conceitos provenientes de fluxos de dados é uma tarefa significamente diferente do aprendizado tradicional em lote. No aprendizado em lote, existe uma premissa implicita que os conceitos a serem aprendidos são estáticos e não evoluem significamente com o tempo. Por outro lado, em fluxos de dados os conceitos a serem aprendidos podem evoluir ao longo do tempo. Esta evolução é chamada de mudança de conceito, e torna a criação de um conjunto fixo de treinamento inaplicável neste cenário. O aprendizado incremental é uma abordagem promissora para trabalhar com fluxos de dados. Contudo, na presença de mudanças de conceito, conceitos desatualizados podem causar erros na classificação de eventos. Apesar de alguns métodos incrementais baseados no modelo de misturas gaussianas terem sido propostos na literatura, nota-se que tais algoritmos não possuem uma política explicita de descarte de conceitos obsoletos. Nesse trabalho um novo algoritmo incremental para fluxos de dados com mudanças de conceito baseado no modelo de misturas gaussianas é proposto. O método proposto é comparado com vários algoritmos amplamente utilizados na literatura, e os resultados mostram que o algoritmo proposto é competitivo com os demais em vários cenários, superando-os em alguns casos. / Learning concepts from data streams differs significantly from traditional batch learning. In batch learning there is an implicit assumption that the concept to be learned is static and does not evolve significantly over time. On the other hand, in data stream learning the concepts to be learned may evolve over time. This evolution is called concept drift, and makes the creation of a fixed training set be no longer applicable. Incremental learning paradigm is a promising approach for learning in a data stream setting. However, in the presence of concept drifts, out dated concepts can cause misclassifications. Several incremental Gaussian mixture models methods have been proposed in the literature, but these algorithms lack an explicit policy to discard outdated concepts. In this work, a new incremental algorithm for data stream with concept drifts based on Gaussian Mixture Models is proposed. The proposed methodis compared to various algorithms widely used in the literature, and the results show that it is competitive with them invarious scenarios, overcoming them in some cases. Aprendizado incremental Fluxo de dados Modelo de misturas gaussianas Mudança de Conceito Concept drift Data stream Gaussian mixture model Incremental learning
44	Mera sličnosti između modela Gausovih smeša zasnovana na transformaciji prostora parametara Krstanović Lidija 25 September 2017 (has links) <p>Predmet istraživanja ovog rada je istraživanje i eksploatacija mogućnosti da parametri Gausovih komponenti korišćenih Gaussian mixture modela  (GMM) aproksimativno leže na niže dimenzionalnoj površi umetnutoj u konusu pozitivno definitnih matrica. U tu svrhu uvodimo novu, mnogo efikasniju meru sličnosti između GMM-ova projektovanjem LPP-tipa parametara komponenti iz više dimenzionalnog parametarskog originalno konfiguracijskog prostora u prostor značajno niže dimenzionalnosti. Prema tome, nalaženje distance između dva GMM-a iz originalnog prostora se redukuje na nalaženje distance između dva skupa niže dimenzionalnih euklidskih vektora, ponderisanih odgovarajućim težinama. Predložena mera je pogodna za primene koje zahtevaju visoko dimenzionalni prostor obeležja i/ili veliki ukupan broj Gausovih komponenti. Razrađena metodologija je primenjena kako na sintetičkim tako i na realnim eksperimentalnim podacima.</p> / <p>This thesis studies the possibility that the parameters of Gaussian components of a<br />particular Gaussian Mixture Model (GMM) lie approximately on a lower-dimensional<br />surface embedded in the cone of positive definite matrices. For that case, we deliver<br />novel, more efficient similarity measure between GMMs, by LPP-like projecting the<br />components of a particular GMM, from the high dimensional original parameter space,<br />to a much lower dimensional space. Thus, finding the distance between two GMMs in<br />the original space is reduced to finding the distance between sets of lower<br />dimensional euclidian vectors, pondered by corresponding weights. The proposed<br />measure is suitable for applications that utilize high dimensional feature spaces and/or<br />large overall number of Gaussian components. We confirm our results on artificial, as<br />well as real experimental data.</p>
45	Finding Anomalous Energy ConsumersUsing Time Series Clustering in the Swedish Energy Market Tonneman, Lukas January 2023 (has links) Improving the energy efficiency of buildings is important for many reasons. There is a large body of data detailing the hourly energy consumption of buildings. This work studies a large data set from the Swedish energy market. This thesis proposes a data analysis methodology for identifying abnormal consumption patterns using two steps of clustering. First, typical weekly energy usage profiles are extracted from each building by clustering week-long segments of the building’s lifetime consumption, and by extracting the medoids of the clusters. Second, all the typical weekly energyusage profiles are clustered using agglomerative hierarchical clustering. Large clusters are assumed to contain normal consumption pattens, and small clusters are assumed to have abnormal patterns. Buildings with a large presence in small clusters are said to be abnormal, and vice versa. The method employs Dynamic Time Warping distance for dissimilarity measure. Using a set of 160 buildings, manually classified by domain experts, this thesis shows that the mean abnormality-score is higher for abnormal buildings compared to normal buildings with p ≈ 0.0036. Computer Sciences Datavetenskap (datalogi)
46	A Bayesian approach to initial model inference in cryo-electron microscopy Joubert, Paul 04 March 2016 (has links) Eine Hauptanwendung der Einzelpartikel-Analyse in der Kryo-Elektronenmikroskopie ist die Charakterisierung der dreidimensionalen Struktur makromolekularer Komplexe. Dazu werden zehntausende Bilder verwendet, die verrauschte zweidimensionale Projektionen des Partikels zeigen. Im ersten Schritt werden ein niedrig aufgelöstetes Anfangsmodell rekonstruiert sowie die unbekannten Bildorientierungen geschätzt. Dies ist ein schwieriges inverses Problem mit vielen Unbekannten, einschließlich einer unbekannten Orientierung für jedes Projektionsbild. Ein gutes Anfangsmodell ist entscheidend für den Erfolg des anschließenden Verfeinerungsschrittes. Meine Dissertation stellt zwei neue Algorithmen zur Rekonstruktion eines Anfangsmodells in der Kryo-Elektronenmikroskopie vor, welche auf einer groben Darstellung der Elektronendichte basieren. Die beiden wesentlichen Beiträge meiner Arbeit sind zum einen das Modell, welches die Elektronendichte darstellt, und zum anderen die neuen Rekonstruktionsalgorithmen. Der erste Hauptbeitrag liegt in der Verwendung Gaußscher Mischverteilungen zur Darstellung von Elektrondichten im Rekonstruktionsschritt. Ich verwende kugelförmige Mischungskomponenten mit unbekannten Positionen, Ausdehnungen und Gewichtungen. Diese Darstellung hat viele Vorteile im Vergleich zu einer gitterbasierten Elektronendichte, die andere Rekonstruktionsalgorithmen üblicherweise verwenden. Zum Beispiel benötigt sie wesentlich weniger Parameter, was zu schnelleren und robusteren Algorithmen führt. Der zweite Hauptbeitrag ist die Entwicklung von Markovketten-Monte-Carlo-Verfahren im Rahmen eines Bayes'schen Ansatzes zur Schätzung der Modellparameter. Der erste Algorithmus kann aus dem Gibbs-Sampling, welches Gaußsche Mischverteilungen an Punktwolken anpasst, abgeleitet werden. Dieser Algorithmus wird hier so erweitert, dass er auch mit Bildern, Projektionen sowie unbekannten Drehungen und Verschiebungen funktioniert. Der zweite Algorithmus wählt einen anderen Zugang. Das Vorwärtsmodell nimmt nun Gaußsche Fehler an. Sampling-Algorithmen wie Hamiltonian Monte Carlo (HMC) erlauben es, die Positionen der Mischungskomponenten und die Bildorientierungen zu schätzen. Meine Dissertation zeigt umfassende numerische Experimente mit simulierten und echten Daten, die die vorgestellten Algorithmen in der Praxis testen und mit anderen Rekonstruktionsverfahren vergleichen. 510 cryo-electron microscopy cryo-EM Gibbs sampling Gaussian mixture model Bayesian initial model inference Markov chain Monte Carlo Hamiltonian Monte Carlo Informatik (PPN619939052)
47	Essays in the economics of subjective well-being Goldsmith, Glenn Fraser January 2011 (has links) This thesis explores three major issues in the burgeoning empirical literature on the determinants of subjective well-being (SWB). While economic theory assumes that it is current consumption that matters to SWB, empirical work has focused almost exclusively on the effect of income. In Part 1, we use household panel data from Russia and Britain to show that neither the standard theoretical account, nor the standard empirical practice may be adequate. Consumption, income, and wealth each contribute separately to SWB, in particular via perceptions of status and anticipation of the future; and omitting consumption from SWB equations significantly understates the importance of money to SWB. Distinguishing between consumption and income is also important to identifying reference effects. In Part 2, we confirm earlier findings that others' income has a positive (informational) effect on SWB in Russia, but show that others' consumption has an offsetting negative (comparison) effect. The net effect depends on how we define individuals' reference groups. We develop a novel econometric model that lets us estimate these reference groups from the data. Contrary to previous results, we conclude that comparison dominates information. Most SWB analyses focus on the average effects of money, relationships, and other outcomes across a given population; yet there may be significant differences in what is important to different people. In Part 3, we employ parametric and semi-parametric random coefficient models to show that there are large differences in the determinants of individual SWB in Britain, and (in contrast to previous work) that such differences cannot simply be attributed to differences in individuals' reporting functions. While individual differences correlate with (some) observable demographic variables, they do not generally correlate with individuals' perceptions about what is important to them. The results of SWB research may therefore be a useful source of information. 330.015195
48	Design of robust blind detector with application to watermarking Anamalu, Ernest Sopuru 14 February 2014 (has links) One of the difficult issues in detection theory is to design a robust detector that takes into account the actual distribution of the original data. The most commonly used statistical detection model for blind detection is Gaussian distribution. Specifically, linear correlation is an optimal detection method in the presence of Gaussian distributed features. This has been found to be sub-optimal detection metric when density deviates completely from Gaussian distributions. Hence, we formulate a detection algorithm that enhances detection probability by exploiting the true characterises of the original data. To understand the underlying distribution function of data, we employed the estimation techniques such as parametric model called approximated density ratio logistic regression model and semiparameric estimations. Semiparametric model has the advantages of yielding density ratios as well as individual densities. Both methods are applicable to signals such as watermark embedded in spatial domain and outperform the conventional linear correlation non-Gaussian distributed. Signal detection Parametric and nonparametric estimations K-means expectation maximization maximum likelihood estimations density ratio estimation Gaussian mixture model Logistic regression model
49	Design of robust blind detector with application to watermarking Anamalu, Ernest Sopuru 14 February 2014 (has links) One of the difficult issues in detection theory is to design a robust detector that takes into account the actual distribution of the original data. The most commonly used statistical detection model for blind detection is Gaussian distribution. Specifically, linear correlation is an optimal detection method in the presence of Gaussian distributed features. This has been found to be sub-optimal detection metric when density deviates completely from Gaussian distributions. Hence, we formulate a detection algorithm that enhances detection probability by exploiting the true characterises of the original data. To understand the underlying distribution function of data, we employed the estimation techniques such as parametric model called approximated density ratio logistic regression model and semiparameric estimations. Semiparametric model has the advantages of yielding density ratios as well as individual densities. Both methods are applicable to signals such as watermark embedded in spatial domain and outperform the conventional linear correlation non-Gaussian distributed. Signal detection Parametric and nonparametric estimations K-means expectation maximization maximum likelihood estimations density ratio estimation Gaussian mixture model Logistic regression model
50	Statistical Models for Characterizing and Reducing Uncertainty in Seasonal Rainfall Pattern Forecasts to Inform Decision Making AlMutairi, Bandar Saud 01 July 2017 (has links) Uncertainty in rainfall forecasts affects the level of quality and assurance for decisions made to manage water resource-based systems. However, eliminating uncertainty in a complete manner could be difficult, decision-makers thus are challenged to make decisions in the light of uncertainty. This study provides statistical models as an approach to cope with uncertainty, including: a) a statistical method relying on a Gaussian mixture (GM) model to assist in better characterize uncertainty in climate model projections and evaluate their performance in matching observations; b) a stochastic model that incorporates the El Niño–Southern Oscillation (ENSO) cycle to narrow uncertainty in seasonal rainfall forecasts; and c) a statistical approach to determine to what extent drought events forecasted using ENSO information could be utilized in the water resources decision-making process. This study also investigates the relationship between calibration and lead time on the ability to narrow the interannual uncertainty of forecasts and the associated usefulness for decision making. These objectives are demonstrated for the northwest region of Costa Rica as a case study of a developing country in Central America. This region of Costa Rica is under an increasing risk of future water shortages due to climate change, increased demand, and high variability in the bimodal cycle of seasonal rainfall. First, the GM model is shown to be a suitable approach to compare and characterize long-term projections of climate models. The GM representation of seasonal cycles is then employed to construct detailed comparison tests for climate models with respect to observed rainfall data. Three verification metrics demonstrate that an acceptable degree of predictability can be obtained by incorporating ENSO information in reducing error and interannual variability in the forecast of seasonal rainfall. The predictability of multicategory rainfall forecasts in the late portion of the wet season surpasses that in the early portion of the wet season. Later, the value of drought forecast information for coping with uncertainty in making decisions on water management is determined by quantifying the reduction in expected losses relative to a perfect forecast. Both the discrimination ability and the relative economic value of drought-event forecasts are improved by the proposed forecast method, especially after calibration. Positive relative economic value is found only for a range of scenarios of the cost-loss ratio, which indicates that the proposed forecast could be used for specific cases. Otherwise, taking actions (no-actions) is preferred as the cost-loss ratio approaches zero (one). Overall, the approach of incorporating ENSO information into seasonal rainfall forecasts would provide useful value to the decision-making process - in particular at lead times of one year ahead. Central America Climate El Niño–Southern Oscillation (ENSO) Gaussian mixture model Multicategory Seasonal Forecast Relative Value

Search results