• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 41
  • 5
  • 4
  • 4
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 80
  • 80
  • 80
  • 18
  • 15
  • 14
  • 14
  • 13
  • 12
  • 11
  • 11
  • 10
  • 10
  • 9
  • 9
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Birthweight-specific neonatal health : With application on data from a tertiaryhospital in Tanzania

Dahlqwist, Elisabeth January 2014 (has links)
The following study analyzes birthweight-specific neonatal health using a combination of a mixture model and logistic regression: the extended Parametric Mixture of Logistic Regression. The data are collected from the Obstetric database at Muhimbili National Hospital in Dar es Salaam, Tanzania and the years 2009 -2013 are used in the analysis. Due to rounding in the birthweight data a novel method to adjust for rounding when estimating a mixture model is applied. The influence of rounding on the estimates is then investigated. A three-component model is selected. The variables used in the analysis of neonatal health are early neonatal mortality, if the mother has HIV, anaemia, is a private patient and if the neonate is born after 36 completed weeks of gestation. It can be concluded that the mortality rates are high especially for low birthweights (2000 or less) in the estimated first and second components. However, due to wide confidence bounds it is hard to draw conclusions from the data.
2

The wild bootstrap resampling in regression imputation algorithm with a Gaussian Mixture Model

Mat Jasin, A., Neagu, Daniel, Csenki, Attila 08 July 2018 (has links)
Yes / Unsupervised learning of finite Gaussian mixture model (FGMM) is used to learn the distribution of population data. This paper proposes the use of the wild bootstrapping to create the variability of the imputed data in single miss-ing data imputation. We compare the performance and accuracy of the proposed method in single imputation and multiple imputation from the R-package Amelia II using RMSE, R-squared, MAE and MAPE. The proposed method shows better performance when compared with the multiple imputation (MI) which is indeed known as the golden method of missing data imputation techniques.
3

Security in Voice Authentication

Yang, Chenguang 27 March 2014 (has links)
We evaluate the security of human voice password databases from an information theoretical point of view. More specifically, we provide a theoretical estimation on the amount of entropy in human voice when processed using the conventional GMM-UBM technologies and the MFCCs as the acoustic features. The theoretical estimation gives rise to a methodology for analyzing the security level in a corpus of human voice. That is, given a database containing speech signals, we provide a method for estimating the relative entropy (Kullback-Leibler divergence) of the database thereby establishing the security level of the speaker verification system. To demonstrate this, we analyze the YOHO database, a corpus of voice samples collected from 138 speakers and show that the amount of entropy extracted is less than 14-bits. We also present a practical attack that succeeds in impersonating the voice of any speaker within the corpus with a 98% success probability with as little as 9 trials. The attack will still succeed with a rate of 62.50% if 4 attempts are permitted. Further, based on the same attack rationale, we mount an attack on the ALIZE speaker verification system. We show through experimentation that the attacker can impersonate any user in the database of 69 people with about 25% success rate with only 5 trials. The success rate can achieve more than 50% by increasing the allowed authentication attempts to 20. Finally, when the practical attack is cast in terms of an entropy metric, we find that the theoretical entropy estimate almost perfectly predicts the success rate of the practical attack, giving further credence to the theoretical model and the associated entropy estimation technique.
4

Robust feature extractions from geometric data using geometric algebra

Minh Tuan, Pham, Yoshikawa, Tomohiro, Furuhashi, Takeshi, Tachibana, Kaita 11 October 2009 (has links)
No description available.
5

Speaker and Emotion Recognition System of Gaussian Mixture Model

Wang, Jhong-yi 01 August 2006 (has links)
In this thesis, the speaker and emotion recognition system is established by PC and digit signal processor (DSP). Most speaker and emotion recognition systems are separately accomplished, but not combined together in the same system. In this thesis, it will show how speaker and emotion recognition systems are combined in the same system. In this system, the voice is picked up by a mike and through DSP to extract the characteristics. Then it passes the sample correctly, it can draw the result of distinguishing. The recognition system is divided into four sub-systems: the pronunciation pre-process, the speaker training model, the speaker and emotion recognition, and the speaker confirmation. The pronunciation pre-process uses the mike to capture the voice, and through the DSP board to convey the voice to the SRAM, then movements dealt with pre-process. The speaker trained model uses the Gaussian mixture model to establish the average, coefficient of variation and weight value of the person who sets up speaker specifically. And we¡¦ll take this information to be the datum of the whole recognition system. The speaker recognition mainly uses the density of probability to recognition the speaker¡¦s identity. The emotion recognition takes advantage of the coefficient of variation to recognize the emotion. The speaker confirms is set up to sure whether the user is the same speaker who hits for the systematic database. The recognition system based on DSP includes two parts¡GHardware setting and implementation of speaker algorithm. We use the fixed-arithmetician DSP chip (chipboard) in the DSP, the algorithm of recognition is Gaussian mixture model. In addition, compared with floating point, the fixed point DSP cost much less; it makes the system nearer to users.
6

GMMEDA : A demonstration of probabilistic modeling in continuous metaheuristic optimization using mixture models

Naveen Kumar Unknown Date (has links)
Optimization problems are common throughout science, engineering and commerce. The desire to continually improve solutions and resolve larger, complex problems has given prominence to this field of research for several decades and has led to the development of a range of optimization algorithms for different class of problems. The Estimation of Distribution Algorithms (EDAs) are a relatively recent class of metaheuristic optimization algorithms based on using probabilistic modeling techniques to control the search process. Within the general EDA framework, a number of different probabilistic models have been previously proposed for both discrete and continuous optimization problems. This thesis focuses on GMMEDAs; continuous EDAs based on the Gaussian Mixture Models (GMM) with parameter estimation performed using the Expectation Maximization (EM) algorithm. To date, this type of model has only received limited attention in the literature. There are few previous experimental studies of the algorithms. Furthermore, a number of implementation details of Continuous Iterated Density Estimation Algorithm based on Gaussian Mixture Model have not been previously documented. This thesis intends to provide a clear description of the GMMEDAs, discuss the implementation decisions and details and provides experimental study to evaluate the performance of the algorithms. The effectiveness of the GMMEDAs with varying model complexity (structure of covariance matrices and number of components) was tested against five benchmark functions (Sphere, Rastrigin, Griewank, Ackley and Rosenbrock) with varying dimensionality (2−, 10− and 30−D). The effect of the selection pressure parameters is also studied in this experiment. The results of the 2D experiments show that a variant of the GMMEDA with moderate complexity (Diagonal GMMEDA) was able to optimize both unimodal and multimodal functions. Further, experimental analysis of the 10 and 30D functions optimized results indicates that the simpler variant of the GMMEDA (Spherical GMMEDA) was most effective of all three variants of the algorithm. However, a greater consistency in the results of these functions is achieved when the most complex variant of the algorithm (Full GMMEDA) is used. The comparison of the results for four artificial test functions - Sphere, Griewank, Ackley and Rosenbrock - showed that the GMMEDA variants optimized most of complex functions better than existing continuous EDAs. This was achieved because of the ability of the GMM components to model the functions effectively. The analysis of the results evaluated by variants of the GMMEDA showed that number of the components and the selection pressure does affect the optimum value of artificial test function. The convergence of the GMMEDA variants to the respective functions best local optimum has been caused more by the complexity in the GMM components. The complexity of GMMEDA because of the number of components increases as the complexity owing to the structure of the covariance matrices increase. However, while finding optimum value of complex functions the increased complexity in GMMEDA due to complex covariance structure overrides the complexity due to increase in number of components. Additionally, the affect on the convergence due to the number of components decreases for most functions when the selection pressure increased. These affects have been noticed in the results in the form of stability of the results related to the functions. Other factors that affect the convergence of the model to the local optima are the initialization of the GMM parameters, the number of the EM components, and the reset condition. The initialization of the GMM components, though not visible graphically in the 10D optimization has shown: for different initialization of the GMM parameters in 2D, the optimum value of the functions is affected. The initialization of the population in the Evolutionary Algorithms has shown to affect the convergence of the algorithm to the functions global optimum. The observation of similar affects due to initialization of GMM parameters on the optimization of the 2D functions indicates that the convergence of the GMM in the 10D could be affected, which in turn, could affect the optimum value of respective functions. The estimated values related to the covariance and mean over the EM iteration in the 2D indicated that some functions needed a greater number of EM iterations while finding their optimum value. This indicates that lesser number of EM iterations could affect the fitting of the components to the selected population in the 10D and the fitting can affect the effective modeling of functions with varying complexity. Finally, the reset condition has shown as resetting the covariance and the best fitness value of individual in each generation in 2D. This condition is certain to affect the convergence of the GMMEDA variants to the respective functions best local optimum. The rate at which the reset condition was invoked could certainly have caused the GMM components covariance values to reset to their initials values and thus the model fitting the percentage of the selected population could have been affected. Considering all the affects caused by the different factors, the results indicates that a smaller number of the components and percentage of the selected population with a simpler S-GMMEDA modeled most functions with a varying complexity.
7

Some non-standard statistical dependence problems

Bere, Alphonce January 2016 (has links)
Philosophiae Doctor - PhD / The major result of this thesis is the development of a framework for the application of pair-mixtures of copulas to model asymmetric dependencies in bivariate data. The main motivation is the inadequacy of mixtures of bivariate Gaussian models which are commonly fitted to data. Mixtures of rotated single parameter Archimedean and Gaussian copulas are fitted to real data sets. The method of maximum likelihood is used for parameter estimation. Goodness-of-fit tests performed on the models giving the highest log-likelihood values show that the models fit the data well. We use mixtures of univariate Gaussian models and mixtures of regression models to investigate the existence of bimodality in the distribution of the widths of autocorrelation functions in a sample of 119 gamma-ray bursts. Contrary to previous findings, our results do not reveal any evidence of bimodality. We extend a study by Genest et al. (2012) of the power and significance levels of tests of copula symmetry, to two copula models which have not been considered previously. Our results confirm that for small sample sizes, these tests fail to maintain their 5% significance level and that the Cramer-von Mises-type statistics are the most powerful.
8

Embedded Feature Selection for Model-based Clustering

January 2020 (has links)
abstract: Model-based clustering is a sub-field of statistical modeling and machine learning. The mixture models use the probability to describe the degree of the data point belonging to the cluster, and the probability is updated iteratively during the clustering. While mixture models have demonstrated the superior performance in handling noisy data in many fields, there exist some challenges for high dimensional dataset. It is noted that among a large number of features, some may not indeed contribute to delineate the cluster profiles. The inclusion of these “noisy” features will confuse the model to identify the real structure of the clusters and cost more computational time. Recognizing the issue, in this dissertation, I propose a new feature selection algorithm for continuous dataset first and then extend to mixed datatype. Finally, I conduct uncertainty quantification for the feature selection results as the third topic. The first topic is an embedded feature selection algorithm termed Expectation-Selection-Maximization (ESM) model that can automatically select features while optimizing the parameters for Gaussian Mixture Model. I introduce a relevancy index (RI) revealing the contribution of the feature in the clustering process to assist feature selection. I demonstrate the efficacy of the ESM by studying two synthetic datasets, four benchmark datasets, and an Alzheimer’s Disease dataset. The second topic focuses on extending the application of ESM algorithm to handle mixed datatypes. The Gaussian mixture model is generalized to Generalized Model of Mixture (GMoM), which can not only handle continuous features, but also binary and nominal features. The last topic is about Uncertainty Quantification (UQ) of the feature selection. A new algorithm termed ESOM is proposed, which takes the variance information into consideration while conducting feature selection. Also, a set of outliers are generated in the feature selection process to infer the uncertainty in the input data. Finally, the selected features and detected outlier instances are evaluated by visualization comparison. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2020
9

BER Modeling for Interference Canceling Adaptive NLMS Equalizer

Roy, Tamoghna 13 January 2015 (has links)
Adaptive LMS equalizers are widely used in digital communication systems for their simplicity in implementation. Conventional adaptive filtering theory suggests the upper bound of the performance of such equalizer is determined by the performance of a Wiener filter of the same structure. However, in the presence of a narrowband interferer the performance of the LMS equalizer is better than that of its Wiener counterpart. This phenomenon, termed a non-Wiener effect, has been observed before and substantial work has been done in explaining the underlying reasons. In this work, we focus on the Bit Error Rate (BER) performance of LMS equalizers. At first a model “the Gaussian Mixture (GM) model“ is presented to estimate the BER performance of a Wiener filter operating in an environment dominated by a narrowband interferer. Simulation results show that the model predicts BER accurately for a wide range of SNR, ISR, and equalizer length. Next, a model similar to GM termed the Gaussian Mixture using Steady State Weights (GMSSW) model is proposed to model the BER behavior of the adaptive NLMS equalizer. Simulation results show unsatisfactory performance of the model. A detailed discussion is presented that points out the limitations of the GMSSW model, thereby providing some insight into the non-Wiener behavior of (N)LMS equalizers. An improved model, the Gaussian with Mean Square Error (GMSE), is then proposed. Simulation results show that the GMSE model is able to model the non-Wiener characteristics of the NLMS equalizer when the normalized step size is between 0 and 0.4. A brief discussion is provided on why the model is inaccurate for larger step sizes. / Master of Science
10

Defining and predicting fast-selling clothing options

Jesperson, Sara January 2019 (has links)
This thesis aims to find a definition of fast-selling clothing options and to find a way to predict them using only a few weeks of sale data as input. The data used for this project contain daily sales and intake quantity for seasonal options, with sale start 2016-2018, provided by the department store chain Åhléns. A definition is found to describe fast-selling clothing options as those having sold a certain percentage of their intake after a fixed number of days. An alternative definition based on cluster affiliation is proven less effective. Two predictive models are tested, the first one being a probabilistic classifier and the second one being a k-nearest neighbor classifier, using the Euclidean distance. The probabilistic model is divided into three steps: transformation, clustering, and classification. The time series are transformed with B-splines to reduce dimensionality, where each time series is represented by a vector with its length and B-spline coefficients. As a tool to improve the quality of the predictions, the B-spline vectors are clustered with a Gaussian mixture model where every cluster is assigned one of the two labels fast-selling or ordinary, thus dividing the clusters into disjoint sets: one containing fast-selling clusters and the other containing ordinary clusters. Lastly, the time series to be predicted are assumed to be Laplace distributed around a B-spline and using the probability distributions provided by the clustering, the posterior probability for each class is used to classify the new observations. In the transformation step, the number of knots for the B-splines are evaluated with cross-validation and the Gaussian mixture models, from the clustering step, are evaluated with the Bayesian information criterion, BIC. The predictive performance of both classifiers is evaluated with accuracy, precision, and recall. The probabilistic model outperforms the k-nearest neighbor model with considerably higher values of accuracy, precision, and recall. The performance of each model is improved by using more data to make the predictions, most prominently with the probabilistic model.

Page generated in 0.0864 seconds