• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 24
  • 5
  • 4
  • 1
  • 1
  • Tagged with
  • 40
  • 40
  • 24
  • 16
  • 11
  • 9
  • 8
  • 8
  • 7
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Hypothesis Testing in Finite Mixture Models

Li, Pengfei 11 December 2007 (has links)
Mixture models provide a natural framework for unobserved heterogeneity in a population. They are widely applied in astronomy, biology, engineering, finance, genetics, medicine, social sciences, and other areas. An important first step for using mixture models is the test of homogeneity. Before one tries to fit a mixture model, it might be of value to know whether the data arise from a homogeneous or heterogeneous population. If the data are homogeneous, it is not even necessary to go into mixture modeling. The rejection of the homogeneous model may also have scientific implications. For example, in classical statistical genetics, it is often suspected that only a subgroup of patients have a disease gene which is linked to the marker. Detecting the existence of this subgroup amounts to the rejection of a homogeneous null model in favour of a two-component mixture model. This problem has attracted intensive research recently. This thesis makes substantial contributions in this area of research. Due to partial loss of identifiability, classic inference methods such as the likelihood ratio test (LRT) lose their usual elegant statistical properties. The limiting distribution of the LRT often involves complex Gaussian processes, which can be hard to implement in data analysis. The modified likelihood ratio test (MLRT) is found to be a nice alternative of the LRT. It restores the identifiability by introducing a penalty to the log-likelihood function. Under some mild conditions, the limiting distribution of the MLRT is 1/2\chi^2_0+1/2\chi^2_1, where \chi^2_{0} is a point mass at 0. This limiting distribution is convenient to use in real data analysis. The choice of the penalty functions in the MLRT is very flexible. A good choice of the penalty enhances the power of the MLRT. In this thesis, we first introduce a new class of penalty functions, with which the MLRT enjoys a significantly improved power for testing homogeneity. The main contribution of this thesis is to propose a new class of methods for testing homogeneity. Most existing methods in the literature for testing of homogeneity, explicitly or implicitly, are derived under the condition of finite Fisher information and a compactness assumption on the space of the mixing parameters. The finite Fisher information condition can prevent their usage to many important mixture models, such as the mixture of geometric distributions, the mixture of exponential distributions and more generally mixture models in scale distribution families. The compactness assumption often forces applicants to set artificial bounds for the parameters of interest and makes the resulting limiting distribution dependent on these bounds. Consequently, developing a method without such restrictions is a dream of many researchers. As it will be seen, the proposed EM-test in this thesis is free of these shortcomings. The EM-test combines the merits of the classic LRT and score test. The properties of the EM-test are particularly easy to investigate under single parameter mixture models. It has a simple limiting distribution 0.5\chi^2_0+0.5\chi^2_1, the same as the MLRT. This result is applicable to mixture models without requiring the restrictive regularity conditions described earlier. The normal mixture model is a very popular model in applications. However it does not satisfy the strong identifiability condition, which imposes substantial technical difficulties in the study of the asymptotic properties. Most existing methods do not directly apply to the normal mixture models, so the asymptotic properties have to be developed separately. We investigate the use of the EM-test to normal mixture models and its limiting distributions are derived. For the homogeneity test in the presence of the structural parameter, the limiting distribution is a simple function of the 0.5\chi^2_0+0.5\chi^2_1 and \chi^2_1 distributions. The test with this limiting distribution is still very convenient to implement. For normal mixtures in both mean and variance parameters, the limiting distribution of the EM-test is found be to \chi^2_2. Mixture models are also widely used in the analysis of the directional data. The von Mises distribution is often regarded as the circular normal model. Interestingly, it satisfies the strong identifiability condition and the parameter space of the mean direction is compact. However the theoretical results in the single parameter mixture models can not directly apply to the von Mises mixture models. Because of this, we also study the application of the EM-test to von Mises mixture models in the presence of the structural parameter. The limiting distribution of the EM-test is also found to be 0.5\chi^2_0+0.5\chi^2_1. Extensive simulation results are obtained to examine the precision of the approximation of the limiting distributions to the finite sample distributions of the EM-test. The type I errors with the critical values determined by the limiting distributions are found to be close to nominal values. In particular, we also propose several precision enhancing methods, which are found to work well. Real data examples are used to illustrate the use of the EM-test.
2

Hypothesis Testing in Finite Mixture Models

Li, Pengfei 11 December 2007 (has links)
Mixture models provide a natural framework for unobserved heterogeneity in a population. They are widely applied in astronomy, biology, engineering, finance, genetics, medicine, social sciences, and other areas. An important first step for using mixture models is the test of homogeneity. Before one tries to fit a mixture model, it might be of value to know whether the data arise from a homogeneous or heterogeneous population. If the data are homogeneous, it is not even necessary to go into mixture modeling. The rejection of the homogeneous model may also have scientific implications. For example, in classical statistical genetics, it is often suspected that only a subgroup of patients have a disease gene which is linked to the marker. Detecting the existence of this subgroup amounts to the rejection of a homogeneous null model in favour of a two-component mixture model. This problem has attracted intensive research recently. This thesis makes substantial contributions in this area of research. Due to partial loss of identifiability, classic inference methods such as the likelihood ratio test (LRT) lose their usual elegant statistical properties. The limiting distribution of the LRT often involves complex Gaussian processes, which can be hard to implement in data analysis. The modified likelihood ratio test (MLRT) is found to be a nice alternative of the LRT. It restores the identifiability by introducing a penalty to the log-likelihood function. Under some mild conditions, the limiting distribution of the MLRT is 1/2\chi^2_0+1/2\chi^2_1, where \chi^2_{0} is a point mass at 0. This limiting distribution is convenient to use in real data analysis. The choice of the penalty functions in the MLRT is very flexible. A good choice of the penalty enhances the power of the MLRT. In this thesis, we first introduce a new class of penalty functions, with which the MLRT enjoys a significantly improved power for testing homogeneity. The main contribution of this thesis is to propose a new class of methods for testing homogeneity. Most existing methods in the literature for testing of homogeneity, explicitly or implicitly, are derived under the condition of finite Fisher information and a compactness assumption on the space of the mixing parameters. The finite Fisher information condition can prevent their usage to many important mixture models, such as the mixture of geometric distributions, the mixture of exponential distributions and more generally mixture models in scale distribution families. The compactness assumption often forces applicants to set artificial bounds for the parameters of interest and makes the resulting limiting distribution dependent on these bounds. Consequently, developing a method without such restrictions is a dream of many researchers. As it will be seen, the proposed EM-test in this thesis is free of these shortcomings. The EM-test combines the merits of the classic LRT and score test. The properties of the EM-test are particularly easy to investigate under single parameter mixture models. It has a simple limiting distribution 0.5\chi^2_0+0.5\chi^2_1, the same as the MLRT. This result is applicable to mixture models without requiring the restrictive regularity conditions described earlier. The normal mixture model is a very popular model in applications. However it does not satisfy the strong identifiability condition, which imposes substantial technical difficulties in the study of the asymptotic properties. Most existing methods do not directly apply to the normal mixture models, so the asymptotic properties have to be developed separately. We investigate the use of the EM-test to normal mixture models and its limiting distributions are derived. For the homogeneity test in the presence of the structural parameter, the limiting distribution is a simple function of the 0.5\chi^2_0+0.5\chi^2_1 and \chi^2_1 distributions. The test with this limiting distribution is still very convenient to implement. For normal mixtures in both mean and variance parameters, the limiting distribution of the EM-test is found be to \chi^2_2. Mixture models are also widely used in the analysis of the directional data. The von Mises distribution is often regarded as the circular normal model. Interestingly, it satisfies the strong identifiability condition and the parameter space of the mean direction is compact. However the theoretical results in the single parameter mixture models can not directly apply to the von Mises mixture models. Because of this, we also study the application of the EM-test to von Mises mixture models in the presence of the structural parameter. The limiting distribution of the EM-test is also found to be 0.5\chi^2_0+0.5\chi^2_1. Extensive simulation results are obtained to examine the precision of the approximation of the limiting distributions to the finite sample distributions of the EM-test. The type I errors with the critical values determined by the limiting distributions are found to be close to nominal values. In particular, we also propose several precision enhancing methods, which are found to work well. Real data examples are used to illustrate the use of the EM-test.
3

The Generalized Linear Mixed Model for Finite Normal Mixtures with Application to Tendon Fibrilogenesis Data

Zhan, Tingting January 2012 (has links)
We propose the generalized linear mixed model for finite normal mixtures (GLMFM), as well as the estimation procedures for the GLMFM model, which are widely applicable to the hierarchical dataset with small number of individual units and multi-modal distributions at the lowest level of clustering. The modeling task is two-fold: (a). to model the lowest level cluster as a finite mixtures of the normal distribution; and (b). to model the properly transformed mixture proportions, means and standard deviations of the lowest-level cluster as a linear hierarchical structure. We propose the robust generalized weighted likelihood estimators and the new cubic-inverse weight for the estimation of the finite mixture model (Zhan et al., 2011). We propose two robust methods for estimating the GLMFM model, which accommodate the contaminations on all clustering levels, the standard-two-stage approach (Chervoneva et al., 2011, co-authored) and a robust joint estimation. Our research was motivated by the data obtained from the tendon fibril experiment reported in Zhang et al. (2006). Our statistical methodology is quite general and has potential application in a variety of relatively complex statistical modeling situations. / Statistics
4

Optimal Subsampling of Finite Mixture Distribution

Neupane, Binod Prasad 05 1900 (has links)
<p> A mixture distribution is a compounding of statistical distributions, which arises when sampling from heterogeneous populations with a different probability density function in each component. A finite mixture has a finite number of components. In the past decade the extent and the potential of the applications of finite mixture models have widened considerably.</p> <p> The objective of this project is to add some functionalities to a package 'mixdist' developed by Du and Macdonald (Du 2002) and Gao (2004) in the R environment (R Development Core Team 2004) for estimating the parameters of a finite mixture distribution with data grouped in bins and conditional data. Mixed data together with conditional data will provide better estimates of parameters than do mixed data alone. Our main objective is to obtain the optimal sample size for each bin of the mixed data to obtain conditional data, given approximate values of parameters and the distributional form of the mixture for the given data. We have also replaced the dependence of the function mix upon the optimizer nlm to optimizer optim to provide the limits to the parameters.</p> <p> Our purpose is to provide easily available tools to modeling fish growth using mixture distribution. However, it has a number of applications in other areas as well.</p> / Thesis / Master of Science (MSc)
5

Effect fusion using model-based clustering

Malsiner-Walli, Gertraud, Pauger, Daniela, Wagner, Helga 01 April 2018 (has links) (PDF)
In social and economic studies many of the collected variables are measured on a nominal scale, often with a large number of categories. The definition of categories can be ambiguous and different classification schemes using either a finer or a coarser grid are possible. Categorization has an impact when such a variable is included as covariate in a regression model: a too fine grid will result in imprecise estimates of the corresponding effects, whereas with a too coarse grid important effects will be missed, resulting in biased effect estimates and poor predictive performance. To achieve an automatic grouping of the levels of a categorical covariate with essentially the same effect, we adopt a Bayesian approach and specify the prior on the level effects as a location mixture of spiky Normal components. Model-based clustering of the effects during MCMC sampling allows to simultaneously detect categories which have essentially the same effect size and identify variables with no effect at all. Fusion of level effects is induced by a prior on the mixture weights which encourages empty components. The properties of this approach are investigated in simulation studies. Finally, the method is applied to analyse effects of high-dimensional categorical predictors on income in Austria.
6

Customer segmentation using unobserved heterogeneity in the perceived value-loyalty-intentions link

Floh, Arne, Zauner, Alexander, Koller, Monika, Rusch, Thomas January 2014 (has links) (PDF)
Multiple facets of perceived value perceptions drive loyalty intentions. However, this value-loyalty link is not uniform for all customers. In fact, the present study identifies three different segments that are internally consistent and stable across different service industries, using two data sets: the wireless telecommunication industry (sample size 1,122) and the financial services industry (sample size 982). Comparing the results of a single-class solution with finite mixture results confirms the existence of unobserved customer segments. The three segments found are characterized as "rationalists", "functionalists" and "value maximizers". These results point the way for value-based segmentation in loyalty initiatives and reflect the importance of a multidimensional conceptualization of perceived value, comprising cognitive and affective components. The present results substantiate the fact that assuming a homogeneous value-loyalty link provides a misleading view of the market. The paper derives implications for marketing research and practice in terms of segmentation, positioning, loyalty programs and strategic alliances. (authors' abstract)
7

Topics in One-Way Supervised Biclustering Using Gaussian Mixture Models

Wong, Monica January 2017 (has links)
Cluster analysis identifies homogeneous groups that are relevant within a population. In model-based clustering, group membership is estimated using a parametric finite mixture model, commonly the mathematically tractable Gaussian mixture model. One-way clustering methods can be restrictive in cases where there are suspected relationships between the variables in each component, leading to the idea of biclustering, which refers to clustering both observations and variables simultaneously. When the relationships between the variables are known, biclustering becomes one-way supervised. To this end, this thesis focuses on a novel one-way supervised biclustering family based on the Gaussian mixture model. In cases where biclustering may be overestimating the number of components in the data, a model averaging technique utilizing Occam's window is applied to produce better clustering results. Automatic outlier detection is introduced into the biclustering family using mixtures of contaminated Gaussian mixture models. Algorithms for model-fitting and parameter estimation are presented for the techniques described in this thesis, and simulation and real data studies are used to assess their performance. / Thesis / Doctor of Philosophy (PhD)
8

Modelling human immunodeficiency virus ribonucleic acid levels with finite mixtures for censored longitudinal data

Grün, Bettina, Hornik, Kurt 01 1900 (has links) (PDF)
The measurement of human immunodeficiency virus ribonucleic acid levels over time leads to censored longitudinal data. Suitable models for dynamic modelling of these levels need to take this data characteristic into account. If groups of patients with different developments of the levels over time are suspected the model class of finite mixtures of mixed effects models with censored data is required.We describe the model specification and derive the estimation with a suitable expectation-maximization algorithm.We propose a convenient implementation using closed form formulae for the expected mean and variance of the truncated multivariate distribution. Only efficient evaluation of the cumulative multivariate normal distribution function is required. Model selection as well as methods for inference are discussed. The application is demonstrated on the clinical trial ACTG 315 data.
9

The Heterogeneity Model and its Special Cases. An Illustrative Comparison.

Tüchler, Regina, Frühwirth-Schnatter, Sylvia, Otter, Thomas January 2002 (has links) (PDF)
In this paper we carry out fully Bayesian analysis of the general heterogeneity model, which is a mixture of random effects model, and its special cases, the random coefficient model and the latent class model. Our application comes from Conjoint analysis and we are especially interested in what is gained by the general heterogeneity model in comparison to the other two when modeling consumers' heterogeneous preferences. (author's abstract) / Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
10

Efficient Tools For Reliability Analysis Using Finite Mixture Distributions

Cross, Richard J. (Richard John) 02 December 2004 (has links)
The complexity of many failure mechanisms and variations in component manufacture often make standard probability distributions inadequate for reliability modeling. Finite mixture distributions provide the necessary flexibility for modeling such complex phenomena but add considerable difficulty to the inference. This difficulty is overcome by drawing an analogy to neural networks. With appropropriate modifications, a neural network can represent a finite mixture CDF or PDF exactly. Training with Bayesian Regularization gives an efficient empirical Bayesian inference of the failure time distribution. Training also yields an effective number of parameters from which the number of components in the mixture can be estimated. Credible sets for functions of the model parameters can be estimated using a simple closed-form expression. Complete, censored, and inpection samples can be considered by appropriate choice of the likelihood function. In this work, architectures for Exponential, Weibull, Normal, and Log-Normal mixture networks have been derived. The capabilities of mixture networks have been demonstrated for complete, censored, and inspection samples from Weibull and Log-Normal mixtures. Furthermore, mixture networks' ability to model arbitrary failure distributions has been demonstrated. A sensitivity analysis has been performed to determine how mixture network estimator errors are affected my mixture component spacing and sample size. It is shown that mixture network estimators are asymptotically unbiased and that errors decay with sample size at least as well as with MLE.

Page generated in 0.0614 seconds