• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 114
  • 22
  • 17
  • 15
  • 7
  • 5
  • 5
  • 4
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 232
  • 232
  • 90
  • 43
  • 43
  • 36
  • 30
  • 28
  • 27
  • 24
  • 23
  • 22
  • 21
  • 20
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Predicting dementia status from Mini-Mental State Exam scores using group-based trajectory modelling

Brown, Cassandra Lynn 24 August 2012 (has links)
Background: Longitudinal studies enable the study of within person change over time in addition to between person differences. In longitudinal studies of older adult populations even when not the question of interest, identifying participants with dementia is desirable, and often necessary. Yet in practice, the time to collect information from each participant may be limited. Therefore, some studies include only a brief general cognitive measure of which the Mini Mental State Examination (MMSE) is the most commonly used (Raina et al., 2009). The current study explores whether group-based trajectory modeling of MMSE scores with a selection of covariates can identify individuals who have or will develop dementia in an 8 year longitudinal study. Methods: The sample included 651 individuals from the Origins of Variance in the Oldest Old study of Swedish twins 80 years old or older (OCTO-Twin). Participants had completed the MMSE every two years, and cases of dementia were diagnosed according to DSM-III criteria. The accuracy of using the classes formed in growth mixture modeling and latent class growth modeling as indicative of dementia status was compared to that of more standard methods, the typical 24/30 cut score and a logistic regression. Results: A three-class quadratic model with covariate effects on class membership was found to best characterize the data. The classes were characterized as High Performing Late Decline, Rapidly Declining, and Decreasing Low Performance, and were labeled as such. Comparing the diagnostic accuracy of the latent trajectory groups against simple methods; the sensitivity of the final model was lower but it was the same or superior in specificity, positive predictive value, negative predictive value, and allowed a more fine-grained analysis of participant risk. Conclusions: Group-based trajectory models may be helpful for grouping longitudinal study participants, particularly if sensitivity is not the primary concern. / Graduate
2

Latent Class Model in Transportation Study

Zhang, Dengfeng 20 January 2015 (has links)
Statistics, as a critical component in transportation research, has been widely used to analyze driver safety, travel time, traffic flow and numerous other problems. Many of these popular topics can be interpreted as to establish the statistical models for the latent structure of data. Over the past several years, the interest in latent class models has continuously increased due to their great potential in solving practical problems. In this dissertation, I developed several latent class models to quantitatively analyze the hidden structure of transportation data and addressed related application issues. The first model is focused on the uncertainty of travel time, which is critical for assessing the reliability of transportation systems. Travel time is random in nature, and contains substantial variability, especially under congested traffic conditions. A Bayesian mixture model, with the ability to incorporate the influence from covariates such as traffic volume, has been proposed. This model advances the previous multi-state travel time reliability model in which the relationship between response and predictors was lacking. The Bayesian mixture travel time model, however, lack the power to accurately predict the future travel time. The analysis indicates that the independence assumption, which is difficult to justify in real data, could be a potential issue. Therefore, I proposed a Hidden Markov model to accommodate dependency structure, and the modeling results were significantly improved. The second and third parts of the dissertation focus on the driver safety identification. Given the demographic information and crash history, the number of crashes, as a type of count data, is commonly modeled by Poisson regression. However, the over-dispersion issue within the data implies that a single Poisson distribution is insufficient to depict the substantial variability. Poisson mixture model is proposed and applied to identify risky and safe drivers. The lower bound of the estimated misclassification rate is evaluated using the concept of overlap probability. Several theoretical results have been discussed regarding the overlap probability. I also introduced quantile regression based on discrete data to specifically model the high-risk drivers. In summary, the major objective of my research is to develop latent class methods and explore the hidden structure within the transportation data, and the approaches I employed can also be implemented for similar research questions in other areas. / Ph. D.
3

Hypothesis Testing in Finite Mixture Models

Li, Pengfei 11 December 2007 (has links)
Mixture models provide a natural framework for unobserved heterogeneity in a population. They are widely applied in astronomy, biology, engineering, finance, genetics, medicine, social sciences, and other areas. An important first step for using mixture models is the test of homogeneity. Before one tries to fit a mixture model, it might be of value to know whether the data arise from a homogeneous or heterogeneous population. If the data are homogeneous, it is not even necessary to go into mixture modeling. The rejection of the homogeneous model may also have scientific implications. For example, in classical statistical genetics, it is often suspected that only a subgroup of patients have a disease gene which is linked to the marker. Detecting the existence of this subgroup amounts to the rejection of a homogeneous null model in favour of a two-component mixture model. This problem has attracted intensive research recently. This thesis makes substantial contributions in this area of research. Due to partial loss of identifiability, classic inference methods such as the likelihood ratio test (LRT) lose their usual elegant statistical properties. The limiting distribution of the LRT often involves complex Gaussian processes, which can be hard to implement in data analysis. The modified likelihood ratio test (MLRT) is found to be a nice alternative of the LRT. It restores the identifiability by introducing a penalty to the log-likelihood function. Under some mild conditions, the limiting distribution of the MLRT is 1/2\chi^2_0+1/2\chi^2_1, where \chi^2_{0} is a point mass at 0. This limiting distribution is convenient to use in real data analysis. The choice of the penalty functions in the MLRT is very flexible. A good choice of the penalty enhances the power of the MLRT. In this thesis, we first introduce a new class of penalty functions, with which the MLRT enjoys a significantly improved power for testing homogeneity. The main contribution of this thesis is to propose a new class of methods for testing homogeneity. Most existing methods in the literature for testing of homogeneity, explicitly or implicitly, are derived under the condition of finite Fisher information and a compactness assumption on the space of the mixing parameters. The finite Fisher information condition can prevent their usage to many important mixture models, such as the mixture of geometric distributions, the mixture of exponential distributions and more generally mixture models in scale distribution families. The compactness assumption often forces applicants to set artificial bounds for the parameters of interest and makes the resulting limiting distribution dependent on these bounds. Consequently, developing a method without such restrictions is a dream of many researchers. As it will be seen, the proposed EM-test in this thesis is free of these shortcomings. The EM-test combines the merits of the classic LRT and score test. The properties of the EM-test are particularly easy to investigate under single parameter mixture models. It has a simple limiting distribution 0.5\chi^2_0+0.5\chi^2_1, the same as the MLRT. This result is applicable to mixture models without requiring the restrictive regularity conditions described earlier. The normal mixture model is a very popular model in applications. However it does not satisfy the strong identifiability condition, which imposes substantial technical difficulties in the study of the asymptotic properties. Most existing methods do not directly apply to the normal mixture models, so the asymptotic properties have to be developed separately. We investigate the use of the EM-test to normal mixture models and its limiting distributions are derived. For the homogeneity test in the presence of the structural parameter, the limiting distribution is a simple function of the 0.5\chi^2_0+0.5\chi^2_1 and \chi^2_1 distributions. The test with this limiting distribution is still very convenient to implement. For normal mixtures in both mean and variance parameters, the limiting distribution of the EM-test is found be to \chi^2_2. Mixture models are also widely used in the analysis of the directional data. The von Mises distribution is often regarded as the circular normal model. Interestingly, it satisfies the strong identifiability condition and the parameter space of the mean direction is compact. However the theoretical results in the single parameter mixture models can not directly apply to the von Mises mixture models. Because of this, we also study the application of the EM-test to von Mises mixture models in the presence of the structural parameter. The limiting distribution of the EM-test is also found to be 0.5\chi^2_0+0.5\chi^2_1. Extensive simulation results are obtained to examine the precision of the approximation of the limiting distributions to the finite sample distributions of the EM-test. The type I errors with the critical values determined by the limiting distributions are found to be close to nominal values. In particular, we also propose several precision enhancing methods, which are found to work well. Real data examples are used to illustrate the use of the EM-test.
4

Hypothesis Testing in Finite Mixture Models

Li, Pengfei 11 December 2007 (has links)
Mixture models provide a natural framework for unobserved heterogeneity in a population. They are widely applied in astronomy, biology, engineering, finance, genetics, medicine, social sciences, and other areas. An important first step for using mixture models is the test of homogeneity. Before one tries to fit a mixture model, it might be of value to know whether the data arise from a homogeneous or heterogeneous population. If the data are homogeneous, it is not even necessary to go into mixture modeling. The rejection of the homogeneous model may also have scientific implications. For example, in classical statistical genetics, it is often suspected that only a subgroup of patients have a disease gene which is linked to the marker. Detecting the existence of this subgroup amounts to the rejection of a homogeneous null model in favour of a two-component mixture model. This problem has attracted intensive research recently. This thesis makes substantial contributions in this area of research. Due to partial loss of identifiability, classic inference methods such as the likelihood ratio test (LRT) lose their usual elegant statistical properties. The limiting distribution of the LRT often involves complex Gaussian processes, which can be hard to implement in data analysis. The modified likelihood ratio test (MLRT) is found to be a nice alternative of the LRT. It restores the identifiability by introducing a penalty to the log-likelihood function. Under some mild conditions, the limiting distribution of the MLRT is 1/2\chi^2_0+1/2\chi^2_1, where \chi^2_{0} is a point mass at 0. This limiting distribution is convenient to use in real data analysis. The choice of the penalty functions in the MLRT is very flexible. A good choice of the penalty enhances the power of the MLRT. In this thesis, we first introduce a new class of penalty functions, with which the MLRT enjoys a significantly improved power for testing homogeneity. The main contribution of this thesis is to propose a new class of methods for testing homogeneity. Most existing methods in the literature for testing of homogeneity, explicitly or implicitly, are derived under the condition of finite Fisher information and a compactness assumption on the space of the mixing parameters. The finite Fisher information condition can prevent their usage to many important mixture models, such as the mixture of geometric distributions, the mixture of exponential distributions and more generally mixture models in scale distribution families. The compactness assumption often forces applicants to set artificial bounds for the parameters of interest and makes the resulting limiting distribution dependent on these bounds. Consequently, developing a method without such restrictions is a dream of many researchers. As it will be seen, the proposed EM-test in this thesis is free of these shortcomings. The EM-test combines the merits of the classic LRT and score test. The properties of the EM-test are particularly easy to investigate under single parameter mixture models. It has a simple limiting distribution 0.5\chi^2_0+0.5\chi^2_1, the same as the MLRT. This result is applicable to mixture models without requiring the restrictive regularity conditions described earlier. The normal mixture model is a very popular model in applications. However it does not satisfy the strong identifiability condition, which imposes substantial technical difficulties in the study of the asymptotic properties. Most existing methods do not directly apply to the normal mixture models, so the asymptotic properties have to be developed separately. We investigate the use of the EM-test to normal mixture models and its limiting distributions are derived. For the homogeneity test in the presence of the structural parameter, the limiting distribution is a simple function of the 0.5\chi^2_0+0.5\chi^2_1 and \chi^2_1 distributions. The test with this limiting distribution is still very convenient to implement. For normal mixtures in both mean and variance parameters, the limiting distribution of the EM-test is found be to \chi^2_2. Mixture models are also widely used in the analysis of the directional data. The von Mises distribution is often regarded as the circular normal model. Interestingly, it satisfies the strong identifiability condition and the parameter space of the mean direction is compact. However the theoretical results in the single parameter mixture models can not directly apply to the von Mises mixture models. Because of this, we also study the application of the EM-test to von Mises mixture models in the presence of the structural parameter. The limiting distribution of the EM-test is also found to be 0.5\chi^2_0+0.5\chi^2_1. Extensive simulation results are obtained to examine the precision of the approximation of the limiting distributions to the finite sample distributions of the EM-test. The type I errors with the critical values determined by the limiting distributions are found to be close to nominal values. In particular, we also propose several precision enhancing methods, which are found to work well. Real data examples are used to illustrate the use of the EM-test.
5

Model-based Pre-processing in Protein Mass Spectrometry

Wagaman, John C. 2009 December 1900 (has links)
The discovery of proteomic information through the use of mass spectrometry (MS) has been an active area of research in the diagnosis and prognosis of many types of cancer. This process involves feature selection through peak detection but is often complicated by many forms of non-biologicalbias. The need to extract biologically relevant peak information from MS data has resulted in the development of statistical techniques to aid in spectra pre-processing. Baseline estimation and normalization are important pre-processing steps because the subsequent quantification of peak heights depends on this baseline estimate. This dissertation introduces a mixture model to estimate the baseline and peak heights simultaneously through the expectation-maximization (EM) algorithm and a penalized likelihood approach. Our model-based pre-processing performs well in the presence of raw, unnormalized data, with few subjective inputs. We also propose a model-based normalization solution for use in subsequent classification procedures, where misclassification results compare favorably with existing methods of normalization. The performance of our pre-processing method is evaluated using popular matrix-assisted laser desorption and ionization (MALDI) and surface-enhanced laser desorption and ionization (SELDI) datasets as well as through simulation.
6

Birthweight-specific neonatal health : With application on data from a tertiaryhospital in Tanzania

Dahlqwist, Elisabeth January 2014 (has links)
The following study analyzes birthweight-specific neonatal health using a combination of a mixture model and logistic regression: the extended Parametric Mixture of Logistic Regression. The data are collected from the Obstetric database at Muhimbili National Hospital in Dar es Salaam, Tanzania and the years 2009 -2013 are used in the analysis. Due to rounding in the birthweight data a novel method to adjust for rounding when estimating a mixture model is applied. The influence of rounding on the estimates is then investigated. A three-component model is selected. The variables used in the analysis of neonatal health are early neonatal mortality, if the mother has HIV, anaemia, is a private patient and if the neonate is born after 36 completed weeks of gestation. It can be concluded that the mortality rates are high especially for low birthweights (2000 or less) in the estimated first and second components. However, due to wide confidence bounds it is hard to draw conclusions from the data.
7

Complete Bayesian analysis of some mixture time series models

Hossain, Shahadat January 2012 (has links)
In this thesis we consider some finite mixture time series models in which each component is following a well-known process, e.g. AR, ARMA or ARMA-GARCH process, with either normal-type errors or Student-t type errors. We develop MCMC methods and use them in the Bayesian analysis of these mixture models. We introduce some new models such as mixture of Student-t ARMA components and mixture of Student-t ARMA-GARCH components with complete Bayesian treatments. Moreover, we use component precision (instead of variance) with an additional hierarchical level which makes our model more consistent with the MCMC moves. We have implemented the proposed methods in R and give examples with real and simulated data.
8

Longitudinal Data Clustering Via Kernel Mixture Models

Zhang, Xi January 2021 (has links)
Kernel mixture models are proposed to cluster univariate, independent multivariate and dependent bivariate longitudinal data. The Gaussian distribution in finite mixture models is replaced by the Gaussian and gamma kernel functions, and the expectation-maximization algorithm is used to estimate bandwidths and compute log-likelihood scores. For dependent bivariate longitudinal data, the bivariate Gaussian copula is used to reveal the correlation between two attributes. After that, we use AIC, BIC and ICL to select the best model. In addition, we also introduce a kernel distance-based clustering method to compare with the kernel mixture models. A simulation is performed to illustrate the performance of this mixture model, and results show that the gamma kernel mixture model performs better than the kernel distance-based clustering method based on misclassification rates. Finally, these two models are applied to COVID-19 data, and sixty countries are classified into ten clusters based on growth rates and death rates. / Thesis / Master of Science (MSc)
9

Estimating the Proportion of True Null Hypotheses in Multiple Testing Problems

Oyeniran, Oluyemi 18 July 2016 (has links)
No description available.
10

The wild bootstrap resampling in regression imputation algorithm with a Gaussian Mixture Model

Mat Jasin, A., Neagu, Daniel, Csenki, Attila 08 July 2018 (has links)
Yes / Unsupervised learning of finite Gaussian mixture model (FGMM) is used to learn the distribution of population data. This paper proposes the use of the wild bootstrapping to create the variability of the imputed data in single miss-ing data imputation. We compare the performance and accuracy of the proposed method in single imputation and multiple imputation from the R-package Amelia II using RMSE, R-squared, MAE and MAPE. The proposed method shows better performance when compared with the multiple imputation (MI) which is indeed known as the golden method of missing data imputation techniques.

Page generated in 0.0416 seconds