61 |
Proportional likelihood ratio mixed model for longitudinal discrete dataWu, Hongqian 01 December 2016 (has links)
A semiparametric proportional likelihood ratio model was proposed by Luo and Tsai (2012) which is suitable for modeling a nonlinear monotonic relationship between the response variable and a covariate. Extending the generalized linear model, this model leaves the probability distribution unspecified but estimates it from the data. In this thesis, we propose to extend this model into analyzing the longitudinal data by incorporating random effects into the linear predictor. By using this model as the conditional density of the response variable given the random effects, we present a maximum likelihood approach for model estimation and inference. Two numerical estimation procedures were developed for response variables with finite support, one based on the Newton-Raphson algorithm and the other one based on generalized expectation maximization (GEM) algorithm. In both estimation procedures, Gauss-Hermite quadrature is employed to approximate the integrals.
Upon convergence, the observed information matrix is estimated through the second-order numerical differentiation of the log likelihood function. Asymptotic properties of the maximum likelihood estimator are established under certain regularity conditions and simulation studies are conducted to assess its finite sample properties and compare the proposed model to the generalized linear mixed model. The proposed method is illustrated in an analysis of data from a multi-site observational study of prodromal Huntington's disease.
|
62 |
Likelihood-based inference for antedependence (Markov) models for categorical longitudinal dataXie, Yunlong 01 July 2011 (has links)
Antedependence (AD) of order p, also known as the Markov property of order p, is a property of index-ordered random variables in which each variable, given at least p immediately preceding variables, is independent of all further preceding variables. Zimmerman and Nunez-Anton (2010) present statistical methodology for fitting and performing inference for AD models for continuous (primarily normal) longitudinal data. But analogous AD-model methodology for categorical longitudinal data has not yet been well developed. In this thesis, we derive maximum likelihood estimators of transition probabilities under antedependence of any order, and we use these estimators to develop likelihood-based methods for determining the order of antedependence of categorical longitudinal data. Specifically, we develop a penalized likelihood method for determining variable-order antedependence structure, and we derive the likelihood ratio test, score test, Wald test and an adaptation of Fisher's exact test for pth-order antedependence against the unstructured (saturated) multinomial model. Simulation studies show that the score (Pearson's Chi-square) test performs better than all the other methods for complete and monotone missing data, while the likelihood ratio test is applicable for data with arbitrary missing pattern. But since the likelihood ratio test is oversensitive under the null hypothesis, we modify it by equating the expectation of the test statistic to its degrees of freedom so that it has actual size closer to nominal size. Additionally, we modify the likelihood ratio tests for use in testing for pth-order antedependence against qth-order antedependence, where q > p, and for testing nested variable-order antedependence models. We extend the methods to deal with data having a monotone or arbitrary missing pattern. For antedependence models of constant order p, we develop methods for testing transition probability stationarity and strict stationarity and for maximum likelihood estimation of parametric generalized linear models that are transition probability stationary AD(p) models. The methods are illustrated using three data sets.
|
63 |
Statistical Modeling and Prediction of HIV/AIDS Prognosis: Bayesian Analyses of Nonlinear Dynamic MixturesLu, Xiaosun 10 July 2014 (has links)
Statistical analyses and modeling have contributed greatly to our understanding of the pathogenesis of HIV-1 infection; they also provide guidance for the treatment of AIDS patients and evaluation of antiretroviral (ARV) therapies. Various statistical methods, nonlinear mixed-effects models in particular, have been applied to model the CD4 and viral load trajectories. A common assumption in these methods is all patients come from a homogeneous population following one mean trajectories. This assumption unfortunately obscures important characteristic difference between subgroups of patients whose response to treatment and whose disease trajectories are biologically different. It also may lack the robustness against population heterogeneity resulting misleading or biased inference.
Finite mixture models, also known as latent class models, are commonly used to model nonpredetermined heterogeneity in a population; they provide an empirical representation of heterogeneity by grouping the population into a finite number of latent classes and modeling the population through a mixture distribution. For each latent class, a finite mixture model allows individuals in each class to vary around their own mean trajectory, instead of a common one shared by all classes. Furthermore, a mixture model has ability to cluster and estimate class membership probabilities at both population and individual levels. This important feature may help physicians to better understand a particular patient disease progression and refine the therapeutical strategy in advance.
In this research, we developed mixture dynamic model and related Bayesian inferences via Markov chain Monte Carlo (MCMC). One real data set from HIV/AIDS clinical management and another from clinical trial were used to illustrate the proposed models and methods.
This dissertation explored three topics. First, we modeled the CD4 trajectories using a finite mixture model with four distinct components of which the mean functions are designed based on Michaelis-Menten function. Relevant covariates both baseline and time-varying were considered and model comparison and selection were based on such-criteria as Deviance Information Criteria (DIC). Class membership model was allowed to depend on covariates for prediction. Second, we explored disease status prediction HIV/AIDS using the latent class membership model. Third, we modeled viral load trajectories using a finite mixture model with three components of which the mean functions are designed based on published HIV dynamic systems. Although this research is motivated by HIV/AIDS studies, the basic concepts and methods developed here have much broader applications in management of other chronic diseases; they can also be applied to dynamic systems in other fields. Implementation of our methods using the publicly- vailable WinBUGS package suggest that our approach can be made quite accessible to practicing statisticians and data analysts.
|
64 |
Model Specification Searches in Latent Growth Modeling: A Monte Carlo StudyKim, Min Jung 2012 May 1900 (has links)
This dissertation investigated the optimal strategy for the model specification search in the latent growth modeling. Although developing an initial model based on the theory from prior research is favored, sometimes researchers may need to specify the starting model in the absence of theory. In this simulation study, the effectiveness of the start models in searching for the true population model was examined. The four possible start models adopted in this study were: the simplest mean and covariance structure model, the simplest mean and the most complex covariance structure model, the most complex mean and the simplest covariance structure model, and the most complex mean and covariance structure model. Six model selection criteria were used to determine the recovery of the true model: Likelihood ratio test (LRT), DeltaCFI, DeltaRMSEA, DeltaSRMR, DeltaAIC, and DeltaBIC.
The results showed that specifying the most complex covariance structure (UN) with the most complex mean structure recovered the true mean trajectory most successfully with the average hit rate above 90% using the DeltaCFI, DeltaBIC, DeltaAIC, and DeltaSRMR. In searching for the true covariance structure, LRT, DeltaCFI, DeltaAIC, and DeltaBIC performed successfully regardless of the searching method with different start models.
|
65 |
Multivariate Longitudinal Data Analysis with Mixed Effects Hidden Markov ModelsRaffa, Jesse Daniel January 2012 (has links)
Longitudinal studies, where data on study subjects are collected over time, is increasingly involving multivariate longitudinal responses. Frequently, the heterogeneity observed in a multivariate longitudinal response can be attributed to underlying unobserved disease states in addition to any between-subject differences. We propose modeling such disease states using a hidden Markov model (HMM) approach and expand upon previous work, which incorporated random effects into HMMs for the analysis of univariate longitudinal data, to the setting of a multivariate longitudinal response. Multivariate longitudinal data are modeled jointly using separate but correlated random effects between longitudinal responses of mixed data types in addition to a shared underlying hidden process. We use a computationally efficient Bayesian approach via Markov chain Monte Carlo (MCMC) to fit such models. We apply this methodology to bivariate longitudinal response data from a smoking cessation clinical trial. Under these models, we examine how to incorporate a treatment effect on the disease states, as well as develop methods to classify observations by disease state and to attempt to understand patient dropout. Simulation studies were performed to evaluate the properties of such models and their applications under a variety of realistic situations.
|
66 |
Applying Localized Realized Volatility Modeling to Futures IndicesFu, Luella 01 January 2011 (has links)
This thesis extends the application of the localized realized volatility model created by Ying Chen, Wolfgang Karl Härdle, and Uta Pigorsch to other futures markets, particularly the CAC 40 and the NI 225. The research attempted to replicate results though ultimately, those results were invalidated by procedural difficulties.
|
67 |
Analysis of Correlated Data with Measurement Error in Responses or CovariatesChen, Zhijian January 2010 (has links)
Correlated data frequently arise from epidemiological studies, especially familial
and longitudinal studies. Longitudinal design has been used by researchers to investigate the changes of certain characteristics over time at the individual level as well as how potential factors influence the changes. Familial studies are often designed to investigate the dependence of health conditions among family members. Various models have been developed for this type of multivariate data, and a wide variety
of estimation techniques have been proposed. However, data collected from observational
studies are often far from perfect, as measurement error may arise from different
sources such as defective measuring systems, diagnostic tests without gold references,
and self-reports. Under such scenarios only rough surrogate variables are measured. Measurement error in covariates in various regression models has been discussed extensively in the literature. It is well known that naive approaches ignoring covariate error often lead to inconsistent estimators for model parameters.
In this thesis, we develop inferential procedures for analyzing correlated data with
response measurement error. We consider three scenarios: (i) likelihood-based inferences for generalized linear mixed models when the continuous response is subject to nonlinear measurement errors; (ii) estimating equations methods for binary responses with misclassifications; and (iii) estimating equations methods for ordinal
responses when the response variable and categorical/ordinal covariates are subject
to misclassifications.
The first problem arises when the continuous response variable is difficult to measure.
When the true response is defined as the long-term average of measurements, a single measurement is considered as an error-contaminated surrogate. We focus on generalized linear mixed models with nonlinear response error and study the induced bias in naive estimates. We propose likelihood-based methods that can yield consistent and efficient estimators for both fixed-effects and variance parameters. Results of simulation studies and analysis of a data set from the Framingham Heart Study
are presented.
Marginal models have been widely used for correlated binary, categorical, and ordinal data. The regression parameters characterize the marginal mean of a single outcome, without conditioning on other outcomes or unobserved random effects. The generalized estimating equations (GEE) approach, introduced by Liang and Zeger (1986), only models the first two moments of the responses with associations being
treated as nuisance characteristics. For some clustered studies especially familial
studies, however, the association structure may be of scientific interest. With binary
data Prentice (1988) proposed additional estimating equations that allow one to
model pairwise correlations. We consider marginal models for correlated binary data
with misclassified responses. We develop “corrected” estimating equations approaches
that can yield consistent estimators for both mean and association parameters. The
idea is related to Nakamura (1990) that is originally developed for correcting bias
induced by additive covariate measurement error under generalized linear models. Our approaches can also handle correlated misclassifications rather than a simple
misclassification process as considered by Neuhaus (2002) for clustered binary data
under generalized linear mixed models. We extend our methods and further develop
marginal approaches for analysis of longitudinal ordinal data with misclassification in both responses and categorical covariates. Simulation studies show that our proposed methods perform very well under a variety of scenarios. Results from application of the proposed methods to real data are presented.
Measurement error can be coupled with many other features in the data, e.g., complex survey designs, that can complicate inferential procedures. We explore combining
survey weights and misclassification in ordinal covariates in logistic regression
analyses. We propose an approach that incorporates survey weights into estimating
equations to yield design-based unbiased estimators.
In the final part of the thesis we outline some directions for future work, such as
transition models and semiparametric models for longitudinal data with both incomplete
observations and measurement error. Missing data is another common feature in applications. Developing novel statistical techniques for dealing with both missing
data and measurement error can be beneficial.
|
68 |
Bayesian Modeling Using Latent StructuresWang, Xiaojing January 2012 (has links)
<p>This dissertation is devoted to modeling complex data from the</p><p>Bayesian perspective via constructing priors with latent structures.</p><p>There are three major contexts in which this is done -- strategies for</p><p>the analysis of dynamic longitudinal data, estimating</p><p>shape-constrained functions, and identifying subgroups. The</p><p>methodology is illustrated in three different</p><p>interdisciplinary contexts: (1) adaptive measurement testing in</p><p>education; (2) emulation of computer models for vehicle crashworthiness; and (3) subgroup analyses based on biomarkers.</p><p>Chapter 1 presents an overview of the utilized latent structured</p><p>priors and an overview of the remainder of the thesis. Chapter 2 is</p><p>motivated by the problem of analyzing dichotomous longitudinal data</p><p>observed at variable and irregular time points for adaptive</p><p>measurement testing in education. One of its main contributions lies</p><p>in developing a new class of Dynamic Item Response (DIR) models via</p><p>specifying a novel dynamic structure on the prior of the latent</p><p>trait. The Bayesian inference for DIR models is undertaken, which</p><p>permits borrowing strength from different individuals, allows the</p><p>retrospective analysis of an individual's changing ability, and</p><p>allows for online prediction of one's ability changes. Proof of</p><p>posterior propriety is presented, ensuring that the objective</p><p>Bayesian analysis is rigorous.</p><p>Chapter 3 deals with nonparametric function estimation under</p><p>shape constraints, such as monotonicity, convexity or concavity. A</p><p>motivating illustration is to generate an emulator to approximate a computer</p><p>model for vehicle crashworthiness. Although Gaussian processes are</p><p>very flexible and widely used in function estimation, they are not</p><p>naturally amenable to incorporation of such constraints. Gaussian</p><p>processes with the squared exponential correlation function have the</p><p>interesting property that their derivative processes are also</p><p>Gaussian processes and are jointly Gaussian processes with the</p><p>original Gaussian process. This allows one to impose shape constraints</p><p>through the derivative process. Two alternative ways of incorporating derivative</p><p>information into Gaussian processes priors are proposed, with one</p><p>focusing on scenarios (important in emulation of computer</p><p>models) in which the function may have flat regions.</p><p>Chapter 4 introduces a Bayesian method to control for multiplicity</p><p>in subgroup analyses through tree-based models that limit the</p><p>subgroups under consideration to those that are a priori plausible.</p><p>Once the prior modeling of the tree is accomplished, each tree will</p><p>yield a statistical model; Bayesian model selection analyses then</p><p>complete the statistical computation for any quantity of interest,</p><p>resulting in multiplicity-controlled inferences. This research is</p><p>motivated by a problem of biomarker and subgroup identification to</p><p>develop tailored therapeutics. Chapter 5 presents conclusions and</p><p>some directions for future research.</p> / Dissertation
|
69 |
Analysis of Correlated Data with Measurement Error in Responses or CovariatesChen, Zhijian January 2010 (has links)
Correlated data frequently arise from epidemiological studies, especially familial
and longitudinal studies. Longitudinal design has been used by researchers to investigate the changes of certain characteristics over time at the individual level as well as how potential factors influence the changes. Familial studies are often designed to investigate the dependence of health conditions among family members. Various models have been developed for this type of multivariate data, and a wide variety
of estimation techniques have been proposed. However, data collected from observational
studies are often far from perfect, as measurement error may arise from different
sources such as defective measuring systems, diagnostic tests without gold references,
and self-reports. Under such scenarios only rough surrogate variables are measured. Measurement error in covariates in various regression models has been discussed extensively in the literature. It is well known that naive approaches ignoring covariate error often lead to inconsistent estimators for model parameters.
In this thesis, we develop inferential procedures for analyzing correlated data with
response measurement error. We consider three scenarios: (i) likelihood-based inferences for generalized linear mixed models when the continuous response is subject to nonlinear measurement errors; (ii) estimating equations methods for binary responses with misclassifications; and (iii) estimating equations methods for ordinal
responses when the response variable and categorical/ordinal covariates are subject
to misclassifications.
The first problem arises when the continuous response variable is difficult to measure.
When the true response is defined as the long-term average of measurements, a single measurement is considered as an error-contaminated surrogate. We focus on generalized linear mixed models with nonlinear response error and study the induced bias in naive estimates. We propose likelihood-based methods that can yield consistent and efficient estimators for both fixed-effects and variance parameters. Results of simulation studies and analysis of a data set from the Framingham Heart Study
are presented.
Marginal models have been widely used for correlated binary, categorical, and ordinal data. The regression parameters characterize the marginal mean of a single outcome, without conditioning on other outcomes or unobserved random effects. The generalized estimating equations (GEE) approach, introduced by Liang and Zeger (1986), only models the first two moments of the responses with associations being
treated as nuisance characteristics. For some clustered studies especially familial
studies, however, the association structure may be of scientific interest. With binary
data Prentice (1988) proposed additional estimating equations that allow one to
model pairwise correlations. We consider marginal models for correlated binary data
with misclassified responses. We develop “corrected” estimating equations approaches
that can yield consistent estimators for both mean and association parameters. The
idea is related to Nakamura (1990) that is originally developed for correcting bias
induced by additive covariate measurement error under generalized linear models. Our approaches can also handle correlated misclassifications rather than a simple
misclassification process as considered by Neuhaus (2002) for clustered binary data
under generalized linear mixed models. We extend our methods and further develop
marginal approaches for analysis of longitudinal ordinal data with misclassification in both responses and categorical covariates. Simulation studies show that our proposed methods perform very well under a variety of scenarios. Results from application of the proposed methods to real data are presented.
Measurement error can be coupled with many other features in the data, e.g., complex survey designs, that can complicate inferential procedures. We explore combining
survey weights and misclassification in ordinal covariates in logistic regression
analyses. We propose an approach that incorporates survey weights into estimating
equations to yield design-based unbiased estimators.
In the final part of the thesis we outline some directions for future work, such as
transition models and semiparametric models for longitudinal data with both incomplete
observations and measurement error. Missing data is another common feature in applications. Developing novel statistical techniques for dealing with both missing
data and measurement error can be beneficial.
|
70 |
Selecting the Working Correlation Structure by a New Generalized AIC Index for Longitudinal DataLin, Wei-Lun 28 November 2007 (has links)
The analysis of longitudinal data has been a popular subject for the recent years. The growth of the Generalized Estimating Equation (GEE) Liang & Zeger, 1986) is one of the most influential recent developments in statistical practice for this practice. GEE methods are attractive both from a theoretical and a practical standpoint. In this paper, we are interested in the influence of different "working" correlation structures for modeling the longitudinal data. Furthermore, we propose a new AIC-like method for the model assessment which generalized AIC from the point of view of the data generating. By comparing the difference of the log-likelihood functions between different correlation models, we define the exact value to create an interval for our model selection. In this thesis, we combine the GEE method and a new generalized AIC Index for the longitudinal data with different correlation structures.
|
Page generated in 0.0242 seconds