1 |
Modeling covariance structure in unbalanced longitudinal dataChen, Min 15 May 2009 (has links)
Modeling covariance structure is important for efficient estimation in longitudinal
data models. Modified Cholesky decomposition (Pourahmadi, 1999) is used as an
unconstrained reparameterization of the covariance matrix. The resulting new parameters
have transparent statistical interpretations and are easily modeled using
covariates. However, this approach is not directly applicable when the longitudinal
data are unbalanced, because a Cholesky factorization for observed data that is
coherent across all subjects usually does not exist. We overcome this difficulty by
treating the problem as a missing data problem and employing a generalized EM
algorithm to compute the ML estimators. We study the covariance matrices in both
fixed-effects models and mixed-effects models for unbalanced longitudinal data. We
illustrate our method by reanalyzing Kenwards (1987) cattle data and conducting
simulation studies.
|
2 |
Topics in analyzing longitudinal dataJu, Hyunsu 17 February 2005 (has links)
We propose methods for analyzing longitudinal data, obtained in clinical trials
and other applications with repeated measures of responses taken over time. Common
characteristics of longitudinal studies are correlated responses and observations taken
at unequal points in time. The first part of this dissertation examines the justification
of a block bootstrap procedure for the repeated measurement designs, which takes
into account the dependence structure of the data by resampling blocks of adjacent
observations rather than individual data points. In the case of dependent stationary
data, under regular conditions, the approximately studentized or standardized block
bootstrap possesses a higher order of accuracy. With longitudinal data, the second
part of this dissertation shows that the diagonal optimal weights for unbalanced
designs can be made to improve the efficiency of the estimators in terms of mean
squared error criterion. Simulation study is conducted for each of the longitudinal
designs. We will also analyze repeated measurement data set concerning nursing home
residents with multiple sclerosis, which is obtained from a large database termed the
minimum data set (MDS).
|
3 |
Accounting for Correlation in the Analysis of Randomized Controlled Trials with Multiple Layers of ClusteringBaumgardner, Adam 17 May 2016 (has links)
A common goal in medical research is to determine the effect that a treatment has on subjects over time. Unfortunately, the analysis of data from such clinical trials often omits several aspects of the study design, leading to incorrect or misleading conclusions. In this paper, a major objective is to show via case studies that randomized controlled trials with longitudinal designs must account for correlation and clustering among observations in order to make proper statistical inference. Further, the effects of outliers in a multi-center, randomized controlled trial with multiple layers of clustering are examined and strategies for detecting and dealing with outlying observations and clusters are discussed. / McAnulty College and Graduate School of Liberal Arts; / Computational Mathematics / MS; / Thesis;
|
4 |
Nonparametric tests for longitudinal dataDong, Lei January 1900 (has links)
Master of Science / Department of Statistics / Haiyan Wang / The purpose of this report is to numerically compare several tests that are applicable to longitudinal data when the experiment contains a large number of treatments or experimental conditions. Such data are increasingly common as technology advances. Of interest is to evaluate if there is any significant main effect of treatment or time, and their interactions. Traditional methods such as linear mixed-effects models (LME), generalized estimating equations (GEE), Wilks' lambda, Hotelling-Lawley, and Pillai's multivariate tests were developed under either parametric distributional assumptions or the assumption of large number of replications. A few recent tests, such as Zhang (2008), Bathke & Harrar (2008), and Bathke & Harrar (2008) were specially developed for the setting of large number of treatments with possibly small replications. In this report, I will present some numerical studies regarding these tests. Performance of these tests will be presented for data generated from several distributions.
|
5 |
Statistical methodology for modelling immunological progression in HIV diseaseParpia, Tamiza January 1999 (has links)
No description available.
|
6 |
Longitudinal Data Analysis with Composite Likelihood MethodsLi, Haocheng January 2012 (has links)
Longitudinal data arise commonly in many fields including public health studies and survey sampling. Valid inference methods for longitudinal data are of great importance in scientific researches. In longitudinal studies, data collection are often designed to follow all the interested information on individuals at scheduled times. The analysis in longitudinal studies usually focuses on how the data change over time and how they are associated with certain risk factors or covariates. Various statistical models and methods have been developed over the past few decades. However,
these methods could become invalid when data possess additional features.
First of all, incompleteness of data presents considerable complications to standard modeling and inference methods. Although we hope each individual completes all of the scheduled measurements without any absence, missing observations occur commonly in longitudinal studies. It has been documented that biased results could arise if such a feature is not properly accounted for in the analysis. There has been a large body of methods in the literature on handling missingness arising either from response components or covariate variables, but relatively little attention has been directed to addressing missingness in both response and covariate variables simultaneously. Important reasons for the sparsity of the research on this topic may be attributed to substantially increased complexity of modeling and computational difficulties.
In Chapter 2 and Chapter 3 of the thesis, I develop methods to handle incomplete longitudinal data using the pairwise likelihood formulation. The proposed methods can handle longitudinal data with missing observations in both response and covariate variables. A unified framework is invoked to accommodate various types of missing data patterns. The performance of the proposed methods is carefully assessed under a variety of circumstances. In particular, issues on efficiency and robustness are investigated. Longitudinal survey data from the National Population Health Study are analyzed with the proposed methods.
The other difficulty in longitudinal data is model selection. Incorporating a large number of irrelevant covariates to the model may result in computation, interpretation and prediction difficulties, thus selecting parsimonious models are typically desirable. In particular, the penalized likelihood method is commonly employed for this purpose. However, when we apply the penalized likelihood approach in longitudinal studies, it may involve high dimensional integrals which are computationally expensive.
We propose an alternative method using the composite likelihood formulation. Formulation of composite likelihood requires only a partial structure of the correlated data such as marginal or pairwise distributions. This strategy shows modeling tractability and computational cheapness in model selection. Therefore, in Chapter 4 of this thesis, I propose a composite likelihood approach with penalized function to handle the model selection issue. In practice, we often face the model selection problem not only from choosing proper covariates for regression predictor, but also from the component of random effects. Furthermore, the specification of random effects distribution could be crucial to maintain the validity of statistical inference. Thus, the discussion on selecting both covariates and random effects as well as misspecification of random effects are also included in Chapter 4.
Chapter 5 of this thesis mainly addresses the joint features of missingness and model selection. I propose a specific composite likelihood method to handle this issue. A typical advantage of the approach is that the inference procedure does not involve explicit missing process assumptions and nuisance parameters estimation.
|
7 |
Longitudinal Data Analysis with Composite Likelihood MethodsLi, Haocheng January 2012 (has links)
Longitudinal data arise commonly in many fields including public health studies and survey sampling. Valid inference methods for longitudinal data are of great importance in scientific researches. In longitudinal studies, data collection are often designed to follow all the interested information on individuals at scheduled times. The analysis in longitudinal studies usually focuses on how the data change over time and how they are associated with certain risk factors or covariates. Various statistical models and methods have been developed over the past few decades. However,
these methods could become invalid when data possess additional features.
First of all, incompleteness of data presents considerable complications to standard modeling and inference methods. Although we hope each individual completes all of the scheduled measurements without any absence, missing observations occur commonly in longitudinal studies. It has been documented that biased results could arise if such a feature is not properly accounted for in the analysis. There has been a large body of methods in the literature on handling missingness arising either from response components or covariate variables, but relatively little attention has been directed to addressing missingness in both response and covariate variables simultaneously. Important reasons for the sparsity of the research on this topic may be attributed to substantially increased complexity of modeling and computational difficulties.
In Chapter 2 and Chapter 3 of the thesis, I develop methods to handle incomplete longitudinal data using the pairwise likelihood formulation. The proposed methods can handle longitudinal data with missing observations in both response and covariate variables. A unified framework is invoked to accommodate various types of missing data patterns. The performance of the proposed methods is carefully assessed under a variety of circumstances. In particular, issues on efficiency and robustness are investigated. Longitudinal survey data from the National Population Health Study are analyzed with the proposed methods.
The other difficulty in longitudinal data is model selection. Incorporating a large number of irrelevant covariates to the model may result in computation, interpretation and prediction difficulties, thus selecting parsimonious models are typically desirable. In particular, the penalized likelihood method is commonly employed for this purpose. However, when we apply the penalized likelihood approach in longitudinal studies, it may involve high dimensional integrals which are computationally expensive.
We propose an alternative method using the composite likelihood formulation. Formulation of composite likelihood requires only a partial structure of the correlated data such as marginal or pairwise distributions. This strategy shows modeling tractability and computational cheapness in model selection. Therefore, in Chapter 4 of this thesis, I propose a composite likelihood approach with penalized function to handle the model selection issue. In practice, we often face the model selection problem not only from choosing proper covariates for regression predictor, but also from the component of random effects. Furthermore, the specification of random effects distribution could be crucial to maintain the validity of statistical inference. Thus, the discussion on selecting both covariates and random effects as well as misspecification of random effects are also included in Chapter 4.
Chapter 5 of this thesis mainly addresses the joint features of missingness and model selection. I propose a specific composite likelihood method to handle this issue. A typical advantage of the approach is that the inference procedure does not involve explicit missing process assumptions and nuisance parameters estimation.
|
8 |
On the use of the bispectrum to detect and model non-linearityBarnett, Adrian Gerard Unknown Date (has links)
Informally a discrete time series is a set of repeated and, normally, equally spaced observations from the same process over time. The statistical analysis of time series has two functions: to understand better the generating process underlying the time series, and to forecast future values. The first analytical methods developed were based upon linear series. A linear series can be represented as a linear function of its own past and current values and the past and current values of some noise process, which can be interpreted as the innovations to the system. A non-linear series has a generally more complex structure that depends upon non-linear interactions between its past and current values and the sequence of innovations. Existing linear statistical methods can only approximate non-linear series. As there is evidence to show that non-linear series are common in real life, two important problems are to detect and then to classify non-linearity. In moving from a linear to a non-linear structure the choice of possible models has moved from a countably infinite to an uncountably infinite set. Hence the need for methods that not only detect non-linearity, but classify the non-linear relationship between the past and current values and innovations. The third order moment is the expectation of the product of three series values lagged in time. The bispectrum is the double Fourier transform of the third order moment. Both statistics are useful tools for eliciting information on non-linear time series. There are concerns with the assumption of asymptotic independence between the values of the bispectrum estimate used by an existing test of non-linearity. We develop a method with a greater power than this existing method to detect non-linear series by using a model-based bootstrap. Further we show how patterns in the bispectrum are useful for classifying the locations of the non-linear interactions. To understand better tests of non-linearity and related inference, we investigate the variance of two estimates of the bispectrum. The two estimates are shown to have different inferential properties. One estimate is generally better able than the other to detect non-linearity and give information on the location of the non-linear interactions. The third order moment is statistically equivalent to the bispectrum. A particular estimate of the bispectrum is the double Fourier transform of all the estimated third order moment values in a specified region. When using the third order moment to test for non-linearity we can examine any subset of these values in the specified region. Hence an advantage to using the third order moment, instead of the bispectrum, when testing for non-linearity is a greater flexibility in the range of values selected. We show an improved test for non-linearity over the bispectrum-based test, using a reduced set of the third order moment and a phase scrambling-based bootstrap. Time series can often be observed in a multiple or repeated form, such as the exchange rate between a set of currencies. There is then interest in summarising the common features of the grouped series. An existing linear method based on the spectrum assumes that an observed series (within a population) can be described as a common population spectrum perturbed by an individual effect. The observational noise in the spectrum is modelled using the known asymptotic properties of the spectral estimate. By modelling (and then removing) the individual effects and noise, the method summarises the population linear characteristics through the spectrum. We modify and then extend this method to summarise the common features of the underlying non-linear generating process of a set of repeated time series using the bispectrum normalised by the spectrum.
|
9 |
Rigorous methods for the analysis, reporting and evaluation of ESM style dataCarter, Lesley-Anne January 2016 (has links)
Experience sampling methodology (ESM) is a real-time data capture method that can be used to monitor symptoms and behaviours as they occur during everyday life. With measures completed multiple times a day, over several days, this intensive longitudinal data collection method results in multilevel data with observations nested within days, nested within subjects. The aim of this thesis was to investigate the optimal use of multilevel models for ESM in the design, reporting and analysis of ESM data, and apply these models to a study in people with psychosis. A methodological systematic review was conducted to identify design, analysis and statistical reporting practices in current ESM studies. Seventy four studies from 2012 were reviewed, and together with the analysis of a motivating example, four significant areas of interest were identified: power and sample size, missing data, momentary variation and predicting momentary change. Appropriate multilevel methods were sought for each of these areas, and were evaluated in the three-level context of ESM.Missing data was found to be both underreported and rarely considered when choosing analysis methods in practice. This work has introduced a more detailed understanding of nonresponse in ESM studies and has discussed appropriate statistical methods in the presence of missing data. This thesis has extended two-level statistical methodology for data analysis to accommodate the three-level structure of ESM. Novel applications of time trends have been developed, were time can be measured at two separate levels. The suitability of predicting momentary change in ESM data has been questioned; it is argued that the first-difference and joint modelling methods that are claimed in the literature to remove bias possibly induce more in this context. Finally, Monte Carlo simulations were shown to be a flexible option for estimating empirical power under varying sample sizes at levels 3, 2 and 1, with recommendations made for conservative power estimates when a priori parameter estimates are unknown. In summary, this work demonstrates how multilevel models can be used to examine the rich data structure of ESM and fully utilize the variation in measures captured at all levels.
|
10 |
Multilevel models in human growth and development researchPan, Huiqi January 1995 (has links)
The analysis of change is an important issue in human growth and development. In longitudinal studies, growth patterns are often summarized by growth 'models' so that a small number of parameters, or the functions of them can be used to make group comparisons or to be related to other measurements. To analyse complete and balanced data, growth curves can be modelled using multivariate analysis of variance with an unstructured variance-covariance matrix; for incomplete and unbalanced data, models such as the two-stage model of Laird and Ware (1982) or the multilevel models of Goldstein (1987) are necessary. The use of multilevel models for describing growth is recognized as an important technique. It is an efficient procedure for incorporating growth models, either linear or nonlinear, into a population study. Up to now there is little literature concerning growth models over wide age ranges using multilevel models. The purpose of this study is to explore suitable multilevel models of growth over a wide age range. Extended splines are proposed, which extend conventional splines using the '+' function and by including logarithmic or negative power terms. The work has been focused on modelling human growth in length, particularly, height and head circumference as they are interesting and important measures of growth. The investigation of polynomials, conventional splines and extended splines on data from the Edinburgh Longitudinal Study shows that the extended splines are better than polynomials and conventional splines for this purpose. It also shows that extended splines are, in fact, piecewise fractional polynomials and describe data better than a single segment of a fractional polynomial. The extended splines are useful, flexible, and easily incorporated in multilevel models for studying populations and for the estimation and comparison of parameters.
|
Page generated in 0.0636 seconds