Global ETD Search

1	Statistical Methods for Multi-State Analysis of Incomplete Longitudinal Data Chen, Baojiang January 2008 (has links) Analyses of longitudinal categorical data are typically based on semiparametric models in which covariate effects are expressed on marginal probabilities and estimation is carried out based on generalized estimating equations (GEE). Methods based on GEE are motivated in part by the lack of tractable models for clustered categorical data. However such marginal methods may not yield fully efficient estimates, nor consistent estimates when missing data are present. In the first part of the thesis I develop a Markov model for the analysis of longitudinal categorical data which facilitates modeling marginal and conditional structures. A likelihood formulation is employed for inference, so the resulting estimators enjoy properties such as optimal efficiency and consistency, and remain consistent when data are missing at random. Simulation studies demonstrate that the proposed method performs well under a variety of situations. Application to data from a smoking prevention study illustrates the utility of the model and interpretation of covariate effects. Incomplete data often arise in many areas of research in practice. This phenomenon is common in longitudinal data on disease history of subjects. Progressive models provide a convenient framework for characterizing disease processes which arise, for example, when the state represents the degree of the irreversible damage incurred by the subject. Problems arise if the mechanism leading to the missing data is related to the response process. A naive analysis might lead to biased results and invalid inferences. The second part of this thesis begins with an investigation of progressive multi-state models for longitudinal studies with incomplete observations. Maximum likelihood estimation is carried out based on an EM algorithm, and variance estimation is provided using Louis method. In general, the maximum likelihood estimates are valid when the missing data mechanism is missing completely at random or missing at random. Here we provide likelihood based method in that the parameters are identifiable no matter what the missing data mechanism. Simulation studies demonstrate that the proposed method works well under a variety of situations. In practice, we often face data with missing values in both the response and the covariates, and sometimes there is some association between the missingness of the response and the covariate. The proper analysis of this type of data requires taking this correlation into consideration. The impact of attrition in longitudinal studies depends on the correlation between the missing response and missing covariate. Ignoring such correlation can bias the statistical inference. We have studied the proper method that incorporates the association between the missingness of the response and missing covariate through the use of inverse probability weighted generalized estimating equations. The simulation illustrates that the proposed method yields a consistent estimator, while the method that ignores the association yields an inconsistent estimator. Many analyses for longitudinal incomplete data focus on studying the impact of covariates on the mean responses. However, little attention has been directed to address the impact of missing covariates on the association parameters in clustered longitudinal studies. The last part of this thesis mainly addresses this problem. Weighted first and second order estimating equations are constructed to obtain consistent estimates of mean and association parameters. Statistics (Biostatistics)
2	Statistical Methods for Multi-State Analysis of Incomplete Longitudinal Data Chen, Baojiang January 2008 (has links) Analyses of longitudinal categorical data are typically based on semiparametric models in which covariate effects are expressed on marginal probabilities and estimation is carried out based on generalized estimating equations (GEE). Methods based on GEE are motivated in part by the lack of tractable models for clustered categorical data. However such marginal methods may not yield fully efficient estimates, nor consistent estimates when missing data are present. In the first part of the thesis I develop a Markov model for the analysis of longitudinal categorical data which facilitates modeling marginal and conditional structures. A likelihood formulation is employed for inference, so the resulting estimators enjoy properties such as optimal efficiency and consistency, and remain consistent when data are missing at random. Simulation studies demonstrate that the proposed method performs well under a variety of situations. Application to data from a smoking prevention study illustrates the utility of the model and interpretation of covariate effects. Incomplete data often arise in many areas of research in practice. This phenomenon is common in longitudinal data on disease history of subjects. Progressive models provide a convenient framework for characterizing disease processes which arise, for example, when the state represents the degree of the irreversible damage incurred by the subject. Problems arise if the mechanism leading to the missing data is related to the response process. A naive analysis might lead to biased results and invalid inferences. The second part of this thesis begins with an investigation of progressive multi-state models for longitudinal studies with incomplete observations. Maximum likelihood estimation is carried out based on an EM algorithm, and variance estimation is provided using Louis method. In general, the maximum likelihood estimates are valid when the missing data mechanism is missing completely at random or missing at random. Here we provide likelihood based method in that the parameters are identifiable no matter what the missing data mechanism. Simulation studies demonstrate that the proposed method works well under a variety of situations. In practice, we often face data with missing values in both the response and the covariates, and sometimes there is some association between the missingness of the response and the covariate. The proper analysis of this type of data requires taking this correlation into consideration. The impact of attrition in longitudinal studies depends on the correlation between the missing response and missing covariate. Ignoring such correlation can bias the statistical inference. We have studied the proper method that incorporates the association between the missingness of the response and missing covariate through the use of inverse probability weighted generalized estimating equations. The simulation illustrates that the proposed method yields a consistent estimator, while the method that ignores the association yields an inconsistent estimator. Many analyses for longitudinal incomplete data focus on studying the impact of covariates on the mean responses. However, little attention has been directed to address the impact of missing covariates on the association parameters in clustered longitudinal studies. The last part of this thesis mainly addresses this problem. Weighted first and second order estimating equations are constructed to obtain consistent estimates of mean and association parameters. Statistics (Biostatistics)
3	An Investigation of Methods for Missing Data in Hierarchical Models for Discrete Data Ahmed, Muhamad Rashid January 2011 (has links) Hierarchical models are applicable to modeling data from complex surveys or longitudinal data when a clustered or multistage sample design is employed. The focus of this thesis is to investigate inference for discrete hierarchical models in the presence of missing data. This thesis is divided into two parts: in the first part, methods are developed to analyze the discrete and ordinal response data from hierarchical longitudinal studies. Several approximation methods have been developed to estimate the parameters for the fixed and random effects in the context of generalized linear models. The thesis focuses on two likelihood-based estimation procedures, the pseudo likelihood (PL) method and the adaptive Gaussian quadrature (AGQ) method. The simulation results suggest that AGQ is preferable to PL when the goal is to estimate the variance of the random intercept in a complex hierarchical model. AGQ provides smaller biases for the estimate of the variance of the random intercept. Furthermore, it permits greater flexibility in accommodating user-defined likelihood functions. In the second part, simulated data are used to develop a method for modeling longitudinal binary data when non-response depends on unobserved responses. This simulation study modeled three-level discrete hierarchical data with 30% and 40% missing data using a missing not at random (MNAR) missing-data mechanism. It focused on a monotone missing data-pattern. The imputation methods used in this thesis are: complete case analysis (CCA), last observation carried forward (LOCF), available case missing value (ACMVPM) restriction, complete case missing value (CCMVPM) restriction, neighboring case missing value (NCMVPM) restriction, selection model with predictive mean matching method (SMPM), and Bayesian pattern mixture model. All three restriction methods and the selection model used the predictive mean matching method to impute missing data. Multiple imputation is used to impute the missing values. These m imputed values for each missing data produce m complete datasets. Each dataset is analyzed and the parameters are estimated. The results from the m analyses are then combined using the method of Rubin(1987), and inferences are made from these results. Our results suggest that restriction methods provide results that are superior to those of other methods. The selection model provides smaller biases than the LOCF methods but as the proportion of missing data increases the selection model is not better than LOCF. Among the three restriction methods the ACMVPM method performs best. The proposed method provides an alternative to standard selection and pattern-mixture modeling frameworks when data are not missing at random. This method is applied to data from the third Waterloo Smoking Project, a seven-year smoking prevention study having substantial non-response due to loss-to-follow-up. Missing Data multilevel Model Statistics (Biostatistics)
4	Longitudinal Data Analysis with Composite Likelihood Methods Li, Haocheng January 2012 (has links) Longitudinal data arise commonly in many fields including public health studies and survey sampling. Valid inference methods for longitudinal data are of great importance in scientific researches. In longitudinal studies, data collection are often designed to follow all the interested information on individuals at scheduled times. The analysis in longitudinal studies usually focuses on how the data change over time and how they are associated with certain risk factors or covariates. Various statistical models and methods have been developed over the past few decades. However, these methods could become invalid when data possess additional features. First of all, incompleteness of data presents considerable complications to standard modeling and inference methods. Although we hope each individual completes all of the scheduled measurements without any absence, missing observations occur commonly in longitudinal studies. It has been documented that biased results could arise if such a feature is not properly accounted for in the analysis. There has been a large body of methods in the literature on handling missingness arising either from response components or covariate variables, but relatively little attention has been directed to addressing missingness in both response and covariate variables simultaneously. Important reasons for the sparsity of the research on this topic may be attributed to substantially increased complexity of modeling and computational difficulties. In Chapter 2 and Chapter 3 of the thesis, I develop methods to handle incomplete longitudinal data using the pairwise likelihood formulation. The proposed methods can handle longitudinal data with missing observations in both response and covariate variables. A unified framework is invoked to accommodate various types of missing data patterns. The performance of the proposed methods is carefully assessed under a variety of circumstances. In particular, issues on efficiency and robustness are investigated. Longitudinal survey data from the National Population Health Study are analyzed with the proposed methods. The other difficulty in longitudinal data is model selection. Incorporating a large number of irrelevant covariates to the model may result in computation, interpretation and prediction difficulties, thus selecting parsimonious models are typically desirable. In particular, the penalized likelihood method is commonly employed for this purpose. However, when we apply the penalized likelihood approach in longitudinal studies, it may involve high dimensional integrals which are computationally expensive. We propose an alternative method using the composite likelihood formulation. Formulation of composite likelihood requires only a partial structure of the correlated data such as marginal or pairwise distributions. This strategy shows modeling tractability and computational cheapness in model selection. Therefore, in Chapter 4 of this thesis, I propose a composite likelihood approach with penalized function to handle the model selection issue. In practice, we often face the model selection problem not only from choosing proper covariates for regression predictor, but also from the component of random effects. Furthermore, the specification of random effects distribution could be crucial to maintain the validity of statistical inference. Thus, the discussion on selecting both covariates and random effects as well as misspecification of random effects are also included in Chapter 4. Chapter 5 of this thesis mainly addresses the joint features of missingness and model selection. I propose a specific composite likelihood method to handle this issue. A typical advantage of the approach is that the inference procedure does not involve explicit missing process assumptions and nuisance parameters estimation. Longitudinal Data Composite Likelihood Statistics (Biostatistics)
5	An Investigation of Methods for Missing Data in Hierarchical Models for Discrete Data Ahmed, Muhamad Rashid January 2011 (has links) Hierarchical models are applicable to modeling data from complex surveys or longitudinal data when a clustered or multistage sample design is employed. The focus of this thesis is to investigate inference for discrete hierarchical models in the presence of missing data. This thesis is divided into two parts: in the first part, methods are developed to analyze the discrete and ordinal response data from hierarchical longitudinal studies. Several approximation methods have been developed to estimate the parameters for the fixed and random effects in the context of generalized linear models. The thesis focuses on two likelihood-based estimation procedures, the pseudo likelihood (PL) method and the adaptive Gaussian quadrature (AGQ) method. The simulation results suggest that AGQ is preferable to PL when the goal is to estimate the variance of the random intercept in a complex hierarchical model. AGQ provides smaller biases for the estimate of the variance of the random intercept. Furthermore, it permits greater flexibility in accommodating user-defined likelihood functions. In the second part, simulated data are used to develop a method for modeling longitudinal binary data when non-response depends on unobserved responses. This simulation study modeled three-level discrete hierarchical data with 30% and 40% missing data using a missing not at random (MNAR) missing-data mechanism. It focused on a monotone missing data-pattern. The imputation methods used in this thesis are: complete case analysis (CCA), last observation carried forward (LOCF), available case missing value (ACMVPM) restriction, complete case missing value (CCMVPM) restriction, neighboring case missing value (NCMVPM) restriction, selection model with predictive mean matching method (SMPM), and Bayesian pattern mixture model. All three restriction methods and the selection model used the predictive mean matching method to impute missing data. Multiple imputation is used to impute the missing values. These m imputed values for each missing data produce m complete datasets. Each dataset is analyzed and the parameters are estimated. The results from the m analyses are then combined using the method of Rubin(1987), and inferences are made from these results. Our results suggest that restriction methods provide results that are superior to those of other methods. The selection model provides smaller biases than the LOCF methods but as the proportion of missing data increases the selection model is not better than LOCF. Among the three restriction methods the ACMVPM method performs best. The proposed method provides an alternative to standard selection and pattern-mixture modeling frameworks when data are not missing at random. This method is applied to data from the third Waterloo Smoking Project, a seven-year smoking prevention study having substantial non-response due to loss-to-follow-up. Missing Data multilevel Model Statistics (Biostatistics)
6	Longitudinal Data Analysis with Composite Likelihood Methods Li, Haocheng January 2012 (has links) Longitudinal data arise commonly in many fields including public health studies and survey sampling. Valid inference methods for longitudinal data are of great importance in scientific researches. In longitudinal studies, data collection are often designed to follow all the interested information on individuals at scheduled times. The analysis in longitudinal studies usually focuses on how the data change over time and how they are associated with certain risk factors or covariates. Various statistical models and methods have been developed over the past few decades. However, these methods could become invalid when data possess additional features. First of all, incompleteness of data presents considerable complications to standard modeling and inference methods. Although we hope each individual completes all of the scheduled measurements without any absence, missing observations occur commonly in longitudinal studies. It has been documented that biased results could arise if such a feature is not properly accounted for in the analysis. There has been a large body of methods in the literature on handling missingness arising either from response components or covariate variables, but relatively little attention has been directed to addressing missingness in both response and covariate variables simultaneously. Important reasons for the sparsity of the research on this topic may be attributed to substantially increased complexity of modeling and computational difficulties. In Chapter 2 and Chapter 3 of the thesis, I develop methods to handle incomplete longitudinal data using the pairwise likelihood formulation. The proposed methods can handle longitudinal data with missing observations in both response and covariate variables. A unified framework is invoked to accommodate various types of missing data patterns. The performance of the proposed methods is carefully assessed under a variety of circumstances. In particular, issues on efficiency and robustness are investigated. Longitudinal survey data from the National Population Health Study are analyzed with the proposed methods. The other difficulty in longitudinal data is model selection. Incorporating a large number of irrelevant covariates to the model may result in computation, interpretation and prediction difficulties, thus selecting parsimonious models are typically desirable. In particular, the penalized likelihood method is commonly employed for this purpose. However, when we apply the penalized likelihood approach in longitudinal studies, it may involve high dimensional integrals which are computationally expensive. We propose an alternative method using the composite likelihood formulation. Formulation of composite likelihood requires only a partial structure of the correlated data such as marginal or pairwise distributions. This strategy shows modeling tractability and computational cheapness in model selection. Therefore, in Chapter 4 of this thesis, I propose a composite likelihood approach with penalized function to handle the model selection issue. In practice, we often face the model selection problem not only from choosing proper covariates for regression predictor, but also from the component of random effects. Furthermore, the specification of random effects distribution could be crucial to maintain the validity of statistical inference. Thus, the discussion on selecting both covariates and random effects as well as misspecification of random effects are also included in Chapter 4. Chapter 5 of this thesis mainly addresses the joint features of missingness and model selection. I propose a specific composite likelihood method to handle this issue. A typical advantage of the approach is that the inference procedure does not involve explicit missing process assumptions and nuisance parameters estimation. Longitudinal Data Composite Likelihood Statistics (Biostatistics)
7	Copula Models for Multi-type Life History Processes Diao, Liqun January 2013 (has links) This thesis considers statistical issues in the analysis of data in the studies of chronic diseases which involve modeling dependencies between life history processes using copula functions. Many disease processes feature recurrent events which represent events arising from an underlying chronic condition; these are often modeled as point processes. In addition, however, there often exists a random variable which is realized upon the occurrence of each event, which is called a mark of the point process. When considered together, such processes are called marked point processes. A novel copula model for the marked point process is described here which uses copula functions to govern the association between marks and event times. Specifically, a copula function is used to link each mark with the next event time following the realization of that mark to reflect the pattern in the data wherein larger marks are often followed by longer time to the next event. The extent of organ damage in an individual can often be characterized by ordered states, and interest frequently lies in modeling the rates at which individuals progress through these states. Risk factors can be studied and the effect of therapeutic interventions can be assessed based on relevant multistate models. When chronic diseases affect multiple organ systems, joint modeling of progression in several organ systems is also important. In contrast to common intensity-based or frailty-based approaches to modelling, this thesis considers a copula-based framework for modeling and analysis. Through decomposition of the density and by use of conditional independence assumptions, an appealing joint model is obtained by assuming that the joint survival function of absorption transition times is governed by a multivariate copula function. Different approaches to estimation and inference are discussed and compared including composite likelihood and two-stage estimation methods. Special attention is paid to the case of interval-censored data arising from intermittent assessment. Attention is also directed to use of copula models for more general scenarios with a focus on semiparametric two-stage estimation procedures. In this approach nonparametric or semiparametric estimates of the marginal survivor functions are obtained in the first stage and estimates of the association parameters are obtained in the second stage. Bivariate failure time models are considered for data under right-censoring and current status observation schemes, and right-censored multistate models. A new expression for the asymptotic variance of the second-stage estimator for the association parameter along with a way of estimating this for finite samples are presented under these models and observation schemes. Copula Lifetime data Composite likelihood Multistage estimation procedure Statistics (Biostatistics)
8	Flexible Bent-Cable Models for Mixture Longitudinal Data Khan, Shahedul Ahsan January 2010 (has links) Data showing a trend that characterizes a change due to a shock to the system are a type of changepoint data, and may be referred to as shock-through data. As a result of the shock, this type of data may exhibit one of two types of transitions: gradual or abrupt. Although shock-through data are of particular interest in many areas of study such as biological, medical, health and environmental applications, previous research has shown that statistical inference from modeling the trend is challenging in the presence of discontinuous derivatives. Further complications arise when we have (1) longitudinal data, and/or (2) samples which come from two potential populations: one with a gradual transition, and the other abrupt. Bent-cable regression is an appealing statistical tool to model shock-through data due to the model's flexibility while being parsimonious with greatly interpretable regression coefficients. It comprises two linear segments (incoming and outgoing) joined by a quadratic bend. In this thesis, we develop extended bent-cable methodology for longitudinal data in a Bayesian framework to account for both types of transitions; inference for the transition type is driven by the data rather than a presumption about the nature of the transition. We describe explicitly the computationally intensive Bayesian implementation of the methodology. Moreover, we describe modeling only one type of transition, which is a special case of this more general model. We demonstrate our methodology by a simulation study, and with two applications: (1) assessing the transition to early hypothermia in a rat model, and (2) understanding CFC-11 trends monitored globally. Our methodology can be further extended at the cost of both theoretical and computational extensiveness. For example, we assume that the two populations mentioned above share common intercept and slopes in the incoming and outgoing phases, an assumption that can be relaxed for instances when intercept and slope parameters could behave differently between populations. In addition to this, we discuss several other directions for future research out of the proposed methodology presented in this thesis. Longitudinal Bent-Cable Regression Longitudinal Changepoint Modeling Statistics (Biostatistics)
9	Flexible Bent-Cable Models for Mixture Longitudinal Data Khan, Shahedul Ahsan January 2010 (has links) Data showing a trend that characterizes a change due to a shock to the system are a type of changepoint data, and may be referred to as shock-through data. As a result of the shock, this type of data may exhibit one of two types of transitions: gradual or abrupt. Although shock-through data are of particular interest in many areas of study such as biological, medical, health and environmental applications, previous research has shown that statistical inference from modeling the trend is challenging in the presence of discontinuous derivatives. Further complications arise when we have (1) longitudinal data, and/or (2) samples which come from two potential populations: one with a gradual transition, and the other abrupt. Bent-cable regression is an appealing statistical tool to model shock-through data due to the model's flexibility while being parsimonious with greatly interpretable regression coefficients. It comprises two linear segments (incoming and outgoing) joined by a quadratic bend. In this thesis, we develop extended bent-cable methodology for longitudinal data in a Bayesian framework to account for both types of transitions; inference for the transition type is driven by the data rather than a presumption about the nature of the transition. We describe explicitly the computationally intensive Bayesian implementation of the methodology. Moreover, we describe modeling only one type of transition, which is a special case of this more general model. We demonstrate our methodology by a simulation study, and with two applications: (1) assessing the transition to early hypothermia in a rat model, and (2) understanding CFC-11 trends monitored globally. Our methodology can be further extended at the cost of both theoretical and computational extensiveness. For example, we assume that the two populations mentioned above share common intercept and slopes in the incoming and outgoing phases, an assumption that can be relaxed for instances when intercept and slope parameters could behave differently between populations. In addition to this, we discuss several other directions for future research out of the proposed methodology presented in this thesis. Longitudinal Bent-Cable Regression Longitudinal Changepoint Modeling Statistics (Biostatistics)
10	Copula Models for Multi-type Life History Processes Diao, Liqun January 2013 (has links) This thesis considers statistical issues in the analysis of data in the studies of chronic diseases which involve modeling dependencies between life history processes using copula functions. Many disease processes feature recurrent events which represent events arising from an underlying chronic condition; these are often modeled as point processes. In addition, however, there often exists a random variable which is realized upon the occurrence of each event, which is called a mark of the point process. When considered together, such processes are called marked point processes. A novel copula model for the marked point process is described here which uses copula functions to govern the association between marks and event times. Specifically, a copula function is used to link each mark with the next event time following the realization of that mark to reflect the pattern in the data wherein larger marks are often followed by longer time to the next event. The extent of organ damage in an individual can often be characterized by ordered states, and interest frequently lies in modeling the rates at which individuals progress through these states. Risk factors can be studied and the effect of therapeutic interventions can be assessed based on relevant multistate models. When chronic diseases affect multiple organ systems, joint modeling of progression in several organ systems is also important. In contrast to common intensity-based or frailty-based approaches to modelling, this thesis considers a copula-based framework for modeling and analysis. Through decomposition of the density and by use of conditional independence assumptions, an appealing joint model is obtained by assuming that the joint survival function of absorption transition times is governed by a multivariate copula function. Different approaches to estimation and inference are discussed and compared including composite likelihood and two-stage estimation methods. Special attention is paid to the case of interval-censored data arising from intermittent assessment. Attention is also directed to use of copula models for more general scenarios with a focus on semiparametric two-stage estimation procedures. In this approach nonparametric or semiparametric estimates of the marginal survivor functions are obtained in the first stage and estimates of the association parameters are obtained in the second stage. Bivariate failure time models are considered for data under right-censoring and current status observation schemes, and right-censored multistate models. A new expression for the asymptotic variance of the second-stage estimator for the association parameter along with a way of estimating this for finite samples are presented under these models and observation schemes. Copula Lifetime data Composite likelihood Multistage estimation procedure Statistics (Biostatistics)

Search results