Spelling suggestions: "subject:"missing at random"" "subject:"kissing at random""
1 |
Efficient Estimation in a Regression Model with Missing ResponsesCrawford, Scott 2012 August 1900 (has links)
This article examines methods to efficiently estimate the mean response in a linear model with an unknown error distribution under the assumption that the responses are
missing at random. We show how the asymptotic variance is affected by the estimator of the regression parameter and by the imputation method. To estimate the regression parameter the Ordinary Least Squares method is efficient only if the error distribution happens to be normal. If the errors are not normal, then we propose a One Step Improvement estimator or a Maximum Empirical Likelihood estimator to estimate the parameter efficiently.
In order to investigate the impact that imputation has on estimation of the mean response, we compare the Listwise Deletion method and the Propensity Score method (which do not use imputation at all), and two imputation methods. We show that Listwise Deletion and the Propensity Score method are inefficient. Partial Imputation, where only the missing responses are imputed, is compared to Full Imputation, where both missing and non-missing responses are imputed. Our results show that in general Full Imputation is better than Partial Imputation. However, when the regression parameter is estimated very poorly, then Partial Imputation will outperform Full Imputation. The efficient estimator for the mean response is the Full Imputation estimator that uses an efficient estimator of the parameter.
|
2 |
Bayesian Methodology for Missing Data, Model Selection and Hierarchical Spatial Models with Application to Ecological DataBoone, Edward L. 14 February 2003 (has links)
Ecological data is often fraught with many problems such as Missing Data and Spatial Correlation. In this dissertation we use a data set collected by the Ohio EPA as motivation for studying techniques to address these problems. The data set is concerned with the benthic health of Ohio's waterways. A new method for incorporating covariate structure and missing data mechanisms into missing data analysis is considered. This method allows us to detect relationships other popular methods do not allow. We then further extend this method into model selection. In the special case where the unobserved covariates are assumed normally distributed we use the Bayesian Model Averaging method to average the models, select the highest probability model and do variable assessment. Accuracy in calculating the posterior model probabilities using the Laplace approximation and an approximation based on the Bayesian Information Criterion (BIC) are explored. It is shown that the Laplace approximation is superior to the BIC based approximation using simulation. Finally, Hierarchical Spatial Linear Models are considered for the data and we show how to combine analysis which have spatial correlation within and between clusters. / Ph. D.
|
3 |
SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITYZheng, Xiyu 01 January 2016 (has links)
Abstract
In a two-level hierarchical linear model(HLM2), the outcome as well as covariates may have missing values at any of the levels. One way to analyze all available data in the model is to estimate a multivariate normal joint distribution of variables, including the outcome, subject to missingness conditional on covariates completely observed by maximum likelihood(ML); draw multiple imputation (MI) of missing values given the estimated joint model; and analyze the hierarchical model given the MI [1,2]. The assumption is data missing at random (MAR). While this method yields efficient estimation of the hierarchical model, it often estimates the model given discrete missing data that is handled under multivariate normality. In this thesis, we evaluate how robust it is to estimate a hierarchical linear model given discrete missing data by the method. We simulate incompletely observed data from a series of hierarchical linear models given discrete covariates MAR, estimate the models by the method, and assess the sensitivity of handling discrete missing data under the multivariate normal joint distribution by computing bias, root mean squared error, standard error, and coverage probability in the estimated hierarchical linear models via a series of simulation studies. We want to achieve the following aim: Evaluate the performance of the method handling binary covariates MAR. We let the missing patterns of level-1 and -2 binary covariates depend on completely observed variables and assess how the method handles binary missing data given different values of success probabilities and missing rates.
Based on the simulation results, the missing data analysis is robust under certain parameter settings. Efficient analysis performs very well for estimation of level-1 fixed and random effects across varying success probabilities and missing rates. MAR estimation of level-2 binary covariate is not well estimated when the missing rate in level-2 binary covariate is greater than 10%.
The rest of the thesis is organized as follows: Section 1 introduces the background information including conventional methods for hierarchical missing data analysis, different missing data mechanisms, and the innovation and significance of this study. Section 2 explains the efficient missing data method. Section 3 represents the sensitivity analysis of the missing data method and explain how we carry out the simulation study using SAS, software package HLM7, and R. Section 4 illustrates the results and useful recommendations for researchers who want to use the missing data method for binary covariates MAR in HLM2. Section 5 presents an illustrative analysis National Growth of Health Study (NGHS) by the missing data method. The thesis ends with a list of useful references that will guide the future study and simulation codes we used.
|
4 |
Methods for handling missing data in cohort studies where outcomes are truncated by deathWen, Lan January 2018 (has links)
This dissertation addresses problems found in observational cohort studies where the repeated outcomes of interest are truncated by both death and by dropout. In particular, we consider methods that make inference for the population of survivors at each time point, otherwise known as 'partly conditional inference'. Partly conditional inference distinguishes between the reasons for missingness; failure to make this distinction will cause inference to be based not only on pre-death outcomes which exist but also on post-death outcomes which fundamentally do not exist. Such inference is called 'immortal cohort inference'. Investigations of health and cognitive outcomes in two studies - the 'Origins of Variance in the Old Old' and the 'Health and Retirement Study' - are conducted. Analysis of these studies is complicated by outcomes of interest being missing because of death and dropout. We show, first, that linear mixed models and joint models (that model both the outcome and survival processes) produce immortal cohort inference. This makes the parameters in the longitudinal (sub-)model difficult to interpret. Second, a thorough comparison of well-known methods used to handle missing outcomes - inverse probability weighting, multiple imputation and linear increments - is made, focusing particularly on the setting where outcomes are missing due to both dropout and death. We show that when the dropout models are correctly specified for inverse probability weighting, and the imputation models are correctly specified for multiple imputation or linear increments, then the assumptions of multiple imputation and linear increments are the same as those of inverse probability weighting only if the time of death is included in the dropout and imputation models. Otherwise they may not be. Simulation studies show that each of these methods gives negligibly biased estimates of the partly conditional mean when its assumptions are met, but potentially biased estimates if its assumptions are not met. In addition, we develop new augmented inverse probability weighted estimating equations for making partly conditional inference, which offer double protection against model misspecification. That is, as long as one of the dropout and imputation models is correctly specified, the partly conditional inference is valid. Third, we describe methods that can be used to make partly conditional inference for non-ignorable missing data. Both monotone and non-monotone missing data are considered. We propose three methods that use a tilt function to relate the distribution of an outcome at visit j among those who were last observed at some time before j to those who were observed at visit j. Sensitivity analyses to departures from ignorable missingness assumptions are conducted on simulations and on real datasets. The three methods are: i) an inverse probability weighted method that up-weights observed subjects to represent subjects who are still alive but are not observed; ii) an imputation method that replaces missing outcomes of subjects who are alive with their conditional mean outcomes given past observed data; and iii) a new augmented inverse probability method that combines the previous two methods and is doubly-robust against model misspecification.
|
5 |
Generalized score tests for missing covariate dataJin, Lei 15 May 2009 (has links)
In this dissertation, the generalized score tests based on weighted estimating equations
are proposed for missing covariate data. Their properties, including the effects
of nuisance functions on the forms of the test statistics and efficiency of the tests,
are investigated. Different versions of the test statistic are properly defined for various
parametric and semiparametric settings. Their asymptotic distributions are also
derived. It is shown that when models for the nuisance functions are correct, appropriate
test statistics can be obtained via plugging the estimates of the nuisance
functions into the appropriate test statistic for the case that the nuisance functions
are known. Furthermore, the optimal test is obtained using the relative efficiency
measure. As an application of the proposed tests, a formal model validation procedure
is developed for generalized linear models in the presence of missing covariates.
The asymptotic distribution of the data driven methods is provided. A simulation
study in both linear and logistic regressions illustrates the applicability and the finite
sample performance of the methodology. Our methods are also employed to analyze
a coronary artery disease diagnostic dataset.
|
6 |
Analysis of Longitudinal Data with Missing Responses Adjusted by Inverse Probability WeightsJankovic, Dina 11 July 2018 (has links)
We propose a new method for analyzing longitudinal data which contain responses
that are missing at random. This method consists in solving the generalized estimating
equation (GEE) of [7] in which the incomplete responses are replaced by values
adjusted using the inverse probability weights proposed in [14]. We show that the
root estimator is consistent and asymptotically normal, essentially under some conditions on the marginal distribution and the surrogate correlation matrix as those
presented in [12] in the case of complete data, and under minimal assumptions on
the missingness probabilities. This method is applied to a real-life dataset taken from
[10], which examines the incidence of respiratory disease in a sample of 250 pre-school age Indonesian children which were examined every 3 months for 18 months, using as covariates the age, gender, and vitamin A deficiency.
|
7 |
Sufficient Dimension Reduction with Missing DataXIA, QI January 2017 (has links)
Existing sufficient dimension reduction (SDR) methods typically consider cases with no missing data. The dissertation aims to propose methods to facilitate the SDR methods when the response can be missing. The first part of the dissertation focuses on the seminal sliced inverse regression (SIR) approach proposed by Li (1991). We show that missing responses generally affect the validity of the inverse regressions under the mechanism of missing at random. We then propose a simple and effective adjustment with inverse probability weighting that guarantees the validity of SIR. Furthermore, a marginal coordinate test is introduced for this adjusted estimator. The proposed method share the simplicity of SIR and requires the linear conditional mean assumption. The second part of the dissertation proposes two new estimating equation procedures: the complete case estimating equation approach and the inverse probability weighted estimating equation approach. The two approaches are applied to a family of dimension reduction methods, which includes ordinary least squares, principal Hessian directions, and SIR. By solving the estimating equations, the two approaches are able to avoid the common assumptions in the SDR literature, the linear conditional mean assumption, and the constant conditional variance assumption. For all the aforementioned methods, the asymptotic properties are established, and their superb finite sample performances are demonstrated through extensive numerical studies as well as a real data analysis. In addition, existing estimators of the central mean space have uneven performances across different types of link functions. To address this limitation, a new hybrid SDR estimator is proposed that successfully recovers the central mean space for a wide range of link functions. Based on the new hybrid estimator, we further study the order determination procedure and the marginal coordinate test. The superior performance of the hybrid estimator over existing methods is demonstrated in simulation studies. Note that the proposed procedures dealing with the missing response at random can be simply adapted to this hybrid method. / Statistics
|
8 |
Multiply Robust Weighted Generalized Estimating Equations for Incomplete Longitudinal Binary Data Using Empirical Likelihood / 欠測を含む二値の経時データにおける経験尤度法を用いた多重頑健重み付き一般化推定方程式Komazaki, Hiroshi 25 March 2024 (has links)
京都大学 / 新制・論文博士 / 博士(社会健康医学) / 乙第13612号 / 論社医博第18号 / 新制||社医||13(附属図書館) / 京都大学大学院医学研究科社会健康医学系専攻 / (主査)教授 森田 智視, 教授 古川 壽亮, 教授 今中 雄一 / 学位規則第4条第2項該当 / Doctor of Public Health / Kyoto University / DFAM
|
9 |
Performance of Imputation Algorithms on Artificially Produced Missing at Random DataOketch, Tobias O 01 May 2017 (has links)
Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased.
However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software R. We assess how these MI methods perform with different percentages of missing data. A multiple regression model was fit on the imputed data sets and the complete data set. Statistical comparisons of the regression coefficients are made between the models using the imputed data and the complete data.
|
10 |
Multiple Imputation for Two-Level Hierarchical Models with Categorical Variables and Missing at Random DataJanuary 2016 (has links)
abstract: Accurate data analysis and interpretation of results may be influenced by many potential factors. The factors of interest in the current work are the chosen analysis model(s), the presence of missing data, and the type(s) of data collected. If analysis models are used which a) do not accurately capture the structure of relationships in the data such as clustered/hierarchical data, b) do not allow or control for missing values present in the data, or c) do not accurately compensate for different data types such as categorical data, then the assumptions associated with the model have not been met and the results of the analysis may be inaccurate. In the presence of clustered/nested data, hierarchical linear modeling or multilevel modeling (MLM; Raudenbush & Bryk, 2002) has the ability to predict outcomes for each level of analysis and across multiple levels (accounting for relationships between levels) providing a significant advantage over single-level analyses. When multilevel data contain missingness, multilevel multiple imputation (MLMI) techniques may be used to model both the missingness and the clustered nature of the data. With categorical multilevel data with missingness, categorical MLMI must be used. Two such routines for MLMI with continuous and categorical data were explored with missing at random (MAR) data: a formal Bayesian imputation and analysis routine in JAGS (R/JAGS) and a common MLM procedure of imputation via Bayesian estimation in BLImP with frequentist analysis of the multilevel model in Mplus (BLImP/Mplus). Manipulated variables included interclass correlations, number of clusters, and the rate of missingness. Results showed that with continuous data, R/JAGS returned more accurate parameter estimates than BLImP/Mplus for almost all parameters of interest across levels of the manipulated variables. Both R/JAGS and BLImP/Mplus encountered convergence issues and returned inaccurate parameter estimates when imputing and analyzing dichotomous data. Follow-up studies showed that JAGS and BLImP returned similar imputed datasets but the choice of analysis software for MLM impacted the recovery of accurate parameter estimates. Implications of these findings and recommendations for further research will be discussed. / Dissertation/Thesis / Doctoral Dissertation Educational Psychology 2016
|
Page generated in 0.1152 seconds