11 |
Comparison of Imputation Methods on Estimating Regression Equation in MNAR MechanismPan, Wensi January 2012 (has links)
In this article, we propose an overview of missing data problem, introduce three missing data mechanisms and study general solutions to them when estimating a linear regression equation. When we have partly missing data, there are two common ways to solve this problem. One way is to ignore those records with missing values. Another method is to impute those observations being missed. Imputation methods arepreferred since they provide full datasets. We observed that there is not a general imputation solution in missing not at random (MNAR) mechanism. In order to check the performance of existing imputation methods in a regression model, a simulation study is set up. Listwise deletion, simple imputation and multiple imputation are selected into comparison which focuses on the effect on parameter estimates and standard errors. The simulation results illustrate that the listwise deletion provides reliable parameter estimates. Simple imputation performs better than multiple imputation in a model with a high determination coefficient. Multiple imputation,which offers a suitable solution for missing at random (MAR), is not valid for MNAR.
|
12 |
On the use of multiple imputation in handling missing values in longitudinal studiesChan, Pui-shan, 陳佩珊 January 2004 (has links)
published_or_final_version / Medical Sciences / Master / Master of Medical Sciences
|
13 |
Contributions to imputation for missing survey data /Haziza, David, January 1900 (has links)
Thesis (Ph.D.) - Carleton University, 2005. / Includes bibliographical references (p. 252-258). Also available in electronic format on the Internet.
|
14 |
Estimating market values for non-publicly-traded U.S. life insurersZhao, Liyan, January 1900 (has links) (PDF)
Thesis (Ph. D.)--University of Texas at Austin, 2005. / Vita. Includes bibliographical references.
|
15 |
Multilevel multiple imputation: An examination of competing methodsJanuary 2015 (has links)
abstract: Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution (e.g., multivariate normal). FCS, on the other hand, imputes variables one at a time, drawing missing values from a series of univariate distributions. In the single-level context, these two approaches have been shown to be equivalent with multivariate normal data. However, less is known about the similarities and differences of these two approaches with multilevel data, and the methodological literature provides no insight into the situations under which the approaches would produce identical results. This document examined five multilevel multiple imputation approaches (three JM methods and two FCS methods) that have been proposed in the literature. An analytic section shows that only two of the methods (one JM method and one FCS method) used imputation models equivalent to a two-level joint population model that contained random intercepts and different associations across levels. The other three methods employed imputation models that differed from the population model primarily in their ability to preserve distinct level-1 and level-2 covariances. I verified the analytic work with computer simulations, and the simulation results also showed that imputation models that failed to preserve level-specific covariances produced biased estimates. The studies also highlighted conditions that exacerbated the amount of bias produced (e.g., bias was greater for conditions with small cluster sizes). The analytic work and simulations lead to a number of practical recommendations for researchers. / Dissertation/Thesis / Doctoral Dissertation Psychology 2015
|
16 |
Checking the adequacy of regression models with complex data structureGuo, Xu 29 July 2014 (has links)
In this thesis, we investigate the model checking problem for parametric regression model with missing response at random and nonignorable missing response. Besides, we also propose a hypothesis-adaptive procedure which is based on the dimension reduction theory. Finally, to extend our methods to missing response situation, we consider the dimension reduction problem with missing response at random. The .rst part of the thesis introduces the model checking for parametric models with response missing at random which is a more general missing mechanism than missing completely at random. Di.erent from existing approaches, two tests have normal distributions as the limiting null distributions no matter whether the inverse probability weight is estimated parametrically or nonparametrically. Thus, p-values can be easily determined. This observation shows that slow convergence rate of nonparametric estimation does not have signi.cant e.ect on the asymptotic behaviours of the tests although it may have impact in .nite sample scenarios. The tests can detect the alternatives distinct from the null hypothesis at a nonparametric rate which is an optimal rate for locally smoothing-based methods in this area. Simulation study is carried out to examine the performance of the tests. The tests are also applied to analyze a data set on monozygotic twins for illustration. In the second part of the thesis, we consider model checking for general linear regression model with non-ignorable missing response. Based on an exponential tilting model, we .rst propose three estimators for the unknown parameter in the general linear regression model. Three empirical process-based tests are constructed. We discuss the asymptotic properties of the proposed tests under null and local alternative hypothesis with di.erent scenarios. We .nd that these three tests perform the same in the asymptotic sense. Simulation studies are also carried out to assess the performance of our proposed test procedures. In the third part, we revisit traditional local smoothing model checking procedures. Noticing that the general nonparametric regression model can be considered as a special multi-index model, we propose an adaptive testing procedure based on the dimension reduction theory. To our surprise, our method can detect local alternative at faster rate than the traditional optimal rate. The theory indicates that in model checking problem, dimensionality may not have strong impact. Simulations are carried out to examine the performance of our methodology. A real data analysis is conducted for illustration. In the last part, we study the dimension reduction problem with missing response at random. Based on the work in this part, we can extend the adaptive testing procedure introduced in the third part to the missing response situation. When there are many predictors, how to e.ciently impute responses missing at random is an important problem to deal with for regression analysis because this missing mechanism, unlike missing completely at random, is highly related to high-dimensional predictor vector. In su.cient dimension reduction framework, the fusion-re.nement (FR) method in the literature is a promising approach. To make estimation more accurate and e.cient, two methods are suggested in this paper. Among them, one method uses the observed data to help on missing data generation, and the other one is an ad hoc approach that mainly reduces the dimension in the nonparametric smoothing in data generation. A data-adaptive synthesization of these two methods is also developed. Simulations are conducted to examine their performance and a HIV clinical trial dataset is analysed for illustration. Keywords: Model checking; Inverse probability weight; Non-ignorable missing response; Adaptive; Central subspace; Dimension reduction; Data-adaptive Synthesization; Missing recovery; Missing response at random; Multiple imputation.
|
17 |
Avoiding the redundant effect on regression analyses of including an outcome in the imputation modelTamegnon, Monelle 01 January 2018 (has links)
Imputation is one well recognized method for handling missing data. Multiple imputation provides a framework for imputing missing data that incorporate uncertainty about the imputations at the analysis stage. An important factor to consider when performing multiple imputation is the imputation model. In particular, a careful choice of the covariates to include in the model is crucial. The current recommendation by several authors in the literature (Van Buren, 2012; Moons et al., 2006, Little and Rubin, 2002) is to include all variables that will appear in the analytical model including the outcome as covariates in the imputation model. When the goal of the analysis is to explore the relationship between the outcome and the variable with missing data (the target variable), this recommendation seems questionable. Should we make use of the outcome to fill-in the target variable missing observations and then use these filled-in observations along with the observed data on the target variable to explore the relationship of the target variable with the outcome? We believe that this approach is circular. Instead, we have designed multiple imputation approaches rooted in machines learning techniques that avoid the use of the outcome at the imputation stage and maintain reasonable inferential properties. We also compare our approaches performances to currently available methods.
|
18 |
Investigation of Multiple Imputation Methods for Categorical VariablesMiranda, Samantha 01 May 2020 (has links)
We compare different multiple imputation methods for categorical variables using the MICE package in R. We take a complete data set and remove different levels of missingness and evaluate the imputation methods for each level of missingness. Logistic regression imputation and linear discriminant analysis (LDA) are used for binary variables. Multinomial logit imputation and LDA are used for nominal variables while ordered logit imputation and LDA are used for ordinal variables. After imputation, the regression coefficients, percent deviation index (PDI) values, and relative frequency tables were found for each imputed data set for each level of missingness and compared to the complete corresponding data set. It was found that logistic regression outperformed LDA for binary variables, and LDA outperformed both multinomial logit imputation and ordered logit imputation for nominal and ordered variables. Simulations were ran to confirm the validity of the results.
|
19 |
Multiple Imputation for Handling Missing Data of Covariates in Meta-RegressionDiaz Yanez, Karina Gabriela January 2021 (has links)
The term meta-analysis refers to the quantitative process of statistically combining results of studies in order to draw overall trends found in a research literature. This technique has become the preferred form of systematic review in fields such as social science and education. As the method has become more standard, the number of large meta-analyses has expanded in these fields as well. Accordingly, the purpose of meta-analysis has expanded to explaining the variation of effect sizes across studies using meta-regression. Unfortunately, missing data is a common problem in meta-analysis. Particularly in meta-regression, missing data problems are frequently related to missing covariates.
When not handled properly, missing covariates in meta-regression can impact the precision of statistical inferences and thus the precision of systematic reviews. Ad hoc methods such as complete-case analysis and shifting units of analysis are the most common approaches to address missing data in meta-analysis. These techniques, to some extent, ignore missing values which in turn can lead to biased estimates. The use of model-based methods for missing data are more justifiable than ad hoc approaches. However, its application in meta-analysis is very limited. Multiple imputation is one of these approaches. Its precision relies mainly on how missing values are imputed. Standard multiple imputation approaches do not consider imputations that are compatible with meta-regression and thus can still yield biased estimates.
This dissertation addresses these issues by firstly assessing the performance of standard multiple imputation methods in the meta-regression context through a simulation study. To later develop compatible multiple imputations that accommodate features of meta-regression assuming dependent effect sizes.
Results show that even though multiple imputation methods can accurately estimate missing data in meta-regression, its accuracy decreases with larger missingness rates and when missingness is strongly related to effect sizes. This study also revealed that, in general, the developed compatible multiple imputation method outperforms standard multiple imputations. These findings also hold for cases in which missingness in a covariate is highly related to the effect size estimates. Finally, an algorithm that allows practitioners to apply compatible imputations in meta-regression was implemented using the R software language.
|
20 |
Performance Comparison of Multiple Imputation Methods for Quantitative Variables for Small and Large Data with Differing VariabilityOnyame, Vincent 01 May 2021 (has links)
Missing data continues to be one of the main problems in data analysis as it reduces sample representativeness and consequently, causes biased estimates. Multiple imputation methods have been established as an effective method of handling missing data. In this study, we examined multiple imputation methods for quantitative variables on twelve data sets with varied sizes and variability that were pseudo generated from an original data. The multiple imputation methods examined are the predictive mean matching, Bayesian linear regression and linear regression, non-Bayesian in the MICE (Multiple Imputation Chain Equation) package in the statistical software, R. The parameter estimates generated from the linear regression on the imputed data were compared to the closest parameter estimates from the complete data across all twelve data sets.
|
Page generated in 0.1069 seconds