Global ETD Search

1	Comparison of Imputation Methods on Estimating Regression Equation in MNAR Mechanism Pan, Wensi January 2012 (has links) In this article, we propose an overview of missing data problem, introduce three missing data mechanisms and study general solutions to them when estimating a linear regression equation. When we have partly missing data, there are two common ways to solve this problem. One way is to ignore those records with missing values. Another method is to impute those observations being missed. Imputation methods arepreferred since they provide full datasets. We observed that there is not a general imputation solution in missing not at random (MNAR) mechanism. In order to check the performance of existing imputation methods in a regression model, a simulation study is set up. Listwise deletion, simple imputation and multiple imputation are selected into comparison which focuses on the effect on parameter estimates and standard errors. The simulation results illustrate that the listwise deletion provides reliable parameter estimates. Simple imputation performs better than multiple imputation in a model with a high determination coefficient. Multiple imputation,which offers a suitable solution for missing at random (MAR), is not valid for MNAR.
2	A Sensitivity Analysis of a Nonignorable Nonresponse Model Via EM Algorithm and Bootstrap Zong, Yujie 15 April 2011 (has links) The Slovenian Public Opinion survey (SPOS), which carried out in 1990, was used by the government of Slovenia as a benchmark to prepare for an upcoming plebiscite, which asked the respondents whether they support independence from Yugoslavia. However, the sample size was large and it is quite likely that the respondents and nonrespondents had divergent viewpoints. We first develop an ignorable nonresponse model which is an extension of a bivariate binomial model. In order to accommodate the nonrespondents, we then develop a nonignorable nonresponse model which is an extension of the ignorable model. Our methodology uses an EM algorithm to fit both the ignorable and nonignorable nonresponse models, and estimation is carried out using the bootstrap mechanism. We also perform sensitivity analysis to study different degrees of departures of the nonignorable nonresponse model from the ignorable nonresponse model. We found that the nonignorable nonresponse model is mildly sensitive to departures from the ignorable nonresponse model. In fact, our finding based on the nonignorable model is better than an earlier conclusion about another nonignorable nonresponse model fitted to these data. Bivariate binomial distribution Bootstrap EM algorithm Missing not at random Multinomial model 2X2 categorical tables
3	Sensitivity Analyses in Empirical Studies Plagued with Missing Data Liublinska, Viktoriia 07 June 2014 (has links) Analyses of data with missing values often require assumptions about missingness mechanisms that cannot be assessed empirically, highlighting the need for sensitivity analyses. However, universal recommendations for reporting missing data and conducting sensitivity analyses in empirical studies are scarce. Both steps are often neglected by practitioners due to the lack of clear guidelines for summarizing missing data and systematic explorations of alternative assumptions, as well as the typical attendant complexity of missing not at random (MNAR) models. We propose graphical displays that help visualize and systematize the results of sensitivity analyses, building upon the idea of "tipping-point" analysis for experiments with dichotomous treatment. The resulting "enhanced tipping-point displays" (ETP) are convenient summaries of conclusions drawn from using different modeling assumptions about the missingness mechanisms, applicable to a broad range of outcome distributions. We also describe a systematic way of exploring MNAR models using ETP displays, based on a pattern-mixture factorization of the outcome distribution, and present a set of sensitivity parameters that arises naturally from such a factorization. The primary goal of the displays is to make formal sensitivity analyses more comprehensible to practitioners, thereby helping them assess the robustness of experiments' conclusions. We also present an example of a recent use of ETP displays in a medical device clinical trial, which helped lead to FDA approval. The last part of the dissertation demonstrates another method of sensitivity analysis in the same clinical trial. The trial is complicated by missingness in outcomes "due to death", and we address this issue by employing Rubin Causal Model and principal stratification. We propose an improved method to estimate the joint posterior distribution of estimands of interest using a Hamiltonian Monte Carlo algorithm and demonstrate its superiority for this problem to the standard Metropolis-Hastings algorithm. The proposed methods of sensitivity analyses provide new collections of useful tools for the analysis of data sets plagued with missing values. / Statistics Statistics clinical trial graphical sensitivity analysis missing not at random multiple imputation principle stratification tipping-point analysis
4	Uncertainty intervals and sensitivity analysis for missing data Genbäck, Minna January 2016 (has links) In this thesis we develop methods for dealing with missing data in a univariate response variable when estimating regression parameters. Missing outcome data is a problem in a number of applications, one of which is follow-up studies. In follow-up studies data is collected at two (or more) occasions, and it is common that only some of the initial participants return at the second occasion. This is the case in Paper II, where we investigate predictors of decline in self reported health in older populations in Sweden, the Netherlands and Italy. In that study, around 50% of the study participants drop out. It is common that researchers rely on the assumption that the missingness is independent of the outcome given some observed covariates. This assumption is called data missing at random (MAR) or ignorable missingness mechanism. However, MAR cannot be tested from the data, and if it does not hold, the estimators based on this assumption are biased. In the study of Paper II, we suspect that some of the individuals drop out due to bad health. If this is the case the data is not MAR. One alternative to MAR, which we pursue, is to incorporate the uncertainty due to missing data into interval estimates instead of point estimates and uncertainty intervals instead of confidence intervals. An uncertainty interval is the analog of a confidence interval but wider due to a relaxation of assumptions on the missing data. These intervals can be used to visualize the consequences deviations from MAR have on the conclusions of the study. That is, they can be used to perform a sensitivity analysis of MAR. The thesis covers different types of linear regression. In Paper I and III we have a continuous outcome, in Paper II a binary outcome, and in Paper IV we allow for mixed effects with a continuous outcome. In Paper III we estimate the effect of a treatment, which can be seen as an example of missing outcome data. missing data missing not at random non-ignorable set identification uncertainty intervals sensitivity analysis self reported health average causal effect average causal effect on the treated mixed-effects models
5	Performance of Imputation Algorithms on Artificially Produced Missing at Random Data Oketch, Tobias O 01 May 2017 (has links) Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased. However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software R. We assess how these MI methods perform with different percentages of missing data. A multiple regression model was fit on the imputed data sets and the complete data set. Statistical comparisons of the regression coefficients are made between the models using the imputed data and the complete data. Missing not at random Missing completely at random Missing at random Multiple imputation Multiple imputation by chained equation Relative efficiency. Applied Statistics Multivariate Analysis Statistical Models
6	Causal Inference in the Face of Assumption Violations Yuki Ohnishi (18423810) 26 April 2024 (has links) <p dir="ltr">This dissertation advances the field of causal inference by developing methodologies in the face of assumption violations. Traditional causal inference methodologies hinge on a core set of assumptions, which are often violated in the complex landscape of modern experiments and observational studies. This dissertation proposes novel methodologies designed to address the challenges posed by single or multiple assumption violations. By applying these innovative approaches to real-world datasets, this research uncovers valuable insights that were previously inaccessible with existing methods. </p><p><br></p><p dir="ltr">First, three significant sources of complications in causal inference that are increasingly of interest are interference among individuals, nonadherence of individuals to their assigned treatments, and unintended missing outcomes. Interference exists if the outcome of an individual depends not only on its assigned treatment, but also on the assigned treatments for other units. It commonly arises when limited controls are placed on the interactions of individuals with one another during the course of an experiment. Treatment nonadherence frequently occurs in human subject experiments, as it can be unethical to force an individual to take their assigned treatment. Clinical trials, in particular, typically have subjects that do not adhere to their assigned treatments due to adverse side effects or intercurrent events. Missing values also commonly occur in clinical studies. For example, some patients may drop out of the study due to the side effects of the treatment. Failing to account for these considerations will generally yield unstable and biased inferences on treatment effects even in randomized experiments, but existing methodologies lack the ability to address all these challenges simultaneously. We propose a novel Bayesian methodology to fill this gap. </p><p><br></p><p dir="ltr">My subsequent research further addresses one of the limitations of the first project: a set of assumptions about interference structures that may be too restrictive in some practical settings. We introduce a concept of the ``degree of interference" (DoI), a latent variable capturing the interference structure. This concept allows for handling arbitrary, unknown interference structures to facilitate inference on causal estimands. </p><p><br></p><p dir="ltr">While randomized experiments offer a solid foundation for valid causal analysis, people are also interested in conducting causal inference using observational data due to the cost and difficulty of randomized experiments and the wide availability of observational data. Nonetheless, using observational data to infer causality requires us to rely on additional assumptions. A central assumption is that of \emph{ignorability}, which posits that the treatment is randomly assigned based on the variables (covariates) included in the dataset. While crucial, this assumption is often debatable, especially when treatments are assigned sequentially to optimize future outcomes. For instance, marketers typically adjust subsequent promotions based on responses to earlier ones and speculate on how customers might have reacted to alternative past promotions. This speculative behavior introduces latent confounders, which must be carefully addressed to prevent biased conclusions. </p><p dir="ltr">In the third project, we investigate these issues by studying sequences of promotional emails sent by a US retailer. We develop a novel Bayesian approach for causal inference from longitudinal observational data that accommodates noncompliance and latent sequential confounding. </p><p><br></p><p dir="ltr">Finally, we formulate the causal inference problem for the privatized data. In the era of digital expansion, the secure handling of sensitive data poses an intricate challenge that significantly influences research, policy-making, and technological innovation. As the collection of sensitive data becomes more widespread across academic, governmental, and corporate sectors, addressing the complex balance between making data accessible and safeguarding private information requires the development of sophisticated methods for analysis and reporting, which must include stringent privacy protections. Currently, the gold standard for maintaining this balance is Differential privacy. </p><p dir="ltr">Local differential privacy is a differential privacy paradigm in which individuals first apply a privacy mechanism to their data (often by adding noise) before transmitting the result to a curator. The noise for privacy results in additional bias and variance in their analyses. Thus, it is of great importance for analysts to incorporate the privacy noise into valid inference.</p><p dir="ltr">In this final project, we develop methodologies to infer causal effects from locally privatized data under randomized experiments. We present frequentist and Bayesian approaches and discuss the statistical properties of the estimators, such as consistency and optimality under various privacy scenarios.</p> Econometric and statistical methods Applied statistics Computational statistics Statistical data science Statistical theory Causal Inference Bayesian statistics Interference Noncompliance Missing not at random (MNAR) Bayesian Nonparametrics Differential privacy
7	Analysis of survey data in the presence of non-ignorable missing-data and selection mechanisms Hammon, Angelina 04 July 2023 (has links) Diese Dissertation beschäftigt sich mit Methoden zur Behandlung von nicht-ignorierbaren fehlenden Daten und Stichprobenverzerrungen – zwei häufig auftretenden Problemen bei der Analyse von Umfragedaten. Beide Datenprobleme können die Qualität der Analyseergebnisse erheblich beeinträchtigen und zu irreführenden Inferenzen über die Population führen. Daher behandle ich innerhalb von drei verschiedenen Forschungsartikeln, Methoden, die eine Durchführung von sogenannten Sensitivitätsanalysen in Bezug auf Missing- und Selektionsmechanismen ermöglichen und dabei auf typische Survey-Daten angewandt werden können. Im Rahmen des ersten und zweiten Artikels entwickele ich Verfahren zur multiplen Imputation von binären und ordinal Mehrebenen-Daten, welche es zulassen, einen potenziellen Missing Not at Random (MNAR) Mechanismus zu berücksichtigen. In unterschiedlichen Simulationsstudien konnte bestätigt werden, dass die neuen Imputationsmethoden in der Lage sind, in allen betrachteten Szenarien unverzerrte sowie effiziente Schätzungen zuliefern. Zudem konnte ihre Anwendbarkeit auf empirische Daten aufgezeigt werden. Im dritten Artikel untersuche ich ein Maß zur Quantifizierung und Adjustierung von nicht ignorierbaren Stichprobenverzerrungen in Anteilswerten, die auf der Basis von nicht-probabilistischen Daten geschätzt wurden. Es handelt sich hierbei um die erste Anwendung des Index auf eine echte nicht-probabilistische Stichprobe abseits der Forschergruppe, die das Maß entwickelt hat. Zudem leite ich einen allgemeinen Leitfaden für die Verwendung des Index in der Praxis ab und validiere die Fähigkeit des Maßes vorhandene Stichprobenverzerrungen korrekt zu erkennen. Die drei vorgestellten Artikel zeigen, wie wichtig es ist, vorhandene Schätzer auf ihre Robustheit hinsichtlich unterschiedlicher Annahmen über den Missing- und Selektionsmechanismus zu untersuchen, wenn es Hinweise darauf gibt, dass die Ignorierbarkeitsannahme verletzt sein könnte und stellen erste Lösungen zur Umsetzung bereit. / This thesis deals with methods for the appropriate handling of non-ignorable missing data and sample selection, which are two common challenges of survey data analysis. Both issues can dramatically affect the quality of analysis results and lead to misleading inferences about the population. Therefore, in three different research articles, I treat methods for the performance of so-called sensitivity analyses with regards to the missing data and selection mechanism that are usable with typical survey data. In the first and second article, I provide novel procedures for the multiple imputation of binary and ordinal multilevel data that are supposed to be Missing not At Random (MNAR). The methods’ suitability to produce unbiased and efficient estimates could be demonstrated in various simulation studies considering different data scenarios. Moreover, I could show their applicability to empirical data. In the third article, I investigate a measure to quantify and adjust non-ignorable selection bias in proportions estimated based on non-probabilistic data. In doing so, I provide the first application of the suggested index to a real non-probability sample outside its original research group. In addition, I derive general guidelines for its usage in practice, and validate the measure’s performance in properly detecting selection bias. The three presented articles highlight the necessity to assess the sensitivity of estimates towards different assumptions about the missing-data and selection mechanism if it seems realistic that the ignorability assumption might be violated, and provide first solutions to enable such robustness checks for specific data situations. Missing Not at Random Multiple Imputation Fully conditional specification Mehrebenen Daten Selektionsmodell Selection Not at Random Stichprobenverzerrung Nicht-probabilistische Stichprobe Pattern-mixture Modell Sensitivitätsanalyse Missing Not at Random Multiple imputation Fully conditional specification Multilevel data Selection model Selection Not at Random Selection bias Non-probability sample Pattern-mixture model Sensitivity analysis 300 Sozialwissenschaften ddc:300 ddc:519
8	Methodology for Handling Missing Data in Nonlinear Mixed Effects Modelling Johansson, Åsa M. January 2014 (has links) To obtain a better understanding of the pharmacokinetic and/or pharmacodynamic characteristics of an investigated treatment, clinical data is often analysed with nonlinear mixed effects modelling. The developed models can be used to design future clinical trials or to guide individualised drug treatment. Missing data is a frequently encountered problem in analyses of clinical data, and to not venture the predictability of the developed model, it is of great importance that the method chosen to handle the missing data is adequate for its purpose. The overall aim of this thesis was to develop methods for handling missing data in the context of nonlinear mixed effects models and to compare strategies for handling missing data in order to provide guidance for efficient handling and consequences of inappropriate handling of missing data. In accordance with missing data theory, all missing data can be divided into three categories; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). When data are MCAR, the underlying missing data mechanism does not depend on any observed or unobserved data; when data are MAR, the underlying missing data mechanism depends on observed data but not on unobserved data; when data are MNAR, the underlying missing data mechanism depends on the unobserved data itself. Strategies and methods for handling missing observation data and missing covariate data were evaluated. These evaluations showed that the most frequently used estimation algorithm in nonlinear mixed effects modelling (first-order conditional estimation), resulted in biased parameter estimates independent on missing data mechanism. However, expectation maximization (EM) algorithms (e.g. importance sampling) resulted in unbiased and precise parameter estimates as long as data were MCAR or MAR. When the observation data are MNAR, a proper method for handling the missing data has to be applied to obtain unbiased and precise parameter estimates, independent on estimation algorithm. The evaluation of different methods for handling missing covariate data showed that a correctly implemented multiple imputations method and full maximum likelihood modelling methods resulted in unbiased and precise parameter estimates when covariate data were MCAR or MAR. When the covariate data were MNAR, the only method resulting in unbiased and precise parameter estimates was a full maximum likelihood modelling method where an extra parameter was estimated, correcting for the unknown missing data mechanism's dependence on the missing data. This thesis presents new insight to the dynamics of missing data in nonlinear mixed effects modelling. Strategies for handling different types of missing data have been developed and compared in order to provide guidance for efficient handling and consequences of inappropriate handling of missing data. Pharmacometrics population models censored observations missing covariates missing dependent variable missing data mechanism missing completely at random (MCAR) missing at random (MAR) missing not at random (MNAR) estimation algorithms

Search results