Spelling suggestions: "subject:"aultiple imputation (dtatistics)"" "subject:"aultiple imputation (estatistics)""
11 |
Multiple Imputation for Handling Missing Data of Covariates in Meta-RegressionDiaz Yanez, Karina Gabriela January 2021 (has links)
The term meta-analysis refers to the quantitative process of statistically combining results of studies in order to draw overall trends found in a research literature. This technique has become the preferred form of systematic review in fields such as social science and education. As the method has become more standard, the number of large meta-analyses has expanded in these fields as well. Accordingly, the purpose of meta-analysis has expanded to explaining the variation of effect sizes across studies using meta-regression. Unfortunately, missing data is a common problem in meta-analysis. Particularly in meta-regression, missing data problems are frequently related to missing covariates.
When not handled properly, missing covariates in meta-regression can impact the precision of statistical inferences and thus the precision of systematic reviews. Ad hoc methods such as complete-case analysis and shifting units of analysis are the most common approaches to address missing data in meta-analysis. These techniques, to some extent, ignore missing values which in turn can lead to biased estimates. The use of model-based methods for missing data are more justifiable than ad hoc approaches. However, its application in meta-analysis is very limited. Multiple imputation is one of these approaches. Its precision relies mainly on how missing values are imputed. Standard multiple imputation approaches do not consider imputations that are compatible with meta-regression and thus can still yield biased estimates.
This dissertation addresses these issues by firstly assessing the performance of standard multiple imputation methods in the meta-regression context through a simulation study. To later develop compatible multiple imputations that accommodate features of meta-regression assuming dependent effect sizes.
Results show that even though multiple imputation methods can accurately estimate missing data in meta-regression, its accuracy decreases with larger missingness rates and when missingness is strongly related to effect sizes. This study also revealed that, in general, the developed compatible multiple imputation method outperforms standard multiple imputations. These findings also hold for cases in which missingness in a covariate is highly related to the effect size estimates. Finally, an algorithm that allows practitioners to apply compatible imputations in meta-regression was implemented using the R software language.
|
12 |
Statistical Learning Methods for Personalized Medical Decision MakingLiu, Ying January 2016 (has links)
The theme of my dissertation is on merging statistical modeling with medical domain knowledge and machine learning algorithms to assist in making personalized medical decisions. In its simplest form, making personalized medical decisions for treatment choices and disease diagnosis modality choices can be transformed into classification or prediction problems in machine learning, where the optimal decision for an individual is a decision rule that yields the best future clinical outcome or maximizes diagnosis accuracy. However, challenges emerge when analyzing complex medical data. On one hand, statistical modeling is needed to deal with inherent practical complications such as missing data, patients' loss to follow-up, ethical and resource constraints in randomized controlled clinical trials. On the other hand, new data types and larger scale of data call for innovations combining statistical modeling, domain knowledge and information technologies. This dissertation contains three parts addressing the estimation of optimal personalized rule for choosing treatment, the estimation of optimal individualized rule for choosing disease diagnosis modality, and methods for variable selection if there are missing data.
In the first part of this dissertation, we propose a method to find optimal Dynamic treatment regimens (DTRs) in Sequential Multiple Assignment Randomized Trial (SMART) data. Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each stage of treatment by potentially time-varying patient features and intermediate outcomes observed in previous stages. The complexity, patient heterogeneity, and chronicity of many diseases and disorders call for learning optimal DTRs that best dynamically tailor treatment to each individual's response over time. We propose a robust and efficient approach referred to as Augmented Multistage Outcome-Weighted Learning (AMOL) to identify optimal DTRs from sequential multiple assignment randomized trials. We improve outcome-weighted learning (Zhao et al.~2012) to allow for negative outcomes; we propose methods to reduce variability of weights to achieve numeric stability and higher efficiency; and finally, for multiple-stage trials, we introduce robust augmentation to improve efficiency by drawing information from Q-function regression models at each stage. The proposed AMOL remains valid even if the regression model is misspecified. We formally justify that proper choice of augmentation guarantees smaller stochastic errors in value function estimation for AMOL; we then establish the convergence rates for AMOL. The comparative advantage of AMOL over existing methods is demonstrated in extensive simulation studies and applications to two SMART data sets: a two-stage trial for attention deficit hyperactivity disorder and the STAR*D trial for major depressive disorder.
The second part of the dissertation introduced a machine learning algorithm to estimate personalized decision rules for medical diagnosis/screening to maximize a weighted combination of sensitivity and specificity. Using subject-specific risk factors and feature variables, such rules administer screening tests with balanced sensitivity and specificity, and thus protect low-risk subjects from unnecessary pain and stress caused by false positive tests, while achieving high sensitivity for subjects at high risk. We conducted simulation study mimicking a real breast cancer study, and we found significant improvements on sensitivity and specificity comparing our personalized screening strategy (assigning mammography+MRI to high-risk patients and mammography alone to low-risk subjects based on a composite score of their risk factors) to one-size-fits-all strategy (assigning mammography+MRI or mammography alone to all subjects). When applying to a Parkinson's disease (PD) FDG-PET and fMRI data, we showed that the method provided individualized modality selection that can improve AUC, and it can provide interpretable decision rules for choosing brain imaging modality for early detection of PD. To the best of our knowledge, this is the first time in the literature to propose automatic data-driven methods and learning algorithm for personalized diagnosis/screening strategy.
In the last part of the dissertation, we propose a method, Multiple Imputation Random Lasso (MIRL), to select important variables and to predict the outcome for an epidemiological study of Eating and Activity in Teens. % in the presence of missing data. In this study, 80% of individuals have at least one variable missing. Therefore, using variable selection methods developed for complete data after list-wise deletion substantially reduces prediction power. Recent work on prediction models in the presence of incomplete data cannot adequately account for large numbers of variables with arbitrary missing patterns. We propose MIRL to combine penalized regression techniques with multiple imputation and stability selection. Extensive simulation studies are conducted to compare MIRL with several alternatives. MIRL outperforms other methods in high-dimensional scenarios in terms of both reduced prediction error and improved variable selection performance, and it has greater advantage when the correlation among variables is high and missing proportion is high. MIRL is shown to have improved performance when comparing with other applicable methods when applied to the study of Eating and Activity in Teens for the boys and girls separately, and to a subgroup of low social economic status (SES) Asian boys who are at high risk of developing obesity.
|
13 |
Estimating market values for non-publicly-traded U.S. life insurersZhao, Liyan 28 August 2008 (has links)
Not available / text
|
14 |
Comparative approaches to handling missing data, with particular focus on multiple imputation for both cross-sectional and longitudinal models.Hassan, Ali Satty Ali. January 2012 (has links)
Much data-based research are characterized by the unavoidable problem of incompleteness
as a result of missing or erroneous values. This thesis discusses some of the
various strategies and basic issues in statistical data analysis to address the missing
data problem, and deals with both the problem of missing covariates and missing outcomes.
We restrict our attention to consider methodologies which address a specific
missing data pattern, namely monotone missingness.
The thesis is divided into two parts. The first part placed a particular emphasis on
the so called missing at random (MAR) assumption, but focuses the bulk of attention
on multiple imputation techniques. The main aim of this part is to investigate various
modelling techniques using application studies, and to specify the most appropriate
techniques as well as gain insight into the appropriateness of these techniques for handling
incomplete data analysis. This thesis first deals with the problem of missing
covariate values to estimate regression parameters under a monotone missing covariate
pattern. The study is devoted to a comparison of different imputation techniques,
namely markov chain monte carlo (MCMC), regression, propensity score (PS) and last
observation carried forward (LOCF). The results from the application study revealed
that we have universally best methods to deal with missing covariates when the missing
data pattern is monotone. Of the methods explored, the MCMC and regression methods
of imputation to estimate regression parameters with monotone missingness were
preferable to the PS and LOCF methods. This study is also concerned with comparative
analysis of the techniques applied to incomplete Gaussian longitudinal outcome
or response data due to random dropout. Three different methods are assessed and
investigated, namely multiple imputation (MI), inverse probability weighting (IPW)
and direct likelihood analysis. The findings in general favoured MI over IPW in the
case of continuous outcomes, even when the MAR mechanism holds. The findings further suggest that the use of MI and direct likelihood techniques lead to accurate and
equivalent results as both techniques arrive at the same substantive conclusions. The
study also compares and contrasts several statistical methods for analyzing incomplete
non-Gaussian longitudinal outcomes when the underlying study is subject to ignorable
dropout. The methods considered include weighted generalized estimating equations
(WGEE), multiple imputation after generalized estimating equations (MI-GEE) and
generalized linear mixed model (GLMM). The current study found that the MI-GEE
method was considerably robust, doing better than all the other methods in terms of
small and large sample sizes, regardless of the dropout rates.
The primary interest of the second part of the thesis falls under the non-ignorable
dropout (MNAR) modelling frameworks that rely on sensitivity analysis in modelling
incomplete Gaussian longitudinal data. The aim of this part is to deal with non-random
dropout by explicitly modelling the assumptions that caused the dropout and
incorporated this additional sub-model into the model for the measurement data, and
to assess the sensitivity of the modelling assumptions. The study pays attention to
the analysis of repeated Gaussian measures subject to potentially non-random dropout
in order to study the influence on inference that might be caused in the data by the
dropout process. We consider the construction of a particular type of selection model,
namely the Diggle-Kenward model as a tool for assessing the sensitivity of a selection
model in terms of the modelling assumptions. The major conclusions drawn were that
there was evidence in favour of the MAR process rather than an MCAR process in
the context of the assumed model. In addition, there was the need to obtain further
insight into the data by comparing various sensitivity analysis frameworks. Lastly,
two families of models were also compared and contrasted to investigate the potential
influence on inference that dropout might have or exert on the dependent measurement
data considered, and to deal with incomplete sequences. The models were based on
selection and pattern mixture frameworks used for sensitivity analysis to jointly model
the distribution of the dropout process and longitudinal measurement process. The
results of the sensitivity analysis were in agreement and hence led to similar parameter
estimates. Additional confidence in the findings was gained as both models led to
similar results for significant effects such as marginal treatment effects. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2012.
|
15 |
Multiple imputation for marginal and mixed models in longitudinal data with informative missingnessDeng, Wei, January 2005 (has links)
Thesis (Ph. D.)--Ohio State University, 2005. / Title from first page of PDF file. Document formatted into pages; contains xiii, 108 p.; also includes graphics. Includes bibliographical references (p. 104-108). Available online via OhioLINK's ETD Center
|
16 |
A Monte Carlo study the impact of missing data in cross-classification random effects models /Alemdar, Meltem. January 2008 (has links)
Thesis (Ph. D.)--Georgia State University, 2008. / Title from title page (Digital Archive@GSU, viewed July 20, 2010) Carolyn F. Furlow, committee chair; Philo A. Hutcheson, Phillip E. Gagne, Sheryl A. Gowen, committee members. Includes bibliographical references (p. 96-100).
|
17 |
Bayesian estimation of factor analysis models with incomplete dataMerkle, Edgar C., January 2005 (has links)
Thesis (Ph. D.)--Ohio State University, 2005. / Title from first page of PDF file. Document formatted into pages; contains xi, 106 p.; also includes graphics. Includes bibliographical references (p. 103-106). Available online via OhioLINK's ETD Center
|
18 |
A cox proportional hazard model for mid-point imputed interval censored dataGwaze, Arnold Rumosa January 2011 (has links)
There has been an increasing interest in survival analysis with interval-censored data, where the event of interest (such as infection with a disease) is not observed exactly but only known to happen between two examination times. However, because so much research has been focused on right-censored data, so many statistical tests and techniques are available for right-censoring methods, hence interval-censoring methods are not as abundant as those for right-censored data. In this study, right-censoring methods are used to fit a proportional hazards model to some interval-censored data. Transformation of the interval-censored observations was done using a method called mid-point imputation, a method which assumes that an event occurs at some midpoint of its recorded interval. Results obtained gave conservative regression estimates but a comparison with the conventional methods showed that the estimates were not significantly different. However, the censoring mechanism and interval lengths should be given serious consideration before deciding on using mid-point imputation on interval-censored data.
|
Page generated in 0.1475 seconds