Global ETD Search

11	Multiple Imputation for Handling Missing Data of Covariates in Meta-Regression Diaz Yanez, Karina Gabriela January 2021 (has links) The term meta-analysis refers to the quantitative process of statistically combining results of studies in order to draw overall trends found in a research literature. This technique has become the preferred form of systematic review in fields such as social science and education. As the method has become more standard, the number of large meta-analyses has expanded in these fields as well. Accordingly, the purpose of meta-analysis has expanded to explaining the variation of effect sizes across studies using meta-regression. Unfortunately, missing data is a common problem in meta-analysis. Particularly in meta-regression, missing data problems are frequently related to missing covariates. When not handled properly, missing covariates in meta-regression can impact the precision of statistical inferences and thus the precision of systematic reviews. Ad hoc methods such as complete-case analysis and shifting units of analysis are the most common approaches to address missing data in meta-analysis. These techniques, to some extent, ignore missing values which in turn can lead to biased estimates. The use of model-based methods for missing data are more justifiable than ad hoc approaches. However, its application in meta-analysis is very limited. Multiple imputation is one of these approaches. Its precision relies mainly on how missing values are imputed. Standard multiple imputation approaches do not consider imputations that are compatible with meta-regression and thus can still yield biased estimates. This dissertation addresses these issues by firstly assessing the performance of standard multiple imputation methods in the meta-regression context through a simulation study. To later develop compatible multiple imputations that accommodate features of meta-regression assuming dependent effect sizes. Results show that even though multiple imputation methods can accurately estimate missing data in meta-regression, its accuracy decreases with larger missingness rates and when missingness is strongly related to effect sizes. This study also revealed that, in general, the developed compatible multiple imputation method outperforms standard multiple imputations. These findings also hold for cases in which missingness in a covariate is highly related to the effect size estimates. Finally, an algorithm that allows practitioners to apply compatible imputations in meta-regression was implemented using the R software language. Statistics Social sciences--Study and teaching Research Multiple imputation (Statistics)
12	Statistical Learning Methods for Personalized Medical Decision Making Liu, Ying January 2016 (has links) The theme of my dissertation is on merging statistical modeling with medical domain knowledge and machine learning algorithms to assist in making personalized medical decisions. In its simplest form, making personalized medical decisions for treatment choices and disease diagnosis modality choices can be transformed into classification or prediction problems in machine learning, where the optimal decision for an individual is a decision rule that yields the best future clinical outcome or maximizes diagnosis accuracy. However, challenges emerge when analyzing complex medical data. On one hand, statistical modeling is needed to deal with inherent practical complications such as missing data, patients' loss to follow-up, ethical and resource constraints in randomized controlled clinical trials. On the other hand, new data types and larger scale of data call for innovations combining statistical modeling, domain knowledge and information technologies. This dissertation contains three parts addressing the estimation of optimal personalized rule for choosing treatment, the estimation of optimal individualized rule for choosing disease diagnosis modality, and methods for variable selection if there are missing data. In the first part of this dissertation, we propose a method to find optimal Dynamic treatment regimens (DTRs) in Sequential Multiple Assignment Randomized Trial (SMART) data. Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each stage of treatment by potentially time-varying patient features and intermediate outcomes observed in previous stages. The complexity, patient heterogeneity, and chronicity of many diseases and disorders call for learning optimal DTRs that best dynamically tailor treatment to each individual's response over time. We propose a robust and efficient approach referred to as Augmented Multistage Outcome-Weighted Learning (AMOL) to identify optimal DTRs from sequential multiple assignment randomized trials. We improve outcome-weighted learning (Zhao et al.~2012) to allow for negative outcomes; we propose methods to reduce variability of weights to achieve numeric stability and higher efficiency; and finally, for multiple-stage trials, we introduce robust augmentation to improve efficiency by drawing information from Q-function regression models at each stage. The proposed AMOL remains valid even if the regression model is misspecified. We formally justify that proper choice of augmentation guarantees smaller stochastic errors in value function estimation for AMOL; we then establish the convergence rates for AMOL. The comparative advantage of AMOL over existing methods is demonstrated in extensive simulation studies and applications to two SMART data sets: a two-stage trial for attention deficit hyperactivity disorder and the STAR*D trial for major depressive disorder. The second part of the dissertation introduced a machine learning algorithm to estimate personalized decision rules for medical diagnosis/screening to maximize a weighted combination of sensitivity and specificity. Using subject-specific risk factors and feature variables, such rules administer screening tests with balanced sensitivity and specificity, and thus protect low-risk subjects from unnecessary pain and stress caused by false positive tests, while achieving high sensitivity for subjects at high risk. We conducted simulation study mimicking a real breast cancer study, and we found significant improvements on sensitivity and specificity comparing our personalized screening strategy (assigning mammography+MRI to high-risk patients and mammography alone to low-risk subjects based on a composite score of their risk factors) to one-size-fits-all strategy (assigning mammography+MRI or mammography alone to all subjects). When applying to a Parkinson's disease (PD) FDG-PET and fMRI data, we showed that the method provided individualized modality selection that can improve AUC, and it can provide interpretable decision rules for choosing brain imaging modality for early detection of PD. To the best of our knowledge, this is the first time in the literature to propose automatic data-driven methods and learning algorithm for personalized diagnosis/screening strategy. In the last part of the dissertation, we propose a method, Multiple Imputation Random Lasso (MIRL), to select important variables and to predict the outcome for an epidemiological study of Eating and Activity in Teens. % in the presence of missing data. In this study, 80% of individuals have at least one variable missing. Therefore, using variable selection methods developed for complete data after list-wise deletion substantially reduces prediction power. Recent work on prediction models in the presence of incomplete data cannot adequately account for large numbers of variables with arbitrary missing patterns. We propose MIRL to combine penalized regression techniques with multiple imputation and stability selection. Extensive simulation studies are conducted to compare MIRL with several alternatives. MIRL outperforms other methods in high-dimensional scenarios in terms of both reduced prediction error and improved variable selection performance, and it has greater advantage when the correlation among variables is high and missing proportion is high. MIRL is shown to have improved performance when comparing with other applicable methods when applied to the study of Eating and Activity in Teens for the boys and girls separately, and to a subgroup of low social economic status (SES) Asian boys who are at high risk of developing obesity. Machine learning Biometry Therapeutics Multiple imputation (Statistics) Medical statistics Machine learning--Statistical methods
13	Estimating market values for non-publicly-traded U.S. life insurers Zhao, Liyan 28 August 2008 (has links) Not available / text Insurance companies--Valuation Life insurance stocks--United States Multiple imputation (Statistics)
14	Comparative approaches to handling missing data, with particular focus on multiple imputation for both cross-sectional and longitudinal models. Hassan, Ali Satty Ali. January 2012 (has links) Much data-based research are characterized by the unavoidable problem of incompleteness as a result of missing or erroneous values. This thesis discusses some of the various strategies and basic issues in statistical data analysis to address the missing data problem, and deals with both the problem of missing covariates and missing outcomes. We restrict our attention to consider methodologies which address a specific missing data pattern, namely monotone missingness. The thesis is divided into two parts. The first part placed a particular emphasis on the so called missing at random (MAR) assumption, but focuses the bulk of attention on multiple imputation techniques. The main aim of this part is to investigate various modelling techniques using application studies, and to specify the most appropriate techniques as well as gain insight into the appropriateness of these techniques for handling incomplete data analysis. This thesis first deals with the problem of missing covariate values to estimate regression parameters under a monotone missing covariate pattern. The study is devoted to a comparison of different imputation techniques, namely markov chain monte carlo (MCMC), regression, propensity score (PS) and last observation carried forward (LOCF). The results from the application study revealed that we have universally best methods to deal with missing covariates when the missing data pattern is monotone. Of the methods explored, the MCMC and regression methods of imputation to estimate regression parameters with monotone missingness were preferable to the PS and LOCF methods. This study is also concerned with comparative analysis of the techniques applied to incomplete Gaussian longitudinal outcome or response data due to random dropout. Three different methods are assessed and investigated, namely multiple imputation (MI), inverse probability weighting (IPW) and direct likelihood analysis. The findings in general favoured MI over IPW in the case of continuous outcomes, even when the MAR mechanism holds. The findings further suggest that the use of MI and direct likelihood techniques lead to accurate and equivalent results as both techniques arrive at the same substantive conclusions. The study also compares and contrasts several statistical methods for analyzing incomplete non-Gaussian longitudinal outcomes when the underlying study is subject to ignorable dropout. The methods considered include weighted generalized estimating equations (WGEE), multiple imputation after generalized estimating equations (MI-GEE) and generalized linear mixed model (GLMM). The current study found that the MI-GEE method was considerably robust, doing better than all the other methods in terms of small and large sample sizes, regardless of the dropout rates. The primary interest of the second part of the thesis falls under the non-ignorable dropout (MNAR) modelling frameworks that rely on sensitivity analysis in modelling incomplete Gaussian longitudinal data. The aim of this part is to deal with non-random dropout by explicitly modelling the assumptions that caused the dropout and incorporated this additional sub-model into the model for the measurement data, and to assess the sensitivity of the modelling assumptions. The study pays attention to the analysis of repeated Gaussian measures subject to potentially non-random dropout in order to study the influence on inference that might be caused in the data by the dropout process. We consider the construction of a particular type of selection model, namely the Diggle-Kenward model as a tool for assessing the sensitivity of a selection model in terms of the modelling assumptions. The major conclusions drawn were that there was evidence in favour of the MAR process rather than an MCAR process in the context of the assumed model. In addition, there was the need to obtain further insight into the data by comparing various sensitivity analysis frameworks. Lastly, two families of models were also compared and contrasted to investigate the potential influence on inference that dropout might have or exert on the dependent measurement data considered, and to deal with incomplete sequences. The models were based on selection and pattern mixture frameworks used for sensitivity analysis to jointly model the distribution of the dropout process and longitudinal measurement process. The results of the sensitivity analysis were in agreement and hence led to similar parameter estimates. Additional confidence in the findings was gained as both models led to similar results for significant effects such as marginal treatment effects. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2012. Multiple imputation (Statistics) Multivariate analysis. Missing observations (Statistics)
15	Multiple imputation for marginal and mixed models in longitudinal data with informative missingness Deng, Wei, January 2005 (has links) Thesis (Ph. D.)--Ohio State University, 2005. / Title from first page of PDF file. Document formatted into pages; contains xiii, 108 p.; also includes graphics. Includes bibliographical references (p. 104-108). Available online via OhioLINK's ETD Center
16	A Monte Carlo study the impact of missing data in cross-classification random effects models / Alemdar, Meltem. January 2008 (has links) Thesis (Ph. D.)--Georgia State University, 2008. / Title from title page (Digital Archive@GSU, viewed July 20, 2010) Carolyn F. Furlow, committee chair; Philo A. Hutcheson, Phillip E. Gagne, Sheryl A. Gowen, committee members. Includes bibliographical references (p. 96-100).
17	Bayesian estimation of factor analysis models with incomplete data Merkle, Edgar C., January 2005 (has links) Thesis (Ph. D.)--Ohio State University, 2005. / Title from first page of PDF file. Document formatted into pages; contains xi, 106 p.; also includes graphics. Includes bibliographical references (p. 103-106). Available online via OhioLINK's ETD Center
18	A cox proportional hazard model for mid-point imputed interval censored data Gwaze, Arnold Rumosa January 2011 (has links) There has been an increasing interest in survival analysis with interval-censored data, where the event of interest (such as infection with a disease) is not observed exactly but only known to happen between two examination times. However, because so much research has been focused on right-censored data, so many statistical tests and techniques are available for right-censoring methods, hence interval-censoring methods are not as abundant as those for right-censored data. In this study, right-censoring methods are used to fit a proportional hazards model to some interval-censored data. Transformation of the interval-censored observations was done using a method called mid-point imputation, a method which assumes that an event occurs at some midpoint of its recorded interval. Results obtained gave conservative regression estimates but a comparison with the conventional methods showed that the estimates were not significantly different. However, the censoring mechanism and interval lengths should be given serious consideration before deciding on using mid-point imputation on interval-censored data. Statistics -- Econometric models Survival analysis (Biometry) Nonparametric statistics Sampling (Statistics) Multiple imputation (Statistics)

Search results