Global ETD Search

111	Performance Comparison of Imputation Methods for Mixed Data Missing at Random with Small and Large Sample Data Set with Different Variability Afari, Kyei 01 August 2021 (has links) One of the concerns in the field of statistics is the presence of missing data, which leads to bias in parameter estimation and inaccurate results. However, the multiple imputation procedure is a remedy for handling missing data. This study looked at the best multiple imputation methods used to handle mixed variable datasets with different sample sizes and variability along with different levels of missingness. The study employed the predictive mean matching, classification and regression trees, and the random forest imputation methods. For each dataset, the multiple regression parameter estimates for the complete datasets were compared to the multiple regression parameter estimates found with the imputed dataset. The results showed that the random forest imputation method was the best for mostly a sample of 150 and 500 irrespective of the variability. The classification and regression tree imputation methods worked best mostly on sample of 30 irrespective of the variability. missing data multiple imputations by chained equation mixed data Applied Statistics
112	Placebo response characteristic in sequential parallel comparison design studies Rybin, Denis V. 13 February 2016 (has links) The placebo response can affect inference in analysis of data from clinical trials. It can bias the estimate of the treatment effect, jeopardize the effort of all involved in a clinical trial and ultimately deprive patients of potentially efficacious treatment. The Sequential Parallel Comparison Design (SPCD) is one of the novel approaches addressing placebo response in clinical trials. The analysis of SPCD clinical trial data typically involves classification of subjects as ‘placebo responders’ or ‘placebo non-responders’. This classification is done using a specific criterion and placebo response is treated as a measurable characteristic. However, the use of criterion may lead to subject misclassification due to measurement error or incorrect criterion selection. Subsequently, misclassification can directly affect SPCD treatment effect estimate. We propose to view placebo response as an unknown random characteristic that can be estimated based on information collected during the trial. Two strategies are presented here. First strategy is to model placebo response using criterion classification as a starting point or the observed data, and to include the placebo response estimate into the treatment effect estimation. Second strategy is to jointly model latent placebo response and the observed data, and estimate treatment effect from the joint model. We evaluate both strategies on a wide range of simulated data scenarios in terms of type I error control, mean squared error and power. We then evaluate the strategies in presence of missing data and propose a method for missing data imputation under the non-informative missingness assumption. The data from a recent SPCD clinical trial is used to compare results of the proposed methods with reported results of the trial. / 2018-01-01T00:00:00Z Biostatistics EM Clinical trial Latent variable Missing data Mixture Placebo response
113	Impact de l’échantillonnage sur l’inférence de structures dans les réseaux : application aux réseaux d’échanges de graines et à l’écologie / Impact of sampling on structure inference in networks : application to seed exchange networks and to ecology Tabouy, Timothée 30 September 2019 (has links) Dans cette thèse nous nous intéressons à l’étude du modèle à bloc stochastique (SBM) en présence de données manquantes. Nous proposons une classification des données manquantes en deux catégories Missing At Random et Not Missing At Random pour les modèles à variables latentes suivant le modèle décrit par D. Rubin. De plus, nous nous sommes attachés à décrire plusieurs stratégies d’échantillonnages de réseau et leurs lois. L’inférence des modèles de SBM avec données manquantes est faite par l’intermédiaire d’une adaptation de l’algorithme EM : l’EM avec approximation variationnelle. L’identifiabilité de plusieurs des SBM avec données manquantes a pu être démontrée ainsi que la consistance et la normalité asymptotique des estimateurs du maximum de vraisemblance et des estimateurs avec approximation variationnelle dans le cas où chaque dyade (paire de nœuds) est échantillonnée indépendamment et avec même probabilité. Nous nous sommes aussi intéressés aux modèles de SBM avec covariables, à leurs inférence en présence de données manquantes et comment procéder quand les covariables ne sont pas disponibles pour conduire l’inférence. Finalement, toutes nos méthodes ont été implémenté dans un package R disponible sur le CRAN. Une documentation complète sur l’utilisation de ce package a été écrite en complément. / In this thesis we are interested in studying the stochastic block model (SBM) in the presence of missing data. We propose a classification of missing data into two categories Missing At Random and Not Missing At Random for latent variable models according to the model described by D. Rubin. In addition, we have focused on describing several network sampling strategies and their distributions. The inference of SBMs with missing data is made through an adaptation of the EM algorithm : the EM with variational approximation. The identifiability of several of the SBM models with missing data has been demonstrated as well as the consistency and asymptotic normality of the maximum likelihood estimators and variational approximation estimators in the case where each dyad (pair of nodes) is sampled independently and with equal probability. We also looked at SBMs with covariates, their inference in the presence of missing data and how to proceed when covariates are not available to conduct the inference. Finally, all our methods were implemented in an R package available on the CRAN. A complete documentation on the use of this package has been written in addition. Modèle à blocs stochastiques Réseaux Données manquantes Networks Missing data Stochastic Block Model
114	Assessing Internationalization of Higher Education Research: Mixed Methods Research Quality and Missing Data Reporting Practices McKinley, Keanen January 2021 (has links) No description available. Education internationalization agency mixed methods international student mobility missing data rigor
115	An Approach to Estimation and Selection in Linear Mixed Models with Missing Data Lee, Yi-Ching 07 August 2019 (has links) No description available. Statistics linear mixed models missing data model slection in linear mixed models estimation in linear mixed models
116	The impact of missing data imputation on HCC survival prediction : Exploring the combination of missing data imputation with data-level methods such as clustering and oversampling Abdul Jalil, Walid, Dalla Torre, Kvin January 2018 (has links) The area of data imputation, which is the process of replacing missing data with substituted values, has been covered quite extensively in recent years. The literature on the practical impact of data imputation however, remains scarce. This thesis explores the impact of some of the state of the art data imputation methods on HCC survival prediction and classification in combination with data-level methods such as oversampling. More specifically, it explores imputation methods for mixed-type datasets and their impact on a particular HCC dataset. Previous research has shown that, the newer, more sophisticated imputation methods outperform simpler ones when evaluated with normalized root mean square error (NRMSE). Contrary to intuition however, the results of this study show that when combined with other data-level methods such as clustering and oversampling, the differences in imputation performance does not always impact classification in any meaningful way. This might be explained by the noise that is introduced when generating synthetic data points in the oversampling process. The results also show that one of the more sophisticated imputation methods, namely MICE, is highly dependent on prior assumptions about the underlying distributions of the dataset. When those assumptions are incorrect, the imputation method performs poorly and has a considerable negative impact on classification. missing data imputation HCC survival prediction oversampling Engineering and Technology Teknik och teknologier
117	The Single Imputation Technique in the Gaussian Mixture Model Framework Aisyah, Binti M.J. January 2018 (has links) Missing data is a common issue in data analysis. Numerous techniques have been proposed to deal with the missing data problem. Imputation is the most popular strategy for handling the missing data. Imputation for data analysis is the process to replace the missing values with any plausible values. Two most frequent imputation techniques cited in literature are the single imputation and the multiple imputation. The multiple imputation, also known as the golden imputation technique, has been proposed by Rubin in 1987 to address the missing data. However, the inconsistency is the major problem in the multiple imputation technique. The single imputation is less popular in missing data research due to bias and less variability issues. One of the solutions to improve the single imputation technique in the basic regression model: the main motivation is that, the residual is added to improve the bias and variability. The residual is drawn by normal distribution assumption with a mean of 0, and the variance is equal to the residual variance. Although new methods in the single imputation technique, such as stochastic regression model, and hot deck imputation, might be able to improve the variability and bias issues, the single imputation techniques suffer with the uncertainty that may underestimate the R-square or standard error in the analysis results. The research reported in this thesis provides two imputation solutions for the single imputation technique. In the first imputation procedure, the wild bootstrap is proposed to improve the uncertainty for the residual variance in the regression model. In the second solution, the predictive mean matching (PMM) is enhanced, where the regression model is taking the main role to generate the recipient values while the observations in the donors are taken from the observed values. Then the missing values are imputed by randomly drawing one of the observations in the donor pool. The size of the donor pool is significant to determine the quality of the imputed values. The fixed size of donor is used to be employed in many existing research works with PMM imputation technique, but might not be appropriate in certain circumstance such as when the data distribution has high density region. Instead of using the fixed size of donor pool, the proposed method applies the radius-based solution to determine the size of donor pool. Both proposed imputation procedures will be combined with the Gaussian mixture model framework to preserve the original data distribution. The results reported in the thesis from the experiments on benchmark and artificial data sets confirm improvement for further data analysis. The proposed approaches are therefore worthwhile to be considered for further investigation and experiments. Missing data imputation Gaussian mixture model Wild bootstrap resampling Predictive mean matching
118	Complications In Clinical Trials: Bayesian Models For Repeated Measures And Simulators For Nonadherence Ahmad Hakeem Abdul Wahab (11186256) 28 July 2021 (has links) <p>Clinical trials are the gold standard for inferring the causal effects of treatments or interventions. This thesis is concerned with the development of methodologies for two problems in modern clinical trials. First is analyzing binary repeated measures in clinical trials using models that reflect the complicated autocorrelation patterns in the data, so as to obtain high power when inferring treatment effects. Second is simulating realistic outcomes and subject nonadherence mechanisms in Phase III pharmaceutical clinical trials under the Tripartite Framework.</p><p> </p><p><b>Bayesian Models for Binary Repeated Data: The Bayesian General Logistic Autoregressive Model and the Polya-Gamma Logistic Autoregressive Model</b></p><p>Autoregressive processes in generalized linear mixed effects regression models are convenient for the analysis of clinical trials that have a moderate to large number of binary repeated measurements, collected across a fixed set of structured time points, for each subject. However, much of the existing literature and methods for autoregressive processes on repeated binary measurements permit only one order and only one autoregressive process in the model. This limits the flexibility of the resulting generalized linear mixed effects regression model to fully capture the dynamics in the data, which can result in decreased power for testing treatment effects. Nested autoregressive structures enable more holistic modeling of clinical trials that can lead to increased power for testing effects.</p><p> </p><p>We introduce the Bayesian General Logistic Autoregressive Model (BGLAM) for the analysis of repeated binary measures in clinical trials. The BGLAM extends previous Bayesian models for binary repeated measures by accommodating flexible and nested autoregressive processes with non-informative priors. We describe methods for selecting the order of the autoregressive process in the BGLAM based on the Deviance Information Criterion (DIC) and marginal log-likelihood, and develop an importance sampling-weighted posterior predictive p-value to test for treatment effects in BGLAM. The frequentist properties of BGLAM compared to existing likelihood- and non-likelihood-based statistical models are evaluated by means of extensive simulation studies involving different data generation mechanisms.</p><p> </p><p>Two features of BGLAM that can limit its application in practice is the computational effort involved in executing it and the inability to integrate added heterogeneity across time in its autoregressive processes. We develop the Polya-Gamma Logistic Autoregressive Model (PGLAM) for addressing these limiting features of the BGLAM. This new model enables the integration of additional layers of variability through random effects and heterogeneity across time in nested autoregressive processes. Furthermore, PGLAM is computationally more efficient than BGLAM because it eliminates the need to use the complex types of samplers for truncated latent variables that is involved in the Markov Chain Monte Carlo algorithm for BGLAM.</p><p> </p><p><b>Data Generating Model for Phase III Clinical Trials With Intercurrent Events</b></p><p>Although clinical trials are designed with strict controls, inevitably complications will arise during the course of the trials. One significant type of complication is missing subject outcomes due to subject drop-out or nonadherence during the trial, which are referred to in general as intercurrent events. This complication can arise from, among other causes, adverse reactions, lack of efficacy of the assigned treatment, administrative reasons, and excess efficacy from the assigned treatment. Intercurrent events typically confound causal inferences on the effects of the treatments under investigation because the missingness that occurs as a result corresponds to a Missing Not at Random missing data mechanism, the pharmaceutical industry is increasingly focused on developing methods for obtaining valid causal inferences on the receipt of treatment in clinical trials with intercurrent events. However, it is extremely difficult to compare the frequentist properties and performance of these competing methods, as real-life clinical trial data cannot be easily accessed or shared, and as the different methods consider distinct assumptions for the underlying data generating mechanism in the clinical trial. We develop a novel simulation model for clinical trials with intercurrent events. Our simulator operates under the Rubin Causal Model. We implement the simulator by means of an R Shiny application. This app enables users to control patient compliance through different sources of discontinuity with varying functional trends, and understand the frequentist properties of treatment effect estimators obtained by different models for various estimands.</p> Statistics repeated measures Bayesian analysis methods Missing Data Approach Nonadherence. simulation
119	New Technique for Imputing Missing Item Responses for an Ordinal Variable: Using Tennessee Youth Risk Behavior Survey as an Example. Ahmed, Andaleeb Abrar 15 December 2007 (has links) (PDF) Surveys ordinarily ask questions in an ordinal scale and often result in missing data. We suggest a regression based technique for imputing missing ordinal data. Multilevel cumulative logit model was used with an assumption that observed responses of certain key variables can serve as covariate in predicting missing item responses of an ordinal variable. Individual predicted probabilities at each response level were obtained. Average individual predicted probabilities for each response level were used to randomly impute the missing responses using a uniform distribution. Finally, likelihood ratio chi square statistics was used to compare the imputed and observed distributions. Two other forms of multiple imputation algorithms were performed for comparison. Performance of our imputation technique was comparable to other 2 established algorithms. Our method being simpler does not involve any complex algorithms and with further research can potentially be used as an imputation technique for missing ordinal variables. Imputation Ordinal Variable Missing Data Physical Sciences and Mathematics Statistical Theory Statistics and Probability
120	Autologous Stem Cell Transplant: Factors Predicting the Yield of CD34+ Cells Lawson, Elizabeth Anne 02 December 2005 (has links) (PDF) Stem cell transplant is often considered the last hope for the survival for many cancer patients. The CD34+ cell content of a collection of stem cells has appeared as the most reliable indicator of the quantity of desired cells in a peripheral blood stem cell harvest and is used as a surrogate measure of the sample quality. Factors predicting the yield of CD34+ cells in a collection are not yet fully understood. Throughout the literature, there has been conflicting evidence with regards to age, gender, disease status, and prior radiation. In addition to the factors that have already been explored, we are interested in finding a cancer-chemotherapy interaction and to develop a predictive model to better identify which patients will be good candidates for this procedure. Because the amount of CD34+ cells is highly skewed, most traditional statistical methods are inappropriate without some transformation. A Bayesian generalized regression model was used to explain the variation of CD34+ collected from the sample by the cancer chemotherapy interaction. Missing data was modeled as unknown parameters to include the entire data set in the analysis. Posterior estimates are obtained using Markov chain methods. Posterior distributions identified weight and gender as well as some cancer-chemotherapy interactions as significant factors. Predictive posterior distributions can be used to identify which patients are good candidates for this procedure. bayesian hierarchal model missing data autologous stem cell transplant CD34+ Statistics and Probability

Search results