• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 72
  • 19
  • 13
  • 5
  • 5
  • 5
  • 3
  • 1
  • Tagged with
  • 134
  • 134
  • 72
  • 64
  • 49
  • 31
  • 22
  • 17
  • 17
  • 15
  • 13
  • 13
  • 13
  • 13
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Handling missing data problems in criminology :an introduction

Wang, Xue January 2016 (has links)
University of Macau / Faculty of Social Sciences / Department of Sociology
2

On two topics with no bridge : bridge sampling with dependent draws and bias of the multiple imputation variance estimator /

Romero, Martin. January 2003 (has links)
Thesis (Ph. D.)--University of Chicago, Dept. of Statistics, December 2003. / Includes bibliographical references. Also available on the Internet.
3

Practical importance sampling methods for finite mixture models and multiple imputation /

Steele, Russell John, January 2002 (has links)
Thesis (Ph. D.)--University of Washington, 2002. / Vita. Includes bibliographical references (p. 109-119).
4

Model Selection and Multivariate Inference Using Data Multiply Imputed for Disclosure Limitation and Nonresponse

Kinney, Satkartar K. January 2007 (has links)
Thesis (Ph. D.)--Duke University, 2007.
5

The handling, analysis and reporting of missing data in patient reported outcome measures for randomised controlled trials

Rombach, Ines January 2016 (has links)
Missing data is a potential source of bias in the results of randomised controlled trials (RCTs), which can have a negative impact on guidance derived from them, and ultimately patient care. This thesis aims to improve the understanding, handling, analysis and reporting of missing data in patient reported outcome measures (PROMs) for RCTs. A review of the literature provided evidence of discrepancies between recommended methodology and current practice in the handling and reporting of missing data. Particularly, missed opportunities to minimise missing data, the use of inappropriate analytical methods and lack of sensitivity analyses were noted. Missing data patterns were examined and found to vary between PROMs as well as across RCTs. Separate analyses illustrated difficulties in predicting missing data, resulting in uncertainty about assumed underlying missing data mechanisms. Simulation work was used to assess the comparative performance of statistical approaches for handling missing available in standard statistical software. Multiple imputation (MI) at either the item, subscale or composite score level was considered for missing PROMs data at a single follow-up time point. The choice of an MI approach depended on a multitude of factors, with MI at the item level being more beneficial than its alternatives for high proportions of item missingness. The approaches performed similarly for high proportions of unit-nonresponse; however, convergence issues were observed for MI at the item level. Maximum likelihood (ML), MI and inverse probability weighting (IPW) were evaluated for handling missing longitudinal PROMs data. MI was less biased than ML when additional post-randomisation data were available, while IPW introduced more bias compared to both ML and MI. A case study was used to explore approaches to sensitivity analyses to assess the impact of missing data. It was found that trial results could be susceptible to varying assumptions about missing data, and the importance of interpreting the results in this context was reiterated. This thesis provides researchers with guidance for the handling and reporting of missing PROMs data in order to decrease bias arising from missing data in RCTs.
6

Multiple Imputation in der Praxis : ein sozialwissenschaftliches Anwendungsbeispiel / Multiple imputation in practice : a socio-scientific example of use

Böwing-Schmalenbrock, Melanie, Jurczok, Anne January 2011 (has links)
Multiple Imputation hat sich in den letzten Jahren als adäquate Methode zum Umgang mit fehlenden Werten erwiesen und etabliert. Das gilt zumindest für die Theorie, denn im Angesicht mangelnder anwendungsbezogener Erläuterungen und Einführungen verzichten in der Praxis viele Sozialwissenschaftler auf diese notwendige Datenaufbereitung. Trotz (oder vielleicht auch wegen) der stetig fortschreitenden Weiterentwicklung der Programme und Optionen zur Umsetzung Multipler Imputationen, sieht sich der Anwender mit zahlreichen Herausforderungen konfrontiert, für die er mitunter nur schwer Lösungsansätze findet. Die Schwierigkeiten reichen von der Analyse und Aufbereitung der Zielvariablen, über die Software-Entscheidung, die Auswahl der Prädiktoren bis hin zur Modell-Formulierung und Ergebnis-Evaluation. In diesem Beitrag wird die Funktionsweise und Anwendbarkeit Multipler Imputationen skizziert und es wird eine Herangehensweise entwickelt, die sich in der schrittweisen Umsetzung dieser Methode als nützlich erwiesen hat – auch für Einsteiger. Es werden konkrete potenzielle Schwierigkeiten angesprochen und mögliche Problemlösungen diskutiert; vor allem die jeweilige Beschaffenheit der fehlenden Werte steht hierbei im Vordergrund. Der Imputations-Prozess und alle mit ihm verbundenen Arbeitsschritte werden anhand eines Anwendungsbeispiels – der Multiplen Imputation des Gesamtvermögens reicher Haushalte – exemplarisch illustriert. / Multiple imputation established itself and proved adequate as method of handling missing observations – at least in theory. Annotations and explanations on how to apply multiple imputation in practice are scarce and this seems to discourage many social scientists to conduct this step of necessary data preparation. Despite (or maybe because of) the continuous and progressive development of programs and features to conduct multiple imputation the user is confronted with numerous challenges for which solutions are sometimes hard to find. The difficulties range from the analysis and preparation of the target variable to deciding in favor of a software package, selecting predictors, formulating a suitable model and evaluating the results. This paper will outline the operation and practicability of multiple imputations and will develop a useful approach, which has proven adequate in handling missing values step by step – even for beginners. It will discuss potential difficulties and gives specific solutions; especially the particular quality of missing data is paramount. The process of imputation with all its necessary steps will be illustrated by the multiple imputation of the total assets of wealthy households.
7

Imputation techniques for non-ordered categorical missing data

Karangwa, Innocent January 2016 (has links)
Philosophiae Doctor - PhD / Missing data are common in survey data sets. Enrolled subjects do not often have data recorded for all variables of interest. The inappropriate handling of missing data may lead to bias in the estimates and incorrect inferences. Therefore, special attention is needed when analysing incomplete data. The multivariate normal imputation (MVNI) and the multiple imputation by chained equations (MICE) have emerged as the best techniques to impute or fills in missing data. The former assumes a normal distribution of the variables in the imputation model, but can also handle missing data whose distributions are not normal. The latter fills in missing values taking into account the distributional form of the variables to be imputed. The aim of this study was to determine the performance of these methods when data are missing at random (MAR) or completely at random (MCAR) on unordered or nominal categorical variables treated as predictors or response variables in the regression models. Both dichotomous and polytomous variables were considered in the analysis. The baseline data used was the 2007 Demographic and Health Survey (DHS) from the Democratic Republic of Congo. The analysis model of interest was the logistic regression model of the woman’s contraceptive method use status on her marital status, controlling or not for other covariates (continuous, nominal and ordinal). Based on the data set with missing values, data sets with missing at random and missing completely at random observations on either the covariates or response variables measured on nominal scale were first simulated, and then used for imputation purposes. Under MVNI method, unordered categorical variables were first dichotomised, and then K − 1 (where K is the number of levels of the categorical variable of interest) dichotomised variables were included in the imputation model, leaving the other category as a reference. These variables were imputed as continuous variables using a linear regression model. Imputation with MICE considered the distributional form of each variable to be imputed. That is, imputations were drawn using binary and multinomial logistic regressions for dichotomous and polytomous variables respectively. The performance of these methods was evaluated in terms of bias and standard errors in regression coefficients that were estimated to determine the association between the woman’s contraceptive methods use status and her marital status, controlling or not for other types of variables. The analysis was done assuming that the sample was not weighted fi then the sample weight was taken into account to assess whether the sample design would affect the performance of the multiple imputation methods of interest, namely MVNI and MICE. As expected, the results showed that for all the models, MVNI and MICE produced less biased smaller standard errors than the case deletion (CD) method, which discards items with missing values from the analysis. Moreover, it was found that when data were missing (MCAR or MAR) on the nominal variables that were treated as predictors in the regression model, MVNI reduced bias in the regression coefficients and standard errors compared to MICE, for both unweighted and weighted data sets. On the other hand, the results indicated that MICE outperforms MVNI when data were missing on the response variables, either the binary or polytomous. Furthermore, it was noted that the sample design (sample weights), the rates of missingness and the missing data mechanisms (MCAR or MAR) did not affect the behaviour of the multiple imputation methods that were considered in this study. Thus, based on these results, it can be concluded that when missing values are present on the outcome variables measured on a nominal scale in regression models, the distributional form of the variable with missing values should be taken into account. When these variables are used as predictors (with missing observations), the parametric imputation approach (MVNI) would be a better option than MICE.
8

Missing imputation methods explored in big data analytics

Brydon, Humphrey Charles January 2018 (has links)
Philosophiae Doctor - PhD (Statistics and Population Studies) / The aim of this study is to look at the methods and processes involved in imputing missing data and more specifically, complete missing blocks of data. A further aim of this study is to look at the effect that the imputed data has on the accuracy of various predictive models constructed on the imputed data and hence determine if the imputation method involved is suitable. The identification of the missingness mechanism present in the data should be the first process to follow in order to identify a possible imputation method. The identification of a suitable imputation method is easier if the mechanism can be identified as one of the following; missing completely at random (MCAR), missing at random (MAR) or not missing at random (NMAR). Predictive models constructed on the complete imputed data sets are shown to be less accurate for those models constructed on data sets which employed a hot-deck imputation method. The data sets which employed either a single or multiple Monte Carlo Markov Chain (MCMC) or the Fully Conditional Specification (FCS) imputation methods are shown to result in predictive models that are more accurate. The addition of an iterative bagging technique in the modelling procedure is shown to produce highly accurate prediction estimates. The bagging technique is applied to variants of the neural network, a decision tree and a multiple linear regression (MLR) modelling procedure. A stochastic gradient boosted decision tree (SGBT) is also constructed as a comparison to the bagged decision tree. Final models are constructed from 200 iterations of the various modelling procedures using a 60% sampling ratio in the bagging procedure. It is further shown that the addition of the bagging technique in the MLR modelling procedure can produce a MLR model that is more accurate than that of the other more advanced modelling procedures under certain conditions. The evaluation of the predictive models constructed on imputed data is shown to vary based on the type of fit statistic used. It is shown that the average squared error reports little difference in the accuracy levels when compared to the results of the Mean Absolute Prediction Error (MAPE). The MAPE fit statistic is able to magnify the difference in the prediction errors reported. The Normalized Mean Bias Error (NMBE) results show that all predictive models constructed produced estimates that were an over-prediction, although these did vary depending on the data set and modelling procedure used. The Nash Sutcliffe efficiency (NSE) was used as a comparison statistic to compare the accuracy of the predictive models in the context of imputed data. The NSE statistic showed that the estimates of the models constructed on the imputed data sets employing a multiple imputation method were highly accurate. The NSE statistic results reported that the estimates from the predictive models constructed on the hot-deck imputed data were inaccurate and that a mean substitution of the fully observed data would have been a better method of imputation. The conclusion reached in this study shows that the choice of imputation method as well as that of the predictive model is dependent on the data used. Four unique combinations of imputation methods and modelling procedures were concluded for the data considered in this study.
9

Neural network imputation : a new fashion or a good tool

Amer, Safaa R. 07 June 2004 (has links)
Most statistical surveys and data collection studies encounter missing data. A common solution to this problem is to discard observations with missing data while reporting the percentage of missing observations in different output tables. Imputation is a tool used to fill in the missing values. This dissertation introduces the missing data problem as well as traditional imputation methods (e.g. hot deck, mean imputation, regression, Markov Chain Monte Carlo, Expectation-Maximization, etc.). The use of artificial neural networks (ANN), a data mining technique, is proposed as an effective imputation procedure. During ANN imputation, computational effort is minimized while accounting for sample design and imputation uncertainty. The mechanism and use of ANN in imputation for complex survey designs is investigated. Imputation methods are not all equally good, and none are universally good. However, simulation results and applications in this dissertation show that regression, Markov chain Monte Carlo, and ANN yield comparable results. Artificial neural networks could be considered as implicit models that take into account the sample design without making strong parametric assumptions. Artificial neural networks make few assumptions about the data, are asymptotically good and robust to multicollinearity and outliers. Overall, ANN could be time and resources efficient for an experienced user compared to other conventional imputation techniques. / Graduation date: 2005
10

Multiple comparisons using multiple imputation under a two-way mixed effects interaction model

Kosler, Joseph Stephen, January 2006 (has links)
Thesis (Ph. D.)--Ohio State University, 2006. / Title from first page of PDF file. Includes bibliographical references (p. 233-237).

Page generated in 0.1198 seconds