1 |
"The Application of Multiple Imputation in Correcting for Unit Nonresponse Bias"Arntsen, Stian Fagerli January 2010 (has links)
No description available.
|
2 |
Praleistų duomenų įrašymo metodai baigtinių populiacijų statistikoje / Missing data imputation in finite population statisticsUtovkaitė, Jurgita 04 March 2009 (has links)
Netgi tobuliausiai suplanuotame tyrime atsiranda įvairių rūšių klaidų, dėl kurių gali būti gauti nepatikimi ar nepakankamai tikslūs tyrimo rezultatai, taigi labai svarbu kiek įmanoma labiau sumažinti tų klaidų įtaką tyrimo rezultatams – sumų, vidurkių, santykių įvertiniams. Vienas iš galimų statistinio tyrimo klaidų tipų yra klaidos dėl neatsakymo į apklausą. Jos atsiranda tuomet, kai atsakytojas neatsako į vieną ar kelis klausimyno klausimus. Neatsakymai tyrimuose pasitaiko dėl įvairių priežasčių. Jie iššaukia standartinių įvertinių, kuriuose neatsižvelgiama į neatsakymus, nuokrypį nuo tikrųjų mus dominančių reikšmių, o taip pat šių įvertinių dispersijos padidėjimą. Dabartinėje praktikoje neatsakymai į apklausą nagrinėjami dviem požiūriais: visų pirma bandoma išvengti arba sumažinti neatsakymų lygį. Yra nemažai literatūros ir metodologinės medžiagos tyrinėjančios neatsakymų priežastis bei pateikiančios rekomendacijas kaip sumažinti neatsakymų lygį, tačiau, kai tyrime jau yra neatsakymų, dominančius įvertinius reikia sukonstruoti taip, kad tyrimo rezultatai būtų kuo tikslesni. Neatsakymų sukeliamiems tyrimo rezultatų nuokrypiams sumažinti naudojami įvairūs būdai. Vienas tokių metodų yra praleistų reikšmių įrašymas. Įrašymas – tai trūkstamų duomenų užpildymo būdas, kuris yra labai naudingas analizuojant nepilnas duomenų sekas. Jis išsprendžia duomenų trūkumo problemą duomenų analizės pradžioje. Praleistų reikšmių įrašymo metodika šiuo metu sparčiai vystosi, galima rasti... [toliau žr. visą tekstą] / Nonresponse has been a matter of concern for several decades in survey theory and practice. The problem can be viewed from two different angles: the prevention or avoidance of nonresponse before it occurs, and the special estimation techniques when nonresponse has occurred. The objective of this work is to describe main methods of estimation when nonresponse occurs. Special attention is drawn on one nonresponse estimation method – imputation. Imputation is the procedure when missing values for one or more study variables are “filled in” with substitutes constructed according to some rules, or observed values for elements other than nonrespondents. In this work imputation methods based on some of the more commonly used statistical rules are considered. Some of them are tested on data set having the same distribution as the data of the real survey taken in Statistics Lithuania. The imputation methods are compared with each other and the best imputation method for this data set is picked up. Special attention is paid on regression imputation.
|
3 |
Imputation en présence de données contenant des zérosNambeu, Christian O. 12 1900 (has links)
L’imputation simple est très souvent utilisée dans les enquêtes pour compenser
pour la non-réponse partielle. Dans certaines situations, la variable nécessitant
l’imputation prend des valeurs nulles un très grand nombre de fois. Ceci est très
fréquent dans les enquêtes entreprises qui collectent les variables économiques.
Dans ce mémoire, nous étudions les propriétés de deux méthodes d’imputation
souvent utilisées en pratique et nous montrons qu’elles produisent des estimateurs
imputés biaisés en général. Motivé par un modèle de mélange, nous proposons
trois méthodes d’imputation et étudions leurs propriétés en termes de biais.
Pour ces méthodes d’imputation, nous considérons un estimateur jackknife de la
variance convergent vers la vraie variance, sous l’hypothèse que la fraction de
sondage est négligeable. Finalement, nous effectuons une étude par simulation
pour étudier la performance des estimateurs ponctuels et de variance en termes
de biais et d’erreur quadratique moyenne. / Single imputation is often used in surveys to compensate for item nonresponse.
In some cases, the variable requiring imputation contains a large amount
of zeroes. This is especially frequent in business surveys that collect economic
variables. In this thesis, we study the properties of two imputation procedures
frequently used in practice and show that they lead to biased estimators, in general.
Motivated by a mixture regression model, we then propose three imputation
procedures and study their properties in terms of bias. For the proposed imputation
procedures, we consider a jackknife variance estimator that is consistent
for the true variance, provided the overall sampling fraction is negligible. Finally,
we perform a simulation study to evaluate the performance of point and variance
estimators in terms of relative bias and mean square error.
|
4 |
Missing imputation methods explored in big data analyticsBrydon, Humphrey Charles January 2018 (has links)
Philosophiae Doctor - PhD (Statistics and Population Studies) / The aim of this study is to look at the methods and processes involved in imputing missing data and more specifically, complete missing blocks of data. A further aim of this study is to look at the effect that the imputed data has on the accuracy of various predictive models constructed on the imputed data and hence determine if the imputation method involved is suitable.
The identification of the missingness mechanism present in the data should be the first process to follow in order to identify a possible imputation method. The identification of a suitable imputation method is easier if the mechanism can be identified as one of the following; missing completely at random (MCAR), missing at random (MAR) or not missing at random (NMAR).
Predictive models constructed on the complete imputed data sets are shown to be less accurate for those models constructed on data sets which employed a hot-deck imputation method. The data sets which employed either a single or multiple Monte Carlo Markov Chain (MCMC) or the Fully Conditional Specification (FCS) imputation methods are shown to result in predictive models that are more accurate.
The addition of an iterative bagging technique in the modelling procedure is shown to produce highly accurate prediction estimates. The bagging technique is applied to variants of the neural network, a decision tree and a multiple linear regression (MLR) modelling procedure. A stochastic gradient boosted decision tree (SGBT) is also constructed as a comparison to the bagged decision tree.
Final models are constructed from 200 iterations of the various modelling procedures using a 60% sampling ratio in the bagging procedure. It is further shown that the addition of the bagging technique in the MLR modelling procedure can produce a MLR model that is more accurate than that of the other more advanced modelling procedures under certain conditions.
The evaluation of the predictive models constructed on imputed data is shown to vary based on the type of fit statistic used. It is shown that the average squared error reports little difference in the accuracy levels when compared to the results of the Mean Absolute Prediction Error (MAPE). The MAPE fit statistic is able to magnify the difference in the prediction errors reported. The Normalized Mean Bias Error (NMBE) results show that all predictive models constructed produced estimates that were an over-prediction, although these did vary depending on the data set and modelling procedure used.
The Nash Sutcliffe efficiency (NSE) was used as a comparison statistic to compare the accuracy of the predictive models in the context of imputed data. The NSE statistic showed that the estimates of the models constructed on the imputed data sets employing a multiple imputation method were highly accurate. The NSE statistic results reported that the estimates from the predictive models constructed on the hot-deck imputed data were inaccurate and that a mean substitution of the fully observed data would have been a better method of imputation.
The conclusion reached in this study shows that the choice of imputation method as well as that of the predictive model is dependent on the data used. Four unique combinations of imputation methods and modelling procedures were concluded for the data considered in this study.
|
5 |
The Impacts of Imputation Tax System on Corporate Dividend Policy¡GAn Empirical Study and SurveyWang, Zong-Siang 11 June 2003 (has links)
none
|
6 |
Imputation en présence de données contenant des zérosNambeu, Christian O. 12 1900 (has links)
L’imputation simple est très souvent utilisée dans les enquêtes pour compenser
pour la non-réponse partielle. Dans certaines situations, la variable nécessitant
l’imputation prend des valeurs nulles un très grand nombre de fois. Ceci est très
fréquent dans les enquêtes entreprises qui collectent les variables économiques.
Dans ce mémoire, nous étudions les propriétés de deux méthodes d’imputation
souvent utilisées en pratique et nous montrons qu’elles produisent des estimateurs
imputés biaisés en général. Motivé par un modèle de mélange, nous proposons
trois méthodes d’imputation et étudions leurs propriétés en termes de biais.
Pour ces méthodes d’imputation, nous considérons un estimateur jackknife de la
variance convergent vers la vraie variance, sous l’hypothèse que la fraction de
sondage est négligeable. Finalement, nous effectuons une étude par simulation
pour étudier la performance des estimateurs ponctuels et de variance en termes
de biais et d’erreur quadratique moyenne. / Single imputation is often used in surveys to compensate for item nonresponse.
In some cases, the variable requiring imputation contains a large amount
of zeroes. This is especially frequent in business surveys that collect economic
variables. In this thesis, we study the properties of two imputation procedures
frequently used in practice and show that they lead to biased estimators, in general.
Motivated by a mixture regression model, we then propose three imputation
procedures and study their properties in terms of bias. For the proposed imputation
procedures, we consider a jackknife variance estimator that is consistent
for the true variance, provided the overall sampling fraction is negligible. Finally,
we perform a simulation study to evaluate the performance of point and variance
estimators in terms of relative bias and mean square error.
|
7 |
Avoiding the redundant effect on regression analyses of including an outcome in the imputation modelTamegnon, Monelle 01 January 2018 (has links)
Imputation is one well recognized method for handling missing data. Multiple imputation provides a framework for imputing missing data that incorporate uncertainty about the imputations at the analysis stage. An important factor to consider when performing multiple imputation is the imputation model. In particular, a careful choice of the covariates to include in the model is crucial. The current recommendation by several authors in the literature (Van Buren, 2012; Moons et al., 2006, Little and Rubin, 2002) is to include all variables that will appear in the analytical model including the outcome as covariates in the imputation model. When the goal of the analysis is to explore the relationship between the outcome and the variable with missing data (the target variable), this recommendation seems questionable. Should we make use of the outcome to fill-in the target variable missing observations and then use these filled-in observations along with the observed data on the target variable to explore the relationship of the target variable with the outcome? We believe that this approach is circular. Instead, we have designed multiple imputation approaches rooted in machines learning techniques that avoid the use of the outcome at the imputation stage and maintain reasonable inferential properties. We also compare our approaches performances to currently available methods.
|
8 |
Corporate criminal liability in the United Kingdom : determining the appropriate mechanism of imputationNana, Constantine Ntsanyu January 2009 (has links)
The objectives of this thesis are twofold: firstly, demonstrate that the string of contradictions stretching across substantive and procedural corporate criminal law may be avoided if courts refer to an appropriate mechanism of imputation; and secondly, show how such an appropriate mechanism of imputation may be determined. This study adopts a three-step process to achieve these objectives. The first step involves elaborating on the lack of coherence and integrity in the imputation of acts and intents (or causal relationships) to corporations caused by a disjunction of rules invoked by courts. The second step involves establishing parameters by which mechanisms of imputation may be evaluated. The third step involves evaluating a number of samples by reference to the established parameters. Five mechanisms of imputation applicable in the United Kingdom and in some jurisdictions that trace their legal heritage to the United Kingdom are evaluated. In the conclusion, it is submitted that although none of the mechanisms evaluated may be deemed to be the appropriate mechanism, the aggregation doctrine is the least inappropriate. This is because although it requires some modification, it can best be aligned with propositions of how the criminal liability of corporations may be established on a coherent and consistent basis. The propositions that are put forward include the use of the doctrine of innocent agency to establish a corporation’s guilt in instances where no guilty agent may be identified; and the use of the principle of accessorial liability to establish a corporation’s guilt in instances where a guilty agent may be identified. The aggregation doctrine as modified in this study will enable the prosecutor to establish a corporation’s guilt as advised above if measurable values are given to the ‘innocent’ acts of agents and if emphasis is placed on how the corporation reacted to the discursive dilemma that arose in the decision-making process that preceded the performance of the relevant activity. This will provide evidence to the effect that the aggregated act represents the corporation’s subjective position.
|
9 |
Imputation techniques for non-ordered categorical missing dataKarangwa, Innocent January 2016 (has links)
Philosophiae Doctor - PhD / Missing data are common in survey data sets. Enrolled subjects do not often have data recorded for all variables of interest. The inappropriate handling of missing data may lead to bias in the estimates and incorrect inferences. Therefore, special attention is needed when analysing incomplete data. The multivariate normal imputation (MVNI) and the multiple imputation by chained equations (MICE) have emerged as the best techniques to impute or fills in missing data. The former assumes a normal distribution of the variables in the imputation model, but can also handle missing data whose distributions are not normal. The latter fills in missing values taking into account the distributional form of the variables to be imputed. The aim of this study was to determine the performance of these methods when data are missing at random (MAR) or completely at random (MCAR) on unordered or nominal categorical variables treated as predictors or response variables in the regression models. Both dichotomous and polytomous variables were considered in the analysis. The baseline data used was the 2007 Demographic and Health Survey (DHS) from the Democratic Republic of Congo. The analysis model of interest was the logistic regression model of the woman’s contraceptive method use status on her marital status, controlling or not for other covariates (continuous, nominal and ordinal). Based on the data set with missing values, data sets with missing at random and missing completely at random observations on either the covariates or response variables measured on nominal scale were first simulated, and then used for imputation purposes. Under MVNI method, unordered categorical variables were first dichotomised, and then K − 1 (where K is the number of levels of the categorical variable of interest) dichotomised variables were included in the imputation model, leaving the other category as a reference. These variables were imputed as continuous variables using a linear regression model. Imputation with MICE considered the distributional form of each variable to be imputed. That is, imputations were drawn using binary and multinomial logistic regressions for dichotomous and polytomous variables respectively. The performance of these methods was evaluated in terms of bias and standard errors in regression coefficients that were estimated to determine the association between the woman’s contraceptive methods use status and her marital status, controlling or not for other types of variables. The analysis was done assuming that the sample was not weighted fi then the sample weight was taken into account to assess whether the sample design would affect the performance of the multiple imputation methods of interest, namely MVNI and MICE. As expected, the results showed that for all the models, MVNI and MICE produced less biased smaller standard errors than the case deletion (CD) method, which discards items with missing values from the analysis. Moreover, it was found that when data were missing (MCAR or MAR) on the nominal variables that were treated as predictors in the regression model, MVNI reduced bias in the regression coefficients and standard errors compared to MICE, for both unweighted and weighted data sets. On the other hand, the results indicated that MICE outperforms MVNI when data were missing on the response variables, either the binary or polytomous. Furthermore, it was noted that the sample design (sample weights), the rates of missingness and the missing data mechanisms (MCAR or MAR) did not affect the behaviour of the multiple imputation methods that were considered in this study. Thus, based on these results, it can be concluded that when missing values are present on the outcome variables measured on a nominal scale in regression models, the distributional form of the variable with missing values should be taken into account. When these variables are used as predictors (with missing observations), the parametric imputation approach (MVNI) would be a better option than MICE.
|
10 |
Context Similarity for Retrieval-Based ImputationAhmadov, Ahmad, Thiele, Maik, Lehner, Wolfgang, Wrembel, Robert 30 June 2022 (has links)
Completeness as one of the four major dimensions of data quality is a pervasive issue in modern databases. Although data imputation has been studied extensively in the literature, most of the research is focused on inference-based approach. We propose to harness Web tables as an external data source to effectively and efficiently retrieve missing data while taking into account the inherent uncertainty and lack of veracity that they contain. Existing approaches mostly rely on standard retrieval techniques and out-of-the-box matching methods which result in a very low precision, especially when dealing with numerical data. We, therefore, propose a novel data imputation approach by applying numerical context similarity measures which results in a significant increase in the precision of the imputation procedure, by ensuring that the imputed values are of the same domain and magnitude as the local values, thus resulting in an accurate imputation. We use Dresden Web Table Corpus which is comprised of more than 125 million web tables extracted from the Common Crawl as our knowledge source. The comprehensive experimental results demonstrate that the proposed method well outperforms the default out-of-the-box retrieval approach.
|
Page generated in 0.0958 seconds