Global ETD Search

1	Missing Data in Complex Sample Surveys: Impact of Deletion and Imputation Treatments on Point and Interval Parameter Estimates Kellermann, Anh Pham 15 January 2018 (has links) The purpose of this simulation study was to evaluate the relative performance of five missing data treatments (MDTs) for handling missing data in complex sample surveys. The five missing data methods included in this study were listwise deletion (LW), single hot-deck imputation (HS), single regression imputation (RS), hot-deck-based multiple imputation (HM), and regression-based multiple imputation (RM). These MDTs were assessed in the context of regression weight estimates in multiple regression analysis in complex sample data with two data levels. In this study, the multiple regression equation had six regressors without missing data and two regressors with missing data. The four performance measures used in this study were statistical bias, RMSE, CI width, and coverage probability (i.e., 95%) of the confidence interval. The five MDTs were evaluated separately for three types of missingness: MCAR, MAR, and MNAR. For each type of missingness, the studied MDTs were evaluated at four levels of missingness (10%, 30%, 50%, and 70%) along with complete sample conditions as a reference point for interpretation of results. In addition, ICC levels (.0, .25, .50) and high and low density population were also manipulated as studied factors. The study’s findings revealed that the performance of each individual MDT varied across missing data types, but their relative performance was quite similar for all missing data types except for LW’s performance in MNAR. RS produced the most inaccurate estimates considering bias, RMSE, and coverage of confidence interval; RM and HM were the second poorest performers. LW as well as HS procedure outperformed the rest on the measures of accuracy and precision in MCAR; however LW’s measures of precision decreased in MAR and MNAR, and LW’s CI width was the widest in MNAR data. In addition, in all three missing data types, those poor performers were less accurate and less precise on variables with missing data than they were on variables without missing data; and the degree of accuracy and precision of these poor performers depended mostly on the level of data ICC. The proportion of missing data only noticeably affected the performance of HM such that in higher missing data levels, HM yielded worse performance measures. Population density factor had negligible effects on most of the measures produced by all studied MDTs except for RMSE, CI width, and CI coverage produced by LW which were modestly influenced by population density. MAR MCAR MNAR two-level data
2	Predicting the Unobserved : A statistical analysis of missing data techniques for binary classification Säfström, Stella January 2019 (has links) The aim of the thesis is to investigate how the classification performance of random forest and logistic regression differ, given an imbalanced data set with MCAR missing data. The performance is measured in terms of accuracy and sensitivity. Two analyses are performed: one with a simulated data set and one application using data from the Swedish population registries. The simulation study is created to have the same class imbalance at 1:5. The missing values are handled using three different techniques: complete case analysis, predictive mean matching and mean imputation. The thesis concludes that logistic regression and random forest are on average equally accurate, with some instances of random forest outperforming logistic regression. Logistic regression consistently outperforms random forest with regards to sensitivity. This implies that logistic regression may be the best option for studies where the goal is to accurately predict outcomes in the minority class. None of the missing data techniques stood out in terms of performance. Random forest logistic regression imputation classification MCAR missing data imbalanced data Probability Theory and Statistics Sannolikhetsteori och statistik
3	Planned Missing Data in Mediation Analysis January 2015 (has links) abstract: This dissertation examines a planned missing data design in the context of mediational analysis. The study considered a scenario in which the high cost of an expensive mediator limited sample size, but in which less expensive mediators could be gathered on a larger sample size. Simulated multivariate normal data were generated from a latent variable mediation model with three observed indicator variables, M1, M2, and M3. Planned missingness was implemented on M1 under the missing completely at random mechanism. Five analysis methods were employed: latent variable mediation model with all three mediators as indicators of a latent construct (Method 1), auxiliary variable model with M1 as the mediator and M2 and M3 as auxiliary variables (Method 2), auxiliary variable model with M1 as the mediator and M2 as a single auxiliary variable (Method 3), maximum likelihood estimation including all available data but incorporating only mediator M1 (Method 4), and listwise deletion (Method 5). The main outcome of interest was empirical power to detect the mediated effect. The main effects of mediation effect size, sample size, and missing data rate performed as expected with power increasing for increasing mediation effect sizes, increasing sample sizes, and decreasing missing data rates. Consistent with expectations, power was the greatest for analysis methods that included all three mediators, and power decreased with analysis methods that included less information. Across all design cells relative to the complete data condition, Method 1 with 20% missingness on M1 produced only 2.06% loss in power for the mediated effect; with 50% missingness, 6.02% loss; and 80% missingess, only 11.86% loss. Method 2 exhibited 20.72% power loss at 80% missingness, even though the total amount of data utilized was the same as Method 1. Methods 3 – 5 exhibited greater power loss. Compared to an average power loss of 11.55% across all levels of missingness for Method 1, average power losses for Methods 3, 4, and 5 were 23.87%, 29.35%, and 32.40%, respectively. In conclusion, planned missingness in a multiple mediator design may permit higher quality characterization of the mediator construct at feasible cost. / Dissertation/Thesis / Doctoral Dissertation Psychology 2015 MCAR missing completely at random planned missing data statistical mediation
4	Auxiliary variables a weight against nonresponse bias : A simulation study Lindberg, Mattias, Guban, Peter January 2014 (has links) Today’s surveys face a growing problem with increasing nonresponse. The increase in nonresponse rate causes a need for better and more effective ways to reduce the nonresponse bias. There are three major scientific orientation of today’s research dealing with nonresponse. One is examining the social factors, the second one studies different data collection methods and the third investigating the use of weights to adjust estimators for nonresponse. We would like to contribute to the third orientation by evaluating estimators which use and adjust weights based on auxiliary variables to balance the survey nonresponse through simulations. For the simulation we use an artificial population consisting of 35455 participants from the Representativity Indicators for Survey Quality project. We model three nonresponse mechanisms (MCAR, MAR and MNAR) with three different coefficient of determination s between our study variable and the auxiliary variables and under three response rates resulting in 63 simulation scenarios. The scenarios are replicated 1000 times to acquire the results. We outline our findings and results for each estimator in all scenarios with the help of bias measures. Nonresponse MAR MCAR MNAR nonresponse bias estimator indicator auxiliary variable Probability Theory and Statistics Sannolikhetsteori och statistik
5	Om imputationsmetoder i statistisk analys : En simuleringsstudie om bortfallshantering och påverkan i en regressionsanalys Ericsson, Axel, Zetterberg, Johannes January 2020 (has links) Bortfall är ett vanligt problem vid genomförandet av statistiska analyser. Studien undersöker hur val av metod för bortfallshantering påverkar en regressionsanalys estimerad från ett dataset med slumpmässigt saknad data. Genom simulerad data och en Monte Carlo-simulation har resultaten jämförts utifrån olika storlekar på bortfall. De saknade värdena har hanterats med fem metoder; imputationer med medelvärde, stokastisk regression, random forest, predictive mean matching (PMM) samt estimation med complete case analysis (CCA). Bortfallshanteringen utvärderas genom modellernas goodness of fit och betakoefficientens bias. Studien visar att CCA och imputering med stokastisk regression, random forest och PMM kan estimera betakoefficienten utan bias men att det ökar med större andel bortfall, vidare påverkas risken för att begå ett typ I-fel. simuleringsstudie bortfall MCAR Monte Carlo regressionsanalys Probability Theory and Statistics Sannolikhetsteori och statistik
6	Missing Data Imputation Method Comparison in Ohio University Student Retention Database Hening, Dyah A. January 2009 (has links) No description available. Higher Education Industrial Engineering Data Imputation Missing data MNAR MCAR MAR student retention
7	Análise de dados categorizados com omissão / Analysis of categorical data with missingness Poleto, Frederico Zanqueta 30 August 2006 (has links) Neste trabalho aborda-se aspectos teóricos, computacionais e aplicados de análises clássicas de dados categorizados com omissão. Uma revisão da literatura é apresentada enquanto se introduz os mecanismos de omissão, mostrando suas características e implicações nas inferências de interesse por meio de um exemplo considerando duas variáveis respostas dicotômicas e estudos de simulação. Amplia-se a modelagem descrita em Paulino (1991, Brazilian Journal of Probability and Statistics 5, 1-42) da distribuição multinomial para a produto de multinomiais para possibilitar a inclusão de variáveis explicativas na análise. Os resultados são desenvolvidos em formulação matricial adequada para a implementação computacional, que é realizada com a construção de uma biblioteca para o ambiente estatístico R, a qual é disponibilizada para facilitar o traçado das inferências descritas nesta dissertação. A aplicação da teoria é ilustrada por meio de cinco exemplos de características diversas, uma vez que se ajusta modelos estruturais lineares (homogeneidade marginal), log-lineares (independência, razão de chances adjacentes comum) e funcionais lineares (kappa, kappa ponderado, sensibilidade/especificidade, valor preditivo positivo/negativo) para as probabilidades de categorização. Os padrões de omissão também são variados, com omissões em uma ou duas variáveis, confundimento de células vizinhas, sem ou com subpopulações. / We consider theoretical, computational and applied aspects of classical categorical data analyses with missingness. We present a literature review while introducing the missingness mechanisms, highlighting their characteristics and implications in the inferences of interest by means of an example involving two binary responses and simulation studies. We extend the multinomial modeling scenario described in Paulino (1991, Brazilian Journal of Probability and Statistics 5, 1-42) to the product-multinomial setup to allow for the inclusion of explanatory variables. We develop the results in matrix formulation and implement the computational procedures via subroutines written under R statistical environment. We illustrate the application of the theory by means of five examples with different characteristics, fitting structural linear (marginal homogeneity), log-linear (independence, constant adjacent odds ratio) and functional linear models (kappa, weighted kappa, sensitivity/specificity, positive/negative predictive value) for the marginal probabilities. The missingness patterns includes missingness in one or two variables, neighbor cells confounded, with or without explanatory variables. categorical data dados categorizados dados faltantes dados incompletos dados omissos ignorable mechanism incomplete data MAR MAR MCAR MCAR mecanismo ignorável mecanismo não-ignorável missing data MNAR MNAR modelos de seleção non-ignorable mechanism selection models
8	Análise de dados categorizados com omissão / Analysis of categorical data with missingness Frederico Zanqueta Poleto 30 August 2006 (has links) Neste trabalho aborda-se aspectos teóricos, computacionais e aplicados de análises clássicas de dados categorizados com omissão. Uma revisão da literatura é apresentada enquanto se introduz os mecanismos de omissão, mostrando suas características e implicações nas inferências de interesse por meio de um exemplo considerando duas variáveis respostas dicotômicas e estudos de simulação. Amplia-se a modelagem descrita em Paulino (1991, Brazilian Journal of Probability and Statistics 5, 1-42) da distribuição multinomial para a produto de multinomiais para possibilitar a inclusão de variáveis explicativas na análise. Os resultados são desenvolvidos em formulação matricial adequada para a implementação computacional, que é realizada com a construção de uma biblioteca para o ambiente estatístico R, a qual é disponibilizada para facilitar o traçado das inferências descritas nesta dissertação. A aplicação da teoria é ilustrada por meio de cinco exemplos de características diversas, uma vez que se ajusta modelos estruturais lineares (homogeneidade marginal), log-lineares (independência, razão de chances adjacentes comum) e funcionais lineares (kappa, kappa ponderado, sensibilidade/especificidade, valor preditivo positivo/negativo) para as probabilidades de categorização. Os padrões de omissão também são variados, com omissões em uma ou duas variáveis, confundimento de células vizinhas, sem ou com subpopulações. / We consider theoretical, computational and applied aspects of classical categorical data analyses with missingness. We present a literature review while introducing the missingness mechanisms, highlighting their characteristics and implications in the inferences of interest by means of an example involving two binary responses and simulation studies. We extend the multinomial modeling scenario described in Paulino (1991, Brazilian Journal of Probability and Statistics 5, 1-42) to the product-multinomial setup to allow for the inclusion of explanatory variables. We develop the results in matrix formulation and implement the computational procedures via subroutines written under R statistical environment. We illustrate the application of the theory by means of five examples with different characteristics, fitting structural linear (marginal homogeneity), log-linear (independence, constant adjacent odds ratio) and functional linear models (kappa, weighted kappa, sensitivity/specificity, positive/negative predictive value) for the marginal probabilities. The missingness patterns includes missingness in one or two variables, neighbor cells confounded, with or without explanatory variables. dados categorizados dados faltantes dados incompletos dados omissos MAR MCAR mecanismo ignorável mecanismo não-ignorável MNAR modelos de seleção categorical data ignorable mechanism incomplete data MAR MCAR missing data MNAR non-ignorable mechanism selection models
9	Análise de dados categorizados com omissão em variáveis explicativas e respostas / Categorical data analysis with missingness in explanatory and response variables Poleto, Frederico Zanqueta 08 April 2011 (has links) Nesta tese apresentam-se desenvolvimentos metodológicos para analisar dados com omissão e também estudos delineados para compreender os resultados de tais análises. Escrutinam-se análises de sensibilidade bayesiana e clássica para dados com respostas categorizadas sujeitas a omissão. Mostra-se que as componentes subjetivas de cada abordagem podem influenciar os resultados de maneira não-trivial, independentemente do tamanho da amostra, e que, portanto, as conclusões devem ser cuidadosamente avaliadas. Especificamente, demonstra-se que distribuições \\apriori\\ comumente consideradas como não-informativas ou levemente informativas podem, na verdade, ser bastante informativas para parâmetros inidentificáveis, e que a escolha do modelo sobreparametrizado também tem um papel importante. Quando há omissão em variáveis explicativas, também é necessário propor um modelo marginal para as covariáveis mesmo se houver interesse apenas no modelo condicional. A especificação incorreta do modelo para as covariáveis ou do modelo para o mecanismo de omissão leva a inferências enviesadas para o modelo de interesse. Trabalhos anteriormente publicados têm-se dividido em duas vertentes: ou utilizam distribuições semiparamétricas/não-paramétricas, flexíveis para as covariáveis, e identificam o modelo com a suposição de um mecanismo de omissão não-informativa, ou empregam distribuições paramétricas para as covariáveis e permitem um mecanismo mais geral, de omissão informativa. Neste trabalho analisam-se respostas binárias, combinando um mecanismo de omissão informativa com um modelo não-paramétrico para as covariáveis contínuas, por meio de uma mistura induzida pela distribuição \\apriori\\ de processo de Dirichlet. No caso em que o interesse recai apenas em momentos da distribuição das respostas, propõe-se uma nova análise de sensibilidade sob o enfoque clássico para respostas incompletas que evita suposições distribucionais e utiliza parâmetros de sensibilidade de fácil interpretação. O procedimento tem, em particular, grande apelo na análise de dados contínuos, campo que tradicionalmente emprega suposições de normalidade e/ou utiliza parâmetros de sensibilidade de difícil interpretação. Todas as análises são ilustradas com conjuntos de dados reais. / We present methodological developments to conduct analyses with missing data and also studies designed to understand the results of such analyses. We examine Bayesian and classical sensitivity analyses for data with missing categorical responses and show that the subjective components of each approach can influence results in non-trivial ways, irrespectively of the sample size, concluding that they need to be carefully evaluated. Specifically, we show that prior distributions commonly regarded as slightly informative or non-informative may actually be too informative for non-identifiable parameters, and that the choice of over-parameterized models may drastically impact the results. When there is missingness in explanatory variables, we also need to consider a marginal model for the covariates even if the interest lies only on the conditional model. An incorrect specification of either the model for the covariates or of the model for the missingness mechanism leads to biased inferences for the parameters of interest. Previously published works are commonly divided into two streams: either they use semi-/non-parametric flexible distributions for the covariates and identify the model via a non-informative missingness mechanism, or they employ parametric distributions for the covariates and allow a more general informative missingness mechanism. We consider the analysis of binary responses, combining an informative missingness model with a non-parametric model for the continuous covariates via a Dirichlet process mixture. When the interest lies only in moments of the response distribution, we consider a new classical sensitivity analysis for incomplete responses that avoids distributional assumptions and employs easily interpreted sensitivity parameters. The procedure is particularly useful for analyses of missing continuous data, an area where normality is traditionally assumed and/or relies on hard-to-interpret sensitivity parameters. We illustrate all analyses with real data sets. Análise de sensibilidade Dados faltantes ou incompletos Dirichlet process Identifiability Identificabilidade Ignorance and uncertainty intervals Incomplete or missing data Intervalos de ignorância e de incerteza MAR MAR MCAR and MNAR MCAR e MNAR Overparameterization. Processo de Dirichlet Selection and pattern-mixture models Sensitivity analysis Sobreparametrização.
10	Análise de dados categorizados com omissão em variáveis explicativas e respostas / Categorical data analysis with missingness in explanatory and response variables Frederico Zanqueta Poleto 08 April 2011 (has links) Nesta tese apresentam-se desenvolvimentos metodológicos para analisar dados com omissão e também estudos delineados para compreender os resultados de tais análises. Escrutinam-se análises de sensibilidade bayesiana e clássica para dados com respostas categorizadas sujeitas a omissão. Mostra-se que as componentes subjetivas de cada abordagem podem influenciar os resultados de maneira não-trivial, independentemente do tamanho da amostra, e que, portanto, as conclusões devem ser cuidadosamente avaliadas. Especificamente, demonstra-se que distribuições \\apriori\\ comumente consideradas como não-informativas ou levemente informativas podem, na verdade, ser bastante informativas para parâmetros inidentificáveis, e que a escolha do modelo sobreparametrizado também tem um papel importante. Quando há omissão em variáveis explicativas, também é necessário propor um modelo marginal para as covariáveis mesmo se houver interesse apenas no modelo condicional. A especificação incorreta do modelo para as covariáveis ou do modelo para o mecanismo de omissão leva a inferências enviesadas para o modelo de interesse. Trabalhos anteriormente publicados têm-se dividido em duas vertentes: ou utilizam distribuições semiparamétricas/não-paramétricas, flexíveis para as covariáveis, e identificam o modelo com a suposição de um mecanismo de omissão não-informativa, ou empregam distribuições paramétricas para as covariáveis e permitem um mecanismo mais geral, de omissão informativa. Neste trabalho analisam-se respostas binárias, combinando um mecanismo de omissão informativa com um modelo não-paramétrico para as covariáveis contínuas, por meio de uma mistura induzida pela distribuição \\apriori\\ de processo de Dirichlet. No caso em que o interesse recai apenas em momentos da distribuição das respostas, propõe-se uma nova análise de sensibilidade sob o enfoque clássico para respostas incompletas que evita suposições distribucionais e utiliza parâmetros de sensibilidade de fácil interpretação. O procedimento tem, em particular, grande apelo na análise de dados contínuos, campo que tradicionalmente emprega suposições de normalidade e/ou utiliza parâmetros de sensibilidade de difícil interpretação. Todas as análises são ilustradas com conjuntos de dados reais. / We present methodological developments to conduct analyses with missing data and also studies designed to understand the results of such analyses. We examine Bayesian and classical sensitivity analyses for data with missing categorical responses and show that the subjective components of each approach can influence results in non-trivial ways, irrespectively of the sample size, concluding that they need to be carefully evaluated. Specifically, we show that prior distributions commonly regarded as slightly informative or non-informative may actually be too informative for non-identifiable parameters, and that the choice of over-parameterized models may drastically impact the results. When there is missingness in explanatory variables, we also need to consider a marginal model for the covariates even if the interest lies only on the conditional model. An incorrect specification of either the model for the covariates or of the model for the missingness mechanism leads to biased inferences for the parameters of interest. Previously published works are commonly divided into two streams: either they use semi-/non-parametric flexible distributions for the covariates and identify the model via a non-informative missingness mechanism, or they employ parametric distributions for the covariates and allow a more general informative missingness mechanism. We consider the analysis of binary responses, combining an informative missingness model with a non-parametric model for the continuous covariates via a Dirichlet process mixture. When the interest lies only in moments of the response distribution, we consider a new classical sensitivity analysis for incomplete responses that avoids distributional assumptions and employs easily interpreted sensitivity parameters. The procedure is particularly useful for analyses of missing continuous data, an area where normality is traditionally assumed and/or relies on hard-to-interpret sensitivity parameters. We illustrate all analyses with real data sets. Análise de sensibilidade Dados faltantes ou incompletos Identificabilidade Intervalos de ignorância e de incerteza MAR MCAR e MNAR Processo de Dirichlet Sobreparametrização. Dirichlet process Identifiability Ignorance and uncertainty intervals Incomplete or missing data MAR MCAR and MNAR Overparameterization. Selection and pattern-mixture models Sensitivity analysis

Search results