Spelling suggestions: "subject:"[een] MISSING DATA"" "subject:"[enn] MISSING DATA""
31 |
Praleistų reikšmių įrašymo metodų efektyvumas turizmo tyrime / Efficiency of missing data imputation methods in the survey on tourismBinkytė, Kristina 08 September 2009 (has links)
Šiame darbe išnagrinėjome kelis praleistų reikšmių įrašymo metodus, kuriuos taikėme išvykstamojo turizmo statistinio tyrimo 2.6. klausimo pirmiems dviem punktams: paslaugų paketo ir transporto išlaidoms. Įrašymo metodų efektyvumo analizę atlikome su pilnais duomenimis, juose fiktyviai padarydamos praleistas reikšmes ir į jas įrašydamos reikšmes keliais praleistų reikšmių įrašymo metodais. Tuomet turėdamos tikras ir įrašytas reikšmes galėjome palyginti parametrų įverčius. Kadangi praleistos reikšmės gali atsirasti atsitiktinai ir neatsitiktinai, todėl mes praleistų reikšmių įrašymo metodus taikėme trims atvejams: kai praleistos reikšmės atsiranda atsitiktinai, kai praleistos reikšmės atsiranda tada, kai neatsako respondentai turėję didžiausias ar mažiausias išlaidas kelionėje. Praleistų reikšmių įrašymui taikėme skirstiniu pagrįstą, vidurkio, atsitiktinio pakartojimo, santykiu pagrįstą ir daugiareikšmio įrašymo metodus, nesudarydamos įrašymo klasių ir sudarydamos įrašymo klases. Taigi, siūlome tokį pat praleistų reikšmių įrašymo metodų efektyvumo tyrimą atlikti ir likusiems 2.6. klausimo punktams, nusistatyti tinkamiausią įrašymo metodą ir tada jį taikyti jau tikroms praleistoms reikšmėms įrašyti. Be to, reikėtų atsižvelgti ir į dėl įrašymo atsirandančios dispersijos įvertinį, nes jos indėlis į bendrą dispersijos įvertinį yra nemažas. Atlikus praleistų reikšmių įrašymą, bus galima taikyti kompiuterinius įverčių skaičiavimo metodus ir nebus prarasta kita informacija, kurią... [toliau žr. visą tekstą] / In this work, we examined some missing data imputation methods in the survey on outbound tourism for the package tour and transport expenses. We performed an analysis of the efficiency of missing data imputation methods using full data sets with fictitious missing data applying various missing data imputation methods to fill in the missing data. Thus, we had real values and imputed values and could compare the estimated parameters. The missing data can appear randomly and non-randomly, so we applied missing data imputation methods in three cases: when missing data appear randomly and when missing data appear in case of non-response of respondents who had the highest or the lowest travel expenses. We applied distribution, average, random, ratio and multiple imputation methods for missing data imputation without using imputation classes and using imputation classes. We propose to perform the same efficiency survey of missing data imputation methods for the remaining items of expenses in the outbound tourism questionnaire in order to find out a convenient missing data imputation method and apply it for the real missing data (the current analysis was performed applying fictitious missing data). After the missing data imputation, we can apply the procedures of parameter estimation and we will not lose other information as it would be the case with the elimination of questionnaires having missing data.
|
32 |
Praleistų reikšmių įrašymo metodų efektyvumas turizmo tyrime / Efficiency of missing data imputation methods in the survey on tourismŠležaitė, Gintvilė 08 September 2009 (has links)
Šiame darbe išnagrinėjome kelis praleistų reikšmių įrašymo metodus, kuriuos taikėme išvykstamojo turizmo statistinio tyrimo 2.6. klausimo pirmiems dviem punktams: paslaugų paketo ir transporto išlaidoms. Įrašymo metodų efektyvumo analizę atlikome su pilnais duomenimis, juose fiktyviai padarydamos praleistas reikšmes ir į jas įrašydamos reikšmes keliais praleistų reikšmių įrašymo metodais. Tuomet turėdamos tikras ir įrašytas reikšmes galėjome palyginti parametrų įverčius. Kadangi praleistos reikšmės gali atsirasti atsitiktinai ir neatsitiktinai, todėl mes praleistų reikšmių įrašymo metodus taikėme trims atvejams: kai praleistos reikšmės atsiranda atsitiktinai, kai praleistos reikšmės atsiranda tada, kai neatsako respondentai turėję didžiausias ar mažiausias išlaidas kelionėje. Praleistų reikšmių įrašymui taikėme skirstiniu pagrįstą, vidurkio, atsitiktinio pakartojimo, santykiu pagrįstą ir daugiareikšmio įrašymo metodus, nesudarydamos įrašymo klasių ir sudarydamos įrašymo klases. Taigi, siūlome tokį pat praleistų reikšmių įrašymo metodų efektyvumo tyrimą atlikti ir likusiems 2.6. klausimo punktams, nusistatyti tinkamiausią įrašymo metodą ir tada jį taikyti jau tikroms praleistoms reikšmėms įrašyti. Be to, reikėtų atsižvelgti ir į dėl įrašymo atsirandančios dispersijos įvertinį, nes jos indėlis į bendrą dispersijos įvertinį yra nemažas. Atlikus praleistų reikšmių įrašymą, bus galima taikyti kompiuterinius įverčių skaičiavimo metodus ir nebus prarasta kita informacija, kurią... [toliau žr. visą tekstą] / In this work, we examined some missing data imputation methods in the survey on outbound tourism for the package tour and transport expenses. We performed an analysis of the efficiency of missing data imputation methods using full data sets with fictitious missing data applying various missing data imputation methods to fill in the missing data. Thus, we had real values and imputed values and could compare the estimated parameters. The missing data can appear randomly and non-randomly, so we applied missing data imputation methods in three cases: when missing data appear randomly and when missing data appear in case of non-response of respondents who had the highest or the lowest travel expenses. We applied distribution, average, random, ratio and multiple imputation methods for missing data imputation without using imputation classes and using imputation classes. We propose to perform the same efficiency survey of missing data imputation methods for the remaining items of expenses in the outbound tourism questionnaire in order to find out a convenient missing data imputation method and apply it for the real missing data (the current analysis was performed applying fictitious missing data). After the missing data imputation, we can apply the procedures of parameter estimation and we will not lose other information as it would be the case with the elimination of questionnaires having missing data.
|
33 |
Topics in Association RulesShaikh, Mateen 21 June 2013 (has links)
Association rules are a useful concept in data mining with the goal of summa- rizing the strong patterns that exist in data. We have identified several issues in mining association rules and addressed them in three main areas. The first area we explore is standardized interestingness measures. Different interestingness measures exist on different ranges, and interpreting them can be subtly problematic. We standardize several interestingness measures and show how these are useful to consider in association rule mining in three examples. A second area we address is incomplete transactions. By applying statistical methods in new ways to association rules, we provide a more comprehensive means of analyzing incomplete transactions. We also describe how to find families of distributions for interestingness measure values when transactions are incomplete. Finally, we address the common result of mining: a plethora of association rules. Unlike methods which attempt to reduce the number of resulting rules, we harness this large quantity to find a higher-level set of patterns. / NSERC Discovery Grant and OMRI Early Researcher Award
|
34 |
Fehlende Daten in Additiven Modellen /Nittner, Thomas. January 2003 (has links) (PDF)
Univ., Diss.--München, 2003. / Zsfassung in engl. Sprache.
|
35 |
Multilevel multiple imputation: An examination of competing methodsJanuary 2015 (has links)
abstract: Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution (e.g., multivariate normal). FCS, on the other hand, imputes variables one at a time, drawing missing values from a series of univariate distributions. In the single-level context, these two approaches have been shown to be equivalent with multivariate normal data. However, less is known about the similarities and differences of these two approaches with multilevel data, and the methodological literature provides no insight into the situations under which the approaches would produce identical results. This document examined five multilevel multiple imputation approaches (three JM methods and two FCS methods) that have been proposed in the literature. An analytic section shows that only two of the methods (one JM method and one FCS method) used imputation models equivalent to a two-level joint population model that contained random intercepts and different associations across levels. The other three methods employed imputation models that differed from the population model primarily in their ability to preserve distinct level-1 and level-2 covariances. I verified the analytic work with computer simulations, and the simulation results also showed that imputation models that failed to preserve level-specific covariances produced biased estimates. The studies also highlighted conditions that exacerbated the amount of bias produced (e.g., bias was greater for conditions with small cluster sizes). The analytic work and simulations lead to a number of practical recommendations for researchers. / Dissertation/Thesis / Doctoral Dissertation Psychology 2015
|
36 |
DOES IT MATTER HOW WE GO WRONG? : The role of model misspecification and study design in assessing the performance of doubly robust estimators / Spelar det roll HUR vi gör fel? : Betydelsen av studiedesign och felspecificering av modeller när man utvärderar prestationen av dubbelt robusta estimatorerEcker, Kreske January 2017 (has links)
This thesis concerns doubly robust (DR) estimation in missing data contexts. Previous research is not unanimous as to which estimators perform best and in which situations DR is to be preferred over other estimators. We observe that the conditions surrounding comparisons of DR- and other estimators vary between dierent previous studies. We therefore focus on the effects of three distinct aspects of study design on the performance of one DR-estimator in comparison to outcome regression (OR). These aspects are sample size, the way in which models are misspecified, and the degree of association between the covariates and propensities. We find that while there are no drastic eects of the type of model misspecication, all three aspects do affect how DR compares to OR. The results can be used to better understand the divergent conclusions of previous research.
|
37 |
Informative censoring with an imprecise anchor event: estimation of change over time and implications for longitudinal data analysisCollins, Jamie Elizabeth 22 January 2016 (has links)
A number of methods have been developed to analyze longitudinal data with dropout. However, there is no uniformly accepted approach. Model performance, in terms of the bias and accuracy of the estimator, depends on the underlying missing data mechanism and it is unclear how existing methods will perform when little is known about the missing data mechanism.
Here we evaluate methods for estimating change over time in longitudinal studies with informative dropout in three settings: using a linear mixed effect (LME) estimator in the presence of multiple types of dropout; proposing an update to the pattern mixture modeling (PMM) approach in the presence of imprecision in identifying informative dropouts; and utilizing this new approach in the presence of prognostic factor by dropout interaction.
We demonstrate that amount of dropout, the proportion of dropout that is informative, and the variability in outcome all affect the performance of an LME estimator in data with a mixture of informative and non-informative dropout. When the amount of dropout is moderate to large (>20% overall) the potential for relative bias greater than 10% increases, especially with large variability in outcome measure, even under scenarios where only a portion of the dropouts are informative.
Under conditions where LME models do not perform well, it is necessary to take the missing data mechanism into account. We develop a method that extends the PMM approach to account for uncertainty in identifying informative dropouts. In scenarios with this uncertainty, the proposed method outperformed the traditional method in terms of bias and coverage.
In the presence of interaction between dropout and a prognostic factor, the LME model performed poorly, in terms of bias and coverage, in estimating prognostic factor-specific slopes and the interaction between the prognostic factor and time. The update to the PMM approach, proposed here, outperformed both the LME and traditional PMM.
Our work suggests that investigators must be cautious with any analysis of data with informative dropout. We found that particular attention must be paid to the model assumptions when the missing data mechanism is not well understood.
|
38 |
On the Interpolation of Missing Dependent Variable ObservationsMedvedeff, Alexander Mark 12 May 2008 (has links)
No description available.
|
39 |
Using the EM Algorithm to Estimate the Difference in Dependent Proportions in a 2 x 2 Table with Missing Data.Talla Souop, Alain Duclaux 18 August 2004 (has links) (PDF)
In this thesis, I am interested in estimating the difference between dependent proportions from a 2 × 2 contingency table when there are missing data. The Expectation-Maximization (EM) algorithm is used to obtain an estimate for the difference between correlated proportions. To obtain the standard error of this difference I employ a resampling technique known as bootstrapping. The performance of the bootstrap standard error is evaluated for different sample sizes and different fractions of missing information. Finally, a 100(1-α)% bootstrap confidence interval is proposed and its coverage is evaluated through simulation.
|
40 |
Causal discovery in the presence of missing dataTu, Ruibo January 2018 (has links)
Missing data are ubiquitous in many domains such as healthcare. Depending on how they are missing, the (conditional) independence relations in the observed data may be different from those for the complete data generated by the underlying causal process (which are not fully observable) and, as a consequence, simply applying existing causal discovery methods to the observed data may give wrong conclusions. It is then essential to extend existing causal discovery approaches to find true underlying causal structure from such incomplete data. In this thesis, we aim at solving this problem for data that are missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR). With missingness mechanisms represented by the Missingness Graph, we present conditions under which addition corrected to derive conditional independence/dependence relations in the complete data. Combined with the correction method that gives closed-form, consistent tests of conditional independence, the proposed causal discovery method, as an extension of the PC algorithm, is shown to give asymptotically correct results. Experiment results illustrate that with further reasonable assumptions, the proposed algorithm can correct the conditional independence for values MCAR, MAR and rather general cases of values MNAR. / Saknade data är allestädes närvarande på många områden, t.ex. sjukvård. Beroende på hur de saknas kan de (villkorliga) oberoende förhållandena i de observerade uppgifterna skilja sig från de för de fullständiga data som genereras av den underliggande orsaksprocessen (som inte är fullt observerbara) och som en följd av att helt enkelt tillämpa befintlig kausal upptäckt metoder för de observerade data kan ge felaktiga slutsatser. Det är då viktigt att förlänga befintliga metoder för kausala upptäckter för att hitta en sann underliggande kausalstruktur från sådana ofullständiga data. I denna avhandling strävar vi efter att lösa detta problem för data som saknas helt slumpmässigt (MCAR), saknas slumpmässigt (MAR) eller saknas inte slumpmässigt (MNAR). Med missmekanismer representerade av Missfallsgrafen presenterar vi förhållanden under vilka tillägg korrigerade för att härleda villkorliga oberoende/beroendeförhållanden i de fullständiga uppgifterna.Kombinerad med korrigeringsmetoden som ger sluten form, konsekventa test av villkorligt oberoende, visas att den föreslagnaorsaks-sökningsmetoden, som en förlängning av PC-algoritmen, ger asymptotiskt korrekta resultat. Experimentresultat illustrera att med ytterligare rimliga antaganden kan den föreslagna algoritmen korrigera det villkorliga oberoende för värdena MCAR, MAR och ganska generella fall av värden MNAR.
|
Page generated in 0.0442 seconds