Global ETD Search

41	Praleistų reikšmių įrašymo metodų efektyvumas turizmo tyrime / Efficiency of missing data imputation methods in the survey on tourism Binkytė, Kristina 08 September 2009 (has links) Šiame darbe išnagrinėjome kelis praleistų reikšmių įrašymo metodus, kuriuos taikėme išvykstamojo turizmo statistinio tyrimo 2.6. klausimo pirmiems dviem punktams: paslaugų paketo ir transporto išlaidoms. Įrašymo metodų efektyvumo analizę atlikome su pilnais duomenimis, juose fiktyviai padarydamos praleistas reikšmes ir į jas įrašydamos reikšmes keliais praleistų reikšmių įrašymo metodais. Tuomet turėdamos tikras ir įrašytas reikšmes galėjome palyginti parametrų įverčius. Kadangi praleistos reikšmės gali atsirasti atsitiktinai ir neatsitiktinai, todėl mes praleistų reikšmių įrašymo metodus taikėme trims atvejams: kai praleistos reikšmės atsiranda atsitiktinai, kai praleistos reikšmės atsiranda tada, kai neatsako respondentai turėję didžiausias ar mažiausias išlaidas kelionėje. Praleistų reikšmių įrašymui taikėme skirstiniu pagrįstą, vidurkio, atsitiktinio pakartojimo, santykiu pagrįstą ir daugiareikšmio įrašymo metodus, nesudarydamos įrašymo klasių ir sudarydamos įrašymo klases. Taigi, siūlome tokį pat praleistų reikšmių įrašymo metodų efektyvumo tyrimą atlikti ir likusiems 2.6. klausimo punktams, nusistatyti tinkamiausią įrašymo metodą ir tada jį taikyti jau tikroms praleistoms reikšmėms įrašyti. Be to, reikėtų atsižvelgti ir į dėl įrašymo atsirandančios dispersijos įvertinį, nes jos indėlis į bendrą dispersijos įvertinį yra nemažas. Atlikus praleistų reikšmių įrašymą, bus galima taikyti kompiuterinius įverčių skaičiavimo metodus ir nebus prarasta kita informacija, kurią... [toliau žr. visą tekstą] / In this work, we examined some missing data imputation methods in the survey on outbound tourism for the package tour and transport expenses. We performed an analysis of the efficiency of missing data imputation methods using full data sets with fictitious missing data applying various missing data imputation methods to fill in the missing data. Thus, we had real values and imputed values and could compare the estimated parameters. The missing data can appear randomly and non-randomly, so we applied missing data imputation methods in three cases: when missing data appear randomly and when missing data appear in case of non-response of respondents who had the highest or the lowest travel expenses. We applied distribution, average, random, ratio and multiple imputation methods for missing data imputation without using imputation classes and using imputation classes. We propose to perform the same efficiency survey of missing data imputation methods for the remaining items of expenses in the outbound tourism questionnaire in order to find out a convenient missing data imputation method and apply it for the real missing data (the current analysis was performed applying fictitious missing data). After the missing data imputation, we can apply the procedures of parameter estimation and we will not lose other information as it would be the case with the elimination of questionnaires having missing data. Praleistos reikšmės Įrašymo metodai Missing data Imputation methods
42	Praleistų reikšmių įrašymo metodų efektyvumas turizmo tyrime / Efficiency of missing data imputation methods in the survey on tourism Šležaitė, Gintvilė 08 September 2009 (has links) Šiame darbe išnagrinėjome kelis praleistų reikšmių įrašymo metodus, kuriuos taikėme išvykstamojo turizmo statistinio tyrimo 2.6. klausimo pirmiems dviem punktams: paslaugų paketo ir transporto išlaidoms. Įrašymo metodų efektyvumo analizę atlikome su pilnais duomenimis, juose fiktyviai padarydamos praleistas reikšmes ir į jas įrašydamos reikšmes keliais praleistų reikšmių įrašymo metodais. Tuomet turėdamos tikras ir įrašytas reikšmes galėjome palyginti parametrų įverčius. Kadangi praleistos reikšmės gali atsirasti atsitiktinai ir neatsitiktinai, todėl mes praleistų reikšmių įrašymo metodus taikėme trims atvejams: kai praleistos reikšmės atsiranda atsitiktinai, kai praleistos reikšmės atsiranda tada, kai neatsako respondentai turėję didžiausias ar mažiausias išlaidas kelionėje. Praleistų reikšmių įrašymui taikėme skirstiniu pagrįstą, vidurkio, atsitiktinio pakartojimo, santykiu pagrįstą ir daugiareikšmio įrašymo metodus, nesudarydamos įrašymo klasių ir sudarydamos įrašymo klases. Taigi, siūlome tokį pat praleistų reikšmių įrašymo metodų efektyvumo tyrimą atlikti ir likusiems 2.6. klausimo punktams, nusistatyti tinkamiausią įrašymo metodą ir tada jį taikyti jau tikroms praleistoms reikšmėms įrašyti. Be to, reikėtų atsižvelgti ir į dėl įrašymo atsirandančios dispersijos įvertinį, nes jos indėlis į bendrą dispersijos įvertinį yra nemažas. Atlikus praleistų reikšmių įrašymą, bus galima taikyti kompiuterinius įverčių skaičiavimo metodus ir nebus prarasta kita informacija, kurią... [toliau žr. visą tekstą] / In this work, we examined some missing data imputation methods in the survey on outbound tourism for the package tour and transport expenses. We performed an analysis of the efficiency of missing data imputation methods using full data sets with fictitious missing data applying various missing data imputation methods to fill in the missing data. Thus, we had real values and imputed values and could compare the estimated parameters. The missing data can appear randomly and non-randomly, so we applied missing data imputation methods in three cases: when missing data appear randomly and when missing data appear in case of non-response of respondents who had the highest or the lowest travel expenses. We applied distribution, average, random, ratio and multiple imputation methods for missing data imputation without using imputation classes and using imputation classes. We propose to perform the same efficiency survey of missing data imputation methods for the remaining items of expenses in the outbound tourism questionnaire in order to find out a convenient missing data imputation method and apply it for the real missing data (the current analysis was performed applying fictitious missing data). After the missing data imputation, we can apply the procedures of parameter estimation and we will not lose other information as it would be the case with the elimination of questionnaires having missing data. Praleistos reikšmės Įrašymo metodai Missing data Imputation methods
43	Topics in Association Rules Shaikh, Mateen 21 June 2013 (has links) Association rules are a useful concept in data mining with the goal of summa- rizing the strong patterns that exist in data. We have identified several issues in mining association rules and addressed them in three main areas. The first area we explore is standardized interestingness measures. Different interestingness measures exist on different ranges, and interpreting them can be subtly problematic. We standardize several interestingness measures and show how these are useful to consider in association rule mining in three examples. A second area we address is incomplete transactions. By applying statistical methods in new ways to association rules, we provide a more comprehensive means of analyzing incomplete transactions. We also describe how to find families of distributions for interestingness measure values when transactions are incomplete. Finally, we address the common result of mining: a plethora of association rules. Unlike methods which attempt to reduce the number of resulting rules, we harness this large quantity to find a higher-level set of patterns. / NSERC Discovery Grant and OMRI Early Researcher Award Association Rules Data Mining Statistics Missing Data Hierarchies Clustering
44	Fehlende Daten in Additiven Modellen / Nittner, Thomas. January 2003 (has links) (PDF) Univ., Diss.--München, 2003. / Zsfassung in engl. Sprache.
45	Impact of data quality on photovoltaic (PV) performance assessment Koubli, Eleni January 2017 (has links) In this work, data quality control and mitigation tools have been developed for improving the accuracy of photovoltaic (PV) system performance assessment. These tools allow to demonstrate the impact of ignoring erroneous or lost data on performance evaluation and fault detection. The work mainly focuses on residential PV systems where monitoring is limited to recording total generation and the lack of meteorological data makes quality control in that area truly challenging. Main quality issues addressed in this work are with regards to wrong system description and missing electrical and/or meteorological data in monitoring. An automatic detection of wrong input information such as system nominal capacity and azimuth is developed, based on statistical distributions of annual figures of PV system performance ratio (PR) and final yield. This approach is specifically useful in carrying out PV fleet analyses where only monthly or annual energy outputs are available. The evaluation is carried out based on synthetic weather data which is obtained by interpolating from a network of about 80 meteorological monitoring stations operated by the UK Meteorological Office. The procedures are used on a large PV domestic dataset, obtained by a social housing organisation, where a significant number of cases with wrong input information are found. Data interruption is identified as another challenge in PV monitoring data, although the effect of this is particularly under-researched in the area of PV. Disregarding missing energy generation data leads to falsely estimated performance figures, which consequently may lead to false alarms on performance and/or the lack of necessary requirements for the financial revenue of a domestic system through the feed-in-tariff scheme. In this work, the effect of missing data is mitigated by applying novel data inference methods based on empirical and artificial neural network approaches, training algorithms and remotely inferred weather data. Various cases of data loss are considered and case studies from the CREST monitoring system and the domestic dataset are used as test cases. When using back-filled energy output, monthly PR estimation yields more accurate results than when including prolonged data gaps in the analysis. Finally, to further discriminate more obscure data from system faults when higher temporal resolution data is available, a remote modelling and failure detection framework is ii developed based on a physical electrical model, remote input weather data and system description extracted from PV module and inverter manufacturer datasheets. The failure detection is based on the analysis of daily profiles and long-term PR comparison of neighbouring PV systems. By employing this tool on various case studies it is seen that undetected wrong data may severely obscure fault detection, affecting PV system s lifetime. Based on the results and conclusions of this work on the employed residential dataset, essential data requirements for domestic PV monitoring are introduced as a potential contribution to existing lessons learnt in PV monitoring.
46	Multilevel multiple imputation: An examination of competing methods January 2015 (has links) abstract: Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution (e.g., multivariate normal). FCS, on the other hand, imputes variables one at a time, drawing missing values from a series of univariate distributions. In the single-level context, these two approaches have been shown to be equivalent with multivariate normal data. However, less is known about the similarities and differences of these two approaches with multilevel data, and the methodological literature provides no insight into the situations under which the approaches would produce identical results. This document examined five multilevel multiple imputation approaches (three JM methods and two FCS methods) that have been proposed in the literature. An analytic section shows that only two of the methods (one JM method and one FCS method) used imputation models equivalent to a two-level joint population model that contained random intercepts and different associations across levels. The other three methods employed imputation models that differed from the population model primarily in their ability to preserve distinct level-1 and level-2 covariances. I verified the analytic work with computer simulations, and the simulation results also showed that imputation models that failed to preserve level-specific covariances produced biased estimates. The studies also highlighted conditions that exacerbated the amount of bias produced (e.g., bias was greater for conditions with small cluster sizes). The analytic work and simulations lead to a number of practical recommendations for researchers. / Dissertation/Thesis / Doctoral Dissertation Psychology 2015 Statistics Psychology Hierarchical Missing Data Multilevel Modeling Multiple Imputation
47	Investigation of Multiple Imputation Methods for Categorical Variables Miranda, Samantha 01 May 2020 (has links) We compare different multiple imputation methods for categorical variables using the MICE package in R. We take a complete data set and remove different levels of missingness and evaluate the imputation methods for each level of missingness. Logistic regression imputation and linear discriminant analysis (LDA) are used for binary variables. Multinomial logit imputation and LDA are used for nominal variables while ordered logit imputation and LDA are used for ordinal variables. After imputation, the regression coefficients, percent deviation index (PDI) values, and relative frequency tables were found for each imputed data set for each level of missingness and compared to the complete corresponding data set. It was found that logistic regression outperformed LDA for binary variables, and LDA outperformed both multinomial logit imputation and ordered logit imputation for nominal and ordered variables. Simulations were ran to confirm the validity of the results. Missing data multiple imputation methods categorical data Physical Sciences and Mathematics
48	DOES IT MATTER HOW WE GO WRONG? : The role of model misspecification and study design in assessing the performance of doubly robust estimators / Spelar det roll HUR vi gör fel? : Betydelsen av studiedesign och felspecificering av modeller när man utvärderar prestationen av dubbelt robusta estimatorer Ecker, Kreske January 2017 (has links) This thesis concerns doubly robust (DR) estimation in missing data contexts. Previous research is not unanimous as to which estimators perform best and in which situations DR is to be preferred over other estimators. We observe that the conditions surrounding comparisons of DR- and other estimators vary between dierent previous studies. We therefore focus on the effects of three distinct aspects of study design on the performance of one DR-estimator in comparison to outcome regression (OR). These aspects are sample size, the way in which models are misspecified, and the degree of association between the covariates and propensities. We find that while there are no drastic eects of the type of model misspecication, all three aspects do affect how DR compares to OR. The results can be used to better understand the divergent conclusions of previous research. Double robustness missing data Probability Theory and Statistics Sannolikhetsteori och statistik
49	Informative censoring with an imprecise anchor event: estimation of change over time and implications for longitudinal data analysis Collins, Jamie Elizabeth 22 January 2016 (has links) A number of methods have been developed to analyze longitudinal data with dropout. However, there is no uniformly accepted approach. Model performance, in terms of the bias and accuracy of the estimator, depends on the underlying missing data mechanism and it is unclear how existing methods will perform when little is known about the missing data mechanism. Here we evaluate methods for estimating change over time in longitudinal studies with informative dropout in three settings: using a linear mixed effect (LME) estimator in the presence of multiple types of dropout; proposing an update to the pattern mixture modeling (PMM) approach in the presence of imprecision in identifying informative dropouts; and utilizing this new approach in the presence of prognostic factor by dropout interaction. We demonstrate that amount of dropout, the proportion of dropout that is informative, and the variability in outcome all affect the performance of an LME estimator in data with a mixture of informative and non-informative dropout. When the amount of dropout is moderate to large (>20% overall) the potential for relative bias greater than 10% increases, especially with large variability in outcome measure, even under scenarios where only a portion of the dropouts are informative. Under conditions where LME models do not perform well, it is necessary to take the missing data mechanism into account. We develop a method that extends the PMM approach to account for uncertainty in identifying informative dropouts. In scenarios with this uncertainty, the proposed method outperformed the traditional method in terms of bias and coverage. In the presence of interaction between dropout and a prognostic factor, the LME model performed poorly, in terms of bias and coverage, in estimating prognostic factor-specific slopes and the interaction between the prognostic factor and time. The update to the PMM approach, proposed here, outperformed both the LME and traditional PMM. Our work suggests that investigators must be cautious with any analysis of data with informative dropout. We found that particular attention must be paid to the model assumptions when the missing data mechanism is not well understood. Biostatistics Informative censoring Longitudinal study Missing data Pattern mixture model
50	Hydrological data interpolation using entropy Ilunga, Masengo 17 November 2006 (has links) Faculty of Engineering and Built Enviroment School of Civil and Enviromental Engineering 0105772w imasengo@yahoo.com / The problem of missing data, insufficient length of hydrological data series and poor quality is common in developing countries. This problem is much more prevalent in developing countries than it is in developed countries. This situation can severely affect the outcome of the water systems managers’ decisions (e.g. reliability of the design, establishment of operating policies for water supply, etc). Thus, numerous data interpolation (infilling) techniques have evolved in hydrology to deal with the missing data. The current study presents merely a methodology by combining different approaches and coping with missing (limited) hydrological data using the theories of entropy, artificial neural networks (ANN) and expectation-maximization (EM) techniques. This methodology is simply formulated into a model named ENANNEX model. This study does not use any physical characteristics of the catchment areas but deals only with the limited information (e.g. streamflow or rainfall) at the target gauge and its similar nearby base gauge(s). The entropy concept was confirmed to be a versatile tool. This concept was firstly used for quantifying information content of hydrological variables (e.g. rainfall or streamflow). The same concept (through directional information transfer index, i.e. DIT) was used in the selection of base/subject gauge. Finally, the DIT notion was also extended to the evaluation of the hydrological data infilling technique performance (i.e. ANN and EM techniques). The methodology was applied to annual total rainfall; annual mean flow series, annual maximum flows and 6-month flow series (means) of selected catchments in the drainage region D “Orange” of South Africa. These data regimes can be regarded as useful for design-oriented studies, flood studies, water balance studies, etc. The results from the case studies showed that DIT is as good index for data infilling technique selection as other criteria, e.g. statistical and graphical. However, the DIT has the feature of being non-dimensionally informational index. The data interpolation iii techniques viz. ANNs and EM (existing methods applied and not yet applied in hydrology) and their new features have been also presented. This study showed that the standard techniques (e.g. Backpropagation-BP and EM) as well as their respective variants could be selected in the missing hydrological data estimation process. However, the capability for the different data interpolation techniques of maintaining the statistical characteristics (e.g. mean, variance) of the target gauge was not neglected. From this study, the relationship between the accuracy of the estimated series (by applying a data infilling technique) and the gap duration was then investigated through the DIT notion. It was shown that a decay (power or exponential) function could better describe that relationship. In other words, the amount of uncertainty removed from the target station in a station-pair, via a given technique, could be known for a given gap duration. It was noticed that the performance of the different techniques depends on the gap duration at the target gauge, the station-pair involved in the missing data estimation and the type of the data regime. This study showed also that it was possible, through entropy approach, to assess (preliminarily) model performance for simulating runoff data at a site where absolutely no record exist: a case study was conducted at Bedford site (in South Africa). Two simulation models, viz. RAFLER and WRSM2000 models, were then assessed in this respect. Both models were found suitable for simulating flows at Bedford. Missing data Interpolation Entropy Artifical neural Networks Expectation maximization

Search results