Global ETD Search

171	變數遺漏值的多重插補應用於條件評估法 / Multiple imputation for missing covariates in contingent valua-tion survey 費詩元, Fei, Shih Yuan Unknown Date (has links) 多數關於願付價格(WTP)之研究中，遺漏資料通常被視為完全隨機遺漏(MCAR)並刪除之。然而，研究中的某些重要變數若具有過高的遺漏比例時，則可能造成分析上的偏誤。收入在許多條件評估(Contingent Valuation)調查中經常扮演著一個重要的角色，同時其也是受訪者最傾向於遺漏的變項之一。在這份研究中，我們將透過模擬的方式來評估多重插補法(Multiple Imputa- tion) 於插補願付價格調查中之遺漏收入之表現。我們考慮三種資料情況：刪除遺漏資料後所剩餘之完整資料、一次插補資料、以及多重插補資料，針對這三種情況，藉由三要素混合模型(Three-Component Mixture Model)所進行之分析來評估其優劣。模擬結果顯示，多重插補法之分析結果優於僅利用刪除遺漏資料所剩餘之完整資料進行分析之結果，並且隨著遺漏比例上升，其優劣更是明顯。我們也發現多重插補法之結果也比起一次插補來的更加可靠、穩定。因此如果資料遺漏機制非完全隨機遺漏之機制時，我們認為多重插補法是一個值得信任且表現不錯的處理方法。此外，文中也透過「竹東及朴子地區心臟血管疾病長期追蹤研究」(Cardio Vascular Disease risk FACtor Two-township Study，簡稱CVDFACTS) 之資料來進行實證分析。文中示範一些評估遺漏機制的技巧，包括比較存活曲線以及邏輯斯迴歸。透過實證分析，我們發現插補前後的確造成模型分析及估計上的差異。 / Most often, studies focus on willingness to pay (WTP) simply ignore the missing values and treat them as if they were missing completely at random. It is well-known that such a practice might cause serious bias and lead to incorrect results. Income is one of the most influential variables in CV (contingent valuation) study and is also the variable that respondents most likely fail to respond. In the present study, we evaluate the performance of multiple imputation (MI) on missing income in the analysis of WTP through a series of simulation experiments. Several approaches such as complete-case analysis, single imputation, and MI are considered and com-pared. We show that performance with MI is always better than complete-case analy-sis, especially when the missing rate gets high. We also show that MI is more stable and reliable than single imputation. As an illustration, we use data from Cardio Vascular Disease risk FACtor Two-township Study (CVDFACTS). We demonstrate how to determine the missing mechanism through comparing the survival curves and a logistic regression model fitting. Based on the empirical study, we find that discarding cases with missing in-come can lead to something different from that with multiple imputation. If the dis-carded cases are not missing complete at random, the remaining samples will be biased. That can be a serious problem in CV research. To conclude, MI is a useful method to deal with missing value problems and it should be worthwhile to give it a try in CV studies. 遺漏多重插補法願付價格 Missing Value multiple imputation WTP
172	Multiple Imputation Methods for Nonignorable Nonresponse, Adaptive Survey Design, and Dissemination of Synthetic Geographies Paiva, Thais Viana January 2014 (has links) <p>This thesis presents methods for multiple imputation that can be applied to missing data and data with confidential variables. Imputation is useful for missing data because it results in a data set that can be analyzed with complete data statistical methods. The missing data are filled in by values generated from a model fit to the observed data. The model specification will depend on the observed data pattern and the missing data mechanism. For example, when the reason why the data is missing is related to the outcome of interest, that is nonignorable missingness, we need to alter the model fit to the observed data to generate the imputed values from a different distribution. Imputation is also used for generating synthetic values for data sets with disclosure restrictions. Since the synthetic values are not actual observations, they can be released for statistical analysis. The interest is in fitting a model that approximates well the relationships in the original data, keeping the utility of the synthetic data, while preserving the confidentiality of the original data. We consider applications of these methods to data from social sciences and epidemiology.</p><p>The first method is for imputation of multivariate continuous data with nonignorable missingness. Regular imputation methods have been used to deal with nonresponse in several types of survey data. However, in some of these studies, the assumption of missing at random is not valid since the probability of missing depends on the response variable. We propose an imputation method for multivariate data sets when there is nonignorable missingness. We fit a truncated Dirichlet process mixture of multivariate normals to the observed data under a Bayesian framework to provide flexibility. With the posterior samples from the mixture model, an analyst can alter the estimated distribution to obtain imputed data under different scenarios. To facilitate that, I developed an R application that allows the user to alter the values of the mixture parameters and visualize the imputation results automatically. I demonstrate this process of sensitivity analysis with an application to the Colombian Annual Manufacturing Survey. I also include a simulation study to show that the correct complete data distribution can be recovered if the true missing data mechanism is known, thus validating that the method can be meaningfully interpreted to do sensitivity analysis.</p><p>The second method uses the imputation techniques for nonignorable missingness to implement a procedure for adaptive design in surveys. Specifically, I develop a procedure that agencies can use to evaluate whether or not it is effective to stop data collection. This decision is based on utility measures to compare the data collected so far with potential follow-up samples. The options are assessed by imputation of the nonrespondents under different missingness scenarios considered by the analyst. The variation in the utility measures is compared to the cost induced by the follow-up sample sizes. We apply the proposed method to the 2007 U.S. Census of Manufactures.</p><p>The third method is for imputation of confidential data sets with spatial locations using disease mapping models. We consider data that include fine geographic information, such as census tract or street block identifiers. This type of data can be difficult to release as public use files, since fine geography provides information that ill-intentioned data users can use to identify individuals. We propose to release data with simulated geographies, so as to enable spatial analyses while reducing disclosure risks. We fit disease mapping models that predict areal-level counts from attributes in the file, and sample new locations based on the estimated models. I illustrate this approach using data on causes of death in North Carolina, including evaluations of the disclosure risks and analytic validity that can result from releasing synthetic geographies.</p> / Dissertation Statistics Confidential data Missing data Multiple Imputation Nonignorable missingness Spatial model Synthetic data
173	Estimation of Aerodynamic Parameters in Real-Time : Implementation and Comparison of a Sequential Frequency Domain Method and a Batch Method Nyman, Lina January 2016 (has links) The flight testing and evaluation of collected data must be efficient during intensive flight-test programs such as the ones conducted during development of new aircraft. The aim of this thesis has thus been to produce a first version of an aerodynamic derivative estimation program that is to be used during real-time flight tests. The program is to give a first estimate of the aerodynamic derivatives as well as check the quality of the data collected and thus serve as a decision support during tests. The work that has been performed includes processing of data in order to use it in computations, comparing a batch and a sequential estimation method using real-time data and programming a user interface. All computations and programming has been done in Matlab. The estimation methods that have been compared are both built on transforming data to the frequency domain using a Chirp z-transform and then estimating the aerodynamic derivatives using complex least squares with instrumental variables.The sequential frequency domain method performs estimates at a given interval while the batch method performs one estimation at the end of the maneuver. Both methods compared in this thesis produce equal results. The continuous updates of the sequential method was however found to be better suited for a real-time application than the single estimation of the batch method. The telemetric data received from the aircraft must be synchronized to a common frequency of 60 Hz. Missing samples of the data stream must be linearly interpolated and different units of measured parameters must be corrected in order to be able to perform these estimations in the real-time test environment. Aerodynamic coefficients least squares chirp z-transform instrumental variables missing data
174	Comparing the Powers of Several Proposed Tests for Testing the Equality of the Means of Two Populations When Some Data Are Missing Dunu, Emeka Samuel 05 1900 (has links) In comparing the means .of two normally distributed populations with unknown variance, two tests very often used are: the two independent sample and the paired sample t tests. There is a possible gain in the power of the significance test by using the paired sample design instead of the two independent samples design. Social sciences -- Statistical methods. Mathematical statistics. Missing observations (Statistics) equality data
175	Diagnosing Learner Deficiencies in Algorithmic Reasoning Hubbard, George U. 05 1900 (has links) It is hypothesized that useful diagnostic information can reside in the wrong answers of multiple-choice tests, and that properly designed distractors can yield indications of misinformation and missing information in algorithmic reasoning on the part of the test taker. In addition to summarizing the literature regarding diagnostic research as opposed to scoring research, this study proposes a methodology for analyzing test results and compares the findings with those from the research of Birenbaum and Tatsuoka and others. The proposed method identifies the conditions of misinformation and missing information, and it contains a statistical compensation for careless errors. Strengths and weaknesses of the method are explored, and suggestions for further research are offered. Examinations -- Scoring. misinformation missing information algorithm
176	LATENT VARIABLE MODELS GIVEN INCOMPLETELY OBSERVED SURROGATE OUTCOMES AND COVARIATES Ren, Chunfeng 01 January 2014 (has links) Latent variable models (LVMs) are commonly used in the scenario where the outcome of the main interest is an unobservable measure, associated with multiple observed surrogate outcomes, and affected by potential risk factors. This thesis develops an approach of efficient handling missing surrogate outcomes and covariates in two- and three-level latent variable models. However, corresponding statistical methodologies and computational software are lacking efficiently analyzing the LVMs given surrogate outcomes and covariates subject to missingness in the LVMs. We analyze the two-level LVMs for longitudinal data from the National Growth of Health Study where surrogate outcomes and covariates are subject to missingness at any of the levels. A conventional method for efficient handling of missing data is to reexpress the desired model as a joint distribution of variables, including the surrogate outcomes that are subject to missingness conditional on all of the covariates that are completely observable, and estimate the joint model by maximum likelihood, which is then transformed to the desired model. The joint model, however, identifies more parameters than desired, in general. The over-identified joint model produces biased estimates of LVMs so that it is most necessary to describe how to impose constraints on the joint model so that it has a one-to-one correspondence with the desired model for unbiased estimation. The constrained joint model handles missing data efficiently under the assumption of ignorable missing data and is estimated by a modified application of the expectation-maximization (EM) algorithm. Biostatistics
177	A Comparison for Longitudinal Data Missing Due to Truncation Liu, Rong 01 January 2006 (has links) Many longitudinal clinical studies suffer from patient dropout. Often the dropout is nonignorable and the missing mechanism needs to be incorporated in the analysis. The methods handling missing data make various assumptions about the missing mechanism, and their utility in practice depends on whether these assumptions apply in a specific application. Ramakrishnan and Wang (2005) proposed a method (MDT) to handle nonignorable missing data, where missing is due to the observations exceeding an unobserved threshold. Assuming that the observations arise from a truncated normal distribution, they suggested an EM algorithm to simplify the estimation.In this dissertation the EM algorithm is implemented for the MDT method when data may include missing at random (MAR) cases. A data set, where the missing data occur due to clinical deterioration and/or improvement is considered for illustration. The missing data are observed at both ends of the truncated normal distribution. A simulation study is conducted to compare the performance of other relevant methods. The factors chosen for the simulation study included, the missing data mechanisms, the forms of response functions, missing at one or two time points, dropout rates, sample sizes and different correlations with AR(1) structure. It was found that the choice of the method for dealing with the missing data is important, especially when a large proportion is missing. The MDT method seems to perform the best when there is reason to believe that the assumption of truncated normal distribution is appropriate.A multiple imputation (MI) procedure under the MDT method to accommodate the uncertainty introduced by imputation is also proposed. The proposed method combines the MDT method with Rubin's (1987) MI method. A procedure to implement the MI method is described. missing data data sets patient dropout algorithm simulation MDT Biostatistics Physical Sciences and Mathematics Statistics and Probability
178	SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY Zheng, Xiyu 01 January 2016 (has links) Abstract In a two-level hierarchical linear model(HLM2), the outcome as well as covariates may have missing values at any of the levels. One way to analyze all available data in the model is to estimate a multivariate normal joint distribution of variables, including the outcome, subject to missingness conditional on covariates completely observed by maximum likelihood(ML); draw multiple imputation (MI) of missing values given the estimated joint model; and analyze the hierarchical model given the MI [1,2]. The assumption is data missing at random (MAR). While this method yields efficient estimation of the hierarchical model, it often estimates the model given discrete missing data that is handled under multivariate normality. In this thesis, we evaluate how robust it is to estimate a hierarchical linear model given discrete missing data by the method. We simulate incompletely observed data from a series of hierarchical linear models given discrete covariates MAR, estimate the models by the method, and assess the sensitivity of handling discrete missing data under the multivariate normal joint distribution by computing bias, root mean squared error, standard error, and coverage probability in the estimated hierarchical linear models via a series of simulation studies. We want to achieve the following aim: Evaluate the performance of the method handling binary covariates MAR. We let the missing patterns of level-1 and -2 binary covariates depend on completely observed variables and assess how the method handles binary missing data given different values of success probabilities and missing rates. Based on the simulation results, the missing data analysis is robust under certain parameter settings. Efficient analysis performs very well for estimation of level-1 fixed and random effects across varying success probabilities and missing rates. MAR estimation of level-2 binary covariate is not well estimated when the missing rate in level-2 binary covariate is greater than 10%. The rest of the thesis is organized as follows: Section 1 introduces the background information including conventional methods for hierarchical missing data analysis, different missing data mechanisms, and the innovation and significance of this study. Section 2 explains the efficient missing data method. Section 3 represents the sensitivity analysis of the missing data method and explain how we carry out the simulation study using SAS, software package HLM7, and R. Section 4 illustrates the results and useful recommendations for researchers who want to use the missing data method for binary covariates MAR in HLM2. Section 5 presents an illustrative analysis National Growth of Health Study (NGHS) by the missing data method. The thesis ends with a list of useful references that will guide the future study and simulation codes we used. Missing at Random Maximum Likelihood Multiple Imputation Hierarchical Linear Model Social Statistics
179	Data envelopment analysis with sparse data Gullipalli, Deep Kumar January 1900 (has links) Master of Science / Department of Industrial & Manufacturing Systems Engineering / David H. Ben-Arieh / Quest for continuous improvement among the organizations and issue of missing data for data analysis are never ending. This thesis brings these two topics under one roof, i.e., to evaluate the productivity of organizations with sparse data. This study focuses on Data Envelopment Analysis (DEA) to determine the efficiency of 41 member clinics of Kansas Association of Medically Underserved (KAMU) with missing data. The primary focus of this thesis is to develop new reliable methods to determine the missing values and to execute DEA. DEA is a linear programming methodology to evaluate relative technical efficiency of homogenous Decision Making Units, using multiple inputs and outputs. Effectiveness of DEA depends on the quality and quantity of data being used. DEA outcomes are susceptible to missing data, thus, creating a need to supplement sparse data in a reliable manner. Determining missing values more precisely improves the robustness of DEA methodology. Three methods to determine the missing values are proposed in this thesis based on three different platforms. First method named as Average Ratio Method (ARM) uses average value, of all the ratios between two variables. Second method is based on a modified Fuzzy C-Means Clustering algorithm, which can handle missing data. The issues associated with this clustering algorithm are resolved to improve its effectiveness. Third method is based on interval approach. Missing values are replaced by interval ranges estimated by experts. Crisp efficiency scores are identified in similar lines to how DEA determines efficiency scores using the best set of weights. There exists no unique way to evaluate the effectiveness of these methods. Effectiveness of these methods is tested by choosing a complete dataset and assuming varying levels of data as missing. Best set of recovered missing values, based on the above methods, serves as a source to execute DEA. Results show that the DEA efficiency scores generated with recovered values are close within close proximity to the actual efficiency scores that would be generated with the complete data. As a summary, this thesis provides an effective and practical approach for replacing missing values needed for DEA. Data envelopment analysis Sparse data Missing values Healthcare Clustering Fuzzy Set Theory Industrial Engineering (0546)
180	The performance of the ATLAS missing transverse momentum high-level trigger in 2015 pp collisions at 13 TeV Chiu, Justin 09 September 2016 (has links) The performance of the ATLAS missing transverse momentum (ETmiss) high-level trigger during 2015 operation is presented. In 2015, the Large Hadron Collider operated at a higher centre-of-mass energy and shorter bunch spacing (sqrt(s) = 13 TeV and 25 ns, respectively) than in previous operation. In future operation, the Large Hadron Collider will operate at even higher instantaneous luminosity (O(10^34 cm^−2 s^−1)) and produce a higher average number of interactions per bunch crossing, <mu>. These operating conditions will pose significant challenges to the ETmiss trigger efficiency and rate. An overview of the new algorithms implemented to address these challenges, and of the existing algorithms is given. An integrated luminosity of 1.4 fb^−1 with <mu> = 14 was collected from pp collisions of the Large Hadron Collider by the ATLAS detector during October and November 2015 and was used to study the efficiency, correlation with offline reconstruction, and rates of the trigger algorithms. The performance was found to be satisfactory. From these studies, recommendations for future operating specifications of the trigger were made. / Graduate / 0798, / jchiu@uvic.ca physics trigger particle physics ATLAS experiment CERN missing transverse momentum MET ATLAS detector LHC

Search results