Global ETD Search

1	Multiple Imputation Methods for Nonignorable Nonresponse, Adaptive Survey Design, and Dissemination of Synthetic Geographies Paiva, Thais Viana January 2014 (has links) <p>This thesis presents methods for multiple imputation that can be applied to missing data and data with confidential variables. Imputation is useful for missing data because it results in a data set that can be analyzed with complete data statistical methods. The missing data are filled in by values generated from a model fit to the observed data. The model specification will depend on the observed data pattern and the missing data mechanism. For example, when the reason why the data is missing is related to the outcome of interest, that is nonignorable missingness, we need to alter the model fit to the observed data to generate the imputed values from a different distribution. Imputation is also used for generating synthetic values for data sets with disclosure restrictions. Since the synthetic values are not actual observations, they can be released for statistical analysis. The interest is in fitting a model that approximates well the relationships in the original data, keeping the utility of the synthetic data, while preserving the confidentiality of the original data. We consider applications of these methods to data from social sciences and epidemiology.</p><p>The first method is for imputation of multivariate continuous data with nonignorable missingness. Regular imputation methods have been used to deal with nonresponse in several types of survey data. However, in some of these studies, the assumption of missing at random is not valid since the probability of missing depends on the response variable. We propose an imputation method for multivariate data sets when there is nonignorable missingness. We fit a truncated Dirichlet process mixture of multivariate normals to the observed data under a Bayesian framework to provide flexibility. With the posterior samples from the mixture model, an analyst can alter the estimated distribution to obtain imputed data under different scenarios. To facilitate that, I developed an R application that allows the user to alter the values of the mixture parameters and visualize the imputation results automatically. I demonstrate this process of sensitivity analysis with an application to the Colombian Annual Manufacturing Survey. I also include a simulation study to show that the correct complete data distribution can be recovered if the true missing data mechanism is known, thus validating that the method can be meaningfully interpreted to do sensitivity analysis.</p><p>The second method uses the imputation techniques for nonignorable missingness to implement a procedure for adaptive design in surveys. Specifically, I develop a procedure that agencies can use to evaluate whether or not it is effective to stop data collection. This decision is based on utility measures to compare the data collected so far with potential follow-up samples. The options are assessed by imputation of the nonrespondents under different missingness scenarios considered by the analyst. The variation in the utility measures is compared to the cost induced by the follow-up sample sizes. We apply the proposed method to the 2007 U.S. Census of Manufactures.</p><p>The third method is for imputation of confidential data sets with spatial locations using disease mapping models. We consider data that include fine geographic information, such as census tract or street block identifiers. This type of data can be difficult to release as public use files, since fine geography provides information that ill-intentioned data users can use to identify individuals. We propose to release data with simulated geographies, so as to enable spatial analyses while reducing disclosure risks. We fit disease mapping models that predict areal-level counts from attributes in the file, and sample new locations based on the estimated models. I illustrate this approach using data on causes of death in North Carolina, including evaluations of the disclosure risks and analytic validity that can result from releasing synthetic geographies.</p> / Dissertation Statistics Confidential data Missing data Multiple Imputation Nonignorable missingness Spatial model Synthetic data
2	Towards Trustworthy Online Voting : Distributed Aggregation of Confidential Data / Confiance dans le vote en ligne : agrégation distribuée de données confidentielles Riemann, Robert 18 December 2017 (has links) L’agrégation des valeurs qui doivent être gardées confidentielles tout en garantissant la robustesse du processus et l’exactitude du résultat est nécessaire pour un nombre croissant d’applications. Divers types d’enquêtes, telles que les examens médicaux, les référendums, les élections, ainsi que les nouveaux services de Internet of Things, tels que la domotique, nécessitent l’agrégation de données confidentielles. En général,la confidentialité est assurée sur la base de tiers de confiance ou des promesses de cryptographie, dont les capacités ne peuvent être évaluées sans expertise.L’ambition de cette thèse est de réduire le besoin de confiance dans les autorités, de même que la technologie, et d’explorer les méthodes d’agrégations de données à grande échelle, qui garantissent un degré élevé de confidentialité et ne dépendent ni de tiers de confiance ni de cryptographie. Inspiré par BitTorrent et Bitcoin, les protocoles P2P sont considérés. La première contribution de cette thèse est l’extension du protocole d’agrégation distribuée BitBallot dans le but de couvrir les agrégations dans les réseaux P2P comprenant des pairs adversaires avec un comportement défaillant ou byzantin. Les changements introduits permettent éventuellement de maintenir un résultat précis en présence d’une minorité adversaire. Les limites de scalabilité rencontrées conduisent à la deuxième contribution dans le but de soutenir les agrégations à grande échelle. Inspiré par BitBallot et BitTorrent, un nouveau protocole distribué appelé ADVOKAT est proposé.Dans les deux protocoles, les pairs sont affectés aux noeuds feuilles d’un réseau de superposition d’une structure arborescente qui détermine le calcul des agrégats intermédiaires et restreint l’échange de données. La partition des données et du calcul entre un réseau de pairs équipotent limite le risque de violation de données et réduit le besoin de confiance dans les autorités. Les protocoles fournissent une couche middleware dont la flexibilité est démontrée par les applications de vote et de loterie. / Aggregation of values that need to be kept confidential while guaranteeing the robustness of the process and the correctness of the result is necessary for an increasing number of applications. Various kinds of surveys, such as medical ones, opinion polls, referendums, elections, as well as new services of the Internet of Things, such as home automation, require the aggregation of confidential data. In general, the confidentiality is ensured on the basis of trusted third parties or promises of cryptography, whose capacities cannot be assessed without expert knowledge.The ambition of this thesis is to reduce the need for trust in both authorities and technology and explore methods for large-scale data aggregations, that ensure a high degree of confidentiality and rely neither on trusted third parties nor solely on cryptography. Inspired by BitTorrent and Bitcoin, P2P protocols are considered.The first contribution of this thesis is the extension of the distributed aggregation protocol BitBallot with the objective to cover aggregations in P2P networks comprising adversarial peers with fail-stop or Byzantine behaviour. The introduced changes allow eventually to maintain an accurate result in presence of an adversarial minority.The encountered scalability limitations lead to the second contribution with the objective to support large-scale aggregations. Inspired by both BitBallot and BitTorrent, a novel distributed protocol called ADVOKAT is proposed.In both protocols, peers are assigned to leaf nodes of a tree overlay network which determines the computation of intermediate aggregates and restricts the exchange of data. The partition of data and computation among a network of equipotent peers limits the potential for data breaches and reduces the need for trust in authorities. The protocols provide a middleware layer whose flexibility is demonstrated by voting and lottery applications. Agrégation confidentielle Agrégation distribuée Vote en ligne Kademlia Systèmes distribués Aggregation of confidential data Distributed aggregation Online voting Kademlia Distributed systems

Search results

Multiple Imputation Methods for Nonignorable Nonresponse, Adaptive Survey Design, and Dissemination of Synthetic Geographies

Towards Trustworthy Online Voting : Distributed Aggregation of Confidential Data / Confiance dans le vote en ligne : agrégation distribuée de données confidentielles