In the last few decades, there has been a dramatic uptick in the issues related to protecting user privacy in released data, both in statistical databases and anonymized records. Privacy-preserving data publishing is a field established to handle these releases while avoiding the problems that plagued many earlier attempts. This issue is of particular importance for governmental data, where both the release and the privacy requirements are frequently governed by legislature (e.g., HIPAA, FERPA, Clery Act). This problem is doubly compounded by the complex survey methods employed to counter problems in data collection. The preeminent definition for privacy is that of differential privacy, which protects users by limiting the impact that any individual can have on the result of any query. <br><br>The thesis proposes models for differentially private versions of current survey methodologies and, discusses the evaluation of those models. We focus on the issues of missing data and weighting which are common techniques employed in complex surveys to counter problems with sampling and response rates. First we propose a model for answering queries on datasets with missing data while maintaining differential privacy. Our model uses k-Nearest Neighbor imputation to replicate donor values while protecting the privacy of the donor. Our model provides significantly better bias reduction in realistic experiments using existing data, as well as providing less noise than a naive solution. Our second model proposes a method of performing Iterative Proportional Fitting (IPF) in a differentially private manner, a common technique used to ensure that survey records are weighted consistently with known values. We also focus on the general philosophical need to incorporate privacy when creating new survey methodologies, rather than assuming that privacy can simply be added at a later step.
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/17139599 |
Date | 20 December 2021 |
Creators | Shawn A Merrill (11806802) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/thesis/Privacy_in_Complex_Sample_Based_Surveys/17139599 |
Page generated in 0.0019 seconds