Return to search

Privacy preserving data publishing: an expected gain model with negative association immunity. / CUHK electronic theses & dissertations collection

隱私保護是許多應用(特別是和人們有關的)要面對的重要問題。在隱私保護數據發布之研究中,我們探討如何在個人隱私不會被侵犯之情況下發布一個包含個人資料之數據庫,而此數據庫仍包含有用的信息以供研究或其他數據分析之用。 / 本論文著重於隱私保護數據發布之隱私模型及算法。我們首先提出一個預期收益模型,以確認發布一個數據庫會否侵犯個人隱私。預期收益模型符合我們在本論文中提出的六個關於量化私人信息之公理,而第六條公理還會以社會心理學之角度考慮人為因素。而且,這模型考慮敵意信息收集人在發布數據庫之中所得到的好處。所以這模型切實反映出敵意信息收集人利用這些好處而獲得利益,而其他隱私模型並沒有考慮這點。然後,我們還提出了一個算法來生成符合預期收益模型之發布數據庫。我們亦進行了一些包含現實數據庫之實驗來表示出這算法是現實可行的。在那之後,我們提出了一個敏感值抑制算法,使發布數據庫能對負向關聯免疫,而負向關聯是前景/背景知識攻擊之一種。我們亦進行了一些實驗來表示出我們只需要抑制平均數個百份比之敏感值就可以令一個發佈數據庫對負向關聯免疫。最後,我們探討在分散環境之下之隱私保護數據發布,這代表有兩個或以上的數據庫持有人分別生成不同但有關之發布數據庫。我們提出一個在分散環境下可用的相異L多樣性的隱私模型和一個算法來生成符合此模型之發布數據庫。我們亦進行了一些實驗來表示出這算法是現實可行的。 / Privacy preserving is an important issue in many applications, especially for the applications that involve human. In privacy preserving data publishing (PPDP), we study how to publish a database, which contains data records of some individuals, so that the privacy of the individuals is preserved while the published database still contains useful information for research or data analysis. / This thesis focuses on privacy models and algorithms in PPDP. We first propose an expected gain model to define whether privacy is preserved for publishing a database. The expected gain model satisfies the six axioms in quantifying private information proposed in this thesis, where the sixth axiom considers human factors in the view of social psychology. In addition, it considers the amount of advantage gained by an adversary by exploiting the private information deduced from a published database. Hence, the model reflects the reality that the adversary uses such an advantage to earn a profit, which is not conisidered by other existing privacy models. Then, we propose an algorithm to generate published databases that satisfy the expected gain model. Experiments on real datasets are conducted to show that the proposed algorithm is feasible to real applications. After that, we propose a value suppression framework to make the published databases immune to negative association, which is a kind of background / foreground knowledge attacks. Experiments are conducted to show that negative association immunity can be achieved by suppressing only a few percent of sensitive values on average. Finally, we investigate PPDP in a non-centralized environment, in which two or more data holders generate their own different but related published databases. We propose a non-centralized distinct l-diversity requirement as the privacy model and an algorithm to generate published databases for this requirement. Experiments are conducted to show that the proposed algorithm is feasible to real applications. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Cheong, Chi Hong. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 186-193). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Thesis Contributions and Organization --- p.2 / Chapter 1.3 --- Other Related Areas --- p.5 / Chapter 1.3.1 --- Privacy Preserving Data Mining --- p.5 / Chapter 1.3.2 --- Partition-Based Approach vs. Differential Privacy Approach --- p.5 / Chapter 2 --- Expected Gain Model --- p.7 / Chapter 2.1 --- Introduction --- p.8 / Chapter 2.1.1 --- Background and Motivation --- p.8 / Chapter 2.1.2 --- Contributions --- p.11 / Chapter 2.2 --- Table Models --- p.12 / Chapter 2.2.1 --- Private Table --- p.12 / Chapter 2.2.2 --- Published Table --- p.13 / Chapter 2.3 --- Private Information Model --- p.14 / Chapter 2.3.1 --- Proposition --- p.14 / Chapter 2.3.2 --- Private Information and Private Probability --- p.15 / Chapter 2.3.3 --- Public Information and Public Probability --- p.18 / Chapter 2.3.4 --- Axioms in Quantifying Private Information --- p.20 / Chapter 2.4 --- Quantifying Private Information --- p.34 / Chapter 2.4.1 --- Expected Gain of a Fair Guessing Game --- p.34 / Chapter 2.4.2 --- Analysis --- p.41 / Chapter 2.5 --- Tuning the Importance of Opposite Information --- p.48 / Chapter 2.6 --- Conclusions --- p.53 / Chapter 3 --- Generalized Expected Gain Model --- p.56 / Chapter 3.1 --- Introduction --- p.58 / Chapter 3.2 --- Table Models --- p.60 / Chapter 3.2.1 --- Private Table --- p.62 / Chapter 3.2.2 --- Published Table --- p.62 / Chapter 3.3 --- Expected Gain Model --- p.63 / Chapter 3.3.1 --- Random Variable and Probability Distribution --- p.64 / Chapter 3.3.2 --- Public Information --- p.64 / Chapter 3.3.3 --- Private Information --- p.65 / Chapter 3.3.4 --- Expected Gain Model --- p.66 / Chapter 3.4 --- Generalization Algorithm --- p.75 / Chapter 3.4.1 --- Generalization Property and Subset Property --- p.75 / Chapter 3.4.2 --- Modified Version of Incognito --- p.78 / Chapter 3.5 --- Related Work --- p.80 / Chapter 3.5.1 --- k-Anonymity --- p.80 / Chapter 3.5.2 --- l-Diversity --- p.81 / Chapter 3.5.3 --- Confidence Bounding --- p.83 / Chapter 3.5.4 --- t-Closeness --- p.84 / Chapter 3.6 --- Experiments --- p.85 / Chapter 3.6.1 --- Experiment Set 1: Average/Max/Min Expected Gain --- p.85 / Chapter 3.6.2 --- Experiment Set 2: Expected Gain Distribution --- p.90 / Chapter 3.6.3 --- Experiment Set 3: Modified Version of Incognito --- p.95 / Chapter 3.7 --- Conclusions --- p.99 / Chapter 4 --- Negative Association Immunity --- p.100 / Chapter 4.1 --- Introduction --- p.100 / Chapter 4.2 --- Related Work --- p.104 / Chapter 4.3 --- Negative Association Immunity and Value Suppression --- p.107 / Chapter 4.3.1 --- Negative Association --- p.108 / Chapter 4.3.2 --- Negative Association Immunity --- p.111 / Chapter 4.3.3 --- Achieving Negative Association Immunity by Value Suppression --- p.114 / Chapter 4.4 --- Local Search Algorithm --- p.123 / Chapter 4.5 --- Experiments --- p.125 / Chapter 4.5.1 --- Settings --- p.125 / Chapter 4.5.2 --- Results and Discussions --- p.128 / Chapter 4.6 --- Conclusions --- p.129 / Chapter 5 --- Non-Centralized Distinct l-Diversity --- p.130 / Chapter 5.1 --- Introduction --- p.130 / Chapter 5.2 --- Related Work --- p.138 / Chapter 5.3 --- Table Models --- p.140 / Chapter 5.3.1 --- Private Tables --- p.140 / Chapter 5.3.2 --- Published Tables --- p.141 / Chapter 5.4 --- Private Information Deduced from Multiple Published Tables --- p.143 / Chapter 5.4.1 --- Private Information Deduced by Simple Counting on Each Published Tables --- p.143 / Chapter 5.4.2 --- Private Information Deduced from Multiple Published Tables --- p.145 / Chapter 5.4.3 --- Probabilistic Table --- p.156 / Chapter 5.5 --- Non-Centralized Distinct l-Diversity and Algorithm --- p.158 / Chapter 5.5.1 --- Non-centralized Distinct l-diversity --- p.159 / Chapter 5.5.2 --- Algorithm --- p.165 / Chapter 5.5.3 --- Theorems --- p.171 / Chapter 5.6 --- Experiments --- p.174 / Chapter 5.6.1 --- Settings --- p.174 / Chapter 5.6.2 --- Metrics --- p.176 / Chapter 5.6.3 --- Results and Discussions --- p.179 / Chapter 5.7 --- Conclusions --- p.181 / Chapter 6 --- Conclusions --- p.183 / Bibliography --- p.186

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328006
Date January 2012
ContributorsCheong, Chi Hong., Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatelectronic resource, electronic resource, remote, 1 online resource (xii, 193 leaves) : ill.
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0028 seconds