Research Doctorate - Doctor of Philosophy (PhD) / The rapid pace of growth in the field of human genetics has left researchers with many new challenges in the area of security and privacy. To encourage participation and foster trust towards research, it is important to ensure that genetic databases are adequately protected. This task is a particularly challenging one for statistical agencies due to the high prevalence of categorical data contained within statistical genetic databases. The absence of natural ordering makes the application of traditional Statistical Disclosure Control (SDC) methods less straightforward, which is why we have proposed a new noise addition technique for categorical values. The main contributions of the thesis are as follows. We provide a comprehensive analysis of the trust relationships that occur between the different stakeholders in a genetic data warehouse system. We also provide a quantifiable model of trust that allows the database manager to granulate the level of protection based on the amount of trust that exists between the stakeholders. To the best of our knowledge, this is the first time that trust has been applied in the SDC context. We propose a privacy protection framework for genetic databases which is designed to deal with the fact that genetic data warehouses typically contain a high proportion of categorical data. The framework includes the use of a clustering technique which allows for the easier application of traditional noise addition techniques for categorical values. Another important contribution of this thesis is a new similarity measure for categorical values, which aims to capture not only the direct similarity between values, but also some sense of transitive similarity. This novel measure also has possible applications in providing a way of ordering categorical values, so that more traditional SDC methods can be more easily applied to them. Our analysis of experimental results also points to a numerical attribute phenomenon, whereby we typically have high similarity between numerical values that are close together, and where the similarity decreases as the absolute value of the difference between numerical values increases. However, some numerical attributes appear to not behave in a strictly `numerical' way. That is, values which are close together numerically do not always appear very similar. We also provide a novel noise addition technique for categorical values, which employs our similarity measure to partition the values in the data set. Our method - VICUS - then perturbs the original microdata file so that each value is more likely to be changed to another value in the same partition than one from a different partition. The technique helps to ensure that the perturbed microdata file retains data quality while also preserving the privacy of individual records.
Identifer | oai:union.ndltd.org:ADTP/233012 |
Date | January 2009 |
Creators | Giggins, Helen |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | Copyright 2009 Helen Giggins |
Page generated in 0.0014 seconds