Indiana University-Purdue University Indianapolis (IUPUI) / As medical data continues to transition to electronic formats, opportunities arise for researchers to use this microdata to discover patterns and increase knowledge that can improve patient care. Now more than ever, it is critical to protect the identities of the
patients contained in these databases. Even after removing obvious “identifier”
attributes, such as social security numbers or first and last names, that clearly identify a specific person, it is possible to join “quasi-identifier” attributes from two or more publicly
available databases to identify individuals.
K-anonymity is an approach that has been used to ensure that no one individual
can be distinguished within a group of at least k individuals. However, the majority of the proposed approaches implementing k-anonymity have focused on improving the efficiency of algorithms implementing k-anonymity; less emphasis has been put towards ensuring the “utility” of anonymized data from a researchers’ perspective. We propose a
new data utility measurement, called the research value (RV), which extends existing
utility measurements by employing data constraints rules that are designed to improve
the effectiveness of queries against the anonymized data.
To anonymize a given raw dataset, two algorithms are proposed that use predefined
generalizations provided by the data content expert and their corresponding
research values to assess an attribute’s data utility as it is generalizing the data to
ensure k-anonymity. In addition, an automated algorithm is presented that uses
clustering and the RV to anonymize the dataset. All of the proposed algorithms scale
efficiently when the number of attributes in a dataset is large.
Identifer | oai:union.ndltd.org:IUPUI/oai:scholarworks.iupui.edu:1805/3427 |
Date | 14 August 2013 |
Creators | Morton, Stuart Michael |
Contributors | Mahoui, Malika, Palakal, Mathew J., Gibson, P. Joseph, Kharrazi, Hadi |
Source Sets | Indiana University-Purdue University Indianapolis |
Language | en_US |
Detected Language | English |
Type | Thesis |
Page generated in 0.0021 seconds