Global ETD Search

Return to search

CONTEXT AWARE PRIVACY PRESERVING CLUSTERING AND CLASSIFICATION

Data are valuable assets to any organizations or individuals. Data are sources of useful information which is a big part of decision making. All sectors have potential to benefit from having information. Commerce, health, and research are some of the fields that have benefited from data. On the other hand, the availability of the data makes it easy for anyone to exploit the data, which in many cases are private confidential data. It is necessary to preserve the confidentiality of the data. We study two categories of privacy: Data Value Hiding and Data Pattern Hiding. Privacy is a huge concern but equally important is the concern of data utility. Data should avoid privacy breach yet be usable. Although these two objectives are contradictory and achieving both at the same time is challenging, having knowledge of the purpose and the manner in which it will be utilized helps. In this research, we focus on some particular situations for clustering and classification problems and strive to balance the utility and privacy of the data.
In the first part of this dissertation, we propose Nonnegative Matrix Factorization (NMF) based techniques that accommodate constraints defined explicitly into the update rules. These constraints determine how the factorization takes place leading to the favorable results. These methods are designed to make alterations on the matrices such that user-specified cluster properties are introduced. These methods can be used to preserve data value as well as data pattern. As NMF and K-means are proven to be equivalent, NMF is an ideal choice for pattern hiding for clustering problems. In addition to the NMF based methods, we propose methods that take into account the data structures and the attribute properties for the classification problems. We separate the work into two different parts: linear classifiers and nonlinear classifiers. We propose two different solutions based on the classifiers. We study the effect of distortion on the utility of data.
We propose three distortion measurement metrics which demonstrate better characteristics than the traditional metrics. The effectiveness of the measures is examined on different benchmark datasets. The result shows that the methods have the desirable properties such as invariance to translation, rotation, and scaling.

Privacy Preserving Data Mining

Nonnegative Matrix Factorization

Databases and Information Systems

Other Computer Sciences

Identifer	oai:union.ndltd.org:uky.edu/oai:uknowledge.uky.edu:cs_etds-1016
Date	01 January 2013
Creators	Thapa, Nirmal
Publisher	UKnowledge
Source Sets	University of Kentucky
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations--Computer Science

Page generated in 0.0022 seconds

CONTEXT AWARE PRIVACY PRESERVING CLUSTERING AND CLASSIFICATION

Description

Links & Downloads

Tags

Additional Fields