Global ETD Search

Return to search

Improving the Scalability of an Exact Approach for Frequent Item Set Hiding

Technological advances have led to the generation of large databases of organizational data recognized as an information-rich, strategic asset for internal analysis and sharing with trading partners. Data mining techniques can discover patterns in large databases including relationships considered strategically relevant to the owner of the data. The frequent item set hiding problem is an area of active research to study approaches for hiding the sensitive knowledge patterns before disclosing the data outside the organization. Several methods address hiding sensitive item sets including an exact approach that generates an extension to the original database that, when combined with the original database, limits the discovery of sensitive association rules without impacting other non-sensitive information. To generate the database extension, this method formulates a constraint optimization problem (COP). Solving the COP formulation is the dominant factor in the computational resource requirements of the exact approach. This dissertation developed heuristics that address the scalability of the exact hiding method. The heuristics are directed at improving the performance of COP solver by reducing the size of the COP formulation without significantly affecting the quality of the solutions generated. The first heuristic decomposes the COP formulation into multiple smaller problem instances that are processed separately by the COP solver to generate partial extensions of the database. The smaller database extensions are then combined to form a database extension that is close to the database extension generated with the original, larger COP formulation. The second heuristic evaluates the revised border used to formulate the COP and reduces the number of variables and constraints by selectively substituting multiple item sets with composite variables. Solving the COP with fewer variables and constraints reduces the computational cost of the processing. Results of heuristic processing were compared with an existing exact approach based on the size of the database extension, the ability to hide sensitive data, and the impact on nonsensitive data.

association rule mining

binary integer programming

knowledge hiding

privacy preserving data mining

Computer Sciences

Identifer	oai:union.ndltd.org:nova.edu/oai:nsuworks.nova.edu:gscis_etd-1204
Date	01 January 2013
Creators	LaMacchia, Carolyn
Publisher	NSUWorks
Source Sets	Nova Southeastern University
Detected Language	English
Type	text
Format	application/pdf
Source	CEC Theses and Dissertations

Page generated in 0.002 seconds

Improving the Scalability of an Exact Approach for Frequent Item Set Hiding

Description

Links & Downloads

Tags

Additional Fields