Utility mining algorithms have recently been proposed to discover high utility itemsets from a quantitative database. Factors such as profits or prices are concerned in measuring the utility values of purchased items for revealing more useful knowledge to managers. Nearly all the existing algorithms are performed in a batch way to extract high utility itemsets. In real-world applications, transactions may, however, be inserted, deleted or modified in a database. The batch mining procedure requires more computational time for rescanning the whole updated database to maintain the up-to-date knowledge. In the first part of this thesis, two algorithms for data insertion and data deletion are respectively proposed for efficiently updating the discovered high utility itemsets based on pre-large concepts. The proposed algorithms firstly partition itemsets into three parts with nine cases according to whether they are large (high), pre-large or small transaction-weighted utilization in the original database. Each part is then performed by its own procedure to maintain and update the discovered high utility itemsets. Based on the pre-large concepts, the original database only need to be rescanned for much fewer itemsets in the maintenance process of high utility itemsets.
Besides, the risk of privacy threats usually exists in the process of data collection and data dissemination. Sensitive or personal information are required to be kept as private information before they are shared or published. Privacy-preserving utility mining (PPUM) has thus become an important issue in recent years. In the second part of this thesis, two evolutionary privacy-preserving utility mining algorithms to hide sensitive high utility itemsets in data sanitization for inserting dummy transactions and deleting transactions are respectively proposed. The two evolutionary privacy-preserving utility mining algorithms find appropriate transactions for insertion and deletion in the data-sanitization process. They adopt a flexible evaluation function with three factors. Different weights are assigned to the three factors depending on users¡¦ preference. The maintenance algorithms proposed in the first part of this thesis are also used in the GA-based approach to reduce the cost of rescanning databases, thus speeding up the evaluation process of chromosomes. Experiments are conducted as well to evaluate the performance of the proposed algorithms.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0911112-051949 |
Date | 11 September 2012 |
Creators | Wong, Jia-Wei |
Contributors | Wen-Yang Lin, Tzung-Pei Hong, Chung-Nan Lee, Chun-Hao Chen |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0911112-051949 |
Rights | user_define, Copyright information available at source archive |
Page generated in 0.0019 seconds