Classification, an important problem of data mining, is one of useful techniques for prediction. The goal of the classification problem is to construct a classifier from a given database for training, and to predict new data with the unknown class. Classification has been widely applied to many areas, such as medical diagnosis and weather prediction. The decision tree is the most popular model among classifiers, since it can generate understandable rules and perform classification without requiring any computation. However, a major drawback of the decision tree model is that it only examines a single attribute at a time. In the real world, attributes in some databases are dependent on each other. Thus, we may improve the accuracy of the decision tree by discovering the correlation between attributes. The CAM method applies the method of mining association rules, like the Apriori method, for discovering the attribute dependence. However, traditional methods for mining association rules are inefficient in the classification applications and could have five problems: (1) the combinatorial explosion problem, (2) invalid candidates, (3) unsuitable minimal support, (4) the ignored meaningful class values, and (5) itemsets without class data. The FP-growth avoids the first two problems. However, it is still suffered from the remaining three problems. Moreover, one more problem occurs: Unnecessary nodes for the classification problem which make the FP-tree incompact and huge. Furthermore, the workload of the CAM method is expensive due to too many times of database scanning, and the attribute combination problem causes some misclassification. Therefore, in this thesis, we present an efficient and accurate decision tree building method which resolves the above six problems and reduces the overhead of database scanning in the CAM method. We build a structure named class-rooted FP-tree which is a tree similar to the FP-tree, except the root of the tree is always a class item. Instead of using a static minimal support applied in the FP-growth method, we decide the minimal support dynamically, which can avoid some misjudgement of large itemsets used for the classification problem. In the decision tree building phase, we provide a pruning strategy that can reduce the times of database scanning. We also solve the attribute combination problem in the CAM method and improve the accuracy. From our simulation, we show that the performance of the proposed class-rooted FP-tree mining method is better than that of other mining association rule methods in terms of storage usage. Our simulation also shows the performance improvement of our method in terms of the times of database scanning and classification accuracy as compared with the CAM method. Therefore, the mining strategy of our proposed method is applicable to any method for building decision tree, and provides high accuracy in the real world.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0629105-045705 |
Date | 29 June 2005 |
Creators | Chen, Chien-hung |
Contributors | San-yi Huang, Chien-i Lee, Ye-in Chang |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0629105-045705 |
Rights | not_available, Copyright information available at source archive |
Page generated in 0.0017 seconds