Rule induction is a popular technique for knowledge acquisition and data mining. Many techniques, such as ID3, C4.5, CART (tree induction tecniques) and Artificial Neural Networks have been developed and widely used. However, most techniques are either based on categorical or numerical mechanisms to assess the importance of different input variables, which may not produce the optimal rule when a mixture of variables exists.
In 1992, Liang proposed a composite approach called CRIS that use different method to analyze different types of data in inducing rules for binary classification. Yang conducted a follow-up research to extend the original algorithm to multiple categories. However, both methods do not take variable interaction into consideration.
The purpose of this research is to extend previous approach and extend by including second-order interaction. We also take into consideration the kurtosis and skewness of data for numerical variables. For categorical data, we also adopt ID3 algorithm to handle classes with low representation in the sample. In order to evaluate this technique, we develop a prototype CRIS 3.0 and compare with existing techniques, including multi-category-CRIS, CART and C4.5 as benchmark. The results show that CRIS 3.0 has the highest probability of producing the highest prediction accuracy.
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0728108-144440 |
Date | 28 July 2008 |
Creators | Yang, Chi-hsien |
Contributors | Chih-ping Wei, Ting-peng Liang, Deng-neng Chen |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | Cholon |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0728108-144440 |
Rights | campus_withheld, Copyright information available at source archive |
Page generated in 0.0023 seconds