Abstract
Existing classification analysis techniques (e.g., decision tree induction, backpropagation neural network, k-nearest neighbor classification, etc.) generally exhibit satisfactory classification effectiveness when dealing with data with non-skewed class distribution. However, real-world applications (e.g., churn prediction and fraud detection) often involve highly skewed data in decision outcomes (e.g., 2% churners and 98% non-churners). Such a highly skewed class distribution problem, if not properly addressed, would imperil the resulting learning effectiveness and might result in a ¡§null¡¨ prediction system that simply predicts all instances as having the majority decision class as the training instances (e.g., predicting all customers as non-churners). In this study, we extended the multi-classifier class-combiner approach and proposed a clustering-based multi-classifier class-combiner technique to address the highly skewed class distribution problem in classification analysis. In addition, we proposed four distance-based methods for selecting a subset of instances having the majority decision class for lowering the degree of skewness in a data set. Using two real-world datasets (including mortality prediction for burn patients and customer loyalty prediction), empirical results suggested that the proposed clustering-based multi-classifier class-combiner technique generally outperformed the traditional multi-classifier class-combiner approach and the four distance-based methods.
Keywords: Data Mining, Classification Analysis, Skewed Class Distribution Problem, Decision Tree Induction, Multi-classifier Class-combiner Approach, Clustering-based Multi-classifier Class-combiner Approach
Identifer | oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0212103-235138 |
Date | 12 February 2003 |
Creators | Chyi, Yu-Meei |
Contributors | Chih-Ping Wei, Shin-Mu Tseng, Tungching Lin |
Publisher | NSYSU |
Source Sets | NSYSU Electronic Thesis and Dissertation Archive |
Language | Cholon |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0212103-235138 |
Rights | withheld, Copyright information available at source archive |
Page generated in 0.0032 seconds