Spelling suggestions: "subject:"skewed class codistribution deproblem"" "subject:"skewed class codistribution 3dproblem""
1 |
Classification Analysis Techniques for Skewed ClassChyi, Yu-Meei 12 February 2003 (has links)
Abstract
Existing classification analysis techniques (e.g., decision tree induction, backpropagation neural network, k-nearest neighbor classification, etc.) generally exhibit satisfactory classification effectiveness when dealing with data with non-skewed class distribution. However, real-world applications (e.g., churn prediction and fraud detection) often involve highly skewed data in decision outcomes (e.g., 2% churners and 98% non-churners). Such a highly skewed class distribution problem, if not properly addressed, would imperil the resulting learning effectiveness and might result in a ¡§null¡¨ prediction system that simply predicts all instances as having the majority decision class as the training instances (e.g., predicting all customers as non-churners). In this study, we extended the multi-classifier class-combiner approach and proposed a clustering-based multi-classifier class-combiner technique to address the highly skewed class distribution problem in classification analysis. In addition, we proposed four distance-based methods for selecting a subset of instances having the majority decision class for lowering the degree of skewness in a data set. Using two real-world datasets (including mortality prediction for burn patients and customer loyalty prediction), empirical results suggested that the proposed clustering-based multi-classifier class-combiner technique generally outperformed the traditional multi-classifier class-combiner approach and the four distance-based methods.
Keywords: Data Mining, Classification Analysis, Skewed Class Distribution Problem, Decision Tree Induction, Multi-classifier Class-combiner Approach, Clustering-based Multi-classifier Class-combiner Approach
|
Page generated in 0.0782 seconds