Global ETD Search

Return to search

Empirical Evaluations of Different Strategies for Classification with Skewed Class Distribution

Existing classification analysis techniques (e.g., decision tree induction,) generally exhibit satisfactory classification effectiveness when dealing with data with non-skewed class distribution. However, real-world applications (e.g., churn prediction and fraud detection) often involve highly skewed data in decision outcomes. Such a highly skewed class distribution problem, if not properly addressed, would imperil the resulting learning effectiveness.
In this study, we empirically evaluate three different approaches, namely the under-sampling, the over-sampling and the multi-classifier committee approaches, for addressing classification with highly skewed class distribution. Due to its popularity, C4.5 is selected as the underlying classification analysis technique. Based on 10 highly skewed class distribution datasets, our empirical evaluations suggest that the multi-classifier committee generally outperformed the under-sampling and the over-sampling approaches, using the recall rate, precision rate and F1-measure as the evaluation criteria. Furthermore, for applications aiming at a high recall rate, use of the over-sampling approach will be suggested. On the other hand, if the precision rate is the primary concern, adoption of the classification model induced directly from original datasets would be recommended.

http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809104-235914

Classification Analysis

Decision Tree Induction

Multi-classifier Committee Approach

Under-sampling

Over-sampling

Skewed Class Distribution

Identifer	oai:union.ndltd.org:NSYSU/oai:NSYSU:etd-0809104-235914
Date	09 August 2004
Creators	Ling, Shih-Shiung
Contributors	Tsang-Hsiang Cheng, San -Yi Huang, Chih-Ping Wei, Te -Min Chang
Publisher	NSYSU
Source Sets	NSYSU Electronic Thesis and Dissertation Archive
Language	Cholon
Detected Language	English
Type	text
Format	application/pdf
Source	http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/view_etd?URN=etd-0809104-235914
Rights	withheld, Copyright information available at source archive

Page generated in 0.002 seconds

Empirical Evaluations of Different Strategies for Classification with Skewed Class Distribution

Description

Links & Downloads

Tags

Additional Fields