Spelling suggestions: "subject:"class distribution"" "subject:"glass distribution""
1 |
Empirical Evaluations of Different Strategies for Classification with Skewed Class DistributionLing, Shih-Shiung 09 August 2004 (has links)
Existing classification analysis techniques (e.g., decision tree induction,) generally exhibit satisfactory classification effectiveness when dealing with data with non-skewed class distribution. However, real-world applications (e.g., churn prediction and fraud detection) often involve highly skewed data in decision outcomes. Such a highly skewed class distribution problem, if not properly addressed, would imperil the resulting learning effectiveness.
In this study, we empirically evaluate three different approaches, namely the under-sampling, the over-sampling and the multi-classifier committee approaches, for addressing classification with highly skewed class distribution. Due to its popularity, C4.5 is selected as the underlying classification analysis technique. Based on 10 highly skewed class distribution datasets, our empirical evaluations suggest that the multi-classifier committee generally outperformed the under-sampling and the over-sampling approaches, using the recall rate, precision rate and F1-measure as the evaluation criteria. Furthermore, for applications aiming at a high recall rate, use of the over-sampling approach will be suggested. On the other hand, if the precision rate is the primary concern, adoption of the classification model induced directly from original datasets would be recommended.
|
2 |
An Investigation of Standard and Ensemble Based Classification Techniques for the Prediction of Hospitalization DurationSheikh-Nia, Samaneh 04 September 2012 (has links)
In any health-care system, early identification of individuals who are most at risk of developing an illness is vital, not only to ensure that a patient is provided with the appropriate treatment, but also to avoid the considerable costs associated with unnecessary hospitalization. To achieve this goal there is a need for a breakthrough prediction method that is capable of dealing with a real world medical data which is inherently complex.
In this study, we show how standard classification algorithms can be employed collectively to predict the length of stay in a hospital of a patient in the upcoming year, based on their medical history. Multiple classifiers are used to perform the prediction task, since real world medical data is significantly complex making the classification task very challenging. The data is voluminous, consists of wide range of class values some of which with a few instances, and it is highly unbalanced making the classification of minority classes very difficult. We propose two Sequential Ensemble Classification (SEC) schemes, one based on an ensemble of homogeneous classifiers, and a second based on a heterogeneous ensemble of classifiers, in three hierarchical granularity levels. The goal of using this system is to provide increased performance over the standard classifiers. This method is highly beneficial when dealing with complex data which is multi-class and highly unbalanced.
|
3 |
Classification Analysis Techniques for Skewed ClassChyi, Yu-Meei 12 February 2003 (has links)
Abstract
Existing classification analysis techniques (e.g., decision tree induction, backpropagation neural network, k-nearest neighbor classification, etc.) generally exhibit satisfactory classification effectiveness when dealing with data with non-skewed class distribution. However, real-world applications (e.g., churn prediction and fraud detection) often involve highly skewed data in decision outcomes (e.g., 2% churners and 98% non-churners). Such a highly skewed class distribution problem, if not properly addressed, would imperil the resulting learning effectiveness and might result in a ¡§null¡¨ prediction system that simply predicts all instances as having the majority decision class as the training instances (e.g., predicting all customers as non-churners). In this study, we extended the multi-classifier class-combiner approach and proposed a clustering-based multi-classifier class-combiner technique to address the highly skewed class distribution problem in classification analysis. In addition, we proposed four distance-based methods for selecting a subset of instances having the majority decision class for lowering the degree of skewness in a data set. Using two real-world datasets (including mortality prediction for burn patients and customer loyalty prediction), empirical results suggested that the proposed clustering-based multi-classifier class-combiner technique generally outperformed the traditional multi-classifier class-combiner approach and the four distance-based methods.
Keywords: Data Mining, Classification Analysis, Skewed Class Distribution Problem, Decision Tree Induction, Multi-classifier Class-combiner Approach, Clustering-based Multi-classifier Class-combiner Approach
|
4 |
Differential evolution technique on weighted voting stacking ensemble method for credit card fraud detectionDolo, Kgaugelo Moses 12 1900 (has links)
Differential Evolution is an optimization technique of stochastic search for a population-based vector, which is powerful and efficient over a continuous space for solving differentiable and non-linear optimization problems. Weighted voting stacking ensemble method is an important technique that combines various classifier models. However, selecting the appropriate weights of classifier models for the correct
classification of transactions is a problem. This research study is therefore aimed at exploring whether the Differential Evolution optimization method is a good approach for defining the weighting function. Manual and random selection of weights for voting credit card transactions has previously been carried out. However, a large number of fraudulent transactions were not detected by the classifier models. Which means that a technique to overcome the weaknesses of the classifier models is required. Thus, the problem of selecting the
appropriate weights was viewed as the problem of weights optimization in this study. The dataset was downloaded from the Kaggle competition data repository. Various machine learning algorithms were used to weight vote a class of transaction. The differential evolution optimization techniques was used as a weighting function. In
addition, the Synthetic Minority Oversampling Technique (SMOTE) and Safe Level Synthetic Minority Oversampling Technique (SL-SMOTE) oversampling algorithms were modified to preserve the definition of SMOTE while improving the performance. Result generated from this research study showed that the Differential Evolution
Optimization method is a good weighting function, which can be adopted as a systematic weight function for weight voting stacking ensemble method of various classification methods. / School of Computing / M. Sc. (Computing)
|
Page generated in 0.0756 seconds