Return to search

Multi-class Classification Methods Utilizing Mahalanobis Taguchi System And A Re-sampling Approach For Imbalanced Data Sets

Classification approaches are used in many areas in order to identify or estimate classes,
which different observations belong to. The classification approach, Mahalanobis Taguchi
System (MTS) is analyzed and further improved for multi-class classification problems under
the scope of this thesis study. MTS tries to explore significant variables and classify a new
observation based on its Mahalanobis distance (MD). In this study, first, sample size
problems, which are encountered mostly in small data sets, and multicollinearity problems,
which constitute some limitations of MTS, are analyzed and a re-sampling approach is
explored as a solution. Our re-sampling approach, which only works for data sets with two
classes, is a combination of over-sampling and under-sampling. Over-sampling is based on
SMOTE, which generates the synthetic observations between the nearest neighbors of
observations in the minority class. In addition, MTS models are used to test the performance
of several re-sampling parameters, for which the most appropriate values are sought specific
to each case. In the second part, multi-class classification methods with MTS are developed.
An algorithm, namely Feature Weighted Multi-class MTS-I (FWMMTS-I), is inspired by the
descent feature weighted MD. It relaxes adding up of the MDs for variables equally. This
provides representations of noisy variables with weights close to zero so that they do not
mask the other variables. As a second multi-class classification algorithm, the original MTS
method is extended to multi-class problems, which is called Multi-class MTS (MMTS). In
addition, a comparable approach to that of Su and Hsiao (2009), which also considers weights
of variables, is studied with a modification in MD calculation. It is named as Feature
Weighted Multi-class MTS-II (FWMMTS-II). The methods are compared on eight different
multi-class data sets using a 5-fold stratified cross validation approach. Results show that
FWMMTS-I is as accurate as MMTS, and they are better than FWMMTS-II. Interestingly,
the Mahalanobis Distance Classifier (MDC) using all the variables directly in the
classification model has performed equally well on the studied data sets.

Identiferoai:union.ndltd.org:METU/oai:etd.lib.metu.edu.tr:http://etd.lib.metu.edu.tr/upload/3/12610521/index.pdf
Date01 April 2009
CreatorsAyhan, Dilber
ContributorsKoksal, Gulser
PublisherMETU
Source SetsMiddle East Technical Univ.
LanguageEnglish
Detected LanguageEnglish
TypeM.S. Thesis
Formattext/pdf
RightsTo liberate the content for public access

Page generated in 0.0027 seconds