Return to search

Binary classification trees : a comparison with popular classification methods in statistics using different software

Thesis (MComm) -- Stellenbosch University, 2002. / ENGLISH ABSTRACT: Consider a data set with a categorical response variable and a set of explanatory
variables. The response variable can have two or more categories and the explanatory
variables can be numerical or categorical. This is a typical setup for a classification
analysis, where we want to model the response based on the explanatory variables.
Traditional statistical methods have been developed under certain assumptions
such as: the explanatory variables are numeric only and! or the data follow a multivariate
normal distribution. hl practice such assumptions are not always met. Different research
fields generate data that have a mixed structure (categorical and numeric) and researchers
are often interested using all these data in the analysis. hl recent years robust methods
such as classification trees have become the substitute for traditional statistical methods
when the above assumptions are violated. Classification trees are not only an effective
classification method, but offer many other advantages.
The aim of this thesis is to highlight the advantages of classification trees. hl the
chapters that follow, the theory of and further developments on classification trees are
discussed. This forms the foundation for the CART software which is discussed in
Chapter 5, as well as other software in which classification tree modeling is possible. We
will compare classification trees to parametric-, kernel- and k-nearest-neighbour
discriminant analyses. A neural network is also compared to classification trees and
finally we draw some conclusions on classification trees and its comparisons with other
methods. / AFRIKAANSE OPSOMMING: Beskou 'n datastel met 'n kategoriese respons veranderlike en 'n stel verklarende
veranderlikes. Die respons veranderlike kan twee of meer kategorieë hê en die
verklarende veranderlikes kan numeries of kategories wees. Hierdie is 'n tipiese opset vir
'n klassifikasie analise, waar ons die respons wil modelleer deur gebruik te maak van die
verklarende veranderlikes.
Tradisionele statistiese metodes is ontwikkelonder sekere aannames soos: die
verklarende veranderlikes is slegs numeries en! of dat die data 'n meerveranderlike
normaal verdeling het. In die praktyk word daar nie altyd voldoen aan hierdie aannames
nie. Verskillende navorsingsvelde genereer data wat 'n gemengde struktuur het
(kategories en numeries) en navorsers wil soms al hierdie data gebruik in die analise. In
die afgelope jare het robuuste metodes soos klassifikasie bome die alternatief geword vir
tradisionele statistiese metodes as daar nie aan bogenoemde aannames voldoen word nie.
Klassifikasie bome is nie net 'n effektiewe klassifikasie metode nie, maar bied baie meer
voordele.
Die doel van hierdie werkstuk is om die voordele van klassifikasie bome uit te
wys. In die hoofstukke wat volg word die teorie en verdere ontwikkelinge van
klassifikasie bome bespreek. Hierdie vorm die fondament vir die CART sagteware wat
bespreek word in Hoofstuk 5, asook ander sagteware waarin klassifikasie boom
modelering moontlik is. Ons sal klassifikasie bome vergelyk met parametriese-, "kernel"-
en "k-nearest-neighbour" diskriminant analise. 'n Neurale netwerk word ook vergelyk
met klassifikasie bome en ten slote word daar gevolgtrekkings gemaak oor klassifikasie
bome en hoe dit vergelyk met ander metodes.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/52718
Date12 1900
CreatorsLamont, Morné Michael Connell
ContributorsLouw, N., Stellenbosch University. Faculty of Economic and Management Sciences. Department of Statistics and Actuarial Science.
PublisherStellenbosch : Stellenbosch University
Source SetsSouth African National ETD Portal
Languageen_ZA
Detected LanguageEnglish
TypeThesis
Formatix, 92 pages : illustrations
RightsStellenbosch University

Page generated in 0.0057 seconds