Extensions of biplot methodology to discriminant analysis with applications of non-parametric principal components

Dissertation (PhD)--Stellenbosch University, 2001. / ENGLISH ABSTRACT: Gower and Hand offer a new perspective on the traditional biplot. This perspective
provides a unified approach to principal component analysis (PCA) biplots based on
Pythagorean distance; canonical variate analysis (CVA) biplots based on Mahalanobis
distance; non-linear biplots based on Euclidean embeddable distances as well as
generalised biplots for use with both continuous and categorical variables.
The biplot methodology of Gower and Hand is extended and applied in statistical
discrimination and classification. This leads to discriminant analysis by means of PCA
biplots, CVA biplots, non-linear biplots as well as generalised biplots. Properties of these
techniques are derived in detail. Classification regions defined for linear discriminant
analysis (LDA) are applied in the CVA biplot leading to discriminant analysis using biplot
methodology. Situations where the assumptions of LDA are not met are considered and
various existing alternative discriminant analysis procedures are formulated in terms of
biplots and apart from PCA biplots, QDA, FDA and DSM biplots are defined, constructed
and their usage illustrated.
It is demonstrated that biplot methodology naturally provides for managing categorical and
continuous variables simultaneously. It is shown through a simulation study that the
techniques based on biplot methodology can be applied successfully to the reversal
problem with categorical variables in discriminant analysis.
Situations occurring in practice where existing discriminant analysis procedures based on
distances from means fail are considered. After discussing self-consistency and principal
curves (a form of non-parametric principal components), discriminant analysis based on
distances from principal curves (a form of a conditional mean) are proposed. This biplot
classification procedure based upon principal curves, yields much better results.
Bootstrapping is considered as a means of describing variability in biplots. Variability in
samples as well as of axes in biplot displays receives attention. Bootstrap a-regions are defined and the ability of these regions to describe biplot variability and to detect outliers
is demonstrated. Robust PCA and CVA biplots restricting the role of influential
observations on biplot displays are also considered.
An extensive library of S-PLUS computer programmes is provided for implementing the
various discriminant analysis techniques that were developed using biplot methodology.
The application of the above theoretical developments and computer software is illustrated
by analysing real-life data sets. Biplots are used to investigate the degree of capital
intensity of companies and to serve as an aid in risk management of a financial institution.
A particular application of the PCA biplot is the TQI biplot used in industry to determine
the degree to which manufactured items comply with multidimensional specifications. A
further interesting application is to determine whether an Old-Cape furniture item is
manufactured of stinkwood or embuia. A data set provided by the Western Cape Nature
Conservation Board consisting of measurements of tortoises from the species Homopus
areolatus is analysed by means of biplot methodology to determine if morphological
differences exist among tortoises from different geographical regions. Allometric
considerations need to be taken into account and the resulting small sample sizes in some
subgroups severely limit the use of conventional statistical procedures.
Biplot methodology is also applied to classification in a diabetes data set illustrating the
combined advantage of using classification with principal curves in a robust biplot or
biplot classification where covariance matrices are unequal. A discriminant analysis
problem where foraging behaviour of deer might eventually result in a change in the
dominant plant species is used to illustrate biplot classification of data sets containing both
continuous and categorical variables. As an example of the use of biplots with large data
sets a data set consisting of 16828 lemons is analysed using biplot methodology to
investigate differences in fruit from various areas of production, cultivars and rootstocks.
The proposed a-bags also provide a measure of quantifying the graphical overlap among
classes. This method is successfully applied in a multidimensional socio-economical data
set to quantify the degree of overlap among different race groups. The application of the proposed biplot methodology in practice has an important byproduct:
It provides the impetus for many a new idea, e.g. applying a peA biplot in
industry led to the development of quality regions; a-bags were constructed to represent
thousands of observations in the lemons data set, in tum leading to means for quantifying
the degree of overlap. This illustrates the enormous flexibility of biplots - biplot
methodology provides an infrastructure for many novelties when applied in practice. / AFRIKAANSE OPSOMMING: Gower en Hand bied 'n nuwe perspektief op die tradisionele bistipping. Hierdie
perspektief verskaf 'n uniforme benadering tot hoofkomponent analise (HKA) bistippings
gebaseer op Pythagoras-afstand; kanoniese veranderlike analise (KVA) bistippings
gebaseer op Mahalanobis-afstand; nie-lineere bistippings gebaseer op Euclidies inbedbare
afstande sowel as veralgemeende bistippings vir gebruik wanneer beide kontinue en
kategoriese veranderlikes voorkom.
Die bistippingsmetodologie van Gower en Hand word uitgebrei en toegepas in statistiese
diskriminasie en klassifikasie. Dit lei tot diskriminantanalise met behulp van HKA
bistippings, KVA bistippings, nie-lineere bistippings sowel as veralgemeende bistippings.
Die eienskappe van hierdie tegnieke word in besonderhede afgelei. Die toepassing van
die konsep van 'n klassifikasiegebied in die KVA bistipping baan die weg vir lineere
diskriminantanalise (LDA) met behulp van bistippingsmetodologie. Situasies waar daar
nie aan die aannames van LDA voldoen word nie kry aandag en verskeie bestaande
altematiewe diskriminantanalise prosedures word in terme van bistippings geformuleer en
naas HKA bistippings, word QDA, FDA en DSM bistippings gedefinieer, gekonstrueer en
hul gebruike gedemonstreer.
Dit word aangetoon dat bistippingsmetodologie op 'n natuurlik wyse voorsiening maak om
kategoriese veranderlikes en kontinue veranderlikes gelyktydig te hanteer. Daar word met
behulp van 'n simulasie-studie aangetoon dat tegnieke gebaseer op die
bistippingsmetodologie wat ontwikkel IS, suksesvol by die sogenaamde
ornkeringsprobleem by diskriminantanalise met kategoriese veranderlikes gebruik kan
Verder word aangevoer dat daar baie praktiese situasies voorkom waar bestaande
prosedures van diskriminantanalise faal omdat dit op afstande vanaf gemiddeldes gebaseer
IS. Na 'n bespreking van self-konsekwentheid en hoofkrommes ('n vorm van nieparametriese
hoofkomponente) word voorgestel om diskriminantanalise op afstand vanaf hoofkrommes ('n vonn van 'n voorwaardelike gemiddelde) te baseer. Sodoende is 'n
bistippingklassifikasie prosedure wat op afstand vanaf hoofkrommes gebaseer is en wat
baie beter resultate lewer, ontwikkel.
Die variasie in die posisies van datapunte in die bistipping sowel as van die bistippingsasse
word bestudeer met behulp van skoenlusmetodes. 'n Skoenlus a-gebied word gedefinieer
en dit word gedemonstreer hoe so 'n a-gebied aangewend kan word om variasie in
bistippings te beskryf en wegleers te identifiseer. Robuuste HKA en KV A bistippings wat
die rol van invloedryke waamemings op die bistipping beperk, word bespreek.
'n Omvangryke biblioteek van S-PLUS rekenaarprogramme is geskryf VIr die
implementering van die verskillende diskriminantanalise tegnieke wat met behulp van
bistippingsmetodologie ontwikkel is. Die toepassing van die voorafgaande teoretiese
ontwikkelinge en rekenaarprogramme word geillustreer aan die hand van werklike
datastelle vanuit die praktyk. So word bistippings gebruik om die mate van
kapitaalintensiteit van ondememings te ondersoek en om as hulpmiddel by risikobestuur
van 'n finansiele instelling te dien. 'n Besondere toepassing van die HKA bistipping is die
TQI bistipping wat in die industriele omgewing gebruik word ten einde te bepaal tot watter
mate vervaardigde artikels aan neergelegde meerdimensionele spesifikasies voldoen. 'n
Verdere interessante toepassing is om te bepaal of 'n Ou-Kaapse meubelstuk van stinkhout
of embuia gemaak is. 'n Datastel verskaf deur Wes-Kaap Natuurbewaring in verband met
die bekende padloper skilpad, Homopus areolatus, is met behulp van bistippings
geanaliseer om te bepaal of daar morfometriese verskille tussen die padlopers afkomstig
van bepaalde geografiese gebiede is. Allometriese beginsels moes ook in ag gene em word
en die min waamemings in sommige van die subgroepe het tot gevolg dat konvensionele
statistiese tegnieke nie sonder meer gebruik kan word nie.
Die bistippingsmetodologie is ook toegepas op klassifikasie by 'n diabetes datastel om die
gekombineerde gebruik van. hoofkrommes in 'n robuuste bistipping te illustreer en
bistippingklassifikasie waar daar sprake van ongelyke kovariansiematrikse is. 'n
Diskriminantanalise probleem waar die weidingsvoorkeure van wildsbokke 'n verandering
in die dominante plantegroei tot gevolg kan he, word gebruik om bistippingklassifikasie met data waar kontinue sowel as kategoriese veranderlikes verskaf word, te illustreer. As
voorbeeld van die gebruik van bistippings by 'n groot datastel is 'n datastel bestaande uit
waamemings van 16828 suurlemoene met behulp van bistippingsmetodologie geanaliseer
ten einde verskille in vrugte afkomstig van verskillende produsente-streke, kultivars en
onderstamme te ondersoek. Die a-sakkies wat hier ontwikkel is, lei tot kwantifisering van
die grafiese oorvleueling van groepe. Hierdie beginsel word suksesvol toegepas in 'n
meerdimensionele sosio-ekonomiese datastel om die mate van oorvleueling van
verskillende bevolkingsgroepe te kwantifiseer.
Die toepassing van die voorgestelde bistippingsmetodologie in die praktyk lei tot 'n
belangrike newe-produk: Dit verskaf die stimulus tot die ontstaan van nuwe idees,
byvoorbeeld, die toepassing van 'n HKA bistipping in 'n industriele omgewing het tot die
ontwikkeling van die konsep van 'n kwaliteitsgebied aanleiding gegee; a-sakkies is
gekonstrueer om duisende waamemings in die suurlemoendatastel te verteenwoordig wat
weer gelei het tot 'n metode om die graad van oorvleueling te kwantifiseer. Hierdeur is die
geweldige veelsydigheid van bistippings geillustreer - bistippingsmetodologie verskaf die
infrastruktuur vir baie vindingryke toepassings in die praktyk.
Date January 2001
CreatorsGardner, Sugnet
ContributorsLe Roux, N. J., Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistical and Actuarial Science.
PublisherStellenbosch : Stellenbosch University
Source SetsSouth African National ETD Portal
Detected LanguageUnknown
Format535 p. : ill.
RightsStellenbosch University

