Return to search

Aspects of model development using regression quantiles and elemental regressions

Dissertation (PhD)--University of Stellenbosch, 2007. / ENGLISH ABSTRACT: It is well known that ordinary least squares (OLS) procedures are sensitive to deviations from
the classical Gaussian assumptions (outliers) as well as data aberrations in the design space.
The two major data aberrations in the design space are collinearity and high leverage.
Leverage points can also induce or hide collinearity in the design space. Such leverage points
are referred to as collinearity influential points. As a consequence, over the years, many
diagnostic tools to detect these anomalies as well as alternative procedures to counter them
were developed. To counter deviations from the classical Gaussian assumptions many robust
procedures have been proposed. One such class of procedures is the Koenker and Bassett
(1978) Regressions Quantiles (RQs), which are natural extensions of order statistics, to the
linear model. RQs can be found as solutions to linear programming problems (LPs). The basic
optimal solutions to these LPs (which are RQs) correspond to elemental subset (ES)
regressions, which consist of subsets of minimum size to estimate the necessary parameters of
the model.
On the one hand, some ESs correspond to RQs. On the other hand, in the literature it is shown
that many OLS statistics (estimators) are related to ES regression statistics (estimators).
Therefore there is an inherent relationship amongst the three sets of procedures. The
relationship between the ES procedure and the RQ one, has been noted almost “casually” in
the literature while the latter has been fairly widely explored. Using these existing
relationships between the ES procedure and the OLS one as well as new ones, collinearity,
leverage and outlier problems in the RQ scenario were investigated. Also, a lasso procedure
was proposed as variable selection technique in the RQ scenario and some tentative results
were given for it. These results are promising.
Single case diagnostics were considered as well as their relationships to multiple case ones. In
particular, multiple cases of the minimum size to estimate the necessary parameters of the
model, were considered, corresponding to a RQ (ES). In this way regression diagnostics were
developed for both ESs and RQs. The main problems that affect RQs adversely are
collinearity and leverage due to the nature of the computational procedures and the fact that
RQs’ influence functions are unbounded in the design space but bounded in the response
variable. As a consequence of this, RQs have a high affinity for leverage points and a high
exclusion rate of outliers. The influential picture exhibited in the presence of both leverage points and outliers is the net result of these two antagonistic forces. Although RQs are
bounded in the response variable (and therefore fairly robust to outliers), outlier diagnostics
were also considered in order to have a more holistic picture.
The investigations used comprised analytic means as well as simulation. Furthermore,
applications were made to artificial computer generated data sets as well as standard data sets
from the literature. These revealed that the ES based statistics can be used to address
problems arising in the RQ scenario to some degree of success. However, due to the
interdependence between the different aspects, viz. the one between leverage and collinearity
and the one between leverage and outliers, “solutions” are often dependent on the particular
situation. In spite of this complexity, the research did produce some fairly general guidelines
that can be fruitfully used in practice. / AFRIKAANSE OPSOMMING: Dit is bekend dat die gewone kleinste kwadraat (KK) prosedures sensitief is vir afwykings
vanaf die klassieke Gaussiese aannames (uitskieters) asook vir data afwykings in die
ontwerpruimte. Twee tipes afwykings van belang in laasgenoemde geval, is kollinearitiet en
punte met hoë hefboom waarde. Laasgenoemde punte kan ook kollineariteit induseer of
versteek in die ontwerp. Na sodanige punte word verwys as kollinêre hefboom punte. Oor die
jare is baie diagnostiese hulpmiddels ontwikkel om hierdie afwykings te identifiseer en om
alternatiewe prosedures daarteen te ontwikkel. Om afwykings vanaf die Gaussiese aanname
teen te werk, is heelwat robuuste prosedures ontwikkel. Een sodanige klas van prosedures is
die Koenker en Bassett (1978) Regressie Kwantiele (RKe), wat natuurlike uitbreidings is van
rangorde statistieke na die lineêre model. RKe kan bepaal word as oplossings van lineêre
programmeringsprobleme (LPs). Die basiese optimale oplossings van hierdie LPs (wat RKe
is) kom ooreen met die elementale deelversameling (ED) regressies, wat bestaan uit
deelversamelings van minimum grootte waarmee die parameters van die model beraam kan
word.
Enersyds geld dat sekere EDs ooreenkom met RKe. Andersyds, uit die literatuur is dit bekend
dat baie KK statistieke (beramers) verwant is aan ED regressie statistieke (beramers). Dit
impliseer dat daar dus ‘n inherente verwantskap is tussen die drie klasse van prosedures. Die
verwantskap tussen die ED en die ooreenkomstige RK prosedures is redelik “terloops” van
melding gemaak in die literatuur, terwyl laasgenoemde prosedures redelik breedvoerig
ondersoek is. Deur gebruik te maak van bestaande verwantskappe tussen ED en KK
prosedures, sowel as nuwes wat ontwikkel is, is kollineariteit, punte met hoë hefboom
waardes en uitskieter probleme in die RK omgewing ondersoek. Voorts is ‘n lasso prosedure
as veranderlike seleksie tegniek voorgestel in die RK situasie en is enkele tentatiewe resultate
daarvoor gegee. Hierdie resultate blyk belowend te wees, veral ook vir verdere navorsing.
Enkel geval diagnostiese tegnieke is beskou sowel as hul verwantskap met meervoudige geval
tegnieke. In die besonder is veral meervoudige gevalle beskou wat van minimum grootte is
om die parameters van die model te kan beraam, en wat ooreenkom met ‘n RK (ED). Met
sodanige benadering is regressie diagnostiese tegnieke ontwikkel vir beide EDs en RKe. Die
belangrikste probleme wat RKe negatief beinvloed, is kollineariteit en punte met hoë
hefboom waardes agv die aard van die berekeningsprosedures en die feit dat RKe se invloedfunksies begrensd is in die ruimte van die afhanklike veranderlike, maar onbegrensd is
in die ontwerpruimte. Gevolglik het RKe ‘n hoë affiniteit vir punte met hoë hefboom waardes
en poog gewoonlik om uitskieters uit te sluit. Die finale uitset wat verkry word wanneer beide
punte met hoë hefboom waardes en uitskieters voorkom, is dan die netto resultaat van hierdie
twee teenstrydige pogings. Alhoewel RKe begrensd is in die onafhanklike veranderlike (en
dus redelik robuust is tov uitskieters), is uitskieter diagnostiese tegnieke ook beskou om ‘n
meer holistiese beeld te verkry.
Die ondersoek het analitiese sowel as simulasie tegnieke gebruik. Voorts is ook gebruik
gemaak van kunsmatige datastelle en standard datastelle uit die literatuur. Hierdie ondersoeke
het getoon dat die ED gebaseerde statistieke met ‘n redelike mate van sukses gebruik kan
word om probleme in die RK omgewing aan te spreek. Dit is egter belangrik om daarop te let
dat as gevolg van die interafhanklikheid tussen kollineariteit en punte met hoë hefboom
waardes asook dié tussen punte met hoë hefboom waardes en uitskieters, “oplossings”
dikwels afhanklik is van die bepaalde situasie. Ten spyte van hierdie kompleksiteit, is op
grond van die navorsing wat gedoen is, tog redelike algemene riglyne verkry wat nuttig in die
praktyk gebruik kan word.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/18668
Date03 1900
CreatorsRanganai, Edmore
ContributorsDe Wet, T., Van Vuuren, J.O., Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
PublisherStellenbosch : Stellenbosch University
Source SetsSouth African National ETD Portal
Languageen_ZA
Detected LanguageUnknown
TypeThesis
Formatxii, 196 leaves : ill.
RightsStellenbosch University

Page generated in 0.0026 seconds