Spelling suggestions: "subject:"dissertations -- estatistics"" "subject:"dissertations -- cstatistics""
1 |
Time series forecasting and model selection in singular spectrum analysisDe Klerk, Jacques 11 1900 (has links)
Dissertation (PhD)--University of Stellenbosch, 2002 / ENGLISH ABSTRACT: Singular spectrum analysis (SSA) originated in the field of Physics. The technique is
non-parametric by nature and inter alia finds application in atmospheric sciences,
signal processing and recently in financial markets. The technique can handle a very
broad class of time series that can contain combinations of complex periodicities,
polynomial or exponential trend. Forecasting techniques are reviewed in this study,
and a new coordinate free joint-horizon k-period-ahead forecasting formulation is
derived. The study also considers model selection in SSA, from which it become
apparent that forward validation results in more stable model selection.
The roots of SSA are outlined and distributional assumptions of signal senes are
considered ab initio. Pitfalls that arise in the multivariate statistical theory are
identified.
Different approaches of recurrent one-period-ahead forecasting are then reviewed.
The forecasting approaches are all supplied in algorithmic form to ensure effortless
adaptation to computer programs. Theoretical considerations, underlying the
forecasting algorithms, are also considered. A new coordinate free joint-horizon kperiod-
ahead forecasting formulation is derived and also adapted for the multichannel
SSA case.
Different model selection techniques are then considered. The use of scree-diagrams,
phase space portraits, percentage variation explained by eigenvectors, cross and
forward validation are considered in detail. The non-parametric nature of SSA
essentially results in the use of non-parametric model selection techniques.
Finally, the study also considers a commercial software package that is available and
compares it with Fortran code, which was developed as part of the study. / AFRIKAANSE OPSOMMING: Singulier spektraalanalise (SSA) het sy oorsprong in die Fisika. Die tegniek is nieparametries
van aard en vind toepassing in velde soos atmosferiese wetenskappe,
seinprossesering en onlangs in finansiële markte. Die tegniek kan 'n wye
verskeidenheid tydreekse hanteer wat kombinasies van komplekse periodisiteite,
polinomiese- en eksponensiële tendense insluit. Vooruitskattingstegnieke word ook in
hierdie studie beskou, en 'n nuwe koërdinaatvrye gesamentlike horison k-periodevooruitskattingformulering
word afgelei. Die studie beskou ook model seleksie in
SSA, waaruit duidelik blyk dat voorwaartse validasie meer stabiele model seleksie tot
gevolg het.
Die agtergrond van SSA word ab initio geskets en verdelingsaannames van seinreekse
beskou. Probleemgevalle wat voorkom in die meervoudige statistiese teorie word
duidelik geïdentifiseer.
Verskeie tegnieke van herhalende toepassing van een-periode-vooruitskatting word
daarna beskou. Die benaderings tot vooruitskatting word in algororitmiese formaat
verskaf wat die aanpassing na rekenaarprogrammering vergemaklik. Teoretiese
vraagstukke, onderliggend aan die vooruitskattings-algortimes, word ook beskou. 'n
Nuwe koërdinaatvrye gesamentlike horison k-periode-vooruitskattingsformulering
word afgelei en aangepas vir die multikanaal SSA geval.
Verskillende model seleksie tegnieke is ook beskou. Die gebruik van "scree"-
diagramme, fase ruimte diagramme, persentasie variasie verklaar deur eievektore,
kruis- en voorwaartse validasie word ook aangespreek. Die nie-parametriese aard van
SSA noop die gebruik van nie-parametriese model seleksie tegnieke.
Die studie vergelyk laastens 'n kommersiële sagtewarepakket met die Fortran
bronkode wat as deel van hierdie studie ontwikkel is.
|
2 |
Estimating measurement error in blood pressure, using structural equations modellingKepe, Lulama Patrick January 2004 (has links)
Thesis (MSc)--Stellenbosch University, 2004. / ENGLISH ABSTRACT: Any branch in science experiences measurement error to some extent. This maybe due to
conditions under which measurements are taken, which may include the subject, the
observer, the measurement instrument, and data collection method. The inexactness
(error) can be reduced to some extent through the study design, but at some level further
reduction becomes difficult or impractical. It then becomes important to determine or
evaluate the magnitude of measurement error and perhaps evaluate its effect on the
investigated relationships. All this is particularly true for blood pressure measurement.
The gold standard for measunng blood pressure (BP) is a 24-hour ambulatory
measurement. However, this technology is not available in Primary Care Clinics in South
Africa and a set of three mercury-based BP measurements is the norm for a clinic visit.
The quality of the standard combination of the repeated measurements can be improved
by modelling the measurement error of each of the diastolic and systolic measurements
and determining optimal weights for the combination of measurements, which will give a
better estimate of the patient's true BP. The optimal weights can be determined through
the method of structural equations modelling (SEM) which allows a richer model than the
standard repeated measures ANOVA. They are less restrictive and give more detail than
the traditional approaches.
Structural equations modelling which is a special case of covariance structure modelling
has proven to be useful in social sciences over the years. Their appeal stem from the fact
that they includes multiple regression and factor analysis as special cases. Multi-type
multi-time (MTMT) models are a specific type of structural equations models that suit
the modelling of BP measurements. These designs (MTMT models) constitute a variant
of repeated measurement designs and are based on Campbell and Fiske's (1959)
suggestion that the quality of methods (time in our case) can be determined by comparing
them with other methods in order to reveal both the systematic and random errors. MTMT models also showed superiority over other data analysis methods because of their
accommodation of the theory of BP. In particular they proved to be a strong alternative to
be considered for the analysis of BP measurement whenever repeated measures are
available even when such measures do not constitute equivalent replicates. This thesis
focuses on SEM and its application to BP studies conducted in a community survey of
Mamre and the Mitchells Plain hypertensive clinic population. / AFRIKAANSE OPSOMMING: Elke vertakking van die wetenskap is tot 'n minder of meerdere mate onderhewig aan
metingsfout. Dit is die gevolg van die omstandighede waaronder metings gemaak word
soos die eenheid wat gemeet word, die waarnemer, die meetinstrument en die data
versamelingsmetode. Die metingsfout kan verminder word deur die studie ontwerp maar
op 'n sekere punt is verdere verbetering in presisie moeilik en onprakties. Dit is dan
belangrik om die omvang ven die metingsfout te bepaal en om die effek hiervan op
verwantskappe te ondersoek. Hierdie aspekte is veral waar vir die meting van bloeddruk
by die mens.
Die goue standaard vir die meet van bloeddruk is 'n 24-uur deurlopenee meting. Hierdie
tegnologie is egter nie in primêre gesondheidsklinieke in Suid-Afrika beskikbaar nie en
'n stel van drie kwik gebasseerde bloedrukmetings is die norm by 'n kliniek besoek. Die
kwaliteit van die standard kombinasie van die herhaalde metings kan verbeter word deur
die modellering van die metingsfout van diastoliese en sistoliese bloeddruk metings. Die
bepaling van optimale gewigte vir die lineêre kombinasie van die metings lei tot 'n beter
skatting van die pasiënt se ware bloedruk. Die gewigte kan berekening word met die
metode van strukturele vergelykings modellering (SVM) wat 'n ryker klas van modelle
bied as die standaard herhaalde metings analise van variansie modelle. Dié model het
minder beperkings en gee dus meer informasie as die tradisionele benaderings.
Strukurele vergelykings modellering wat 'n spesial geval van kovariansie strukturele
modellering is, is oor die jare nuttig aangewend in die sosiale wetenskap. Die aanhang is
die gevolg van die feit dat meervoudige lineêre regressie en faktor analise ook spesiale
gevalle van die metode is. Meervoudige-tipe meervoudige-tyd (MTMT) modelle is 'n
spesifieke strukturele vergelykings model wat die modellering van bloedruk pas. Hierdie
tipe model is 'n variant van die herhaalde metings ontwerp en is gebaseer op Campbell en
Fiske (1959) se voorstel dat die kwaliteit van verskillende metodes bepaal kan word deur
dit met ander metodes te vergelyk om sodoende sistematiese en stogastiese foute te
onderskei. Die MTMT model pas ook goed in by die onderliggende fisiologies aspekte van bloedruk en die meting daarvan. Dit is dus 'n goeie alternatief vir studies waar die
herhaalde metings nie ekwivalente replikate is nie.
Hierdie tesis fokus op die strukturele vergelykings model en die toepassing daarvan in
hipertensie studies uitgevoer in die Mamre gemeenskap en 'n hipertensie kliniek
populasie in Mitchells Plain.
|
3 |
Some statistical aspects of LULU smoothersJankowitz, Maria Dorothea 12 1900 (has links)
Thesis (PhD (Statistics and Actuarial Science))--University of Stellenbosch, 2007. / The smoothing of time series plays a very important role in various practical applications. Estimating
the signal and removing the noise is the main goal of smoothing. Traditionally linear smoothers were
used, but nonlinear smoothers became more popular through the years.
From the family of nonlinear smoothers, the class of median smoothers, based on order statistics, is the
most popular. A new class of nonlinear smoothers, called LULU smoothers, was developed by using
the minimum and maximum selectors. These smoothers have very attractive mathematical properties.
In this thesis their statistical properties are investigated and compared to that of the class of median
smoothers.
Smoothing, together with related concepts, are discussed in general. Thereafter, the class of median
smoothers, from the literature is discussed. The class of LULU smoothers is defined, their properties
are explained and new contributions are made. The compound LULU smoother is introduced and its
property of variation decomposition is discussed. The probability distributions of some LULUsmoothers
with independent data are derived. LULU smoothers and median smoothers are compared according
to the properties of monotonicity, idempotency, co-idempotency, stability, edge preservation, output
distributions and variation decomposition. A comparison is made of their respective abilities for signal
recovery by means of simulations. The success of the smoothers in recovering the signal is measured
by the integrated mean square error and the regression coefficient calculated from the least squares
regression of the smoothed sequence on the signal. Finally, LULU smoothers are practically applied.
|
4 |
Variable selection for kernel methods with application to binary classificationOosthuizen, Surette 03 1900 (has links)
Thesis (PhD (Statistics and Actuarial Science))—University of Stellenbosch, 2008. / The problem of variable selection in binary kernel classification is addressed in this thesis.
Kernel methods are fairly recent additions to the statistical toolbox, having originated
approximately two decades ago in machine learning and artificial intelligence. These
methods are growing in popularity and are already frequently applied in regression and
classification problems.
Variable selection is an important step in many statistical applications. Thereby a better
understanding of the problem being investigated is achieved, and subsequent analyses of
the data frequently yield more accurate results if irrelevant variables have been eliminated.
It is therefore obviously important to investigate aspects of variable selection for kernel
methods.
Chapter 2 of the thesis is an introduction to the main part presented in Chapters 3 to 6. In
Chapter 2 some general background material on kernel methods is firstly provided, along
with an introduction to variable selection. Empirical evidence is presented substantiating
the claim that variable selection is a worthwhile enterprise in kernel classification
problems. Several aspects which complicate variable selection in kernel methods are
discussed.
An important property of kernel methods is that the original data are effectively
transformed before a classification algorithm is applied to it. The space in which the
original data reside is called input space, while the transformed data occupy part of a
feature space. In Chapter 3 we investigate whether variable selection should be performed
in input space or rather in feature space. A new approach to selection, so-called feature-toinput
space selection, is also proposed. This approach has the attractive property of
combining information generated in feature space with easy interpretation in input space. An empirical study reveals that effective variable selection requires utilisation of at least
some information from feature space.
Having confirmed in Chapter 3 that variable selection should preferably be done in feature
space, the focus in Chapter 4 is on two classes of selecion criteria operating in feature
space: criteria which are independent of the specific kernel classification algorithm and
criteria which depend on this algorithm. In this regard we concentrate on two kernel
classifiers, viz. support vector machines and kernel Fisher discriminant analysis, both of
which are described in some detail in Chapter 4. The chapter closes with a simulation
study showing that two of the algorithm-independent criteria are very competitive with the
more sophisticated algorithm-dependent ones.
In Chapter 5 we incorporate a specific strategy for searching through the space of variable
subsets into our investigation. Evidence in the literature strongly suggests that backward
elimination is preferable to forward selection in this regard, and we therefore focus on
recursive feature elimination. Zero- and first-order forms of the new selection criteria
proposed earlier in the thesis are presented for use in recursive feature elimination and their
properties are investigated in a numerical study. It is found that some of the simpler zeroorder
criteria perform better than the more complicated first-order ones.
Up to the end of Chapter 5 it is assumed that the number of variables to select is known.
We do away with this restriction in Chapter 6 and propose a simple criterion which uses the
data to identify this number when a support vector machine is used. The proposed criterion
is investigated in a simulation study and compared to cross-validation, which can also be
used for this purpose. We find that the proposed criterion performs well.
The thesis concludes in Chapter 7 with a summary and several discussions for further
research.
|
5 |
Statistical inference for inequality measures based on semi-parametric estimatorsKpanzou, Tchilabalo Abozou 12 1900 (has links)
Thesis (PhD)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Measures of inequality, also used as measures of concentration or diversity, are very popular in economics
and especially in measuring the inequality in income or wealth within a population and between
populations. However, they have applications in many other fields, e.g. in ecology, linguistics, sociology,
demography, epidemiology and information science.
A large number of measures have been proposed to measure inequality. Examples include the Gini
index, the generalized entropy, the Atkinson and the quintile share ratio measures. Inequality measures
are inherently dependent on the tails of the population (underlying distribution) and therefore their
estimators are typically sensitive to data from these tails (nonrobust). For example, income distributions
often exhibit a long tail to the right, leading to the frequent occurrence of large values in samples. Since
the usual estimators are based on the empirical distribution function, they are usually nonrobust to such
large values. Furthermore, heavy-tailed distributions often occur in real life data sets, remedial action
therefore needs to be taken in such cases.
The remedial action can be either a trimming of the extreme data or a modification of the (traditional)
estimator to make it more robust to extreme observations. In this thesis we follow the second option,
modifying the traditional empirical distribution function as estimator to make it more robust. Using results
from extreme value theory, we develop more reliable distribution estimators in a semi-parametric
setting. These new estimators of the distribution then form the basis for more robust estimators of the
measures of inequality. These estimators are developed for the four most popular classes of measures,
viz. Gini, generalized entropy, Atkinson and quintile share ratio. Properties of such estimators
are studied especially via simulation. Using limiting distribution theory and the bootstrap methodology,
approximate confidence intervals were derived. Through the various simulation studies, the proposed
estimators are compared to the standard ones in terms of mean squared error, relative impact of contamination,
confidence interval length and coverage probability. In these studies the semi-parametric
methods show a clear improvement over the standard ones. The theoretical properties of the quintile
share ratio have not been studied much. Consequently, we also derive its influence function as well as
the limiting normal distribution of its nonparametric estimator. These results have not previously been
published.
In order to illustrate the methods developed, we apply them to a number of real life data sets. Using
such data sets, we show how the methods can be used in practice for inference. In order to choose
between the candidate parametric distributions, use is made of a measure of sample representativeness
from the literature. These illustrations show that the proposed methods can be used to reach
satisfactory conclusions in real life problems. / AFRIKAANSE OPSOMMING: Maatstawwe van ongelykheid, wat ook gebruik word as maatstawwe van konsentrasie of diversiteit,
is baie populêr in ekonomie en veral vir die kwantifisering van ongelykheid in inkomste of welvaart
binne ’n populasie en tussen populasies. Hulle het egter ook toepassings in baie ander dissiplines,
byvoorbeeld ekologie, linguistiek, sosiologie, demografie, epidemiologie en inligtingskunde.
Daar bestaan reeds verskeie maatstawwe vir die meet van ongelykheid. Voorbeelde sluit in die Gini
indeks, die veralgemeende entropie maatstaf, die Atkinson maatstaf en die kwintiel aandeel verhouding.
Maatstawwe van ongelykheid is inherent afhanklik van die sterte van die populasie (onderliggende
verdeling) en beramers daarvoor is tipies dus sensitief vir data uit sodanige sterte (nierobuust). Inkomste
verdelings het byvoorbeeld dikwels lang regtersterte, wat kan lei tot die voorkoms van groot
waardes in steekproewe. Die tradisionele beramers is gebaseer op die empiriese verdelingsfunksie, en
hulle is gewoonlik dus nierobuust teenoor sodanige groot waardes nie. Aangesien swaarstert verdelings
dikwels voorkom in werklike data, moet regstellings gemaak word in sulke gevalle.
Hierdie regstellings kan bestaan uit of die afknip van ekstreme data of die aanpassing van tradisionele
beramers om hulle meer robuust te maak teen ekstreme waardes. In hierdie tesis word die
tweede opsie gevolg deurdat die tradisionele empiriese verdelingsfunksie as beramer aangepas word
om dit meer robuust te maak. Deur gebruik te maak van resultate van ekstreemwaardeteorie, word
meer betroubare beramers vir verdelings ontwikkel in ’n semi-parametriese opset. Hierdie nuwe beramers
van die verdeling vorm dan die basis vir meer robuuste beramers van maatstawwe van ongelykheid.
Hierdie beramers word ontwikkel vir die vier mees populêre klasse van maatstawwe, naamlik
Gini, veralgemeende entropie, Atkinson en kwintiel aandeel verhouding. Eienskappe van hierdie
beramers word bestudeer, veral met behulp van simulasie studies. Benaderde vertrouensintervalle
word ontwikkel deur gebruik te maak van limietverdelingsteorie en die skoenlus metodologie. Die
voorgestelde beramers word vergelyk met tradisionele beramers deur middel van verskeie simulasie
studies. Die vergelyking word gedoen in terme van gemiddelde kwadraat fout, relatiewe impak van
kontaminasie, vertrouensinterval lengte en oordekkingswaarskynlikheid. In hierdie studies toon die
semi-parametriese metodes ’n duidelike verbetering teenoor die tradisionele metodes. Die kwintiel
aandeel verhouding se teoretiese eienskappe het nog nie veel aandag in die literatuur geniet nie.
Gevolglik lei ons die invloedfunksie asook die asimptotiese verdeling van die nie-parametriese beramer
daarvoor af.
Ten einde die metodes wat ontwikkel is te illustreer, word dit toegepas op ’n aantal werklike datastelle.
Hierdie toepassings toon hoe die metodes gebruik kan word vir inferensie in die praktyk. ’n Metode
in die literatuur vir steekproefverteenwoordiging word voorgestel en gebruik om ’n keuse tussen die
kandidaat parametriese verdelings te maak. Hierdie voorbeelde toon dat die voorgestelde metodes
met vrug gebruik kan word om bevredigende gevolgtrekkings in die praktyk te maak.
|
6 |
Aspects of model development using regression quantiles and elemental regressionsRanganai, Edmore 03 1900 (has links)
Dissertation (PhD)--University of Stellenbosch, 2007. / ENGLISH ABSTRACT: It is well known that ordinary least squares (OLS) procedures are sensitive to deviations from
the classical Gaussian assumptions (outliers) as well as data aberrations in the design space.
The two major data aberrations in the design space are collinearity and high leverage.
Leverage points can also induce or hide collinearity in the design space. Such leverage points
are referred to as collinearity influential points. As a consequence, over the years, many
diagnostic tools to detect these anomalies as well as alternative procedures to counter them
were developed. To counter deviations from the classical Gaussian assumptions many robust
procedures have been proposed. One such class of procedures is the Koenker and Bassett
(1978) Regressions Quantiles (RQs), which are natural extensions of order statistics, to the
linear model. RQs can be found as solutions to linear programming problems (LPs). The basic
optimal solutions to these LPs (which are RQs) correspond to elemental subset (ES)
regressions, which consist of subsets of minimum size to estimate the necessary parameters of
the model.
On the one hand, some ESs correspond to RQs. On the other hand, in the literature it is shown
that many OLS statistics (estimators) are related to ES regression statistics (estimators).
Therefore there is an inherent relationship amongst the three sets of procedures. The
relationship between the ES procedure and the RQ one, has been noted almost “casually” in
the literature while the latter has been fairly widely explored. Using these existing
relationships between the ES procedure and the OLS one as well as new ones, collinearity,
leverage and outlier problems in the RQ scenario were investigated. Also, a lasso procedure
was proposed as variable selection technique in the RQ scenario and some tentative results
were given for it. These results are promising.
Single case diagnostics were considered as well as their relationships to multiple case ones. In
particular, multiple cases of the minimum size to estimate the necessary parameters of the
model, were considered, corresponding to a RQ (ES). In this way regression diagnostics were
developed for both ESs and RQs. The main problems that affect RQs adversely are
collinearity and leverage due to the nature of the computational procedures and the fact that
RQs’ influence functions are unbounded in the design space but bounded in the response
variable. As a consequence of this, RQs have a high affinity for leverage points and a high
exclusion rate of outliers. The influential picture exhibited in the presence of both leverage points and outliers is the net result of these two antagonistic forces. Although RQs are
bounded in the response variable (and therefore fairly robust to outliers), outlier diagnostics
were also considered in order to have a more holistic picture.
The investigations used comprised analytic means as well as simulation. Furthermore,
applications were made to artificial computer generated data sets as well as standard data sets
from the literature. These revealed that the ES based statistics can be used to address
problems arising in the RQ scenario to some degree of success. However, due to the
interdependence between the different aspects, viz. the one between leverage and collinearity
and the one between leverage and outliers, “solutions” are often dependent on the particular
situation. In spite of this complexity, the research did produce some fairly general guidelines
that can be fruitfully used in practice. / AFRIKAANSE OPSOMMING: Dit is bekend dat die gewone kleinste kwadraat (KK) prosedures sensitief is vir afwykings
vanaf die klassieke Gaussiese aannames (uitskieters) asook vir data afwykings in die
ontwerpruimte. Twee tipes afwykings van belang in laasgenoemde geval, is kollinearitiet en
punte met hoë hefboom waarde. Laasgenoemde punte kan ook kollineariteit induseer of
versteek in die ontwerp. Na sodanige punte word verwys as kollinêre hefboom punte. Oor die
jare is baie diagnostiese hulpmiddels ontwikkel om hierdie afwykings te identifiseer en om
alternatiewe prosedures daarteen te ontwikkel. Om afwykings vanaf die Gaussiese aanname
teen te werk, is heelwat robuuste prosedures ontwikkel. Een sodanige klas van prosedures is
die Koenker en Bassett (1978) Regressie Kwantiele (RKe), wat natuurlike uitbreidings is van
rangorde statistieke na die lineêre model. RKe kan bepaal word as oplossings van lineêre
programmeringsprobleme (LPs). Die basiese optimale oplossings van hierdie LPs (wat RKe
is) kom ooreen met die elementale deelversameling (ED) regressies, wat bestaan uit
deelversamelings van minimum grootte waarmee die parameters van die model beraam kan
word.
Enersyds geld dat sekere EDs ooreenkom met RKe. Andersyds, uit die literatuur is dit bekend
dat baie KK statistieke (beramers) verwant is aan ED regressie statistieke (beramers). Dit
impliseer dat daar dus ‘n inherente verwantskap is tussen die drie klasse van prosedures. Die
verwantskap tussen die ED en die ooreenkomstige RK prosedures is redelik “terloops” van
melding gemaak in die literatuur, terwyl laasgenoemde prosedures redelik breedvoerig
ondersoek is. Deur gebruik te maak van bestaande verwantskappe tussen ED en KK
prosedures, sowel as nuwes wat ontwikkel is, is kollineariteit, punte met hoë hefboom
waardes en uitskieter probleme in die RK omgewing ondersoek. Voorts is ‘n lasso prosedure
as veranderlike seleksie tegniek voorgestel in die RK situasie en is enkele tentatiewe resultate
daarvoor gegee. Hierdie resultate blyk belowend te wees, veral ook vir verdere navorsing.
Enkel geval diagnostiese tegnieke is beskou sowel as hul verwantskap met meervoudige geval
tegnieke. In die besonder is veral meervoudige gevalle beskou wat van minimum grootte is
om die parameters van die model te kan beraam, en wat ooreenkom met ‘n RK (ED). Met
sodanige benadering is regressie diagnostiese tegnieke ontwikkel vir beide EDs en RKe. Die
belangrikste probleme wat RKe negatief beinvloed, is kollineariteit en punte met hoë
hefboom waardes agv die aard van die berekeningsprosedures en die feit dat RKe se invloedfunksies begrensd is in die ruimte van die afhanklike veranderlike, maar onbegrensd is
in die ontwerpruimte. Gevolglik het RKe ‘n hoë affiniteit vir punte met hoë hefboom waardes
en poog gewoonlik om uitskieters uit te sluit. Die finale uitset wat verkry word wanneer beide
punte met hoë hefboom waardes en uitskieters voorkom, is dan die netto resultaat van hierdie
twee teenstrydige pogings. Alhoewel RKe begrensd is in die onafhanklike veranderlike (en
dus redelik robuust is tov uitskieters), is uitskieter diagnostiese tegnieke ook beskou om ‘n
meer holistiese beeld te verkry.
Die ondersoek het analitiese sowel as simulasie tegnieke gebruik. Voorts is ook gebruik
gemaak van kunsmatige datastelle en standard datastelle uit die literatuur. Hierdie ondersoeke
het getoon dat die ED gebaseerde statistieke met ‘n redelike mate van sukses gebruik kan
word om probleme in die RK omgewing aan te spreek. Dit is egter belangrik om daarop te let
dat as gevolg van die interafhanklikheid tussen kollineariteit en punte met hoë hefboom
waardes asook dié tussen punte met hoë hefboom waardes en uitskieters, “oplossings”
dikwels afhanklik is van die bepaalde situasie. Ten spyte van hierdie kompleksiteit, is op
grond van die navorsing wat gedoen is, tog redelike algemene riglyne verkry wat nuttig in die
praktyk gebruik kan word.
|
7 |
Improved estimation procedures for a positive extreme value indexBerning, Thomas Louw 12 1900 (has links)
Thesis (PhD (Statistics))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: In extreme value theory (EVT) the emphasis is on extreme (very small or very large) observations. The crucial parameter when making inferences about extreme quantiles, is called the extreme value index (EVI). This thesis concentrates on only the right tail of the underlying distribution (extremely large observations), and specifically situations where the EVI is assumed to be positive. A positive EVI indicates that the underlying distribution of the data has a heavy right tail, as is the case with, for example, insurance claims data.
There are numerous areas of application of EVT, since there are a vast number of situations in which one would be interested in predicting extreme events accurately. Accurate prediction requires accurate estimation of the EVI, which has received ample attention in the literature from a theoretical as well as practical point of view.
Countless estimators of the EVI exist in the literature, but the practitioner has little information on how these estimators compare. An extensive simulation study was designed and conducted to compare the performance of a wide range of estimators, over a wide range of sample sizes and distributions.
A new procedure for the estimation of a positive EVI was developed, based on fitting the perturbed Pareto distribution (PPD) to observations above a threshold, using Bayesian methodology. Attention was also given to the development of a threshold selection technique.
One of the major contributions of this thesis is a measure which quantifies the stability (or rather instability) of estimates across a range of thresholds. This measure can be used to objectively obtain the range of thresholds over which the estimates are most stable. It is this measure which is used for the purpose of threshold selection for the proposed PPD estimator.
A case study of five insurance claims data sets illustrates how data sets can be analyzed in practice. It is shown to what extent discretion can/should be applied, as well as how different estimators can be used in a complementary fashion to give more insight into the nature of the data and the extreme tail of the underlying distribution. The analysis is carried out from the point of raw data, to the construction of tables which can be used directly to gauge the risk of the insurance portfolio over a given time frame. / AFRIKAANSE OPSOMMING: Die veld van ekstreemwaardeteorie (EVT) is bemoeid met ekstreme (baie klein of baie groot) waarnemings. Die parameter wat deurslaggewend is wanneer inferensies aangaande ekstreme kwantiele ter sprake is, is die sogenaamde ekstreemwaarde-indeks (EVI). Hierdie verhandeling konsentreer op slegs die regterstert van die onderliggende verdeling (baie groot waarnemings), en meer spesifiek, op situasies waar aanvaar word dat die EVI positief is. ’n Positiewe EVI dui aan dat die onderliggende verdeling ’n swaar regterstert het, wat byvoorbeeld die geval is by versekeringseis data.
Daar is verskeie velde waar EVT toegepas word, aangesien daar ’n groot aantal situasies is waarin mens sou belangstel om ekstreme gebeurtenisse akkuraat te voorspel. Akkurate voorspelling vereis die akkurate beraming van die EVI, wat reeds ruim aandag in die literatuur geniet het, uit beide teoretiese en praktiese oogpunte.
’n Groot aantal beramers van die EVI bestaan in die literatuur, maar enige persoon wat die toepassing van EVT in die praktyk beoog, het min inligting oor hoe hierdie beramers met mekaar vergelyk. ’n Uitgebreide simulasiestudie is ontwerp en uitgevoer om die akkuraatheid van beraming van ’n groot verskeidenheid van beramers in die literatuur te vergelyk. Die studie sluit ’n groot verskeidenheid van steekproefgroottes en onderliggende verdelings in.
’n Nuwe prosedure vir die beraming van ’n positiewe EVI is ontwikkel, gebaseer op die passing van die gesteurde Pareto verdeling (PPD) aan waarnemings wat ’n gegewe drempel oorskrei, deur van Bayes tegnieke gebruik te maak. Aandag is ook geskenk aan die ontwikkeling van ’n drempelseleksiemetode.
Een van die hoofbydraes van hierdie verhandeling is ’n maatstaf wat die stabiliteit (of eerder onstabiliteit) van beramings oor verskeie drempels kwantifiseer. Hierdie maatstaf bied ’n objektiewe manier om ’n gebied (versameling van drempelwaardes) te verkry waaroor die beramings die stabielste is. Dit is hierdie maatstaf wat gebruik word om drempelseleksie te doen in die geval van die PPD beramer.
’n Gevallestudie van vyf stelle data van versekeringseise demonstreer hoe data in die praktyk geanaliseer kan word. Daar word getoon tot watter mate diskresie toegepas kan/moet word, asook hoe verskillende beramers op ’n komplementêre wyse ingespan kan word om meer insig te verkry met betrekking tot die aard van die data en die stert van die onderliggende verdeling. Die analise word uitgevoer vanaf die punt waar slegs rou data beskikbaar is, tot op die punt waar tabelle saamgestel is wat direk gebruik kan word om die risiko van die versekeringsportefeulje te bepaal oor ’n gegewe periode.
|
8 |
Edgeworth-corrected small-sample confidence intervals for ratio parameters in linear regressionBinyavanga, Kamanzi-wa 03 1900 (has links)
Dissertation (PhD)--Stellenbosch University, 2002. / ENGLISH ABSTRACT: In this thesis we construct a central confidence interval for a smooth scalar non-linear function of
parameter vector f3 in a single general linear regression model Y = X f3 + c. We do this by first
developing an Edgeworth expansion for the distribution function of a standardised point estimator.
The confidence interval is then constructed in the manner discussed. Simulation studies reported at
the end of the thesis show the interval to perform well in many small-sample situations.
Central to the development of the Edgeworth expansion is our use of the index notation which, in
statistics, has been popularised by McCullagh (1984, 1987).
The contributions made in this thesis are of two kinds. We revisit the complex McCullagh Index
Notation, modify and extend it in certain respects as well as repackage it in the manner that is more
accessible to other researchers.
On the new contributions, in addition to the introduction of a new small-sample confidence interval,
we extend the theory of stochastic polynomials (SP) in three respects. A method, which we believe to
be the simplest and most transparent to date, is proposed for deriving cumulants for these. Secondly,
the theory of the cumulants of the SP is developed both in the context of Edgeworth expansion as well
as in the regression setting. Thirdly, our new method enables us to propose a natural alternative to
the method of Hall (1992a, 1992b) regarding skewness-reduction in Edgeworth expansions. / AFRIKAANSE OPSOMMING: In hierdie proefskrif word daar aandag gegee aan die konstruksie van 'n sentrale vertrouensinterval
vir 'n gladde skalare nie-lineêre funksie van die parametervektor (3 in 'n enkele algemene lineêre
regressiemodel y = X (3 + e.. Dit behels eerstens die ontwikkeling van 'n Edgeworth uitbreiding
vir die verdelingsfunksie van 'n gestandaardiseerde puntberamer. Die vertrouensinterval word dan op
grond van hierdie uitbreiding gekonstrueer. Simulasiestudies wat aan die einde van die proefskrif
gerapporteer word, toon dat die voorgestelde interval goed vertoon in verskeie klein-steekproef
gevalle.
Die gebruik van indeksnotasie, wat in die statistiek deur McCullagh (1984, 1987) bekendgestel is,
speel 'n sentrale rol in die ontwikkeling van die Edgeworth uitbreiding.
Die bydrae wat in hierdie proefskrif gemaak word, is van 'n tweërlei aard. Die ingewikkelde
Indeksnotasie van McCullagh word ondersoek, aangepas en ten opsigte van sekere aspekte uitgebrei.
Die notasie word ook aangebied in 'n vorm wat dit hopelik meer toeganklik sal maak vir ander
navorsers.
Betreffende die bydrae wat gemaak word, word 'n nuwe klein-steekproef vertrouensinterval
voorgestel, en word die teorie van stogastiese polinome (SP) ook in drie opsigte uitgebrei. 'n Metode
word voorgestelom die kumulante van SP'e af te lei. Ons glo dat hierdie metode die duidelikste
en eenvoudigste metode is wat tot dusver hiervoor voorgestel is. Tweedens word die teorie van die
kumulante van SP'e ontwikkel binne die konteks van Edgeworth uitbreidings, sowel as die konteks
van regressie. Derdens stelons nuwe metode ons in staat om 'n natuurlike alternatief voor te stel vir
die metode van Hall (1992a, 1992b) vir die vermindering van skeefheid in Edgeworth uitbreidings.
|
9 |
Influential data cases when the C-p criterion is used for variable selection in multiple linear regressionUys, Daniel Wilhelm January 2003 (has links)
Dissertation (PhD)--Stellenbosch University, 2003. / ENGLISH ABSTRACT: In this dissertation we study the influence of data cases when the Cp criterion of Mallows (1973)
is used for variable selection in multiple linear regression. The influence is investigated in
terms of the predictive power and the predictor variables included in the resulting model when
variable selection is applied. In particular, we focus on the importance of identifying and
dealing with these so called selection influential data cases before model selection and fitting
are performed. For this purpose we develop two new selection influence measures, both based
on the Cp criterion. The first measure is specifically developed to identify individual selection
influential data cases, whereas the second identifies subsets of selection influential data cases.
The success with which these influence measures identify selection influential data cases, is
evaluated in example data sets and in simulation. All results are derived in the coordinate free
context, with special application in multiple linear regression. / AFRIKAANSE OPSOMMING: Invloedryke waarnemings as die C-p kriterium vir veranderlike seleksie in meervoudigelineêre regressie gebruik word: In hierdie proefskrif ondersoek ons die invloed van waarnemings as die Cp kriterium van Mallows
(1973) vir veranderlike seleksie in meervoudige lineêre regressie gebruik word. Die
invloed van waarnemings op die voorspellingskrag en die onafhanklike veranderlikes wat ingesluit
word in die finale geselekteerde model, word ondersoek. In besonder fokus ons op
die belangrikheid van identifisering van en handeling met sogenaamde seleksie invloedryke
waarnemings voordat model seleksie en passing gedoen word. Vir hierdie doel word twee
nuwe invloedsmaatstawwe, albei gebaseer op die Cp kriterium, ontwikkel. Die eerste maatstaf
is spesifiek ontwikkelom die invloed van individuele waarnemings te meet, terwyl die tweede
die invloed van deelversamelings van waarnemings op die seleksie proses meet. Die sukses
waarmee hierdie invloedsmaatstawwe seleksie invloedryke waarnemings identifiseer word
beoordeel in voorbeeld datastelle en in simulasie. Alle resultate word afgelei binne die koërdinaatvrye
konteks, met spesiale toepassing in meervoudige lineêre regressie.
|
10 |
Evaluating the properties of sensory tests using computer intensive and biplot methodologiesMeintjes, M. M. (Maria Magdalena) 03 1900 (has links)
Assignment (MComm)--University of Stellenbosch, 2007. / ENGLISH ABSTRACT: This study is the result of part-time work done at a product development centre. The organisation extensively makes use of trained panels in sensory trials designed to asses the quality of its product. Although standard statistical procedures are used for analysing the results arising from these trials, circumstances necessitate deviations from the prescribed protocols. Therefore the validity of conclusions drawn as a result of these testing procedures might be questionable. This assignment deals with these questions.
Sensory trials are vital in the development of new products, control of quality levels and the exploration of improvement in current products. Standard test procedures used to explore such questions exist but are in practice often implemented by investigators who have little or no statistical background. Thus test methods are implemented as black boxes and procedures are used blindly without checking all the appropriate assumptions and other statistical requirements. The specific product under consideration often warrants certain modifications to the standard methodology. These changes may have some unknown effect on the obtained results and therefore should be scrutinized to ensure that the results remain valid.
The aim of this study is to investigate the distribution and other characteristics of sensory data, comparing the hypothesised, observed and bootstrap distributions. Furthermore, the standard testing methods used to analyse sensory data sets will be evaluated. After comparing these methods, alternative testing methods may be introduced and then tested using newly generated data sets.
Graphical displays are also useful to get an overall impression of the data under consideration. Biplots are especially useful in the investigation of multivariate sensory data. The underlying relationships among attributes and their combined effect on the panellists’ decisions can be visually investigated by constructing a biplot. Results obtained by implementing biplot methods are compared to those of sensory tests, i.e. whether a significant difference between objects will correspond to large distances between the points representing objects in the display. In conclusion some recommendations are made as to how the organisation under consideration should implement sensory procedures in future trials. However, these proposals are preliminary and further research is necessary before final adoption. Some issues for further investigation are suggested. / AFRIKAANSE OPSOMMING: Hierdie studie spruit uit deeltydse werk by ’n produk-ontwikkeling-sentrum. Die organisasie maak in al hul sensoriese proewe rakende die kwaliteit van hul produkte op groot skaal gebruik van opgeleide panele. Alhoewel standaard prosedures ingespan word om die resultate te analiseer, noodsaak sekere omstandighede dat die voorgeskrewe protokol in ’n aangepaste vorm geïmplementeer word. Dié aanpassings mag meebring dat gevolgtrekkings gebaseer op resultate ongeldig is. Hierdie werkstuk ondersoek bogenoemde probleem.
Sensoriese proewe is noodsaaklik in kwaliteitbeheer, die verbetering van bestaande produkte, asook die ontwikkeling van nuwe produkte. Daar bestaan standaard toets- prosedures om vraagstukke te verken, maar dié word dikwels toegepas deur navorsers met min of geen statistiese kennis. Dit lei daartoe dat toetsprosedures blindelings geïmplementeer en resultate geïnterpreteer word sonder om die nodige aannames en ander statistiese vereistes na te gaan. Alhoewel ’n spesifieke produk die wysiging van die standaard metode kan regverdig, kan hierdie veranderinge ’n groot invloed op die resultate hê. Dus moet die geldigheid van die resultate noukeurig ondersoek word.
Die doel van hierdie studie is om die verdeling sowel as ander eienskappe van sensoriese data te bestudeer, deur die verdeling onder die nulhipotese sowel as die waargenome- en skoenlusverdelings te beskou. Verder geniet die standaard toetsprosedure, tans in gebruik om sensoriese data te analiseer, ook aandag. Na afloop hiervan word alternatiewe toetsprosedures voorgestel en dié geëvalueer op nuut gegenereerde datastelle.
Grafiese voorstellings is ook nuttig om ’n geheelbeeld te kry van die data onder bespreking. Bistippings is veral handig om meerdimensionele sensoriese data te bestudeer. Die onderliggende verband tussen die kenmerke van ’n produk sowel as hul gekombineerde effek op ’n paneel se besluit, kan hierdeur visueel ondersoek word. Resultate verkry in die voorstellings word vergelyk met dié van sensoriese toetsprosedures om vas te stel of statisties betekenisvolle verskille in ’n produk korrespondeer met groot afstande tussen die relevante punte in die bistippingsvoorstelling.
Ten slotte word sekere aanbevelings rakende die implementering van sensoriese proewe in die toekoms aan die betrokke organisasie gemaak. Hierdie aanbevelings word gemaak op grond van die voorafgaande ondersoeke, maar verdere navorsing is nodig voor die finale aanvaarding daarvan. Waar moontlik, word voorstelle vir verdere ondersoeke gedoen.
|
Page generated in 0.1888 seconds