Spelling suggestions: "subject:"actuarial science."" "subject:"actuariale science.""
61 |
Data Imputation For Loss ReservingZhai, Yilong January 2024 (has links)
This master thesis delves into machine learning predictive modelling to predict missing values in loss reserving, focusing on predicting missing values for individual features (age, accident year, etc) and annual insurance payments. Leveraging machine learning techniques such as random forest and decision trees, we explore their performance for missing value prediction compared to traditional regression models. Moreover, the study transforms individual payments into run-off triangle versions. It uses the imputed dataset and complete dataset to compare the performance of different data imputation models by the loss reserves estimation from the Mack and GLM reserves model. By evaluating the performance of these diverse techniques, this research aims to contribute valuable insights to the evolving landscape of predictive analytics in insurance, guiding industry practices toward more accurate and efficient modelling approaches. / Thesis / Master of Science (MSc)
|
62 |
Some statistical aspects of LULU smoothersJankowitz, Maria Dorothea 12 1900 (has links)
Thesis (PhD (Statistics and Actuarial Science))--University of Stellenbosch, 2007. / The smoothing of time series plays a very important role in various practical applications. Estimating
the signal and removing the noise is the main goal of smoothing. Traditionally linear smoothers were
used, but nonlinear smoothers became more popular through the years.
From the family of nonlinear smoothers, the class of median smoothers, based on order statistics, is the
most popular. A new class of nonlinear smoothers, called LULU smoothers, was developed by using
the minimum and maximum selectors. These smoothers have very attractive mathematical properties.
In this thesis their statistical properties are investigated and compared to that of the class of median
smoothers.
Smoothing, together with related concepts, are discussed in general. Thereafter, the class of median
smoothers, from the literature is discussed. The class of LULU smoothers is defined, their properties
are explained and new contributions are made. The compound LULU smoother is introduced and its
property of variation decomposition is discussed. The probability distributions of some LULUsmoothers
with independent data are derived. LULU smoothers and median smoothers are compared according
to the properties of monotonicity, idempotency, co-idempotency, stability, edge preservation, output
distributions and variation decomposition. A comparison is made of their respective abilities for signal
recovery by means of simulations. The success of the smoothers in recovering the signal is measured
by the integrated mean square error and the regression coefficient calculated from the least squares
regression of the smoothed sequence on the signal. Finally, LULU smoothers are practically applied.
|
63 |
Statistical inference for inequality measures based on semi-parametric estimatorsKpanzou, Tchilabalo Abozou 12 1900 (has links)
Thesis (PhD)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Measures of inequality, also used as measures of concentration or diversity, are very popular in economics
and especially in measuring the inequality in income or wealth within a population and between
populations. However, they have applications in many other fields, e.g. in ecology, linguistics, sociology,
demography, epidemiology and information science.
A large number of measures have been proposed to measure inequality. Examples include the Gini
index, the generalized entropy, the Atkinson and the quintile share ratio measures. Inequality measures
are inherently dependent on the tails of the population (underlying distribution) and therefore their
estimators are typically sensitive to data from these tails (nonrobust). For example, income distributions
often exhibit a long tail to the right, leading to the frequent occurrence of large values in samples. Since
the usual estimators are based on the empirical distribution function, they are usually nonrobust to such
large values. Furthermore, heavy-tailed distributions often occur in real life data sets, remedial action
therefore needs to be taken in such cases.
The remedial action can be either a trimming of the extreme data or a modification of the (traditional)
estimator to make it more robust to extreme observations. In this thesis we follow the second option,
modifying the traditional empirical distribution function as estimator to make it more robust. Using results
from extreme value theory, we develop more reliable distribution estimators in a semi-parametric
setting. These new estimators of the distribution then form the basis for more robust estimators of the
measures of inequality. These estimators are developed for the four most popular classes of measures,
viz. Gini, generalized entropy, Atkinson and quintile share ratio. Properties of such estimators
are studied especially via simulation. Using limiting distribution theory and the bootstrap methodology,
approximate confidence intervals were derived. Through the various simulation studies, the proposed
estimators are compared to the standard ones in terms of mean squared error, relative impact of contamination,
confidence interval length and coverage probability. In these studies the semi-parametric
methods show a clear improvement over the standard ones. The theoretical properties of the quintile
share ratio have not been studied much. Consequently, we also derive its influence function as well as
the limiting normal distribution of its nonparametric estimator. These results have not previously been
published.
In order to illustrate the methods developed, we apply them to a number of real life data sets. Using
such data sets, we show how the methods can be used in practice for inference. In order to choose
between the candidate parametric distributions, use is made of a measure of sample representativeness
from the literature. These illustrations show that the proposed methods can be used to reach
satisfactory conclusions in real life problems. / AFRIKAANSE OPSOMMING: Maatstawwe van ongelykheid, wat ook gebruik word as maatstawwe van konsentrasie of diversiteit,
is baie populêr in ekonomie en veral vir die kwantifisering van ongelykheid in inkomste of welvaart
binne ’n populasie en tussen populasies. Hulle het egter ook toepassings in baie ander dissiplines,
byvoorbeeld ekologie, linguistiek, sosiologie, demografie, epidemiologie en inligtingskunde.
Daar bestaan reeds verskeie maatstawwe vir die meet van ongelykheid. Voorbeelde sluit in die Gini
indeks, die veralgemeende entropie maatstaf, die Atkinson maatstaf en die kwintiel aandeel verhouding.
Maatstawwe van ongelykheid is inherent afhanklik van die sterte van die populasie (onderliggende
verdeling) en beramers daarvoor is tipies dus sensitief vir data uit sodanige sterte (nierobuust). Inkomste
verdelings het byvoorbeeld dikwels lang regtersterte, wat kan lei tot die voorkoms van groot
waardes in steekproewe. Die tradisionele beramers is gebaseer op die empiriese verdelingsfunksie, en
hulle is gewoonlik dus nierobuust teenoor sodanige groot waardes nie. Aangesien swaarstert verdelings
dikwels voorkom in werklike data, moet regstellings gemaak word in sulke gevalle.
Hierdie regstellings kan bestaan uit of die afknip van ekstreme data of die aanpassing van tradisionele
beramers om hulle meer robuust te maak teen ekstreme waardes. In hierdie tesis word die
tweede opsie gevolg deurdat die tradisionele empiriese verdelingsfunksie as beramer aangepas word
om dit meer robuust te maak. Deur gebruik te maak van resultate van ekstreemwaardeteorie, word
meer betroubare beramers vir verdelings ontwikkel in ’n semi-parametriese opset. Hierdie nuwe beramers
van die verdeling vorm dan die basis vir meer robuuste beramers van maatstawwe van ongelykheid.
Hierdie beramers word ontwikkel vir die vier mees populêre klasse van maatstawwe, naamlik
Gini, veralgemeende entropie, Atkinson en kwintiel aandeel verhouding. Eienskappe van hierdie
beramers word bestudeer, veral met behulp van simulasie studies. Benaderde vertrouensintervalle
word ontwikkel deur gebruik te maak van limietverdelingsteorie en die skoenlus metodologie. Die
voorgestelde beramers word vergelyk met tradisionele beramers deur middel van verskeie simulasie
studies. Die vergelyking word gedoen in terme van gemiddelde kwadraat fout, relatiewe impak van
kontaminasie, vertrouensinterval lengte en oordekkingswaarskynlikheid. In hierdie studies toon die
semi-parametriese metodes ’n duidelike verbetering teenoor die tradisionele metodes. Die kwintiel
aandeel verhouding se teoretiese eienskappe het nog nie veel aandag in die literatuur geniet nie.
Gevolglik lei ons die invloedfunksie asook die asimptotiese verdeling van die nie-parametriese beramer
daarvoor af.
Ten einde die metodes wat ontwikkel is te illustreer, word dit toegepas op ’n aantal werklike datastelle.
Hierdie toepassings toon hoe die metodes gebruik kan word vir inferensie in die praktyk. ’n Metode
in die literatuur vir steekproefverteenwoordiging word voorgestel en gebruik om ’n keuse tussen die
kandidaat parametriese verdelings te maak. Hierdie voorbeelde toon dat die voorgestelde metodes
met vrug gebruik kan word om bevredigende gevolgtrekkings in die praktyk te maak.
|
64 |
Aspects of model development using regression quantiles and elemental regressionsRanganai, Edmore 03 1900 (has links)
Dissertation (PhD)--University of Stellenbosch, 2007. / ENGLISH ABSTRACT: It is well known that ordinary least squares (OLS) procedures are sensitive to deviations from
the classical Gaussian assumptions (outliers) as well as data aberrations in the design space.
The two major data aberrations in the design space are collinearity and high leverage.
Leverage points can also induce or hide collinearity in the design space. Such leverage points
are referred to as collinearity influential points. As a consequence, over the years, many
diagnostic tools to detect these anomalies as well as alternative procedures to counter them
were developed. To counter deviations from the classical Gaussian assumptions many robust
procedures have been proposed. One such class of procedures is the Koenker and Bassett
(1978) Regressions Quantiles (RQs), which are natural extensions of order statistics, to the
linear model. RQs can be found as solutions to linear programming problems (LPs). The basic
optimal solutions to these LPs (which are RQs) correspond to elemental subset (ES)
regressions, which consist of subsets of minimum size to estimate the necessary parameters of
the model.
On the one hand, some ESs correspond to RQs. On the other hand, in the literature it is shown
that many OLS statistics (estimators) are related to ES regression statistics (estimators).
Therefore there is an inherent relationship amongst the three sets of procedures. The
relationship between the ES procedure and the RQ one, has been noted almost “casually” in
the literature while the latter has been fairly widely explored. Using these existing
relationships between the ES procedure and the OLS one as well as new ones, collinearity,
leverage and outlier problems in the RQ scenario were investigated. Also, a lasso procedure
was proposed as variable selection technique in the RQ scenario and some tentative results
were given for it. These results are promising.
Single case diagnostics were considered as well as their relationships to multiple case ones. In
particular, multiple cases of the minimum size to estimate the necessary parameters of the
model, were considered, corresponding to a RQ (ES). In this way regression diagnostics were
developed for both ESs and RQs. The main problems that affect RQs adversely are
collinearity and leverage due to the nature of the computational procedures and the fact that
RQs’ influence functions are unbounded in the design space but bounded in the response
variable. As a consequence of this, RQs have a high affinity for leverage points and a high
exclusion rate of outliers. The influential picture exhibited in the presence of both leverage points and outliers is the net result of these two antagonistic forces. Although RQs are
bounded in the response variable (and therefore fairly robust to outliers), outlier diagnostics
were also considered in order to have a more holistic picture.
The investigations used comprised analytic means as well as simulation. Furthermore,
applications were made to artificial computer generated data sets as well as standard data sets
from the literature. These revealed that the ES based statistics can be used to address
problems arising in the RQ scenario to some degree of success. However, due to the
interdependence between the different aspects, viz. the one between leverage and collinearity
and the one between leverage and outliers, “solutions” are often dependent on the particular
situation. In spite of this complexity, the research did produce some fairly general guidelines
that can be fruitfully used in practice. / AFRIKAANSE OPSOMMING: Dit is bekend dat die gewone kleinste kwadraat (KK) prosedures sensitief is vir afwykings
vanaf die klassieke Gaussiese aannames (uitskieters) asook vir data afwykings in die
ontwerpruimte. Twee tipes afwykings van belang in laasgenoemde geval, is kollinearitiet en
punte met hoë hefboom waarde. Laasgenoemde punte kan ook kollineariteit induseer of
versteek in die ontwerp. Na sodanige punte word verwys as kollinêre hefboom punte. Oor die
jare is baie diagnostiese hulpmiddels ontwikkel om hierdie afwykings te identifiseer en om
alternatiewe prosedures daarteen te ontwikkel. Om afwykings vanaf die Gaussiese aanname
teen te werk, is heelwat robuuste prosedures ontwikkel. Een sodanige klas van prosedures is
die Koenker en Bassett (1978) Regressie Kwantiele (RKe), wat natuurlike uitbreidings is van
rangorde statistieke na die lineêre model. RKe kan bepaal word as oplossings van lineêre
programmeringsprobleme (LPs). Die basiese optimale oplossings van hierdie LPs (wat RKe
is) kom ooreen met die elementale deelversameling (ED) regressies, wat bestaan uit
deelversamelings van minimum grootte waarmee die parameters van die model beraam kan
word.
Enersyds geld dat sekere EDs ooreenkom met RKe. Andersyds, uit die literatuur is dit bekend
dat baie KK statistieke (beramers) verwant is aan ED regressie statistieke (beramers). Dit
impliseer dat daar dus ‘n inherente verwantskap is tussen die drie klasse van prosedures. Die
verwantskap tussen die ED en die ooreenkomstige RK prosedures is redelik “terloops” van
melding gemaak in die literatuur, terwyl laasgenoemde prosedures redelik breedvoerig
ondersoek is. Deur gebruik te maak van bestaande verwantskappe tussen ED en KK
prosedures, sowel as nuwes wat ontwikkel is, is kollineariteit, punte met hoë hefboom
waardes en uitskieter probleme in die RK omgewing ondersoek. Voorts is ‘n lasso prosedure
as veranderlike seleksie tegniek voorgestel in die RK situasie en is enkele tentatiewe resultate
daarvoor gegee. Hierdie resultate blyk belowend te wees, veral ook vir verdere navorsing.
Enkel geval diagnostiese tegnieke is beskou sowel as hul verwantskap met meervoudige geval
tegnieke. In die besonder is veral meervoudige gevalle beskou wat van minimum grootte is
om die parameters van die model te kan beraam, en wat ooreenkom met ‘n RK (ED). Met
sodanige benadering is regressie diagnostiese tegnieke ontwikkel vir beide EDs en RKe. Die
belangrikste probleme wat RKe negatief beinvloed, is kollineariteit en punte met hoë
hefboom waardes agv die aard van die berekeningsprosedures en die feit dat RKe se invloedfunksies begrensd is in die ruimte van die afhanklike veranderlike, maar onbegrensd is
in die ontwerpruimte. Gevolglik het RKe ‘n hoë affiniteit vir punte met hoë hefboom waardes
en poog gewoonlik om uitskieters uit te sluit. Die finale uitset wat verkry word wanneer beide
punte met hoë hefboom waardes en uitskieters voorkom, is dan die netto resultaat van hierdie
twee teenstrydige pogings. Alhoewel RKe begrensd is in die onafhanklike veranderlike (en
dus redelik robuust is tov uitskieters), is uitskieter diagnostiese tegnieke ook beskou om ‘n
meer holistiese beeld te verkry.
Die ondersoek het analitiese sowel as simulasie tegnieke gebruik. Voorts is ook gebruik
gemaak van kunsmatige datastelle en standard datastelle uit die literatuur. Hierdie ondersoeke
het getoon dat die ED gebaseerde statistieke met ‘n redelike mate van sukses gebruik kan
word om probleme in die RK omgewing aan te spreek. Dit is egter belangrik om daarop te let
dat as gevolg van die interafhanklikheid tussen kollineariteit en punte met hoë hefboom
waardes asook dié tussen punte met hoë hefboom waardes en uitskieters, “oplossings”
dikwels afhanklik is van die bepaalde situasie. Ten spyte van hierdie kompleksiteit, is op
grond van die navorsing wat gedoen is, tog redelike algemene riglyne verkry wat nuttig in die
praktyk gebruik kan word.
|
65 |
Improved estimation procedures for a positive extreme value indexBerning, Thomas Louw 12 1900 (has links)
Thesis (PhD (Statistics))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: In extreme value theory (EVT) the emphasis is on extreme (very small or very large) observations. The crucial parameter when making inferences about extreme quantiles, is called the extreme value index (EVI). This thesis concentrates on only the right tail of the underlying distribution (extremely large observations), and specifically situations where the EVI is assumed to be positive. A positive EVI indicates that the underlying distribution of the data has a heavy right tail, as is the case with, for example, insurance claims data.
There are numerous areas of application of EVT, since there are a vast number of situations in which one would be interested in predicting extreme events accurately. Accurate prediction requires accurate estimation of the EVI, which has received ample attention in the literature from a theoretical as well as practical point of view.
Countless estimators of the EVI exist in the literature, but the practitioner has little information on how these estimators compare. An extensive simulation study was designed and conducted to compare the performance of a wide range of estimators, over a wide range of sample sizes and distributions.
A new procedure for the estimation of a positive EVI was developed, based on fitting the perturbed Pareto distribution (PPD) to observations above a threshold, using Bayesian methodology. Attention was also given to the development of a threshold selection technique.
One of the major contributions of this thesis is a measure which quantifies the stability (or rather instability) of estimates across a range of thresholds. This measure can be used to objectively obtain the range of thresholds over which the estimates are most stable. It is this measure which is used for the purpose of threshold selection for the proposed PPD estimator.
A case study of five insurance claims data sets illustrates how data sets can be analyzed in practice. It is shown to what extent discretion can/should be applied, as well as how different estimators can be used in a complementary fashion to give more insight into the nature of the data and the extreme tail of the underlying distribution. The analysis is carried out from the point of raw data, to the construction of tables which can be used directly to gauge the risk of the insurance portfolio over a given time frame. / AFRIKAANSE OPSOMMING: Die veld van ekstreemwaardeteorie (EVT) is bemoeid met ekstreme (baie klein of baie groot) waarnemings. Die parameter wat deurslaggewend is wanneer inferensies aangaande ekstreme kwantiele ter sprake is, is die sogenaamde ekstreemwaarde-indeks (EVI). Hierdie verhandeling konsentreer op slegs die regterstert van die onderliggende verdeling (baie groot waarnemings), en meer spesifiek, op situasies waar aanvaar word dat die EVI positief is. ’n Positiewe EVI dui aan dat die onderliggende verdeling ’n swaar regterstert het, wat byvoorbeeld die geval is by versekeringseis data.
Daar is verskeie velde waar EVT toegepas word, aangesien daar ’n groot aantal situasies is waarin mens sou belangstel om ekstreme gebeurtenisse akkuraat te voorspel. Akkurate voorspelling vereis die akkurate beraming van die EVI, wat reeds ruim aandag in die literatuur geniet het, uit beide teoretiese en praktiese oogpunte.
’n Groot aantal beramers van die EVI bestaan in die literatuur, maar enige persoon wat die toepassing van EVT in die praktyk beoog, het min inligting oor hoe hierdie beramers met mekaar vergelyk. ’n Uitgebreide simulasiestudie is ontwerp en uitgevoer om die akkuraatheid van beraming van ’n groot verskeidenheid van beramers in die literatuur te vergelyk. Die studie sluit ’n groot verskeidenheid van steekproefgroottes en onderliggende verdelings in.
’n Nuwe prosedure vir die beraming van ’n positiewe EVI is ontwikkel, gebaseer op die passing van die gesteurde Pareto verdeling (PPD) aan waarnemings wat ’n gegewe drempel oorskrei, deur van Bayes tegnieke gebruik te maak. Aandag is ook geskenk aan die ontwikkeling van ’n drempelseleksiemetode.
Een van die hoofbydraes van hierdie verhandeling is ’n maatstaf wat die stabiliteit (of eerder onstabiliteit) van beramings oor verskeie drempels kwantifiseer. Hierdie maatstaf bied ’n objektiewe manier om ’n gebied (versameling van drempelwaardes) te verkry waaroor die beramings die stabielste is. Dit is hierdie maatstaf wat gebruik word om drempelseleksie te doen in die geval van die PPD beramer.
’n Gevallestudie van vyf stelle data van versekeringseise demonstreer hoe data in die praktyk geanaliseer kan word. Daar word getoon tot watter mate diskresie toegepas kan/moet word, asook hoe verskillende beramers op ’n komplementêre wyse ingespan kan word om meer insig te verkry met betrekking tot die aard van die data en die stert van die onderliggende verdeling. Die analise word uitgevoer vanaf die punt waar slegs rou data beskikbaar is, tot op die punt waar tabelle saamgestel is wat direk gebruik kan word om die risiko van die versekeringsportefeulje te bepaal oor ’n gegewe periode.
|
66 |
Edgeworth-corrected small-sample confidence intervals for ratio parameters in linear regressionBinyavanga, Kamanzi-wa 03 1900 (has links)
Dissertation (PhD)--Stellenbosch University, 2002. / ENGLISH ABSTRACT: In this thesis we construct a central confidence interval for a smooth scalar non-linear function of
parameter vector f3 in a single general linear regression model Y = X f3 + c. We do this by first
developing an Edgeworth expansion for the distribution function of a standardised point estimator.
The confidence interval is then constructed in the manner discussed. Simulation studies reported at
the end of the thesis show the interval to perform well in many small-sample situations.
Central to the development of the Edgeworth expansion is our use of the index notation which, in
statistics, has been popularised by McCullagh (1984, 1987).
The contributions made in this thesis are of two kinds. We revisit the complex McCullagh Index
Notation, modify and extend it in certain respects as well as repackage it in the manner that is more
accessible to other researchers.
On the new contributions, in addition to the introduction of a new small-sample confidence interval,
we extend the theory of stochastic polynomials (SP) in three respects. A method, which we believe to
be the simplest and most transparent to date, is proposed for deriving cumulants for these. Secondly,
the theory of the cumulants of the SP is developed both in the context of Edgeworth expansion as well
as in the regression setting. Thirdly, our new method enables us to propose a natural alternative to
the method of Hall (1992a, 1992b) regarding skewness-reduction in Edgeworth expansions. / AFRIKAANSE OPSOMMING: In hierdie proefskrif word daar aandag gegee aan die konstruksie van 'n sentrale vertrouensinterval
vir 'n gladde skalare nie-lineêre funksie van die parametervektor (3 in 'n enkele algemene lineêre
regressiemodel y = X (3 + e.. Dit behels eerstens die ontwikkeling van 'n Edgeworth uitbreiding
vir die verdelingsfunksie van 'n gestandaardiseerde puntberamer. Die vertrouensinterval word dan op
grond van hierdie uitbreiding gekonstrueer. Simulasiestudies wat aan die einde van die proefskrif
gerapporteer word, toon dat die voorgestelde interval goed vertoon in verskeie klein-steekproef
gevalle.
Die gebruik van indeksnotasie, wat in die statistiek deur McCullagh (1984, 1987) bekendgestel is,
speel 'n sentrale rol in die ontwikkeling van die Edgeworth uitbreiding.
Die bydrae wat in hierdie proefskrif gemaak word, is van 'n tweërlei aard. Die ingewikkelde
Indeksnotasie van McCullagh word ondersoek, aangepas en ten opsigte van sekere aspekte uitgebrei.
Die notasie word ook aangebied in 'n vorm wat dit hopelik meer toeganklik sal maak vir ander
navorsers.
Betreffende die bydrae wat gemaak word, word 'n nuwe klein-steekproef vertrouensinterval
voorgestel, en word die teorie van stogastiese polinome (SP) ook in drie opsigte uitgebrei. 'n Metode
word voorgestelom die kumulante van SP'e af te lei. Ons glo dat hierdie metode die duidelikste
en eenvoudigste metode is wat tot dusver hiervoor voorgestel is. Tweedens word die teorie van die
kumulante van SP'e ontwikkel binne die konteks van Edgeworth uitbreidings, sowel as die konteks
van regressie. Derdens stelons nuwe metode ons in staat om 'n natuurlike alternatief voor te stel vir
die metode van Hall (1992a, 1992b) vir die vermindering van skeefheid in Edgeworth uitbreidings.
|
67 |
Influential data cases when the C-p criterion is used for variable selection in multiple linear regressionUys, Daniel Wilhelm January 2003 (has links)
Dissertation (PhD)--Stellenbosch University, 2003. / ENGLISH ABSTRACT: In this dissertation we study the influence of data cases when the Cp criterion of Mallows (1973)
is used for variable selection in multiple linear regression. The influence is investigated in
terms of the predictive power and the predictor variables included in the resulting model when
variable selection is applied. In particular, we focus on the importance of identifying and
dealing with these so called selection influential data cases before model selection and fitting
are performed. For this purpose we develop two new selection influence measures, both based
on the Cp criterion. The first measure is specifically developed to identify individual selection
influential data cases, whereas the second identifies subsets of selection influential data cases.
The success with which these influence measures identify selection influential data cases, is
evaluated in example data sets and in simulation. All results are derived in the coordinate free
context, with special application in multiple linear regression. / AFRIKAANSE OPSOMMING: Invloedryke waarnemings as die C-p kriterium vir veranderlike seleksie in meervoudigelineêre regressie gebruik word: In hierdie proefskrif ondersoek ons die invloed van waarnemings as die Cp kriterium van Mallows
(1973) vir veranderlike seleksie in meervoudige lineêre regressie gebruik word. Die
invloed van waarnemings op die voorspellingskrag en die onafhanklike veranderlikes wat ingesluit
word in die finale geselekteerde model, word ondersoek. In besonder fokus ons op
die belangrikheid van identifisering van en handeling met sogenaamde seleksie invloedryke
waarnemings voordat model seleksie en passing gedoen word. Vir hierdie doel word twee
nuwe invloedsmaatstawwe, albei gebaseer op die Cp kriterium, ontwikkel. Die eerste maatstaf
is spesifiek ontwikkelom die invloed van individuele waarnemings te meet, terwyl die tweede
die invloed van deelversamelings van waarnemings op die seleksie proses meet. Die sukses
waarmee hierdie invloedsmaatstawwe seleksie invloedryke waarnemings identifiseer word
beoordeel in voorbeeld datastelle en in simulasie. Alle resultate word afgelei binne die koërdinaatvrye
konteks, met spesiale toepassing in meervoudige lineêre regressie.
|
68 |
Evaluating the properties of sensory tests using computer intensive and biplot methodologiesMeintjes, M. M. (Maria Magdalena) 03 1900 (has links)
Assignment (MComm)--University of Stellenbosch, 2007. / ENGLISH ABSTRACT: This study is the result of part-time work done at a product development centre. The organisation extensively makes use of trained panels in sensory trials designed to asses the quality of its product. Although standard statistical procedures are used for analysing the results arising from these trials, circumstances necessitate deviations from the prescribed protocols. Therefore the validity of conclusions drawn as a result of these testing procedures might be questionable. This assignment deals with these questions.
Sensory trials are vital in the development of new products, control of quality levels and the exploration of improvement in current products. Standard test procedures used to explore such questions exist but are in practice often implemented by investigators who have little or no statistical background. Thus test methods are implemented as black boxes and procedures are used blindly without checking all the appropriate assumptions and other statistical requirements. The specific product under consideration often warrants certain modifications to the standard methodology. These changes may have some unknown effect on the obtained results and therefore should be scrutinized to ensure that the results remain valid.
The aim of this study is to investigate the distribution and other characteristics of sensory data, comparing the hypothesised, observed and bootstrap distributions. Furthermore, the standard testing methods used to analyse sensory data sets will be evaluated. After comparing these methods, alternative testing methods may be introduced and then tested using newly generated data sets.
Graphical displays are also useful to get an overall impression of the data under consideration. Biplots are especially useful in the investigation of multivariate sensory data. The underlying relationships among attributes and their combined effect on the panellists’ decisions can be visually investigated by constructing a biplot. Results obtained by implementing biplot methods are compared to those of sensory tests, i.e. whether a significant difference between objects will correspond to large distances between the points representing objects in the display. In conclusion some recommendations are made as to how the organisation under consideration should implement sensory procedures in future trials. However, these proposals are preliminary and further research is necessary before final adoption. Some issues for further investigation are suggested. / AFRIKAANSE OPSOMMING: Hierdie studie spruit uit deeltydse werk by ’n produk-ontwikkeling-sentrum. Die organisasie maak in al hul sensoriese proewe rakende die kwaliteit van hul produkte op groot skaal gebruik van opgeleide panele. Alhoewel standaard prosedures ingespan word om die resultate te analiseer, noodsaak sekere omstandighede dat die voorgeskrewe protokol in ’n aangepaste vorm geïmplementeer word. Dié aanpassings mag meebring dat gevolgtrekkings gebaseer op resultate ongeldig is. Hierdie werkstuk ondersoek bogenoemde probleem.
Sensoriese proewe is noodsaaklik in kwaliteitbeheer, die verbetering van bestaande produkte, asook die ontwikkeling van nuwe produkte. Daar bestaan standaard toets- prosedures om vraagstukke te verken, maar dié word dikwels toegepas deur navorsers met min of geen statistiese kennis. Dit lei daartoe dat toetsprosedures blindelings geïmplementeer en resultate geïnterpreteer word sonder om die nodige aannames en ander statistiese vereistes na te gaan. Alhoewel ’n spesifieke produk die wysiging van die standaard metode kan regverdig, kan hierdie veranderinge ’n groot invloed op die resultate hê. Dus moet die geldigheid van die resultate noukeurig ondersoek word.
Die doel van hierdie studie is om die verdeling sowel as ander eienskappe van sensoriese data te bestudeer, deur die verdeling onder die nulhipotese sowel as die waargenome- en skoenlusverdelings te beskou. Verder geniet die standaard toetsprosedure, tans in gebruik om sensoriese data te analiseer, ook aandag. Na afloop hiervan word alternatiewe toetsprosedures voorgestel en dié geëvalueer op nuut gegenereerde datastelle.
Grafiese voorstellings is ook nuttig om ’n geheelbeeld te kry van die data onder bespreking. Bistippings is veral handig om meerdimensionele sensoriese data te bestudeer. Die onderliggende verband tussen die kenmerke van ’n produk sowel as hul gekombineerde effek op ’n paneel se besluit, kan hierdeur visueel ondersoek word. Resultate verkry in die voorstellings word vergelyk met dié van sensoriese toetsprosedures om vas te stel of statisties betekenisvolle verskille in ’n produk korrespondeer met groot afstande tussen die relevante punte in die bistippingsvoorstelling.
Ten slotte word sekere aanbevelings rakende die implementering van sensoriese proewe in die toekoms aan die betrokke organisasie gemaak. Hierdie aanbevelings word gemaak op grond van die voorafgaande ondersoeke, maar verdere navorsing is nodig voor die finale aanvaarding daarvan. Waar moontlik, word voorstelle vir verdere ondersoeke gedoen.
|
69 |
Estimating the window period and incidence of recently infected HIV patients.Du Toit, Cari 03 1900 (has links)
Thesis (MComm (Statistics and Actuarial Science))--University of Stellenbosch, 2009. / Incidence can be defined as the rate of occurence of new infections of a disease like HIV and
is an useful estimate of trends in the epidemic. Annualised incidence can be expressed as a
proportion, namely the number of recent infections per year divided by the number of people at
risk of infection. This number of recent infections is dependent on the window period, which
is basically the period of time from seroconversion to being classified as a long-term infection
for the first time. The BED capture enzyme immunoassay was developed to provide a way to
distinguish between recent and long-term infections. An optical density (OD) measurement is
obtained from this assay. Window period is defined as the number of days since seroconversion,
with a baseline OD value of 0, 0476 to the number of days to reach an optical density of 0, 8.The
aim of this study is to describe different techniques to estimate the window period which may
subsequently lead to alternative estimates of annualised incidence of HIV infection. These
various techniques are applied to different subsets of the Zimbabwe Vitamin A for Mothers and
Babies (ZVITAMBO) dataset.
Three different approaches are described to analyse window periods: a non-parametric survival
analysis approach, the fitting of a general linear mixed model in a longitudinal data setting and
a Bayesian approach of assigning probability distributions to the parameters of interest. These
techniques are applied to different subsets and transformations of the data and the estimated
mean and median window periods are obtained and utilised in the calculation of incidence.
|
70 |
A comparison of support vector machines and traditional techniques for statistical regression and classificationHechter, Trudie 04 1900 (has links)
Thesis (MComm)--Stellenbosch University, 2004. / ENGLISH ABSTRACT: Since its introduction in Boser et al. (1992), the support vector machine has become a
popular tool in a variety of machine learning applications. More recently, the support
vector machine has also been receiving increasing attention in the statistical
community as a tool for classification and regression. In this thesis support vector
machines are compared to more traditional techniques for statistical classification and
regression. The techniques are applied to data from a life assurance environment for a
binary classification problem and a regression problem. In the classification case the
problem is the prediction of policy lapses using a variety of input variables, while in
the regression case the goal is to estimate the income of clients from these variables.
The performance of the support vector machine is compared to that of discriminant
analysis and classification trees in the case of classification, and to that of multiple
linear regression and regression trees in regression, and it is found that support vector
machines generally perform well compared to the traditional techniques. / AFRIKAANSE OPSOMMING: Sedert die bekendstelling van die ondersteuningspuntalgoritme in Boser et al. (1992),
het dit 'n populêre tegniek in 'n verskeidenheid masjienleerteorie applikasies geword.
Meer onlangs het die ondersteuningspuntalgoritme ook meer aandag in die statistiese
gemeenskap begin geniet as 'n tegniek vir klassifikasie en regressie. In hierdie tesis
word ondersteuningspuntalgoritmes vergelyk met meer tradisionele tegnieke vir
statistiese klassifikasie en regressie. Die tegnieke word toegepas op data uit 'n
lewensversekeringomgewing vir 'n binêre klassifikasie probleem sowel as 'n
regressie probleem. In die klassifikasiegeval is die probleem die voorspelling van
polisvervallings deur 'n verskeidenheid invoer veranderlikes te gebruik, terwyl in die
regressiegeval gepoog word om die inkomste van kliënte met behulp van hierdie
veranderlikes te voorspel. Die resultate van die ondersteuningspuntalgoritme word
met dié van diskriminant analise en klassifikasiebome vergelyk in die
klassifikasiegeval, en met veelvoudige linêere regressie en regressiebome in die
regressiegeval. Die gevolgtrekking is dat ondersteuningspuntalgoritmes oor die
algemeen goed vaar in vergelyking met die tradisionele tegnieke.
|
Page generated in 0.0662 seconds