Spelling suggestions: "subject:"etheses -- estatistics"" "subject:"etheses -- cstatistics""
11 |
Modelling CD4+ count over time in HIV positive patients initiated on HAART in South Africa using linear mixed models.Yende Zuma, Nonhlanhla. January 2009 (has links)
HIV is among the highly infectious and pathogenic diseases with a high mortality rate. The spread of HIV is in uenced by several individual based epidemiological factors such as age, gender, mobility, sexual partner pro le and the presence of sexually transmitted infections (STI). CD4+ count over time provided the rst surrogate marker of HIV disease progression and is currently used for clinical management of HIV-positive patients. The CD4+ count as a key disease marker is repeatedly measured among those individuals who test HIV positive to monitor the progression of the disease since it is known that HIV/AIDS is a long wave event. This gives rise to what is commonly known as longitudinal data. The aim of this project is to determine if the patients' weight, baseline age, sex, viral load and clinic site, in uences the rate of change in CD4+ count over time. We will use data of patients who commenced highly active antiretroviral therapy (HAART) from the Center for the AIDS Programme of Research in South Africa (CAPRISA) in the AIDS Treatment Project (CAT) between June 2004 and September 2006, including two years of follow-up for each patient. Analysis was done using linear mixed models methods for longitudinal data. The results showed that larger increase in CD4+ count over time was observed in females and individuals who were younger. However, upon tting baseline log viral load in the model instead of the log viral at all visits was that, larger increase in CD4+ count was observed in females, individuals who were younger, had higher baseline log viral load and lower weight. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2009.
|
12 |
Modeling environmental factors affecting the growth of eucalypt clones.January 2009 (has links)
Tree growth is influenced by environment and genetic factors. The same tree growing in different areas will have different growth patterns. Trees with different genetic material, e.g. pine and Eucalyptus trees, growing under the same environmental conditions have different growth patterns. Plantation trees in South Africa are mainly used for pulp and paper production. Growth is an important economic factor in the pulp and paper industry. Plantations with fast growth will be available for processing earlier compared to a slow growth plantation. Consequently, it is important to understand the role played by environmental factors, especially climatic factors, on tree growth. This thesis investigated the climatic effects on the radial growth of two Eucalyptus clones using growth data collected daily over five years by Sappi. The general linear model and the time series models were used to assess the effects of climate on radial growth of the two clones. It was found that the two clones have similar overall growth patterns over time, but differ in growth rates. The growth pattern of the two clones appears to be characterized by substantial jumps/changes in growth rates over time. The times at which the jumps/changes in growth rate occur are referred to as the “breakpoints”. The piecewise linear regression model was used to estimate when the breakpoints occur. After estimating the breakpoints, the climatic effects associated with these breakpoints were investigated. The linear and time series modeling results indicated that the contribution of climatic factors on radial growth of Eucalyptus clones was small. Most of the variation in radial growth was explained by the age of the trees. Consequently, this thesis also investigated the appropriate functional relationship between radial growth and age. In particular, this nonlinear growth models were used to model the radial growth process. The investigated growth curve models were those which included the maximum radius and the age at which the radial growth rate is largest as some of the parameters. The maximum growth rate was calculated from the estimated model of each clone. The results indicated that the two clones reach the maximum growth rate at different times. In particular, the two clones reach the maximum growth rates at around 368 and 376 days, respectively. Furthermore, the maximum radius was found to be different for the two clones. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2009.
|
13 |
Spatial analysis and efficiency of systematic designs in intercropping experiments.Wandiembe, Symon Peter. January 2002 (has links)
In studies involving intercropping plant populations, the main interest is to locate the position of the maximum response or to study the response pattern. Such studies normally require many plant population levels. Thus, designs such as spacing systematic designs that minimise experimental land area are desired. Randomised block designs may not perform well as they allow few population levels which may not span the maximum or enable exploration of other features of the response surface. However, lack of complete randomisation in systematic designs may imply spatial variability (largescale and small-scale variations i.e. trend and spatial dependence) in observations. There is no correct statistical method laid out for data analysis from such designs. Given that spacing systematic designs are not well explored in literature, the main thrusts of this study are two fold; namely, to explore the use of spatial modelling techniques in analysing and modelling data from systematic designs, and to evaluate the efficiency of systematic designs used in intercropping experiments. Three classes of models for trend and error modelling are explored/introduced. These include spatial linear mixed models, semi-parametric mixed models and beta-hat models incorporating spatial variability. The reliability and precision of these methods are demonstrated. Relative efficiency of systematic designs to completely randomised design are evaluated. The analysis of data from systematic designs is shown be easily implemented. Measures of efficiency that include <pp directed measures (A and E criteria), D1 and DB efficiencies for regression parameters, and power are used. Systematic designs are shown to be efficient; on average 72% for A and E- efficiencies and 93% for D1 and DB efficiencies. Overall, these results suggest that systematic designs are suitable and reliable for intercropping plant population studies. / Thesis (M.Sc.) - University of Natal, Pietermaritzburg, 2002
|
14 |
Stochastic volatility effects on defaultable bonds.Mkize, Thembisile. January 2009 (has links)
We study the eff ects of stochastic volatility of defaultable bonds using the first -passage structural approach. In this approach Black and Cox (1976) argued that default can happen at any time. This then led to the development of afirst-passage model, in which a rm (company) default occurs when its value falls to a barrier. In the first-passage model the rm debt is considered to be a single pure discount bond and default occurs only if the rm value falls below the face value of the bond at maturity. Here the firm's debt can be viewed as a portfolio composed of a risk-free bond and a short-put option on the value of a rm. The classic Black-Scholes-Merton model only considers a single liability and the solvency is tested at the maturity date, while the extended Black-Scholes-Merton model allows for default at any time before maturity to cater for more complex capital structures and was delivered by Geske, Black-Cox, Leland, Leland and Toft and others. In this work a review of the eff ect of stochastic volatility on defaultable bonds is given. In addition a study from the first-passage structural approach and reduced-form approach is made. We also introduce symmetry analysis to study some of the equations that appear in option-pricing models. This approach is quite recent and has produced successful results. In this work we lay the foundation of this method. Keywords: Stochastic Volatility, Defaultable bonds, Lie Symmetries. / Thesis (M.Sc.)-University of KwaZulu-Natal, Westville, 2009.
|
15 |
Evaluation of strategies to combine multiple biomarkers in diagnostic testing.Mohammed, Muna Balla Elshareef. January 2012 (has links)
A challenge in clinical medicine is that of correct diagnosis of disease. Medical researchers invest
considerable time and effort to enhance accurate disease diagnosis. Diagnostic tests are important
components in modern medical practice. The receiver operating characteristic (ROC) is a commonly
used statistical tool for describing the discriminatory accuracy and performance of a diagnostic
test. A popular summary index of discriminatory accuracy is the area under ROC curve (AUC).
In the era of high-dimensional data, scientists are evaluating hundreds to multiple thousands of
biomarkers simultaneously. A critical challenge is the combination of these markers into models
that give insight into disease. In infectious disease, markers are often evaluated in the host as well
as in the microorganism or virus causing infection, adding more complexity to the analysis. In
addition to providing an improved understanding of factors associated with infection and disease
development, combinations of relevant markers is important to diagnose and treat disease. Taken
together, this presents many novel and major challenges to, and extends the role of, the statistical
analyst.
In this thesis, we will address the problem of how to select from multiple markers using existing
methods. Logistic regression models offer a simple method for combining markers. We applied
resampling methods (e.g., Cross-Validation and bootstrap) to adjust for overfitting associated with
model selection. We simulated several multivariate models to evaluate the performance of the resampling
approaches in this setting. We applied the methods to data collected from a study of
tuberculosis immune reconstitution inflammatory syndrome (TB-IRIS) in Cape Town. Baseline levels
of five biomarkers were evaluated and we used this dataset to evaluate whether a combination
of these biomarkers could accurately discriminate between Tuberculosis Immune Reconstitution
Inflammatory Syndrome (TB-IRIS) and non TB-IRIS patients, applying AUC analysis and resampling
methods. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2012.
|
16 |
Use of statistical modelling and analyses of malaria rapid diagnostic test outcome in Ethiopia.Ayele, Dawit Getnet. 12 December 2013 (has links)
The transmission of malaria is among the leading public health problems in
Ethiopia. From the total area of Ethiopia, more than 75% is malarious. Identifying
the infectiousness of malaria by socio-economic, demographic and geographic risk
factors based on the malaria rapid diagnosis test (RDT) survey results has several
advantages for planning, monitoring and controlling, and eventual malaria
eradication effort. Such a study requires thorough understanding of the diseases
process and associated factors. However such studies are limited. Therefore, the
aim of this study was to use different statistical tools suitable to identify socioeconomic,
demographic and geographic risk factors of malaria based on the
malaria rapid diagnosis test (RDT) survey results in Ethiopia. A total of 224
clusters of about 25 households were selected from the Amhara, Oromiya and
Southern Nation Nationalities and People (SNNP) regions of Ethiopia. Accordingly,
a number of binary response statistical analysis models were used. Multiple
correspondence analysis was carried out to identify the association among socioeconomic,
demographic and geographic factors. Moreover a number of binary
response models such as survey logistic, GLMM, GLMM with spatial correlation,
joint models and semi-parametric models were applied. To test and investigate how well the observed malaria RDT result, use of mosquito nets and use of indoor residual spray data fit the expectations of the model, Rasch model was used. The fitted models have their own strengths and weaknesses. Application of
these models was carried out by analysing data on malaria RDT result. The data
used in this study, which was conducted from December 2006 to January 2007 by
The Carter Center, is from baseline malaria indicator survey in Amhara, Oromiya
and Southern Nation Nationalities and People (SNNP) regions of Ethiopia.
The correspondence analysis and survey logistic regression model was used to
identify predictors which affect malaria RDT results. The effect of identified socioeconomic,
demographic and geographic factors were subsequently explored by
fitting a generalized linear mixed model (GLMM), i.e., to assess the covariance
structures of the random components (to assess the association structure of the
data). To examine whether the data displayed any spatial autocorrelation, i.e.,
whether surveys that are near in space have malaria prevalence or incidence that
is similar to the surveys that are far apart, spatial statistics analysis was
performed. This was done by introducing spatial autocorrelation structure in
GLMM. Moreover, the customary two variables joint modelling approach was
extended to three variables joint effect by exploring the joint effect of malaria RDT
result, use of mosquito nets and indoor residual spray in the last twelve months.
Assessing the association between these outcomes was also of interest.
Furthermore, the relationships between the response and some confounding
covariates may have unknown functional form. This led to proposing the use of
semiparametric additive models which are less restrictive in their specification.
Therefore, generalized additive mixed models were used to model the effect of age,
family size, number of rooms per person, number of nets per person, altitude and
number of months the room sprayed nonparametrically. The result from the study
suggests that with the correct use of mosquito nets, indoor residual spraying and
other preventative measures, coupled with factors such as the number of rooms in
a house, are associated with a decrease in the incidence of malaria as determined
by the RDT. However, the study also suggests that the poor are less likely to use
these preventative measures to effectively counteract the spread of malaria. In
order to determine whether or not the limited number of respondents had undue
influence on the malaria RDT result, a Rasch model was used. The result shows
that none of the responses had such influences. Therefore, application of the
Rasch model has supported the viability of the total sixteen (socio-economic,
demographic and geographic) items for measuring malaria RDT result, use of
indoor residual spray and use of mosquito nets. From the analysis it can be seen
that the scale shows high reliability. Hence, the result from Rasch model supports the analysis carried out in previous models. / Thesis (Ph.D.)-University of KwaZulu-Natal, Pietermaritzburg, 2013.
|
17 |
Statistical modelling of availability of major food cereals in Lesotho : application of regression models and diagnostics.Khoeli, Makhala Bernice. January 2012 (has links)
Oftentimes, application of regression models to analyse cereals data is limited to estimating and
predicting crop production or yield. The general approach has been to fit the model without much
consideration of the problems that accompany application of regression models to real life data, such
as collinearity, models not fitting the data correctly and violation of assumptions. These problems
may interfere with applicability and usefulness of the models, and compromise validity of results
if they are not corrected when fitting the model. We applied regression models and diagnostics
on national and household data to model availability of main cereals in Lesotho, namely, maize,
sorghum and wheat. The application includes the linear regression model, regression and collinear
diagnostics, Box-Cox transformation, ridge regression, quantile regression, logistic regression and
its extensions with multiple nominal and ordinal responses.
The Linear model with first-order autoregressive process AR(1) was used to determine factors
that affected availability of cereals at the national level. Case deletion diagnostics were used to
identify extreme observations with influence on different quantities of the fitted regression model,
such as estimated parameters, predicted values, and covariance matrix of the estimates. Collinearity
diagnostics detected the presence of more than one collinear relationship coexisting in the data
set. They also determined variables involved in each relationship, and assessed potential negative
impact of collinearity on estimated parameters. Ridge regression remedied collinearity problems
by controlling inflation and instability of estimates. The Box-Cox transformation corrected non-constant
variance, longer and heavier tails of the distribution of data. These increased applicability
and usefulness of the linear models in modeling availability of cereals.
Quantile regression, as a robust regression, was applied to the household data as an alternative
to classical regression. Classical regression estimates from ordinary least squares method are sensitive
to distributions with longer and heavier tails than the normal distribution, as well as to
outliers. Quantile regression estimates appear to be more efficient than least squares estimates for
a wide range of error term distribution. We studied availability of cereals further by categorizing
households according to availability of different cereals, and applied the logistic regression model
and its extensions. Logistic regression was applied to model availability and non-availability of
cereals. Multinomial logistic regression was applied to model availability with nominal multiple
categories. Ordinal logistic regression was applied to model availability with ordinal categories and
this made full use of available information. The three variants of logistic regression model gave
results that are in agreement, which are also in agreement with the results from the linear regression
model and quantile regression model. / Thesis (Ph.D.)-University of KwaZulu-Natal, Durban, 2012.
|
18 |
Some statistical aspects of LULU smoothersJankowitz, Maria Dorothea 12 1900 (has links)
Thesis (PhD (Statistics and Actuarial Science))--University of Stellenbosch, 2007. / The smoothing of time series plays a very important role in various practical applications. Estimating
the signal and removing the noise is the main goal of smoothing. Traditionally linear smoothers were
used, but nonlinear smoothers became more popular through the years.
From the family of nonlinear smoothers, the class of median smoothers, based on order statistics, is the
most popular. A new class of nonlinear smoothers, called LULU smoothers, was developed by using
the minimum and maximum selectors. These smoothers have very attractive mathematical properties.
In this thesis their statistical properties are investigated and compared to that of the class of median
smoothers.
Smoothing, together with related concepts, are discussed in general. Thereafter, the class of median
smoothers, from the literature is discussed. The class of LULU smoothers is defined, their properties
are explained and new contributions are made. The compound LULU smoother is introduced and its
property of variation decomposition is discussed. The probability distributions of some LULUsmoothers
with independent data are derived. LULU smoothers and median smoothers are compared according
to the properties of monotonicity, idempotency, co-idempotency, stability, edge preservation, output
distributions and variation decomposition. A comparison is made of their respective abilities for signal
recovery by means of simulations. The success of the smoothers in recovering the signal is measured
by the integrated mean square error and the regression coefficient calculated from the least squares
regression of the smoothed sequence on the signal. Finally, LULU smoothers are practically applied.
|
19 |
Statistical inference for inequality measures based on semi-parametric estimatorsKpanzou, Tchilabalo Abozou 12 1900 (has links)
Thesis (PhD)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Measures of inequality, also used as measures of concentration or diversity, are very popular in economics
and especially in measuring the inequality in income or wealth within a population and between
populations. However, they have applications in many other fields, e.g. in ecology, linguistics, sociology,
demography, epidemiology and information science.
A large number of measures have been proposed to measure inequality. Examples include the Gini
index, the generalized entropy, the Atkinson and the quintile share ratio measures. Inequality measures
are inherently dependent on the tails of the population (underlying distribution) and therefore their
estimators are typically sensitive to data from these tails (nonrobust). For example, income distributions
often exhibit a long tail to the right, leading to the frequent occurrence of large values in samples. Since
the usual estimators are based on the empirical distribution function, they are usually nonrobust to such
large values. Furthermore, heavy-tailed distributions often occur in real life data sets, remedial action
therefore needs to be taken in such cases.
The remedial action can be either a trimming of the extreme data or a modification of the (traditional)
estimator to make it more robust to extreme observations. In this thesis we follow the second option,
modifying the traditional empirical distribution function as estimator to make it more robust. Using results
from extreme value theory, we develop more reliable distribution estimators in a semi-parametric
setting. These new estimators of the distribution then form the basis for more robust estimators of the
measures of inequality. These estimators are developed for the four most popular classes of measures,
viz. Gini, generalized entropy, Atkinson and quintile share ratio. Properties of such estimators
are studied especially via simulation. Using limiting distribution theory and the bootstrap methodology,
approximate confidence intervals were derived. Through the various simulation studies, the proposed
estimators are compared to the standard ones in terms of mean squared error, relative impact of contamination,
confidence interval length and coverage probability. In these studies the semi-parametric
methods show a clear improvement over the standard ones. The theoretical properties of the quintile
share ratio have not been studied much. Consequently, we also derive its influence function as well as
the limiting normal distribution of its nonparametric estimator. These results have not previously been
published.
In order to illustrate the methods developed, we apply them to a number of real life data sets. Using
such data sets, we show how the methods can be used in practice for inference. In order to choose
between the candidate parametric distributions, use is made of a measure of sample representativeness
from the literature. These illustrations show that the proposed methods can be used to reach
satisfactory conclusions in real life problems. / AFRIKAANSE OPSOMMING: Maatstawwe van ongelykheid, wat ook gebruik word as maatstawwe van konsentrasie of diversiteit,
is baie populêr in ekonomie en veral vir die kwantifisering van ongelykheid in inkomste of welvaart
binne ’n populasie en tussen populasies. Hulle het egter ook toepassings in baie ander dissiplines,
byvoorbeeld ekologie, linguistiek, sosiologie, demografie, epidemiologie en inligtingskunde.
Daar bestaan reeds verskeie maatstawwe vir die meet van ongelykheid. Voorbeelde sluit in die Gini
indeks, die veralgemeende entropie maatstaf, die Atkinson maatstaf en die kwintiel aandeel verhouding.
Maatstawwe van ongelykheid is inherent afhanklik van die sterte van die populasie (onderliggende
verdeling) en beramers daarvoor is tipies dus sensitief vir data uit sodanige sterte (nierobuust). Inkomste
verdelings het byvoorbeeld dikwels lang regtersterte, wat kan lei tot die voorkoms van groot
waardes in steekproewe. Die tradisionele beramers is gebaseer op die empiriese verdelingsfunksie, en
hulle is gewoonlik dus nierobuust teenoor sodanige groot waardes nie. Aangesien swaarstert verdelings
dikwels voorkom in werklike data, moet regstellings gemaak word in sulke gevalle.
Hierdie regstellings kan bestaan uit of die afknip van ekstreme data of die aanpassing van tradisionele
beramers om hulle meer robuust te maak teen ekstreme waardes. In hierdie tesis word die
tweede opsie gevolg deurdat die tradisionele empiriese verdelingsfunksie as beramer aangepas word
om dit meer robuust te maak. Deur gebruik te maak van resultate van ekstreemwaardeteorie, word
meer betroubare beramers vir verdelings ontwikkel in ’n semi-parametriese opset. Hierdie nuwe beramers
van die verdeling vorm dan die basis vir meer robuuste beramers van maatstawwe van ongelykheid.
Hierdie beramers word ontwikkel vir die vier mees populêre klasse van maatstawwe, naamlik
Gini, veralgemeende entropie, Atkinson en kwintiel aandeel verhouding. Eienskappe van hierdie
beramers word bestudeer, veral met behulp van simulasie studies. Benaderde vertrouensintervalle
word ontwikkel deur gebruik te maak van limietverdelingsteorie en die skoenlus metodologie. Die
voorgestelde beramers word vergelyk met tradisionele beramers deur middel van verskeie simulasie
studies. Die vergelyking word gedoen in terme van gemiddelde kwadraat fout, relatiewe impak van
kontaminasie, vertrouensinterval lengte en oordekkingswaarskynlikheid. In hierdie studies toon die
semi-parametriese metodes ’n duidelike verbetering teenoor die tradisionele metodes. Die kwintiel
aandeel verhouding se teoretiese eienskappe het nog nie veel aandag in die literatuur geniet nie.
Gevolglik lei ons die invloedfunksie asook die asimptotiese verdeling van die nie-parametriese beramer
daarvoor af.
Ten einde die metodes wat ontwikkel is te illustreer, word dit toegepas op ’n aantal werklike datastelle.
Hierdie toepassings toon hoe die metodes gebruik kan word vir inferensie in die praktyk. ’n Metode
in die literatuur vir steekproefverteenwoordiging word voorgestel en gebruik om ’n keuse tussen die
kandidaat parametriese verdelings te maak. Hierdie voorbeelde toon dat die voorgestelde metodes
met vrug gebruik kan word om bevredigende gevolgtrekkings in die praktyk te maak.
|
20 |
Aspects of model development using regression quantiles and elemental regressionsRanganai, Edmore 03 1900 (has links)
Dissertation (PhD)--University of Stellenbosch, 2007. / ENGLISH ABSTRACT: It is well known that ordinary least squares (OLS) procedures are sensitive to deviations from
the classical Gaussian assumptions (outliers) as well as data aberrations in the design space.
The two major data aberrations in the design space are collinearity and high leverage.
Leverage points can also induce or hide collinearity in the design space. Such leverage points
are referred to as collinearity influential points. As a consequence, over the years, many
diagnostic tools to detect these anomalies as well as alternative procedures to counter them
were developed. To counter deviations from the classical Gaussian assumptions many robust
procedures have been proposed. One such class of procedures is the Koenker and Bassett
(1978) Regressions Quantiles (RQs), which are natural extensions of order statistics, to the
linear model. RQs can be found as solutions to linear programming problems (LPs). The basic
optimal solutions to these LPs (which are RQs) correspond to elemental subset (ES)
regressions, which consist of subsets of minimum size to estimate the necessary parameters of
the model.
On the one hand, some ESs correspond to RQs. On the other hand, in the literature it is shown
that many OLS statistics (estimators) are related to ES regression statistics (estimators).
Therefore there is an inherent relationship amongst the three sets of procedures. The
relationship between the ES procedure and the RQ one, has been noted almost “casually” in
the literature while the latter has been fairly widely explored. Using these existing
relationships between the ES procedure and the OLS one as well as new ones, collinearity,
leverage and outlier problems in the RQ scenario were investigated. Also, a lasso procedure
was proposed as variable selection technique in the RQ scenario and some tentative results
were given for it. These results are promising.
Single case diagnostics were considered as well as their relationships to multiple case ones. In
particular, multiple cases of the minimum size to estimate the necessary parameters of the
model, were considered, corresponding to a RQ (ES). In this way regression diagnostics were
developed for both ESs and RQs. The main problems that affect RQs adversely are
collinearity and leverage due to the nature of the computational procedures and the fact that
RQs’ influence functions are unbounded in the design space but bounded in the response
variable. As a consequence of this, RQs have a high affinity for leverage points and a high
exclusion rate of outliers. The influential picture exhibited in the presence of both leverage points and outliers is the net result of these two antagonistic forces. Although RQs are
bounded in the response variable (and therefore fairly robust to outliers), outlier diagnostics
were also considered in order to have a more holistic picture.
The investigations used comprised analytic means as well as simulation. Furthermore,
applications were made to artificial computer generated data sets as well as standard data sets
from the literature. These revealed that the ES based statistics can be used to address
problems arising in the RQ scenario to some degree of success. However, due to the
interdependence between the different aspects, viz. the one between leverage and collinearity
and the one between leverage and outliers, “solutions” are often dependent on the particular
situation. In spite of this complexity, the research did produce some fairly general guidelines
that can be fruitfully used in practice. / AFRIKAANSE OPSOMMING: Dit is bekend dat die gewone kleinste kwadraat (KK) prosedures sensitief is vir afwykings
vanaf die klassieke Gaussiese aannames (uitskieters) asook vir data afwykings in die
ontwerpruimte. Twee tipes afwykings van belang in laasgenoemde geval, is kollinearitiet en
punte met hoë hefboom waarde. Laasgenoemde punte kan ook kollineariteit induseer of
versteek in die ontwerp. Na sodanige punte word verwys as kollinêre hefboom punte. Oor die
jare is baie diagnostiese hulpmiddels ontwikkel om hierdie afwykings te identifiseer en om
alternatiewe prosedures daarteen te ontwikkel. Om afwykings vanaf die Gaussiese aanname
teen te werk, is heelwat robuuste prosedures ontwikkel. Een sodanige klas van prosedures is
die Koenker en Bassett (1978) Regressie Kwantiele (RKe), wat natuurlike uitbreidings is van
rangorde statistieke na die lineêre model. RKe kan bepaal word as oplossings van lineêre
programmeringsprobleme (LPs). Die basiese optimale oplossings van hierdie LPs (wat RKe
is) kom ooreen met die elementale deelversameling (ED) regressies, wat bestaan uit
deelversamelings van minimum grootte waarmee die parameters van die model beraam kan
word.
Enersyds geld dat sekere EDs ooreenkom met RKe. Andersyds, uit die literatuur is dit bekend
dat baie KK statistieke (beramers) verwant is aan ED regressie statistieke (beramers). Dit
impliseer dat daar dus ‘n inherente verwantskap is tussen die drie klasse van prosedures. Die
verwantskap tussen die ED en die ooreenkomstige RK prosedures is redelik “terloops” van
melding gemaak in die literatuur, terwyl laasgenoemde prosedures redelik breedvoerig
ondersoek is. Deur gebruik te maak van bestaande verwantskappe tussen ED en KK
prosedures, sowel as nuwes wat ontwikkel is, is kollineariteit, punte met hoë hefboom
waardes en uitskieter probleme in die RK omgewing ondersoek. Voorts is ‘n lasso prosedure
as veranderlike seleksie tegniek voorgestel in die RK situasie en is enkele tentatiewe resultate
daarvoor gegee. Hierdie resultate blyk belowend te wees, veral ook vir verdere navorsing.
Enkel geval diagnostiese tegnieke is beskou sowel as hul verwantskap met meervoudige geval
tegnieke. In die besonder is veral meervoudige gevalle beskou wat van minimum grootte is
om die parameters van die model te kan beraam, en wat ooreenkom met ‘n RK (ED). Met
sodanige benadering is regressie diagnostiese tegnieke ontwikkel vir beide EDs en RKe. Die
belangrikste probleme wat RKe negatief beinvloed, is kollineariteit en punte met hoë
hefboom waardes agv die aard van die berekeningsprosedures en die feit dat RKe se invloedfunksies begrensd is in die ruimte van die afhanklike veranderlike, maar onbegrensd is
in die ontwerpruimte. Gevolglik het RKe ‘n hoë affiniteit vir punte met hoë hefboom waardes
en poog gewoonlik om uitskieters uit te sluit. Die finale uitset wat verkry word wanneer beide
punte met hoë hefboom waardes en uitskieters voorkom, is dan die netto resultaat van hierdie
twee teenstrydige pogings. Alhoewel RKe begrensd is in die onafhanklike veranderlike (en
dus redelik robuust is tov uitskieters), is uitskieter diagnostiese tegnieke ook beskou om ‘n
meer holistiese beeld te verkry.
Die ondersoek het analitiese sowel as simulasie tegnieke gebruik. Voorts is ook gebruik
gemaak van kunsmatige datastelle en standard datastelle uit die literatuur. Hierdie ondersoeke
het getoon dat die ED gebaseerde statistieke met ‘n redelike mate van sukses gebruik kan
word om probleme in die RK omgewing aan te spreek. Dit is egter belangrik om daarop te let
dat as gevolg van die interafhanklikheid tussen kollineariteit en punte met hoë hefboom
waardes asook dié tussen punte met hoë hefboom waardes en uitskieters, “oplossings”
dikwels afhanklik is van die bepaalde situasie. Ten spyte van hierdie kompleksiteit, is op
grond van die navorsing wat gedoen is, tog redelike algemene riglyne verkry wat nuttig in die
praktyk gebruik kan word.
|
Page generated in 0.0647 seconds