Global ETD Search

11	Modelling CD4+ count over time in HIV positive patients initiated on HAART in South Africa using linear mixed models. Yende Zuma, Nonhlanhla. January 2009 (has links) HIV is among the highly infectious and pathogenic diseases with a high mortality rate. The spread of HIV is in uenced by several individual based epidemiological factors such as age, gender, mobility, sexual partner pro le and the presence of sexually transmitted infections (STI). CD4+ count over time provided the rst surrogate marker of HIV disease progression and is currently used for clinical management of HIV-positive patients. The CD4+ count as a key disease marker is repeatedly measured among those individuals who test HIV positive to monitor the progression of the disease since it is known that HIV/AIDS is a long wave event. This gives rise to what is commonly known as longitudinal data. The aim of this project is to determine if the patients' weight, baseline age, sex, viral load and clinic site, in uences the rate of change in CD4+ count over time. We will use data of patients who commenced highly active antiretroviral therapy (HAART) from the Center for the AIDS Programme of Research in South Africa (CAPRISA) in the AIDS Treatment Project (CAT) between June 2004 and September 2006, including two years of follow-up for each patient. Analysis was done using linear mixed models methods for longitudinal data. The results showed that larger increase in CD4+ count over time was observed in females and individuals who were younger. However, upon tting baseline log viral load in the model instead of the log viral at all visits was that, larger increase in CD4+ count was observed in females, individuals who were younger, had higher baseline log viral load and lower weight. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2009. AIDS (Disease) HIV infections.
12	Modeling environmental factors affecting the growth of eucalypt clones. January 2009 (has links) Tree growth is influenced by environment and genetic factors. The same tree growing in different areas will have different growth patterns. Trees with different genetic material, e.g. pine and Eucalyptus trees, growing under the same environmental conditions have different growth patterns. Plantation trees in South Africa are mainly used for pulp and paper production. Growth is an important economic factor in the pulp and paper industry. Plantations with fast growth will be available for processing earlier compared to a slow growth plantation. Consequently, it is important to understand the role played by environmental factors, especially climatic factors, on tree growth. This thesis investigated the climatic effects on the radial growth of two Eucalyptus clones using growth data collected daily over five years by Sappi. The general linear model and the time series models were used to assess the effects of climate on radial growth of the two clones. It was found that the two clones have similar overall growth patterns over time, but differ in growth rates. The growth pattern of the two clones appears to be characterized by substantial jumps/changes in growth rates over time. The times at which the jumps/changes in growth rate occur are referred to as the “breakpoints”. The piecewise linear regression model was used to estimate when the breakpoints occur. After estimating the breakpoints, the climatic effects associated with these breakpoints were investigated. The linear and time series modeling results indicated that the contribution of climatic factors on radial growth of Eucalyptus clones was small. Most of the variation in radial growth was explained by the age of the trees. Consequently, this thesis also investigated the appropriate functional relationship between radial growth and age. In particular, this nonlinear growth models were used to model the radial growth process. The investigated growth curve models were those which included the maximum radius and the age at which the radial growth rate is largest as some of the parameters. The maximum growth rate was calculated from the estimated model of each clone. The results indicated that the two clones reach the maximum growth rate at different times. In particular, the two clones reach the maximum growth rates at around 368 and 376 days, respectively. Furthermore, the maximum radius was found to be different for the two clones. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2009. Eucalyptus--Properties. Forest site qualities. Clones. Theses--Statistics.
13	Spatial analysis and efficiency of systematic designs in intercropping experiments. Wandiembe, Symon Peter. January 2002 (has links) In studies involving intercropping plant populations, the main interest is to locate the position of the maximum response or to study the response pattern. Such studies normally require many plant population levels. Thus, designs such as spacing systematic designs that minimise experimental land area are desired. Randomised block designs may not perform well as they allow few population levels which may not span the maximum or enable exploration of other features of the response surface. However, lack of complete randomisation in systematic designs may imply spatial variability (largescale and small-scale variations i.e. trend and spatial dependence) in observations. There is no correct statistical method laid out for data analysis from such designs. Given that spacing systematic designs are not well explored in literature, the main thrusts of this study are two fold; namely, to explore the use of spatial modelling techniques in analysing and modelling data from systematic designs, and to evaluate the efficiency of systematic designs used in intercropping experiments. Three classes of models for trend and error modelling are explored/introduced. These include spatial linear mixed models, semi-parametric mixed models and beta-hat models incorporating spatial variability. The reliability and precision of these methods are demonstrated. Relative efficiency of systematic designs to completely randomised design are evaluated. The analysis of data from systematic designs is shown be easily implemented. Measures of efficiency that include <pp directed measures (A and E criteria), D1 and DB efficiencies for regression parameters, and power are used. Systematic designs are shown to be efficient; on average 72% for A and E- efficiencies and 93% for D1 and DB efficiencies. Overall, these results suggest that systematic designs are suitable and reliable for intercropping plant population studies. / Thesis (M.Sc.) - University of Natal, Pietermaritzburg, 2002 Spatial Analysis (Statistics) Intercropping--Experiments. Plant Communities. Theses--Statistics.
14	Stochastic volatility effects on defaultable bonds. Mkize, Thembisile. January 2009 (has links) We study the eff ects of stochastic volatility of defaultable bonds using the first -passage structural approach. In this approach Black and Cox (1976) argued that default can happen at any time. This then led to the development of afirst-passage model, in which a rm (company) default occurs when its value falls to a barrier. In the first-passage model the rm debt is considered to be a single pure discount bond and default occurs only if the rm value falls below the face value of the bond at maturity. Here the firm's debt can be viewed as a portfolio composed of a risk-free bond and a short-put option on the value of a rm. The classic Black-Scholes-Merton model only considers a single liability and the solvency is tested at the maturity date, while the extended Black-Scholes-Merton model allows for default at any time before maturity to cater for more complex capital structures and was delivered by Geske, Black-Cox, Leland, Leland and Toft and others. In this work a review of the eff ect of stochastic volatility on defaultable bonds is given. In addition a study from the first-passage structural approach and reduced-form approach is made. We also introduce symmetry analysis to study some of the equations that appear in option-pricing models. This approach is quite recent and has produced successful results. In this work we lay the foundation of this method. Keywords: Stochastic Volatility, Defaultable bonds, Lie Symmetries. / Thesis (M.Sc.)-University of KwaZulu-Natal, Westville, 2009. Bond issues. Stochastic processes.
15	Evaluation of strategies to combine multiple biomarkers in diagnostic testing. Mohammed, Muna Balla Elshareef. January 2012 (has links) A challenge in clinical medicine is that of correct diagnosis of disease. Medical researchers invest considerable time and effort to enhance accurate disease diagnosis. Diagnostic tests are important components in modern medical practice. The receiver operating characteristic (ROC) is a commonly used statistical tool for describing the discriminatory accuracy and performance of a diagnostic test. A popular summary index of discriminatory accuracy is the area under ROC curve (AUC). In the era of high-dimensional data, scientists are evaluating hundreds to multiple thousands of biomarkers simultaneously. A critical challenge is the combination of these markers into models that give insight into disease. In infectious disease, markers are often evaluated in the host as well as in the microorganism or virus causing infection, adding more complexity to the analysis. In addition to providing an improved understanding of factors associated with infection and disease development, combinations of relevant markers is important to diagnose and treat disease. Taken together, this presents many novel and major challenges to, and extends the role of, the statistical analyst. In this thesis, we will address the problem of how to select from multiple markers using existing methods. Logistic regression models offer a simple method for combining markers. We applied resampling methods (e.g., Cross-Validation and bootstrap) to adjust for overfitting associated with model selection. We simulated several multivariate models to evaluate the performance of the resampling approaches in this setting. We applied the methods to data collected from a study of tuberculosis immune reconstitution inflammatory syndrome (TB-IRIS) in Cape Town. Baseline levels of five biomarkers were evaluated and we used this dataset to evaluate whether a combination of these biomarkers could accurately discriminate between Tuberculosis Immune Reconstitution Inflammatory Syndrome (TB-IRIS) and non TB-IRIS patients, applying AUC analysis and resampling methods. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2012. Biochemical markers. Drug development.
16	Use of statistical modelling and analyses of malaria rapid diagnostic test outcome in Ethiopia. Ayele, Dawit Getnet. 12 December 2013 (has links) The transmission of malaria is among the leading public health problems in Ethiopia. From the total area of Ethiopia, more than 75% is malarious. Identifying the infectiousness of malaria by socio-economic, demographic and geographic risk factors based on the malaria rapid diagnosis test (RDT) survey results has several advantages for planning, monitoring and controlling, and eventual malaria eradication effort. Such a study requires thorough understanding of the diseases process and associated factors. However such studies are limited. Therefore, the aim of this study was to use different statistical tools suitable to identify socioeconomic, demographic and geographic risk factors of malaria based on the malaria rapid diagnosis test (RDT) survey results in Ethiopia. A total of 224 clusters of about 25 households were selected from the Amhara, Oromiya and Southern Nation Nationalities and People (SNNP) regions of Ethiopia. Accordingly, a number of binary response statistical analysis models were used. Multiple correspondence analysis was carried out to identify the association among socioeconomic, demographic and geographic factors. Moreover a number of binary response models such as survey logistic, GLMM, GLMM with spatial correlation, joint models and semi-parametric models were applied. To test and investigate how well the observed malaria RDT result, use of mosquito nets and use of indoor residual spray data fit the expectations of the model, Rasch model was used. The fitted models have their own strengths and weaknesses. Application of these models was carried out by analysing data on malaria RDT result. The data used in this study, which was conducted from December 2006 to January 2007 by The Carter Center, is from baseline malaria indicator survey in Amhara, Oromiya and Southern Nation Nationalities and People (SNNP) regions of Ethiopia. The correspondence analysis and survey logistic regression model was used to identify predictors which affect malaria RDT results. The effect of identified socioeconomic, demographic and geographic factors were subsequently explored by fitting a generalized linear mixed model (GLMM), i.e., to assess the covariance structures of the random components (to assess the association structure of the data). To examine whether the data displayed any spatial autocorrelation, i.e., whether surveys that are near in space have malaria prevalence or incidence that is similar to the surveys that are far apart, spatial statistics analysis was performed. This was done by introducing spatial autocorrelation structure in GLMM. Moreover, the customary two variables joint modelling approach was extended to three variables joint effect by exploring the joint effect of malaria RDT result, use of mosquito nets and indoor residual spray in the last twelve months. Assessing the association between these outcomes was also of interest. Furthermore, the relationships between the response and some confounding covariates may have unknown functional form. This led to proposing the use of semiparametric additive models which are less restrictive in their specification. Therefore, generalized additive mixed models were used to model the effect of age, family size, number of rooms per person, number of nets per person, altitude and number of months the room sprayed nonparametrically. The result from the study suggests that with the correct use of mosquito nets, indoor residual spraying and other preventative measures, coupled with factors such as the number of rooms in a house, are associated with a decrease in the incidence of malaria as determined by the RDT. However, the study also suggests that the poor are less likely to use these preventative measures to effectively counteract the spread of malaria. In order to determine whether or not the limited number of respondents had undue influence on the malaria RDT result, a Rasch model was used. The result shows that none of the responses had such influences. Therefore, application of the Rasch model has supported the viability of the total sixteen (socio-economic, demographic and geographic) items for measuring malaria RDT result, use of indoor residual spray and use of mosquito nets. From the analysis it can be seen that the scale shows high reliability. Hence, the result from Rasch model supports the analysis carried out in previous models. / Thesis (Ph.D.)-University of KwaZulu-Natal, Pietermaritzburg, 2013. Mathematical statistics. Probabilities. Linear models (Statistics)
17	Statistical modelling of availability of major food cereals in Lesotho : application of regression models and diagnostics. Khoeli, Makhala Bernice. January 2012 (has links) Oftentimes, application of regression models to analyse cereals data is limited to estimating and predicting crop production or yield. The general approach has been to fit the model without much consideration of the problems that accompany application of regression models to real life data, such as collinearity, models not fitting the data correctly and violation of assumptions. These problems may interfere with applicability and usefulness of the models, and compromise validity of results if they are not corrected when fitting the model. We applied regression models and diagnostics on national and household data to model availability of main cereals in Lesotho, namely, maize, sorghum and wheat. The application includes the linear regression model, regression and collinear diagnostics, Box-Cox transformation, ridge regression, quantile regression, logistic regression and its extensions with multiple nominal and ordinal responses. The Linear model with first-order autoregressive process AR(1) was used to determine factors that affected availability of cereals at the national level. Case deletion diagnostics were used to identify extreme observations with influence on different quantities of the fitted regression model, such as estimated parameters, predicted values, and covariance matrix of the estimates. Collinearity diagnostics detected the presence of more than one collinear relationship coexisting in the data set. They also determined variables involved in each relationship, and assessed potential negative impact of collinearity on estimated parameters. Ridge regression remedied collinearity problems by controlling inflation and instability of estimates. The Box-Cox transformation corrected non-constant variance, longer and heavier tails of the distribution of data. These increased applicability and usefulness of the linear models in modeling availability of cereals. Quantile regression, as a robust regression, was applied to the household data as an alternative to classical regression. Classical regression estimates from ordinary least squares method are sensitive to distributions with longer and heavier tails than the normal distribution, as well as to outliers. Quantile regression estimates appear to be more efficient than least squares estimates for a wide range of error term distribution. We studied availability of cereals further by categorizing households according to availability of different cereals, and applied the logistic regression model and its extensions. Logistic regression was applied to model availability and non-availability of cereals. Multinomial logistic regression was applied to model availability with nominal multiple categories. Ordinal logistic regression was applied to model availability with ordinal categories and this made full use of available information. The three variants of logistic regression model gave results that are in agreement, which are also in agreement with the results from the linear regression model and quantile regression model. / Thesis (Ph.D.)-University of KwaZulu-Natal, Durban, 2012. Mathematical statistics. Probabilities. Linear models (Statistics)
18	Some statistical aspects of LULU smoothers Jankowitz, Maria Dorothea 12 1900 (has links) Thesis (PhD (Statistics and Actuarial Science))--University of Stellenbosch, 2007. / The smoothing of time series plays a very important role in various practical applications. Estimating the signal and removing the noise is the main goal of smoothing. Traditionally linear smoothers were used, but nonlinear smoothers became more popular through the years. From the family of nonlinear smoothers, the class of median smoothers, based on order statistics, is the most popular. A new class of nonlinear smoothers, called LULU smoothers, was developed by using the minimum and maximum selectors. These smoothers have very attractive mathematical properties. In this thesis their statistical properties are investigated and compared to that of the class of median smoothers. Smoothing, together with related concepts, are discussed in general. Thereafter, the class of median smoothers, from the literature is discussed. The class of LULU smoothers is defined, their properties are explained and new contributions are made. The compound LULU smoother is introduced and its property of variation decomposition is discussed. The probability distributions of some LULUsmoothers with independent data are derived. LULU smoothers and median smoothers are compared according to the properties of monotonicity, idempotency, co-idempotency, stability, edge preservation, output distributions and variation decomposition. A comparison is made of their respective abilities for signal recovery by means of simulations. The success of the smoothers in recovering the signal is measured by the integrated mean square error and the regression coefficient calculated from the least squares regression of the smoothed sequence on the signal. Finally, LULU smoothers are practically applied. Smoothing (Statistics) Estimation theory
19	Statistical inference for inequality measures based on semi-parametric estimators Kpanzou, Tchilabalo Abozou 12 1900 (has links) Thesis (PhD)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Measures of inequality, also used as measures of concentration or diversity, are very popular in economics and especially in measuring the inequality in income or wealth within a population and between populations. However, they have applications in many other fields, e.g. in ecology, linguistics, sociology, demography, epidemiology and information science. A large number of measures have been proposed to measure inequality. Examples include the Gini index, the generalized entropy, the Atkinson and the quintile share ratio measures. Inequality measures are inherently dependent on the tails of the population (underlying distribution) and therefore their estimators are typically sensitive to data from these tails (nonrobust). For example, income distributions often exhibit a long tail to the right, leading to the frequent occurrence of large values in samples. Since the usual estimators are based on the empirical distribution function, they are usually nonrobust to such large values. Furthermore, heavy-tailed distributions often occur in real life data sets, remedial action therefore needs to be taken in such cases. The remedial action can be either a trimming of the extreme data or a modification of the (traditional) estimator to make it more robust to extreme observations. In this thesis we follow the second option, modifying the traditional empirical distribution function as estimator to make it more robust. Using results from extreme value theory, we develop more reliable distribution estimators in a semi-parametric setting. These new estimators of the distribution then form the basis for more robust estimators of the measures of inequality. These estimators are developed for the four most popular classes of measures, viz. Gini, generalized entropy, Atkinson and quintile share ratio. Properties of such estimators are studied especially via simulation. Using limiting distribution theory and the bootstrap methodology, approximate confidence intervals were derived. Through the various simulation studies, the proposed estimators are compared to the standard ones in terms of mean squared error, relative impact of contamination, confidence interval length and coverage probability. In these studies the semi-parametric methods show a clear improvement over the standard ones. The theoretical properties of the quintile share ratio have not been studied much. Consequently, we also derive its influence function as well as the limiting normal distribution of its nonparametric estimator. These results have not previously been published. In order to illustrate the methods developed, we apply them to a number of real life data sets. Using such data sets, we show how the methods can be used in practice for inference. In order to choose between the candidate parametric distributions, use is made of a measure of sample representativeness from the literature. These illustrations show that the proposed methods can be used to reach satisfactory conclusions in real life problems. / AFRIKAANSE OPSOMMING: Maatstawwe van ongelykheid, wat ook gebruik word as maatstawwe van konsentrasie of diversiteit, is baie populêr in ekonomie en veral vir die kwantifisering van ongelykheid in inkomste of welvaart binne ’n populasie en tussen populasies. Hulle het egter ook toepassings in baie ander dissiplines, byvoorbeeld ekologie, linguistiek, sosiologie, demografie, epidemiologie en inligtingskunde. Daar bestaan reeds verskeie maatstawwe vir die meet van ongelykheid. Voorbeelde sluit in die Gini indeks, die veralgemeende entropie maatstaf, die Atkinson maatstaf en die kwintiel aandeel verhouding. Maatstawwe van ongelykheid is inherent afhanklik van die sterte van die populasie (onderliggende verdeling) en beramers daarvoor is tipies dus sensitief vir data uit sodanige sterte (nierobuust). Inkomste verdelings het byvoorbeeld dikwels lang regtersterte, wat kan lei tot die voorkoms van groot waardes in steekproewe. Die tradisionele beramers is gebaseer op die empiriese verdelingsfunksie, en hulle is gewoonlik dus nierobuust teenoor sodanige groot waardes nie. Aangesien swaarstert verdelings dikwels voorkom in werklike data, moet regstellings gemaak word in sulke gevalle. Hierdie regstellings kan bestaan uit of die afknip van ekstreme data of die aanpassing van tradisionele beramers om hulle meer robuust te maak teen ekstreme waardes. In hierdie tesis word die tweede opsie gevolg deurdat die tradisionele empiriese verdelingsfunksie as beramer aangepas word om dit meer robuust te maak. Deur gebruik te maak van resultate van ekstreemwaardeteorie, word meer betroubare beramers vir verdelings ontwikkel in ’n semi-parametriese opset. Hierdie nuwe beramers van die verdeling vorm dan die basis vir meer robuuste beramers van maatstawwe van ongelykheid. Hierdie beramers word ontwikkel vir die vier mees populêre klasse van maatstawwe, naamlik Gini, veralgemeende entropie, Atkinson en kwintiel aandeel verhouding. Eienskappe van hierdie beramers word bestudeer, veral met behulp van simulasie studies. Benaderde vertrouensintervalle word ontwikkel deur gebruik te maak van limietverdelingsteorie en die skoenlus metodologie. Die voorgestelde beramers word vergelyk met tradisionele beramers deur middel van verskeie simulasie studies. Die vergelyking word gedoen in terme van gemiddelde kwadraat fout, relatiewe impak van kontaminasie, vertrouensinterval lengte en oordekkingswaarskynlikheid. In hierdie studies toon die semi-parametriese metodes ’n duidelike verbetering teenoor die tradisionele metodes. Die kwintiel aandeel verhouding se teoretiese eienskappe het nog nie veel aandag in die literatuur geniet nie. Gevolglik lei ons die invloedfunksie asook die asimptotiese verdeling van die nie-parametriese beramer daarvoor af. Ten einde die metodes wat ontwikkel is te illustreer, word dit toegepas op ’n aantal werklike datastelle. Hierdie toepassings toon hoe die metodes gebruik kan word vir inferensie in die praktyk. ’n Metode in die literatuur vir steekproefverteenwoordiging word voorgestel en gebruik om ’n keuse tussen die kandidaat parametriese verdelings te maak. Hierdie voorbeelde toon dat die voorgestelde metodes met vrug gebruik kan word om bevredigende gevolgtrekkings in die praktyk te maak. Extreme value theory Semi-parametric estimation Confidence intervals
20	Aspects of model development using regression quantiles and elemental regressions Ranganai, Edmore 03 1900 (has links) Dissertation (PhD)--University of Stellenbosch, 2007. / ENGLISH ABSTRACT: It is well known that ordinary least squares (OLS) procedures are sensitive to deviations from the classical Gaussian assumptions (outliers) as well as data aberrations in the design space. The two major data aberrations in the design space are collinearity and high leverage. Leverage points can also induce or hide collinearity in the design space. Such leverage points are referred to as collinearity influential points. As a consequence, over the years, many diagnostic tools to detect these anomalies as well as alternative procedures to counter them were developed. To counter deviations from the classical Gaussian assumptions many robust procedures have been proposed. One such class of procedures is the Koenker and Bassett (1978) Regressions Quantiles (RQs), which are natural extensions of order statistics, to the linear model. RQs can be found as solutions to linear programming problems (LPs). The basic optimal solutions to these LPs (which are RQs) correspond to elemental subset (ES) regressions, which consist of subsets of minimum size to estimate the necessary parameters of the model. On the one hand, some ESs correspond to RQs. On the other hand, in the literature it is shown that many OLS statistics (estimators) are related to ES regression statistics (estimators). Therefore there is an inherent relationship amongst the three sets of procedures. The relationship between the ES procedure and the RQ one, has been noted almost “casually” in the literature while the latter has been fairly widely explored. Using these existing relationships between the ES procedure and the OLS one as well as new ones, collinearity, leverage and outlier problems in the RQ scenario were investigated. Also, a lasso procedure was proposed as variable selection technique in the RQ scenario and some tentative results were given for it. These results are promising. Single case diagnostics were considered as well as their relationships to multiple case ones. In particular, multiple cases of the minimum size to estimate the necessary parameters of the model, were considered, corresponding to a RQ (ES). In this way regression diagnostics were developed for both ESs and RQs. The main problems that affect RQs adversely are collinearity and leverage due to the nature of the computational procedures and the fact that RQs’ influence functions are unbounded in the design space but bounded in the response variable. As a consequence of this, RQs have a high affinity for leverage points and a high exclusion rate of outliers. The influential picture exhibited in the presence of both leverage points and outliers is the net result of these two antagonistic forces. Although RQs are bounded in the response variable (and therefore fairly robust to outliers), outlier diagnostics were also considered in order to have a more holistic picture. The investigations used comprised analytic means as well as simulation. Furthermore, applications were made to artificial computer generated data sets as well as standard data sets from the literature. These revealed that the ES based statistics can be used to address problems arising in the RQ scenario to some degree of success. However, due to the interdependence between the different aspects, viz. the one between leverage and collinearity and the one between leverage and outliers, “solutions” are often dependent on the particular situation. In spite of this complexity, the research did produce some fairly general guidelines that can be fruitfully used in practice. / AFRIKAANSE OPSOMMING: Dit is bekend dat die gewone kleinste kwadraat (KK) prosedures sensitief is vir afwykings vanaf die klassieke Gaussiese aannames (uitskieters) asook vir data afwykings in die ontwerpruimte. Twee tipes afwykings van belang in laasgenoemde geval, is kollinearitiet en punte met hoë hefboom waarde. Laasgenoemde punte kan ook kollineariteit induseer of versteek in die ontwerp. Na sodanige punte word verwys as kollinêre hefboom punte. Oor die jare is baie diagnostiese hulpmiddels ontwikkel om hierdie afwykings te identifiseer en om alternatiewe prosedures daarteen te ontwikkel. Om afwykings vanaf die Gaussiese aanname teen te werk, is heelwat robuuste prosedures ontwikkel. Een sodanige klas van prosedures is die Koenker en Bassett (1978) Regressie Kwantiele (RKe), wat natuurlike uitbreidings is van rangorde statistieke na die lineêre model. RKe kan bepaal word as oplossings van lineêre programmeringsprobleme (LPs). Die basiese optimale oplossings van hierdie LPs (wat RKe is) kom ooreen met die elementale deelversameling (ED) regressies, wat bestaan uit deelversamelings van minimum grootte waarmee die parameters van die model beraam kan word. Enersyds geld dat sekere EDs ooreenkom met RKe. Andersyds, uit die literatuur is dit bekend dat baie KK statistieke (beramers) verwant is aan ED regressie statistieke (beramers). Dit impliseer dat daar dus ‘n inherente verwantskap is tussen die drie klasse van prosedures. Die verwantskap tussen die ED en die ooreenkomstige RK prosedures is redelik “terloops” van melding gemaak in die literatuur, terwyl laasgenoemde prosedures redelik breedvoerig ondersoek is. Deur gebruik te maak van bestaande verwantskappe tussen ED en KK prosedures, sowel as nuwes wat ontwikkel is, is kollineariteit, punte met hoë hefboom waardes en uitskieter probleme in die RK omgewing ondersoek. Voorts is ‘n lasso prosedure as veranderlike seleksie tegniek voorgestel in die RK situasie en is enkele tentatiewe resultate daarvoor gegee. Hierdie resultate blyk belowend te wees, veral ook vir verdere navorsing. Enkel geval diagnostiese tegnieke is beskou sowel as hul verwantskap met meervoudige geval tegnieke. In die besonder is veral meervoudige gevalle beskou wat van minimum grootte is om die parameters van die model te kan beraam, en wat ooreenkom met ‘n RK (ED). Met sodanige benadering is regressie diagnostiese tegnieke ontwikkel vir beide EDs en RKe. Die belangrikste probleme wat RKe negatief beinvloed, is kollineariteit en punte met hoë hefboom waardes agv die aard van die berekeningsprosedures en die feit dat RKe se invloedfunksies begrensd is in die ruimte van die afhanklike veranderlike, maar onbegrensd is in die ontwerpruimte. Gevolglik het RKe ‘n hoë affiniteit vir punte met hoë hefboom waardes en poog gewoonlik om uitskieters uit te sluit. Die finale uitset wat verkry word wanneer beide punte met hoë hefboom waardes en uitskieters voorkom, is dan die netto resultaat van hierdie twee teenstrydige pogings. Alhoewel RKe begrensd is in die onafhanklike veranderlike (en dus redelik robuust is tov uitskieters), is uitskieter diagnostiese tegnieke ook beskou om ‘n meer holistiese beeld te verkry. Die ondersoek het analitiese sowel as simulasie tegnieke gebruik. Voorts is ook gebruik gemaak van kunsmatige datastelle en standard datastelle uit die literatuur. Hierdie ondersoeke het getoon dat die ED gebaseerde statistieke met ‘n redelike mate van sukses gebruik kan word om probleme in die RK omgewing aan te spreek. Dit is egter belangrik om daarop te let dat as gevolg van die interafhanklikheid tussen kollineariteit en punte met hoë hefboom waardes asook dié tussen punte met hoë hefboom waardes en uitskieters, “oplossings” dikwels afhanklik is van die bepaalde situasie. Ten spyte van hierdie kompleksiteit, is op grond van die navorsing wat gedoen is, tog redelike algemene riglyne verkry wat nuttig in die praktyk gebruik kan word. Regression analysis Least squares Estimation theory

Search results