Global ETD Search

21	Improved estimation procedures for a positive extreme value index Berning, Thomas Louw 12 1900 (has links) Thesis (PhD (Statistics))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: In extreme value theory (EVT) the emphasis is on extreme (very small or very large) observations. The crucial parameter when making inferences about extreme quantiles, is called the extreme value index (EVI). This thesis concentrates on only the right tail of the underlying distribution (extremely large observations), and specifically situations where the EVI is assumed to be positive. A positive EVI indicates that the underlying distribution of the data has a heavy right tail, as is the case with, for example, insurance claims data. There are numerous areas of application of EVT, since there are a vast number of situations in which one would be interested in predicting extreme events accurately. Accurate prediction requires accurate estimation of the EVI, which has received ample attention in the literature from a theoretical as well as practical point of view. Countless estimators of the EVI exist in the literature, but the practitioner has little information on how these estimators compare. An extensive simulation study was designed and conducted to compare the performance of a wide range of estimators, over a wide range of sample sizes and distributions. A new procedure for the estimation of a positive EVI was developed, based on fitting the perturbed Pareto distribution (PPD) to observations above a threshold, using Bayesian methodology. Attention was also given to the development of a threshold selection technique. One of the major contributions of this thesis is a measure which quantifies the stability (or rather instability) of estimates across a range of thresholds. This measure can be used to objectively obtain the range of thresholds over which the estimates are most stable. It is this measure which is used for the purpose of threshold selection for the proposed PPD estimator. A case study of five insurance claims data sets illustrates how data sets can be analyzed in practice. It is shown to what extent discretion can/should be applied, as well as how different estimators can be used in a complementary fashion to give more insight into the nature of the data and the extreme tail of the underlying distribution. The analysis is carried out from the point of raw data, to the construction of tables which can be used directly to gauge the risk of the insurance portfolio over a given time frame. / AFRIKAANSE OPSOMMING: Die veld van ekstreemwaardeteorie (EVT) is bemoeid met ekstreme (baie klein of baie groot) waarnemings. Die parameter wat deurslaggewend is wanneer inferensies aangaande ekstreme kwantiele ter sprake is, is die sogenaamde ekstreemwaarde-indeks (EVI). Hierdie verhandeling konsentreer op slegs die regterstert van die onderliggende verdeling (baie groot waarnemings), en meer spesifiek, op situasies waar aanvaar word dat die EVI positief is. ’n Positiewe EVI dui aan dat die onderliggende verdeling ’n swaar regterstert het, wat byvoorbeeld die geval is by versekeringseis data. Daar is verskeie velde waar EVT toegepas word, aangesien daar ’n groot aantal situasies is waarin mens sou belangstel om ekstreme gebeurtenisse akkuraat te voorspel. Akkurate voorspelling vereis die akkurate beraming van die EVI, wat reeds ruim aandag in die literatuur geniet het, uit beide teoretiese en praktiese oogpunte. ’n Groot aantal beramers van die EVI bestaan in die literatuur, maar enige persoon wat die toepassing van EVT in die praktyk beoog, het min inligting oor hoe hierdie beramers met mekaar vergelyk. ’n Uitgebreide simulasiestudie is ontwerp en uitgevoer om die akkuraatheid van beraming van ’n groot verskeidenheid van beramers in die literatuur te vergelyk. Die studie sluit ’n groot verskeidenheid van steekproefgroottes en onderliggende verdelings in. ’n Nuwe prosedure vir die beraming van ’n positiewe EVI is ontwikkel, gebaseer op die passing van die gesteurde Pareto verdeling (PPD) aan waarnemings wat ’n gegewe drempel oorskrei, deur van Bayes tegnieke gebruik te maak. Aandag is ook geskenk aan die ontwikkeling van ’n drempelseleksiemetode. Een van die hoofbydraes van hierdie verhandeling is ’n maatstaf wat die stabiliteit (of eerder onstabiliteit) van beramings oor verskeie drempels kwantifiseer. Hierdie maatstaf bied ’n objektiewe manier om ’n gebied (versameling van drempelwaardes) te verkry waaroor die beramings die stabielste is. Dit is hierdie maatstaf wat gebruik word om drempelseleksie te doen in die geval van die PPD beramer. ’n Gevallestudie van vyf stelle data van versekeringseise demonstreer hoe data in die praktyk geanaliseer kan word. Daar word getoon tot watter mate diskresie toegepas kan/moet word, asook hoe verskillende beramers op ’n komplementêre wyse ingespan kan word om meer insig te verkry met betrekking tot die aard van die data en die stert van die onderliggende verdeling. Die analise word uitgevoer vanaf die punt waar slegs rou data beskikbaar is, tot op die punt waar tabelle saamgestel is wat direk gebruik kan word om die risiko van die versekeringsportefeulje te bepaal oor ’n gegewe periode. Extreme value index (EVI) Extreme value theory (EVT)
22	Edgeworth-corrected small-sample confidence intervals for ratio parameters in linear regression Binyavanga, Kamanzi-wa 03 1900 (has links) Dissertation (PhD)--Stellenbosch University, 2002. / ENGLISH ABSTRACT: In this thesis we construct a central confidence interval for a smooth scalar non-linear function of parameter vector f3 in a single general linear regression model Y = X f3 + c. We do this by first developing an Edgeworth expansion for the distribution function of a standardised point estimator. The confidence interval is then constructed in the manner discussed. Simulation studies reported at the end of the thesis show the interval to perform well in many small-sample situations. Central to the development of the Edgeworth expansion is our use of the index notation which, in statistics, has been popularised by McCullagh (1984, 1987). The contributions made in this thesis are of two kinds. We revisit the complex McCullagh Index Notation, modify and extend it in certain respects as well as repackage it in the manner that is more accessible to other researchers. On the new contributions, in addition to the introduction of a new small-sample confidence interval, we extend the theory of stochastic polynomials (SP) in three respects. A method, which we believe to be the simplest and most transparent to date, is proposed for deriving cumulants for these. Secondly, the theory of the cumulants of the SP is developed both in the context of Edgeworth expansion as well as in the regression setting. Thirdly, our new method enables us to propose a natural alternative to the method of Hall (1992a, 1992b) regarding skewness-reduction in Edgeworth expansions. / AFRIKAANSE OPSOMMING: In hierdie proefskrif word daar aandag gegee aan die konstruksie van 'n sentrale vertrouensinterval vir 'n gladde skalare nie-lineêre funksie van die parametervektor (3 in 'n enkele algemene lineêre regressiemodel y = X (3 + e.. Dit behels eerstens die ontwikkeling van 'n Edgeworth uitbreiding vir die verdelingsfunksie van 'n gestandaardiseerde puntberamer. Die vertrouensinterval word dan op grond van hierdie uitbreiding gekonstrueer. Simulasiestudies wat aan die einde van die proefskrif gerapporteer word, toon dat die voorgestelde interval goed vertoon in verskeie klein-steekproef gevalle. Die gebruik van indeksnotasie, wat in die statistiek deur McCullagh (1984, 1987) bekendgestel is, speel 'n sentrale rol in die ontwikkeling van die Edgeworth uitbreiding. Die bydrae wat in hierdie proefskrif gemaak word, is van 'n tweërlei aard. Die ingewikkelde Indeksnotasie van McCullagh word ondersoek, aangepas en ten opsigte van sekere aspekte uitgebrei. Die notasie word ook aangebied in 'n vorm wat dit hopelik meer toeganklik sal maak vir ander navorsers. Betreffende die bydrae wat gemaak word, word 'n nuwe klein-steekproef vertrouensinterval voorgestel, en word die teorie van stogastiese polinome (SP) ook in drie opsigte uitgebrei. 'n Metode word voorgestelom die kumulante van SP'e af te lei. Ons glo dat hierdie metode die duidelikste en eenvoudigste metode is wat tot dusver hiervoor voorgestel is. Tweedens word die teorie van die kumulante van SP'e ontwikkel binne die konteks van Edgeworth uitbreidings, sowel as die konteks van regressie. Derdens stelons nuwe metode ons in staat om 'n natuurlike alternatief voor te stel vir die metode van Hall (1992a, 1992b) vir die vermindering van skeefheid in Edgeworth uitbreidings. Confidence intervals Regression analysis Edgeworth expansions
23	Influential data cases when the C-p criterion is used for variable selection in multiple linear regression Uys, Daniel Wilhelm January 2003 (has links) Dissertation (PhD)--Stellenbosch University, 2003. / ENGLISH ABSTRACT: In this dissertation we study the influence of data cases when the Cp criterion of Mallows (1973) is used for variable selection in multiple linear regression. The influence is investigated in terms of the predictive power and the predictor variables included in the resulting model when variable selection is applied. In particular, we focus on the importance of identifying and dealing with these so called selection influential data cases before model selection and fitting are performed. For this purpose we develop two new selection influence measures, both based on the Cp criterion. The first measure is specifically developed to identify individual selection influential data cases, whereas the second identifies subsets of selection influential data cases. The success with which these influence measures identify selection influential data cases, is evaluated in example data sets and in simulation. All results are derived in the coordinate free context, with special application in multiple linear regression. / AFRIKAANSE OPSOMMING: Invloedryke waarnemings as die C-p kriterium vir veranderlike seleksie in meervoudigelineêre regressie gebruik word: In hierdie proefskrif ondersoek ons die invloed van waarnemings as die Cp kriterium van Mallows (1973) vir veranderlike seleksie in meervoudige lineêre regressie gebruik word. Die invloed van waarnemings op die voorspellingskrag en die onafhanklike veranderlikes wat ingesluit word in die finale geselekteerde model, word ondersoek. In besonder fokus ons op die belangrikheid van identifisering van en handeling met sogenaamde seleksie invloedryke waarnemings voordat model seleksie en passing gedoen word. Vir hierdie doel word twee nuwe invloedsmaatstawwe, albei gebaseer op die Cp kriterium, ontwikkel. Die eerste maatstaf is spesifiek ontwikkelom die invloed van individuele waarnemings te meet, terwyl die tweede die invloed van deelversamelings van waarnemings op die seleksie proses meet. Die sukses waarmee hierdie invloedsmaatstawwe seleksie invloedryke waarnemings identifiseer word beoordeel in voorbeeld datastelle en in simulasie. Alle resultate word afgelei binne die koërdinaatvrye konteks, met spesiale toepassing in meervoudige lineêre regressie. Regression analysis C-p criterion Variable selection
24	Evaluating the properties of sensory tests using computer intensive and biplot methodologies Meintjes, M. M. (Maria Magdalena) 03 1900 (has links) Assignment (MComm)--University of Stellenbosch, 2007. / ENGLISH ABSTRACT: This study is the result of part-time work done at a product development centre. The organisation extensively makes use of trained panels in sensory trials designed to asses the quality of its product. Although standard statistical procedures are used for analysing the results arising from these trials, circumstances necessitate deviations from the prescribed protocols. Therefore the validity of conclusions drawn as a result of these testing procedures might be questionable. This assignment deals with these questions. Sensory trials are vital in the development of new products, control of quality levels and the exploration of improvement in current products. Standard test procedures used to explore such questions exist but are in practice often implemented by investigators who have little or no statistical background. Thus test methods are implemented as black boxes and procedures are used blindly without checking all the appropriate assumptions and other statistical requirements. The specific product under consideration often warrants certain modifications to the standard methodology. These changes may have some unknown effect on the obtained results and therefore should be scrutinized to ensure that the results remain valid. The aim of this study is to investigate the distribution and other characteristics of sensory data, comparing the hypothesised, observed and bootstrap distributions. Furthermore, the standard testing methods used to analyse sensory data sets will be evaluated. After comparing these methods, alternative testing methods may be introduced and then tested using newly generated data sets. Graphical displays are also useful to get an overall impression of the data under consideration. Biplots are especially useful in the investigation of multivariate sensory data. The underlying relationships among attributes and their combined effect on the panellists’ decisions can be visually investigated by constructing a biplot. Results obtained by implementing biplot methods are compared to those of sensory tests, i.e. whether a significant difference between objects will correspond to large distances between the points representing objects in the display. In conclusion some recommendations are made as to how the organisation under consideration should implement sensory procedures in future trials. However, these proposals are preliminary and further research is necessary before final adoption. Some issues for further investigation are suggested. / AFRIKAANSE OPSOMMING: Hierdie studie spruit uit deeltydse werk by ’n produk-ontwikkeling-sentrum. Die organisasie maak in al hul sensoriese proewe rakende die kwaliteit van hul produkte op groot skaal gebruik van opgeleide panele. Alhoewel standaard prosedures ingespan word om die resultate te analiseer, noodsaak sekere omstandighede dat die voorgeskrewe protokol in ’n aangepaste vorm geïmplementeer word. Dié aanpassings mag meebring dat gevolgtrekkings gebaseer op resultate ongeldig is. Hierdie werkstuk ondersoek bogenoemde probleem. Sensoriese proewe is noodsaaklik in kwaliteitbeheer, die verbetering van bestaande produkte, asook die ontwikkeling van nuwe produkte. Daar bestaan standaard toets- prosedures om vraagstukke te verken, maar dié word dikwels toegepas deur navorsers met min of geen statistiese kennis. Dit lei daartoe dat toetsprosedures blindelings geïmplementeer en resultate geïnterpreteer word sonder om die nodige aannames en ander statistiese vereistes na te gaan. Alhoewel ’n spesifieke produk die wysiging van die standaard metode kan regverdig, kan hierdie veranderinge ’n groot invloed op die resultate hê. Dus moet die geldigheid van die resultate noukeurig ondersoek word. Die doel van hierdie studie is om die verdeling sowel as ander eienskappe van sensoriese data te bestudeer, deur die verdeling onder die nulhipotese sowel as die waargenome- en skoenlusverdelings te beskou. Verder geniet die standaard toetsprosedure, tans in gebruik om sensoriese data te analiseer, ook aandag. Na afloop hiervan word alternatiewe toetsprosedures voorgestel en dié geëvalueer op nuut gegenereerde datastelle. Grafiese voorstellings is ook nuttig om ’n geheelbeeld te kry van die data onder bespreking. Bistippings is veral handig om meerdimensionele sensoriese data te bestudeer. Die onderliggende verband tussen die kenmerke van ’n produk sowel as hul gekombineerde effek op ’n paneel se besluit, kan hierdeur visueel ondersoek word. Resultate verkry in die voorstellings word vergelyk met dié van sensoriese toetsprosedures om vas te stel of statisties betekenisvolle verskille in ’n produk korrespondeer met groot afstande tussen die relevante punte in die bistippingsvoorstelling. Ten slotte word sekere aanbevelings rakende die implementering van sensoriese proewe in die toekoms aan die betrokke organisasie gemaak. Hierdie aanbevelings word gemaak op grond van die voorafgaande ondersoeke, maar verdere navorsing is nodig voor die finale aanvaarding daarvan. Waar moontlik, word voorstelle vir verdere ondersoeke gedoen.
25	Estimating the window period and incidence of recently infected HIV patients. Du Toit, Cari 03 1900 (has links) Thesis (MComm (Statistics and Actuarial Science))--University of Stellenbosch, 2009. / Incidence can be defined as the rate of occurence of new infections of a disease like HIV and is an useful estimate of trends in the epidemic. Annualised incidence can be expressed as a proportion, namely the number of recent infections per year divided by the number of people at risk of infection. This number of recent infections is dependent on the window period, which is basically the period of time from seroconversion to being classified as a long-term infection for the first time. The BED capture enzyme immunoassay was developed to provide a way to distinguish between recent and long-term infections. An optical density (OD) measurement is obtained from this assay. Window period is defined as the number of days since seroconversion, with a baseline OD value of 0, 0476 to the number of days to reach an optical density of 0, 8.The aim of this study is to describe different techniques to estimate the window period which may subsequently lead to alternative estimates of annualised incidence of HIV infection. These various techniques are applied to different subsets of the Zimbabwe Vitamin A for Mothers and Babies (ZVITAMBO) dataset. Three different approaches are described to analyse window periods: a non-parametric survival analysis approach, the fitting of a general linear mixed model in a longitudinal data setting and a Bayesian approach of assigning probability distributions to the parameters of interest. These techniques are applied to different subsets and transformations of the data and the estimated mean and median window periods are obtained and utilised in the calculation of incidence. Window period HIV infections HIV infections -- Statistical methods
26	A comparison of support vector machines and traditional techniques for statistical regression and classification Hechter, Trudie 04 1900 (has links) Thesis (MComm)--Stellenbosch University, 2004. / ENGLISH ABSTRACT: Since its introduction in Boser et al. (1992), the support vector machine has become a popular tool in a variety of machine learning applications. More recently, the support vector machine has also been receiving increasing attention in the statistical community as a tool for classification and regression. In this thesis support vector machines are compared to more traditional techniques for statistical classification and regression. The techniques are applied to data from a life assurance environment for a binary classification problem and a regression problem. In the classification case the problem is the prediction of policy lapses using a variety of input variables, while in the regression case the goal is to estimate the income of clients from these variables. The performance of the support vector machine is compared to that of discriminant analysis and classification trees in the case of classification, and to that of multiple linear regression and regression trees in regression, and it is found that support vector machines generally perform well compared to the traditional techniques. / AFRIKAANSE OPSOMMING: Sedert die bekendstelling van die ondersteuningspuntalgoritme in Boser et al. (1992), het dit 'n populêre tegniek in 'n verskeidenheid masjienleerteorie applikasies geword. Meer onlangs het die ondersteuningspuntalgoritme ook meer aandag in die statistiese gemeenskap begin geniet as 'n tegniek vir klassifikasie en regressie. In hierdie tesis word ondersteuningspuntalgoritmes vergelyk met meer tradisionele tegnieke vir statistiese klassifikasie en regressie. Die tegnieke word toegepas op data uit 'n lewensversekeringomgewing vir 'n binêre klassifikasie probleem sowel as 'n regressie probleem. In die klassifikasiegeval is die probleem die voorspelling van polisvervallings deur 'n verskeidenheid invoer veranderlikes te gebruik, terwyl in die regressiegeval gepoog word om die inkomste van kliënte met behulp van hierdie veranderlikes te voorspel. Die resultate van die ondersteuningspuntalgoritme word met dié van diskriminant analise en klassifikasiebome vergelyk in die klassifikasiegeval, en met veelvoudige linêere regressie en regressiebome in die regressiegeval. Die gevolgtrekking is dat ondersteuningspuntalgoritmes oor die algemeen goed vaar in vergelyking met die tradisionele tegnieke. Machine learning Regression analysis
27	Empirical Bayes estimation of the extreme value index in an ANOVA setting Jordaan, Aletta Gertruida 04 1900 (has links) Thesis (MComm)-- Stellenbosch University, 2014. / ENGLISH ABSTRACT: Extreme value theory (EVT) involves the development of statistical models and techniques in order to describe and model extreme events. In order to make inferences about extreme quantiles, it is necessary to estimate the extreme value index (EVI). Numerous estimators of the EVI exist in the literature. However, these estimators are only applicable in the single sample setting. The aim of this study is to obtain an improved estimator of the EVI that is applicable to an ANOVA setting. An ANOVA setting lends itself naturally to empirical Bayes (EB) estimators, which are the main estimators under consideration in this study. EB estimators have not received much attention in the literature. The study begins with a literature study, covering the areas of application of EVT, Bayesian theory and EB theory. Different estimation methods of the EVI are discussed, focusing also on possible methods of determining the optimal threshold. Specifically, two adaptive methods of threshold selection are considered. A simulation study is carried out to compare the performance of different estimation methods, applied only in the single sample setting. First order and second order estimation methods are considered. In the case of second order estimation, possible methods of estimating the second order parameter are also explored. With regards to obtaining an estimator that is applicable to an ANOVA setting, a first order EB estimator and a second order EB estimator of the EVI are derived. A case study of five insurance claims portfolios is used to examine whether the two EB estimators improve the accuracy of estimating the EVI, when compared to viewing the portfolios in isolation. The results showed that the first order EB estimator performed better than the Hill estimator. However, the second order EB estimator did not perform better than the “benchmark” second order estimator, namely fitting the perturbed Pareto distribution to all observations above a pre-determined threshold by means of maximum likelihood estimation. / AFRIKAANSE OPSOMMING: Ekstreemwaardeteorie (EWT) behels die ontwikkeling van statistiese modelle en tegnieke wat gebruik word om ekstreme gebeurtenisse te beskryf en te modelleer. Ten einde inferensies aangaande ekstreem kwantiele te maak, is dit nodig om die ekstreem waarde indeks (EWI) te beraam. Daar bestaan talle beramers van die EWI in die literatuur. Hierdie beramers is egter slegs van toepassing in die enkele steekproef geval. Die doel van hierdie studie is om ’n meer akkurate beramer van die EWI te verkry wat van toepassing is in ’n ANOVA opset. ’n ANOVA opset leen homself tot die gebruik van empiriese Bayes (EB) beramers, wat die fokus van hierdie studie sal wees. Hierdie beramers is nog nie in literatuur ondersoek nie. Die studie begin met ’n literatuurstudie, wat die areas van toepassing vir EWT, Bayes teorie en EB teorie insluit. Verskillende metodes van EWI beraming word bespreek, insluitend ’n bespreking oor hoe die optimale drempel bepaal kan word. Spesifiek word twee aanpasbare metodes van drempelseleksie beskou. ’n Simulasiestudie is uitgevoer om die akkuraatheid van beraming van verskillende beramingsmetodes te vergelyk, in die enkele steekproef geval. Eerste orde en tweede orde beramingsmetodes word beskou. In die geval van tweede orde beraming, word moontlike beramingsmetodes van die tweede orde parameter ook ondersoek. ’n Eerste orde en ’n tweede orde EB beramer van die EWI is afgelei met die doel om ’n beramer te kry wat van toepassing is vir die ANAVA opset. ’n Gevallestudie van vyf versekeringsportefeuljes word gebruik om ondersoek in te stel of die twee EB beramers die akkuraatheid van beraming van die EWI verbeter, in vergelyking met die EWI beramers wat verkry word deur die portefeuljes afsonderlik te ontleed. Die resultate toon dat die eerste orde EB beramer beter gevaar het as die Hill beramer. Die tweede orde EB beramer het egter slegter gevaar as die tweede orde beramer wat gebruik is as maatstaf, naamlik die passing van die gesteurde Pareto verdeling (PPD) aan alle waarnemings bo ’n gegewe drempel, met behulp van maksimum aanneemlikheidsberaming. Extreme Value Theory Analysis of variance UCTD
28	Nearest hypersphere classification : a comparison with other classification techniques Van der Westhuizen, Cornelius Stephanus 12 1900 (has links) Thesis (MCom)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: Classification is a widely used statistical procedure to classify objects into two or more classes according to some rule which is based on the input variables. Examples of such techniques are Linear and Quadratic Discriminant Analysis (LDA and QDA). However, classification of objects with these methods can get complicated when the number of input variables in the data become too large (􀝊 ≪ 􀝌), when the assumption of normality is no longer met or when classes are not linearly separable. Vapnik et al. (1995) introduced the Support Vector Machine (SVM), a kernel-based technique, which can perform classification in cases where LDA and QDA are not valid. SVM makes use of an optimal separating hyperplane and a kernel function to derive a rule which can be used for classifying objects. Another kernel-based technique was proposed by Tax and Duin (1999) where a hypersphere is used for domain description of a single class. The idea of a hypersphere for a single class can be easily extended to classification when dealing with multiple classes by just classifying objects to the nearest hypersphere. Although the theory of hyperspheres is well developed, not much research has gone into using hyperspheres for classification and the performance thereof compared to other classification techniques. In this thesis we will give an overview of Nearest Hypersphere Classification (NHC) as well as provide further insight regarding the performance of NHC compared to other classification techniques (LDA, QDA and SVM) under different simulation configurations. We begin with a literature study, where the theory of the classification techniques LDA, QDA, SVM and NHC will be dealt with. In the discussion of each technique, applications in the statistical software R will also be provided. An extensive simulation study is carried out to compare the performance of LDA, QDA, SVM and NHC for the two-class case. Various data scenarios will be considered in the simulation study. This will give further insight in terms of which classification technique performs better under the different data scenarios. Finally, the thesis ends with the comparison of these techniques on real-world data. / AFRIKAANSE OPSOMMING: Klassifikasie is ’n statistiese metode wat gebruik word om objekte in twee of meer klasse te klassifiseer gebaseer op ’n reël wat gebou is op die onafhanklike veranderlikes. Voorbeelde van hierdie metodes sluit in Lineêre en Kwadratiese Diskriminant Analise (LDA en KDA). Wanneer die aantal onafhanklike veranderlikes in ’n datastel te veel raak, die aanname van normaliteit nie meer geld nie of die klasse nie meer lineêr skeibaar is nie, raak die toepassing van metodes soos LDA en KDA egter te moeilik. Vapnik et al. (1995) het ’n kern gebaseerde metode bekendgestel, die Steun Vektor Masjien (SVM), wat wel vir klassifisering gebruik kan word in situasies waar metodes soos LDA en KDA misluk. SVM maak gebruik van ‘n optimale skeibare hipervlak en ’n kern funksie om ’n reël af te lei wat gebruik kan word om objekte te klassifiseer. ’n Ander kern gebaseerde tegniek is voorgestel deur Tax and Duin (1999) waar ’n hipersfeer gebruik kan word om ’n gebied beskrywing op te stel vir ’n datastel met net een klas. Dié idee van ’n enkele klas wat beskryf kan word deur ’n hipersfeer, kan maklik uitgebrei word na ’n multi-klas klassifikasie probleem. Dit kan gedoen word deur slegs die objekte te klassifiseer na die naaste hipersfeer. Alhoewel die teorie van hipersfere goed ontwikkeld is, is daar egter nog nie baie navorsing gedoen rondom die gebruik van hipersfere vir klassifikasie nie. Daar is ook nog nie baie gekyk na die prestasie van hipersfere in vergelyking met ander klassifikasie tegnieke nie. In hierdie tesis gaan ons ‘n oorsig gee van Naaste Hipersfeer Klassifikasie (NHK) asook verdere insig in terme van die prestasie van NHK in vergelyking met ander klassifikasie tegnieke (LDA, KDA en SVM) onder sekere simulasie konfigurasies. Ons gaan begin met ‘n literatuurstudie, waar die teorie van die klassifikasie tegnieke LDA, KDA, SVM en NHK behandel gaan word. Vir elke tegniek gaan toepassings in die statistiese sagteware R ook gewys word. ‘n Omvattende simulasie studie word uitgevoer om die prestasie van die tegnieke LDA, KDA, SVM en NHK te vergelyk. Die vergelyking word gedoen vir situasies waar die data slegs twee klasse het. ‘n Verskeidenheid van data situasies gaan ook ondersoek word om verdere insig te toon in terme van wanneer watter tegniek die beste vaar. Die tesis gaan afsluit deur die genoemde tegnieke toe te pas op praktiese datastelle. UCTD Classification Machine learning Kernel functions
29	Modelling of multi-state panel data : the importance of the model assumptions Mafu, Thandile John 12 1900 (has links) Thesis (MCom)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: A multi-state model is a way of describing a process in which a subject moves through a series of states in continuous time. The series of states might be the measurement of a disease for example in state 1 we might have subjects that are free from disease, in state 2 we might have subjects that have a disease but the disease is mild, in state 3 we might have subjects having a severe disease and in last state 4 we have those that die because of the disease. So Markov models estimates the transition probabilities and transition intensity rates that describe the movement of subjects between these states. The transition might be for example a particular subject or patient might be slightly sick at age 30 but after 5 years he or she might be worse. So Markov model will estimate what probability will be for that patient for moving from state 2 to state 3. Markov multi-state models were studied in this thesis with the view of assessing the Markov models assumptions such as homogeneity of the transition rates through time, homogeneity of the transition rates across the subject population and Markov property or assumption. The assessments of these assumptions were based on simulated panel or longitudinal dataset which was simulated using the R package named msm package developed by Christopher Jackson (2014). The R code that was written using this package is attached as appendix. Longitudinal dataset consists of repeated measurements of the state of a subject and the time between observations. The period of time with observations in longitudinal dataset is being made on subject at regular or irregular time intervals until the subject dies then the study ends. / AFRIKAANSE OPSOMMING: ’n Meertoestandmodel is ’n manier om ’n proses te beskryf waarin ’n subjek in ’n ononderbroke tydperk deur verskeie toestande beweeg. Die verskillende toestande kan byvoorbeeld vir die meting van siekte gebruik word, waar toestand 1 uit gesonde subjekte bestaan, toestand 2 uit subjekte wat siek is, dog slegs matig, toestand 3 uit subjekte wat ernstig siek is, en toestand 4 uit subjekte wat aan die siekte sterf. ’n Markov-model raam die oorgangswaarskynlikhede en -intensiteit wat die subjekte se vordering deur hierdie toestande beskryf. Die oorgang is byvoorbeeld wanneer ’n bepaalde subjek of pasiënt op 30-jarige ouderdom net lig aangetas is, maar na vyf jaar veel ernstiger siek is. Die Markov-model raam dus die waarskynlikheid dat so ’n pasiënt van toestand 2 tot toestand 3 sal vorder. Hierdie tesis het ondersoek ingestel na Markov-meertoestandmodelle ten einde die aannames van die modelle, soos die homogeniteit van oorgangstempo’s oor tyd, die homogeniteit van oorgangstempo’s oor die subjekpopulasie en tipiese Markov-eienskappe, te beoordeel. Die beoordeling van hierdie aannames was gegrond op ’n gesimuleerde paneel of longitudinale datastel wat met behulp van Christopher Jackson (2014) se R-pakket genaamd msm gesimuleer is. Die R-kode wat met behulp van hierdie pakket geskryf is, word as bylae aangeheg. Die longitudinale datastel bestaan uit herhaalde metings van die toestand waarin ’n subjek verkeer en die tydsverloop tussen waarnemings. Waarnemings van die longitudinale datastel word met gereelde of ongereelde tussenposes onderneem totdat die subjek sterf, wanneer die studie dan ook ten einde loop. UCTD Markov processes Longitudinal method
30	Completion of an incomplete market by quadratic variation assets. Mgobhozi, S. W. January 2011 (has links) It is well known that the general geometric L´evy market models are incomplete, except for the geometric Brownian and the geometric Poissonian, but such a market can be completed by enlarging it with power-jump assets as Corcuera and Nualart [12] did on their paper. With the knowledge that an incomplete market due to jumps can be completed, we look at other cases of incompleteness. We will consider incompleteness due to more sources of randomness than tradable assets, transactions costs and stochastic volatility. We will show that such markets are incomplete and propose a way to complete them. By doing this we show that such markets can be completed. In the case of incompleteness due to more randomness than tradable assets, we will enlarge the market using the market’s underlying quadratic variation assets. By doing this we show that the market can be completed. Looking at a market paying transactional costs, which is also an incomplete market model due to indifference between the buyers and sellers price, we will show that a market paying transactional costs as the one given by, Cvitanic and Karatzas [13] can be completed. Empirical findings have shown that the Black and Scholes assumption of constant volatility is inaccurate (see Tompkins [40] for empirical evidence). Volatility is in some sense stochastic, and is divided into two broad classes. The first class being single-factor models, which have only one source of randomness, and are complete markets models. The other class being the multi-factor models in which other random elements are introduced, hence are an incomplete markets models. In this project we look at some commonly used multi-factor models and attempt to complete one of them. / Thesis (M.Sc.)-University of KwaZulu-Natal, Durban, 2011. Financial risk management. Stochastic analysis. Stochastic processes.

Search results