Global ETD Search

71	Empirical Bayes estimation of the extreme value index in an ANOVA setting Jordaan, Aletta Gertruida 04 1900 (has links) Thesis (MComm)-- Stellenbosch University, 2014. / ENGLISH ABSTRACT: Extreme value theory (EVT) involves the development of statistical models and techniques in order to describe and model extreme events. In order to make inferences about extreme quantiles, it is necessary to estimate the extreme value index (EVI). Numerous estimators of the EVI exist in the literature. However, these estimators are only applicable in the single sample setting. The aim of this study is to obtain an improved estimator of the EVI that is applicable to an ANOVA setting. An ANOVA setting lends itself naturally to empirical Bayes (EB) estimators, which are the main estimators under consideration in this study. EB estimators have not received much attention in the literature. The study begins with a literature study, covering the areas of application of EVT, Bayesian theory and EB theory. Different estimation methods of the EVI are discussed, focusing also on possible methods of determining the optimal threshold. Specifically, two adaptive methods of threshold selection are considered. A simulation study is carried out to compare the performance of different estimation methods, applied only in the single sample setting. First order and second order estimation methods are considered. In the case of second order estimation, possible methods of estimating the second order parameter are also explored. With regards to obtaining an estimator that is applicable to an ANOVA setting, a first order EB estimator and a second order EB estimator of the EVI are derived. A case study of five insurance claims portfolios is used to examine whether the two EB estimators improve the accuracy of estimating the EVI, when compared to viewing the portfolios in isolation. The results showed that the first order EB estimator performed better than the Hill estimator. However, the second order EB estimator did not perform better than the “benchmark” second order estimator, namely fitting the perturbed Pareto distribution to all observations above a pre-determined threshold by means of maximum likelihood estimation. / AFRIKAANSE OPSOMMING: Ekstreemwaardeteorie (EWT) behels die ontwikkeling van statistiese modelle en tegnieke wat gebruik word om ekstreme gebeurtenisse te beskryf en te modelleer. Ten einde inferensies aangaande ekstreem kwantiele te maak, is dit nodig om die ekstreem waarde indeks (EWI) te beraam. Daar bestaan talle beramers van die EWI in die literatuur. Hierdie beramers is egter slegs van toepassing in die enkele steekproef geval. Die doel van hierdie studie is om ’n meer akkurate beramer van die EWI te verkry wat van toepassing is in ’n ANOVA opset. ’n ANOVA opset leen homself tot die gebruik van empiriese Bayes (EB) beramers, wat die fokus van hierdie studie sal wees. Hierdie beramers is nog nie in literatuur ondersoek nie. Die studie begin met ’n literatuurstudie, wat die areas van toepassing vir EWT, Bayes teorie en EB teorie insluit. Verskillende metodes van EWI beraming word bespreek, insluitend ’n bespreking oor hoe die optimale drempel bepaal kan word. Spesifiek word twee aanpasbare metodes van drempelseleksie beskou. ’n Simulasiestudie is uitgevoer om die akkuraatheid van beraming van verskillende beramingsmetodes te vergelyk, in die enkele steekproef geval. Eerste orde en tweede orde beramingsmetodes word beskou. In die geval van tweede orde beraming, word moontlike beramingsmetodes van die tweede orde parameter ook ondersoek. ’n Eerste orde en ’n tweede orde EB beramer van die EWI is afgelei met die doel om ’n beramer te kry wat van toepassing is vir die ANAVA opset. ’n Gevallestudie van vyf versekeringsportefeuljes word gebruik om ondersoek in te stel of die twee EB beramers die akkuraatheid van beraming van die EWI verbeter, in vergelyking met die EWI beramers wat verkry word deur die portefeuljes afsonderlik te ontleed. Die resultate toon dat die eerste orde EB beramer beter gevaar het as die Hill beramer. Die tweede orde EB beramer het egter slegter gevaar as die tweede orde beramer wat gebruik is as maatstaf, naamlik die passing van die gesteurde Pareto verdeling (PPD) aan alle waarnemings bo ’n gegewe drempel, met behulp van maksimum aanneemlikheidsberaming. Extreme Value Theory Analysis of variance UCTD
72	Nearest hypersphere classification : a comparison with other classification techniques Van der Westhuizen, Cornelius Stephanus 12 1900 (has links) Thesis (MCom)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: Classification is a widely used statistical procedure to classify objects into two or more classes according to some rule which is based on the input variables. Examples of such techniques are Linear and Quadratic Discriminant Analysis (LDA and QDA). However, classification of objects with these methods can get complicated when the number of input variables in the data become too large (􀝊 ≪ 􀝌), when the assumption of normality is no longer met or when classes are not linearly separable. Vapnik et al. (1995) introduced the Support Vector Machine (SVM), a kernel-based technique, which can perform classification in cases where LDA and QDA are not valid. SVM makes use of an optimal separating hyperplane and a kernel function to derive a rule which can be used for classifying objects. Another kernel-based technique was proposed by Tax and Duin (1999) where a hypersphere is used for domain description of a single class. The idea of a hypersphere for a single class can be easily extended to classification when dealing with multiple classes by just classifying objects to the nearest hypersphere. Although the theory of hyperspheres is well developed, not much research has gone into using hyperspheres for classification and the performance thereof compared to other classification techniques. In this thesis we will give an overview of Nearest Hypersphere Classification (NHC) as well as provide further insight regarding the performance of NHC compared to other classification techniques (LDA, QDA and SVM) under different simulation configurations. We begin with a literature study, where the theory of the classification techniques LDA, QDA, SVM and NHC will be dealt with. In the discussion of each technique, applications in the statistical software R will also be provided. An extensive simulation study is carried out to compare the performance of LDA, QDA, SVM and NHC for the two-class case. Various data scenarios will be considered in the simulation study. This will give further insight in terms of which classification technique performs better under the different data scenarios. Finally, the thesis ends with the comparison of these techniques on real-world data. / AFRIKAANSE OPSOMMING: Klassifikasie is ’n statistiese metode wat gebruik word om objekte in twee of meer klasse te klassifiseer gebaseer op ’n reël wat gebou is op die onafhanklike veranderlikes. Voorbeelde van hierdie metodes sluit in Lineêre en Kwadratiese Diskriminant Analise (LDA en KDA). Wanneer die aantal onafhanklike veranderlikes in ’n datastel te veel raak, die aanname van normaliteit nie meer geld nie of die klasse nie meer lineêr skeibaar is nie, raak die toepassing van metodes soos LDA en KDA egter te moeilik. Vapnik et al. (1995) het ’n kern gebaseerde metode bekendgestel, die Steun Vektor Masjien (SVM), wat wel vir klassifisering gebruik kan word in situasies waar metodes soos LDA en KDA misluk. SVM maak gebruik van ‘n optimale skeibare hipervlak en ’n kern funksie om ’n reël af te lei wat gebruik kan word om objekte te klassifiseer. ’n Ander kern gebaseerde tegniek is voorgestel deur Tax and Duin (1999) waar ’n hipersfeer gebruik kan word om ’n gebied beskrywing op te stel vir ’n datastel met net een klas. Dié idee van ’n enkele klas wat beskryf kan word deur ’n hipersfeer, kan maklik uitgebrei word na ’n multi-klas klassifikasie probleem. Dit kan gedoen word deur slegs die objekte te klassifiseer na die naaste hipersfeer. Alhoewel die teorie van hipersfere goed ontwikkeld is, is daar egter nog nie baie navorsing gedoen rondom die gebruik van hipersfere vir klassifikasie nie. Daar is ook nog nie baie gekyk na die prestasie van hipersfere in vergelyking met ander klassifikasie tegnieke nie. In hierdie tesis gaan ons ‘n oorsig gee van Naaste Hipersfeer Klassifikasie (NHK) asook verdere insig in terme van die prestasie van NHK in vergelyking met ander klassifikasie tegnieke (LDA, KDA en SVM) onder sekere simulasie konfigurasies. Ons gaan begin met ‘n literatuurstudie, waar die teorie van die klassifikasie tegnieke LDA, KDA, SVM en NHK behandel gaan word. Vir elke tegniek gaan toepassings in die statistiese sagteware R ook gewys word. ‘n Omvattende simulasie studie word uitgevoer om die prestasie van die tegnieke LDA, KDA, SVM en NHK te vergelyk. Die vergelyking word gedoen vir situasies waar die data slegs twee klasse het. ‘n Verskeidenheid van data situasies gaan ook ondersoek word om verdere insig te toon in terme van wanneer watter tegniek die beste vaar. Die tesis gaan afsluit deur die genoemde tegnieke toe te pas op praktiese datastelle. UCTD Classification Machine learning Kernel functions
73	Modelling of multi-state panel data : the importance of the model assumptions Mafu, Thandile John 12 1900 (has links) Thesis (MCom)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: A multi-state model is a way of describing a process in which a subject moves through a series of states in continuous time. The series of states might be the measurement of a disease for example in state 1 we might have subjects that are free from disease, in state 2 we might have subjects that have a disease but the disease is mild, in state 3 we might have subjects having a severe disease and in last state 4 we have those that die because of the disease. So Markov models estimates the transition probabilities and transition intensity rates that describe the movement of subjects between these states. The transition might be for example a particular subject or patient might be slightly sick at age 30 but after 5 years he or she might be worse. So Markov model will estimate what probability will be for that patient for moving from state 2 to state 3. Markov multi-state models were studied in this thesis with the view of assessing the Markov models assumptions such as homogeneity of the transition rates through time, homogeneity of the transition rates across the subject population and Markov property or assumption. The assessments of these assumptions were based on simulated panel or longitudinal dataset which was simulated using the R package named msm package developed by Christopher Jackson (2014). The R code that was written using this package is attached as appendix. Longitudinal dataset consists of repeated measurements of the state of a subject and the time between observations. The period of time with observations in longitudinal dataset is being made on subject at regular or irregular time intervals until the subject dies then the study ends. / AFRIKAANSE OPSOMMING: ’n Meertoestandmodel is ’n manier om ’n proses te beskryf waarin ’n subjek in ’n ononderbroke tydperk deur verskeie toestande beweeg. Die verskillende toestande kan byvoorbeeld vir die meting van siekte gebruik word, waar toestand 1 uit gesonde subjekte bestaan, toestand 2 uit subjekte wat siek is, dog slegs matig, toestand 3 uit subjekte wat ernstig siek is, en toestand 4 uit subjekte wat aan die siekte sterf. ’n Markov-model raam die oorgangswaarskynlikhede en -intensiteit wat die subjekte se vordering deur hierdie toestande beskryf. Die oorgang is byvoorbeeld wanneer ’n bepaalde subjek of pasiënt op 30-jarige ouderdom net lig aangetas is, maar na vyf jaar veel ernstiger siek is. Die Markov-model raam dus die waarskynlikheid dat so ’n pasiënt van toestand 2 tot toestand 3 sal vorder. Hierdie tesis het ondersoek ingestel na Markov-meertoestandmodelle ten einde die aannames van die modelle, soos die homogeniteit van oorgangstempo’s oor tyd, die homogeniteit van oorgangstempo’s oor die subjekpopulasie en tipiese Markov-eienskappe, te beoordeel. Die beoordeling van hierdie aannames was gegrond op ’n gesimuleerde paneel of longitudinale datastel wat met behulp van Christopher Jackson (2014) se R-pakket genaamd msm gesimuleer is. Die R-kode wat met behulp van hierdie pakket geskryf is, word as bylae aangeheg. Die longitudinale datastel bestaan uit herhaalde metings van die toestand waarin ’n subjek verkeer en die tydsverloop tussen waarnemings. Waarnemings van die longitudinale datastel word met gereelde of ongereelde tussenposes onderneem totdat die subjek sterf, wanneer die studie dan ook ten einde loop. UCTD Markov processes Longitudinal method
74	Robust principal component analysis biplots Wedlake, Ryan Stuart 03 1900 (has links) Thesis (MSc (Mathematical Statistics))--University of Stellenbosch, 2008. / In this study several procedures for finding robust principal components (RPCs) for low and high dimensional data sets are investigated in parallel with robust principal component analysis (RPCA) biplots. These RPCA biplots will be used for the simultaneous visualisation of the observations and variables in the subspace spanned by the RPCs. Chapter 1 contains: a brief overview of the difficulties that are encountered when graphically investigating patterns and relationships in multidimensional data and why PCA can be used to circumvent these difficulties; the objectives of this study; a summary of the work done in order to meet these objectives; certain results in matrix algebra that are needed throughout this study. In Chapter 2 the derivation of the classic sample principal components (SPCs) is first discussed in detail since they are the „building blocks‟ of classic principal component analysis (CPCA) biplots. Secondly, the traditional CPCA biplot of Gabriel (1971) is reviewed. Thirdly, modifications to this biplot using the new philosophy of Gower & Hand (1996) are given attention. Reasons why this modified biplot has several advantages over the traditional biplot – some of which are aesthetical in nature – are given. Lastly, changes that can be made to the Gower & Hand (1996) PCA biplot to optimally visualise the correlations between the variables is discussed. Because the SPCs determine the position of the observations as well as the orientation of the arrows (traditional biplot) or axes (Gower and Hand biplot) in the PCA biplot subspace, it is useful to give estimates of the standard errors of the SPCs together with the biplot display as an indication of the stability of the biplot. A computer-intensive statistical technique called the Bootstrap is firstly discussed that is used to calculate the standard errors of the SPCs without making underlying distributional assumptions. Secondly, the influence of outliers on Bootstrap results is investigated. Lastly, a robust form of the Bootstrap is briefly discussed for calculating standard error estimates that remain stable with or without the presence of outliers in the sample. All the preceding topics are the subject matter of Chapter 3. In Chapter 4, reasons why a PC analysis should be made robust in the presence of outliers are firstly discussed. Secondly, different types of outliers are discussed. Thirdly, a method for identifying influential observations and a method for identifying outlying observations are investigated. Lastly, different methods for constructing robust estimates of location and dispersion for the observations receive attention. These robust estimates are used in numerical procedures that calculate RPCs. In Chapter 5, an overview of some of the procedures that are used to calculate RPCs for lower and higher dimensional data sets is firstly discussed. Secondly, two numerical procedures that can be used to calculate RPCs for lower dimensional data sets are discussed and compared in detail. Details and examples of robust versions of the Gower & Hand (1996) PCA biplot that can be constructed using these RPCs are also provided. In Chapter 6, five numerical procedures for calculating RPCs for higher dimensional data sets are discussed in detail. Once RPCs have been obtained by using these methods, they are used to construct robust versions of the PCA biplot of Gower & Hand (1996). Details and examples of these robust PCA biplots are also provided. An extensive software library has been developed so that the biplot methodology discussed in this study can be used in practice. The functions in this library are given in an appendix at the end of this study. This software library is used on data sets from various fields so that the merit of the theory developed in this study can be visually appraised. Biplot Outliers Robust principal components Bootstrap
75	The implementation of noise addition partial least squares Moller, Jurgen Johann 03 1900 (has links) Thesis (MComm (Statistics and Actuarial Science))--University of Stellenbosch, 2009. / When determining the chemical composition of a specimen, traditional laboratory techniques are often both expensive and time consuming. It is therefore preferable to employ more cost effective spectroscopic techniques such as near infrared (NIR). Traditionally, the calibration problem has been solved by means of multiple linear regression to specify the model between X and Y. Traditional regression techniques, however, quickly fail when using spectroscopic data, as the number of wavelengths can easily be several hundred, often exceeding the number of chemical samples. This scenario, together with the high level of collinearity between wavelengths, will necessarily lead to singularity problems when calculating the regression coefficients. Ways of dealing with the collinearity problem include principal component regression (PCR), ridge regression (RR) and PLS regression. Both PCR and RR require a significant amount of computation when the number of variables is large. PLS overcomes the collinearity problem in a similar way as PCR, by modelling both the chemical and spectral data as functions of common latent variables. The quality of the employed reference method greatly impacts the coefficients of the regression model and therefore, the quality of its predictions. With both X and Y subject to random error, the quality the predictions of Y will be reduced with an increase in the level of noise. Previously conducted research focussed mainly on the effects of noise in X. This paper focuses on a method proposed by Dardenne and Fernández Pierna, called Noise Addition Partial Least Squares (NAPLS) that attempts to deal with the problem of poor reference values. Some aspects of the theory behind PCR, PLS and model selection is discussed. This is then followed by a discussion of the NAPLS algorithm. Both PLS and NAPLS are implemented on various datasets that arise in practice, in order to determine cases where NAPLS will be beneficial over conventional PLS. For each dataset, specific attention is given to the analysis of outliers, influential values and the linearity between X and Y, using graphical techniques. Lastly, the performance of the NAPLS algorithm is evaluated for various Principal components analysis Regression analysis Ridge regression (Statistics)
76	Modelling market risk with SAS Risk Dimensions : a step by step implementation Du Toit, Carl 03 1900 (has links) Thesis (MComm (Statistics and Actuarial Science))--University of Stellenbosch, 2005. / Financial institutions invest in financial securities like equities, options and government bonds. Two measures, namely return and risk, are associated with each investment position. Return is a measure of the profit or loss of the investment, whilst risk is defined as the uncertainty about return. A financial institution that holds a portfolio of securities is exposed to different types of risk. The most well-known types are market, credit, liquidity, operational and legal risk. An institution has the need to quantify for each type of risk, the extent of its exposure. Currently, standard risk measures that aim to quantify risk only exist for market and credit risk. Extensive calculations are usually required to obtain values for risk measures. The investments positions that form the portfolio, as well as the market information that are used in the risk measure calculations, change during each trading day. Hence, the financial institution needs a business tool that has the ability to calculate various standard risk measures for dynamic market and position data at the end of each trading day. SAS Risk Dimensions is a software package that provides a solution to the calculation problem. A risk management system is created with this package and is used to calculate all the relevant risk measures on a daily basis. The purpose of this document is to explain and illustrate all the steps that should be followed to create a suitable risk management system with SAS Risk Dimensions. Risk management -- Statistical methods SAS Risk dimensions
77	Pricing and Hedging the Guaranteed Minimum Withdrawal Benefits in Variable Annuities Liu, Yan January 2010 (has links) The Guaranteed Minimum Withdrawal Benefits (GMWBs) are optional riders provided by insurance companies in variable annuities. They guarantee the policyholders' ability to get the initial investment back by making periodic withdrawals regardless of the impact of poor market performance. With GMWBs attached, variable annuities become more attractive. This type of guarantee can be challenging to price and hedge. We employ two approaches to price GMWBs. Under the constant static withdrawal assumption, the first approach is to decompose the GMWB and the variable annuity into an arithmetic average strike Asian call option and an annuity certain. The second approach is to treat the GMWB alone as a put option whose maturity and payoff are random. Hedging helps insurers specify and manage the risks of writing GMWBs, as well as find their fair prices. We propose semi-static hedging strategies that offer several advantages over dynamic hedging. The idea is to construct a portfolio of European options that replicate the conditional expected GMWB liability in a short time period, and update the portfolio after the options expire. This strategy requires fewer portfolio adjustments, and outperforms the dynamic strategy when there are random jumps in the underlying price. We also extend the semi-static hedging strategies to the Heston stochastic volatility model. semi-static hedging Guaranteed Minimum Withdrawal Benefits Actuarial Science
78	Pricing and Hedging the Guaranteed Minimum Withdrawal Benefits in Variable Annuities Liu, Yan January 2010 (has links) The Guaranteed Minimum Withdrawal Benefits (GMWBs) are optional riders provided by insurance companies in variable annuities. They guarantee the policyholders' ability to get the initial investment back by making periodic withdrawals regardless of the impact of poor market performance. With GMWBs attached, variable annuities become more attractive. This type of guarantee can be challenging to price and hedge. We employ two approaches to price GMWBs. Under the constant static withdrawal assumption, the first approach is to decompose the GMWB and the variable annuity into an arithmetic average strike Asian call option and an annuity certain. The second approach is to treat the GMWB alone as a put option whose maturity and payoff are random. Hedging helps insurers specify and manage the risks of writing GMWBs, as well as find their fair prices. We propose semi-static hedging strategies that offer several advantages over dynamic hedging. The idea is to construct a portfolio of European options that replicate the conditional expected GMWB liability in a short time period, and update the portfolio after the options expire. This strategy requires fewer portfolio adjustments, and outperforms the dynamic strategy when there are random jumps in the underlying price. We also extend the semi-static hedging strategies to the Heston stochastic volatility model. semi-static hedging Guaranteed Minimum Withdrawal Benefits Actuarial Science
79	Analysis of Financial Data using a Difference-Poisson Autoregressive Model Baroud, Hiba January 2011 (has links) Box and Jenkins methodologies have massively contributed to the analysis of time series data. However, the assumptions used in these methods impose constraints on the type of the data. As a result, difficulties arise when we apply those tools to a more generalized type of data (e.g. count, categorical or integer-valued data) rather than the classical continuous or more specifically Gaussian type. Papers in the literature proposed alternate methods to model discrete-valued time series data, among these methods is Pegram's operator (1980). We use this operator to build an AR(p) model for integer-valued time series (including both positive and negative integers). The innovations follow the differenced Poisson distribution, or Skellam distribution. While the model includes the usual AR(p) correlation structure, it can be made more general. In fact, the operator can be extended in a way where it is possible to have components which contribute to positive correlation, while at the same time have components which contribute to negative correlation. As an illustration, the process is used to model the change in a stock’s price, where three variations are presented: Variation I, Variation II and Variation III. The first model disregards outliers; however, the second and third include large price changes associated with the effect of large volume trades and market openings. Parameters of the model are estimated using Maximum Likelihood methods. We use several model selection criteria to select the best order for each variation of the model as well as to determine which is the best variation of the model. The most adequate order for all the variations of the model is $AR(3)$. While the best fit for the data is Variation II, residuals' diagnostic plots suggest that Variation III represents a better correlation structure for the model. Pegrams's Operator Skellam Distribution Negative Correlations Actuarial Science
80	Markovian Approaches to Joint-life Mortality with Applications in Risk Management Ji, Min 28 July 2011 (has links) The combined survival status of the insured lives is a critical problem when pricing and reserving insurance products with more than one life. Our preliminary experience examination of bivariate annuity data from a large Canadian insurance company shows that the relative risk of mortality for an individual increases after the loss of his/her spouse, and that the increase is especially dramatic shortly after bereavement. This preliminary result is supported by the empirical studies over the past 50 years, which suggest dependence between a husband and wife. The dependence between a married couple may be significant in risk management of joint-life policies. This dissertation progressively explores Markovian models in pricing and risk management of joint-life policies, illuminating their advantages in dependent modeling of joint time-until-death (or other exit time) random variables. This dissertation argues that in the dependent modeling of joint-life dependence, Markovian models are flexible, transparent, and easily extended. Multiple state models have been widely used in historic data analysis, particularly in the modeling of failures that have event-related dependence. This dissertation introduces a ¡°common shock¡± factor into a standard Markov joint-life mortality model, and then extends it to a semi-Markov model to capture the decaying effect of the "broken heart" factor. The proposed models transparently and intuitively measure the extent of three types of dependence: the instantaneous dependence, the short-term impact of bereavement, and the long-term association between lifetimes. Some copula-based dependence measures, such as upper tail dependence, can also be derived from Markovian approaches. Very often, death is not the only mode of decrement. Entry into long-term care and voluntary prepayment, for instance, can affect reverse mortgage terminations. The semi-Markov joint-life model is extended to incorporate more exit modes, to model joint-life reverse mortgage termination speed. The event-triggered dependence between a husband and wife is modeled. For example, one spouse's death increases the survivor's inclination to move close to kin. We apply the proposed model specifically to develop the valuation formulas for roll-up mortgages in the UK and Home Equity Conversion Mortgages in the US. We test the significance of each termination mode and then use the model to investigate the mortgage insurance premiums levied on Home Equity Conversion Mortgage borrowers. Finally, this thesis extends the semi-Markov joint-life mortality model to having stochastic transition intensities, for modeling joint-life longevity risk in last-survivor annuities. We propose a natural extension of Gompertz' law to have correlated stochastic dynamics for its two parameters, and incorporate it into the semi-Markov joint-life mortality model. Based on this preliminary joint-life longevity model, we examine the impact of mortality improvement on the cost of a last survivor annuity, and investigate the market prices of longevity risk in last survivor annuities using risk-neutral pricing theory. Markov Semi-Markov Joint lives Mortality Longevity Actuarial Science

Search results