• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 52
  • 25
  • 1
  • Tagged with
  • 80
  • 80
  • 68
  • 68
  • 37
  • 31
  • 24
  • 17
  • 17
  • 14
  • 13
  • 10
  • 10
  • 9
  • 9
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Improving the accuracy of prediction using singular spectrum analysis by incorporating internet activity

Badenhorst, Dirk Jakobus Pretorius 03 1900 (has links)
Thesis (MComm)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: Researchers and investors have been attempting to predict stock market activity for years. The possible financial gain that accurate predictions would offer lit a flame of greed and drive that would inspire all kinds of researchers. However, after many of these researchers have failed, they started to hypothesize that a goal such as this is not only improbable, but impossible. Previous predictions were based on historical data of the stock market activity itself and would often incorporate different types of auxiliary data. This auxiliary data ranged as far as imagination allowed in an attempt to find some correlation and some insight into the future, that could in turn lead to the figurative pot of gold. More often than not, the auxiliary data would not prove helpful. However, with the birth of the internet, endless amounts of new sources of auxiliary data presented itself. In this thesis I propose that the near in finite amount of data available on the internet could provide us with information that would improve stock market predictions. With this goal in mind, the different sources of information available on the internet are considered. Previous studies on similar topics presented possible ways in which we can measure internet activity, which might relate to stock market activity. These studies also gave some insights on the advantages and disadvantages of using some of these sources. These considerations are investigated in this thesis. Since a lot of this work is therefore based on the prediction of a time series, it was necessary to choose a prediction algorithm. Previously used linear methods seemed too simple for prediction of stock market activity and a new non-linear method, called Singular Spectrum Analysis, is therefore considered. A detailed study of this algorithm is done to ensure that it is an appropriate prediction methodology to use. Furthermore, since we will be including auxiliary information, multivariate extensions of this algorithm are considered as well. Some of the inaccuracies and inadequacies of these current multivariate extensions are studied and an alternative multivariate technique is proposed and tested. This alternative approach addresses the inadequacies of existing methods. With the appropriate methodology chosen and the appropriate sources of auxiliary information chosen, a concluding chapter is done on whether predictions that includes auxiliary information (obtained from the internet) improve on baseline predictions that are simply based on historical stock market data. / AFRIKAANSE OPSOMMING: Navorsers en beleggers is vir jare al opsoek na maniere om aandeelpryse meer akkuraat te voorspel. Die moontlike finansiële implikasies wat akkurate vooruitskattings kan inhou het 'n vlam van geldgierigheid en dryf wakker gemaak binne navorsers regoor die wêreld. Nadat baie van hierdie navorsers onsuksesvol was, het hulle begin vermoed dat so 'n doel nie net onwaarskynlik is nie, maar onmoontlik. Vorige vooruitskattings was bloot gebaseer op historiese aandeelprys data en sou soms verskillende tipes bykomende data inkorporeer. Die tipes data wat gebruik was het gestrek so ver soos wat die verbeelding toegelaat het, in 'n poging om korrelasie en inligting oor die toekoms te kry wat na die guurlike pot goud sou lei. Navorsers het gereeld gevind dat hierdie verskillende tipes bykomende inligting nie van veel hulp was nie, maar met die geboorte van die internet het 'n oneindige hoeveelheid nuwe bronne van bykomende inligting bekombaar geraak. In hierdie tesis stel ek dus voor dat die data beskikbaar op die internet dalk vir ons kan inligting gee wat verwant is aan toekomstige aandeelpryse. Met hierdie doel in die oog, is die verskillende bronne van inligting op die internet gebestudeer. Vorige studies op verwante werk het sekere spesifieke maniere voorgestel waarop ons internet aktiwiteit kan meet. Hierdie studies het ook insig gegee oor die voordele en die nadele wat sommige bronne inhou. Hierdie oorwegings word ook in hierdie tesis bespreek. Aangesien 'n groot gedeelte van hierdie tesis dus gebasseer word op die vooruitskatting van 'n tydreeks, is dit nodig om 'n toepaslike vooruitskattings algoritme te kies. Baie navorsers het verkies om eenvoudige lineêre metodes te gebruik. Hierdie metodes het egter te eenvoudig voorgekom en 'n relatiewe nuwe nie-lineêre metode (met die naam "Singular Spectrum Analysis") is oorweeg. 'n Deeglike studie van hierdie algoritme is gedoen om te verseker dat die metode van toepassing is op aandeelprys data. Verder, aangesien ons gebruik wou maak van bykomende inligting, is daar ook 'n studie gedoen op huidige multivariaat uitbreidings van hierdie algoritme en die probleme wat dit inhou. 'n Alternatiewe multivariaat metode is toe voorgestel en getoets wat hierdie probleme aanspreek. Met 'n gekose vooruitskattingsmetode en gekose bronne van bykomende data is 'n gevolgtrekkende hoofstuk geskryf oor of vooruitskattings, wat die bykomende internet data inkorporeer, werklik in staat is om te verbeter op die eenvoudige vooruitskattings, wat slegs gebaseer is op die historiese aandeelprys data.
72

PCA and CVA biplots : a study of their underlying theory and quality measures

Brand, Hilmarie 03 1900 (has links)
Thesis (MComm)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: The main topics of study in this thesis are the Principal Component Analysis (PCA) and Canonical Variate Analysis (CVA) biplots, with the primary focus falling on the quality measures associated with these biplots. A detailed study of different routes along which PCA and CVA can be derived precedes the study of the PCA biplot and CVA biplot respectively. Different perspectives on PCA and CVA highlight different aspects of the theory that underlie PCA and CVA biplots respectively and so contribute to a more solid understanding of these biplots and their interpretation. PCA is studied via the routes followed by Pearson (1901) and Hotelling (1933). CVA is studied from the perspectives of Linear Discriminant Analysis, Canonical Correlation Analysis as well as a two-step approach introduced in Gower et al. (2011). The close relationship between CVA and Multivariate Analysis of Variance (MANOVA) also receives some attention. An explanation of the construction of the PCA biplot is provided subsequent to the study of PCA. Thereafter follows an in depth investigation of quality measures of the PCA biplot as well as the relationships between these quality measures. Specific attention is given to the effect of standardisation on the PCA biplot and its quality measures. Following the study of CVA is an explanation of the construction of the weighted CVA biplot as well as two different unweighted CVA biplots based on the two-step approach to CVA. Specific attention is given to the effect of accounting for group sizes in the construction of the CVA biplot on the representation of the group structure underlying a data set. It was found that larger groups tend to be better separated from other groups in the weighted CVA biplot than in the corresponding unweighted CVA biplots. Similarly it was found that smaller groups tend to be separated to a greater extent from other groups in the unweighted CVA biplots than in the corresponding weighted CVA biplot. A detailed investigation of previously defined quality measures of the CVA biplot follows the study of the CVA biplot. It was found that the accuracy with which the group centroids of larger groups are approximated in the weighted CVA biplot is usually higher than that in the corresponding unweighted CVA biplots. Three new quality measures that assess that accuracy of the Pythagorean distances in the CVA biplot are also defined. These quality measures assess the accuracy of the Pythagorean distances between the group centroids, the Pythagorean distances between the individual samples and the Pythagorean distances between the individual samples and group centroids in the CVA biplot respectively. / AFRIKAANSE OPSOMMING: Die hoofonderwerpe van studie in hierdie tesis is die Hoofkomponent Analise (HKA) bistipping asook die Kanoniese Veranderlike Analise (KVA) bistipping met die primêre fokus op die kwaliteitsmaatstawwe wat daarmee geassosieer word. ’n Gedetailleerde studie van verskillende roetes waarlangs HKA en KVA afgelei kan word, gaan die studie van die HKA en KVA bistippings respektiewelik vooraf. Verskillende perspektiewe op HKA en KVA belig verskillende aspekte van die teorie wat onderliggend is tot die HKA en KVA bistippings respektiewelik en dra sodoende by tot ’n meer breedvoerige begrip van hierdie bistippings en hulle interpretasies. HKA word bestudeer volgens die roetes wat gevolg is deur Pearson (1901) en Hotelling (1933). KVA word bestudeer vanuit die perspektiewe van Linieêre Diskriminantanalise, Kanoniese Korrelasie-analise sowel as ’n twee-stap-benadering soos voorgestel in Gower et al. (2011). Die noue verwantskap tussen KVA en Meerveranderlike Analise van Variansie (MANOVA) kry ook aandag. ’n Verduideliking van die konstruksie van die HKA bistipping word voorsien na afloop van die studie van HKA. Daarna volg ’n indiepte-ondersoek van die HKA bistipping kwaliteitsmaatstawwe sowel as die onderlinge verhoudings tussen hierdie kwaliteitsmaatstawe. Spesifieke aandag word gegee aan die effek van die standaardisasie op die HKA bistipping en sy kwaliteitsmaatstawe. Opvolgend op die studie van KVA is ’n verduideliking van die konstruksie van die geweegde KVA bistipping sowel as twee veskillende ongeweegde KVA bistippings gebaseer op die twee-stap-benadering tot KVA. Spesifieke aandag word gegee aan die effek wat die inagneming van die groepsgroottes in die konstruksie van die KVA bistipping op die voorstelling van die groepstruktuur onderliggend aan ’n datastel het. Daar is gevind dat groter groepe beter geskei is van ander groepe in die geweegde KVA bistipping as in die oorstemmende ongeweegde KVA bistipping. Soortgelyk daaraan is gevind dat kleiner groepe tot ’n groter mate geskei is van ander groepe in die ongeweegde KVA bistipping as in die oorstemmende geweegde KVA bistipping. ’n Gedetailleerde ondersoek van voorheen gedefinieerde kwaliteitsmaatstawe van die KVA bistipping volg op die studie van die KVA bistipping. Daar is gevind dat die akkuraatheid waarmee die groepsgemiddeldes van groter groepe benader word in die geweegde KVA bistipping, gewoonlik hoër is as in die ooreenstemmende ongeweegde KVA bistippings. Drie nuwe kwaliteitsmaatstawe wat die akkuraatheid van die Pythagoras-afstande in die KVA bistipping meet, word gedefinieer. Hierdie kwaliteitsmaatstawe beskryf onderskeidelik die akkuraatheid van die voorstelling van die Pythagoras-afstande tussen die groepsgemiddeldes, die Pythagoras-afstande tussen die individuele observasies en die Pythagoras-afstande tussen die individuele observasies en groepsgemiddeldes in die KVA bistipping.
73

Modelling longitudinally measured outcome HIV biomarkers with immuno genetic parameters.

Bryan, Susan Ruth. January 2011 (has links)
According to the Joint United Nations Programme against HIV/AIDS 2009 AIDS epidemic update, there were a total of 33.3 million (31.4 million–35.3 million) people living with HIV worldwide in 2009. The majority of the epidemic occurs in Sub-Saharan Africa. Of the 33.3 million people living with HIV worldwide in 2009, a vast majority of 22.5 million (20.9 million-24.2 million) were from Sub-Saharan Africa. There were 1.8 million (1.6 million-2.0 million) new infections and 1.3 million (1.1 million-1.5 million) AIDS-related deaths in Sub-Saharan Africa in 2009 (UNAIDS, 2009). Statistical models and analysis are required in order to further understand the dynamics of HIV/AIDS and in the design of intervention and control strategies. Despite the prevalence of this disease, its pathogenesis is still poorly understood. A thorough understanding of HIV and factors that influence progression of the disease is required in order to prevent the further spread of the virus. Modelling provides us with a means to understand and predict the progression of the disease better. Certain genetic factors play a key role in the way the disease progresses in a human body. For example HLA-B types and IL-10 genotypes are some of the genetic factors that have been independently associated with the control of HIV infection. Both HLA-B and IL-10 may influence the quality and magnitude of immune responses and IL-10 has also been shown to down regulate the expression of certain HLA molecules. Studies are therefore required to investigate how HLA-B types and IL-10 genotypes may interact to affect HIV infection outcomes. This dissertation uses the Sinikithemba study data from the HIV Pathogenesis Programme (HPP) at the Medical School, University of KwaZulu-Natal involving 450 HIV positive and treatment naive individuals to model how certain outcome biomarkers (CD4+ counts and viral loads) are associated with immuno genetic parameters (HLA-B types and IL-10 genotypes). The work also seeks to exploit novel longitudinal data methods in Statistics in order to efficiently model longitudinally measured HIV outcome data. Statistical techniques such as linear mixed models and generalized estimating equations were used to model this data. The findings from the current work agree quite closely with what is expected from the biological understanding of the disease. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2011.
74

Estimation and analysis of measures of disease for HIV infection in childbearing women using serial seroprevalence data.

Sewpaul, Ronel. January 2011 (has links)
The prevalence and the incidence are two primary epidemiological parameters in infectious disease modelling. The incidence is also closely related to the force of infection or the hazard of infection in survival analysis terms. The two measures carry the same information about a disease because they measure the rate at which new infections occur. The disease prevalence gives the proportion of infected individuals in the population at a given time, while the incidence is the rate of new infections. The thesis discusses methods for estimating HIV prevalence, incidence rates and the force of infection, against age and time, using cross-sectional seroprevalence data for pregnant women attending antenatal clinics. The data was collected on women aged 12 to 47 in rural KwaZulu-Natal for each of the years 2001 to 2006. The generalized linear model for binomial response is used extensively. First the logistic regression model is used to estimate annual HIV prevalence by age. It was found that the estimated prevalence for each year increases with age, to peaks of between 36% and 57% in the mid to late twenties, before declining steadily toward the forties. Fitted prevalence for 2001 is lower than for the other years across all ages. Several models for estimating the force of infection are discussed and applied. The fitted force of infection rises with age to a peak of 0.074 at age 15, and then decreases toward higher ages. The force of infection measures the potential risk of infection per individual per unit time. A proportional hazards model of the age to infection is applied to the data, and shows that additional variables such as partner’s age and the number of previous pregnancies do have a significant effect on the infection hazard. Studies for estimating incidence from multiple prevalence surveys are reviewed. The relative inclusion rate (RIR), accounting for the fact that the probability of inclusion in a prevalence sample depends on the individual’s HIV status, and its role in incidence estimation is discussed as a possible future approach of extending the current work. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2011.
75

Statistical modelling and estimation of solar radiation.

Nzuza, Mphiliseni Bongani. 15 October 2014 (has links)
Solar radiation is a primary driving force behind a number of solar energy applications such as photovoltaic systems for electricity generation amongst others. Hence, the accurate modelling and prediction of the solar flux incident at a particular location, is essential for the design and performance prediction of solar energy conversion systems. In this regard, literature shows that time series models such as the Box-Jenkins Seasonal/Non-seasonal Autoregressive Integrated Moving Average (S/ARIMA) stochastic models have considerable efficacy to describe, monitor and forecast solar radiation data series at various sites on the earths surface (see e.g. Reikard, 2009). This success is attributable to their ability to capture the stochastic component of the irradiance series due to the effects of the ever-changing atmospheric conditions. On the other hand at the top of the atmosphere, there are no such conditions and deterministic models which have been used successfully to model extra-terrestrial solar radiation. One such modelling procedure is the use of a sinusoidal predictor at determined harmonic (Fourier) frequencies to capture the inherent periodicities (seasonalities) due to the diurnal cycle. We combine this deterministic model component and SARIMA models to construct harmonically coupled SARIMA (HCSARIMA) models to model the resulting mixture of stochastic and deterministic components of solar radiation recorded at the earths surface. A comparative study of these two classes of models is undertaken for the horizontal global solar irradiance incident on the solar panels at UKZN Howard College (UKZN HC), located at 29.9º South, 30.98º East with elevation, 151.3m. The results indicated that both SARIMA and HCSARIMA models are good in describing the underlying data generating processes for all data series with respect to different diagnostics. In terms of the predictive ability, the HCSARIMA models generally had a competitive edge over the SARIMA models in most cases. Also, a tentative study of long range dependence (long memory) shows this phenomenon to be inherent in high frequency data series. Therefore autoregressive fractionally integrated moving average (ARFIMA) models are recommended for further studies on high frequency irradiance. / M.Sc. University of KwaZulu-Natal, Durban 2014.
76

A framework for estimating risk

Kroon, Rodney Stephen 03 1900 (has links)
Thesis (PhD (Statistics and Actuarial Sciences))--Stellenbosch University, 2008. / We consider the problem of model assessment by risk estimation. Various approaches to risk estimation are considered in a uni ed framework. This a discussion of various complexity dimensions and approaches to obtaining bounds on covering numbers is also presented. The second type of training sample interval estimator discussed in the thesis is Rademacher bounds. These bounds use advanced concentration inequalities, so a chapter discussing such inequalities is provided. Our discussion of Rademacher bounds leads to the presentation of an alternative, slightly stronger, form of the core result used for deriving local Rademacher bounds, by avoiding a few unnecessary relaxations. Next, we turn to a discussion of PAC-Bayesian bounds. Using an approach developed by Olivier Catoni, we develop new PAC-Bayesian bounds based on results underlying Hoe ding's inequality. By utilizing Catoni's concept of \exchangeable priors", these results allowed the extension of a covering number-based result to averaging classi ers, as well as its corresponding algorithm- and data-dependent result. The last contribution of the thesis is the development of a more exible shell decomposition bound: by using Hoe ding's tail inequality rather than Hoe ding's relative entropy inequality, we extended the bound to general loss functions, allowed the use of an arbitrary number of bins, and introduced between-bin and within-bin \priors". Finally, to illustrate the calculation of these bounds, we applied some of them to the UCI spam classi cation problem, using decision trees and boosted stumps. framework is an extension of a decision-theoretic framework proposed by David Haussler. Point and interval estimation based on test samples and training samples is discussed, with interval estimators being classi ed based on the measure of deviation they attempt to bound. The main contribution of this thesis is in the realm of training sample interval estimators, particularly covering number-based and PAC-Bayesian interval estimators. The thesis discusses a number of approaches to obtaining such estimators. The rst type of training sample interval estimator to receive attention is estimators based on classical covering number arguments. A number of these estimators were generalized in various directions. Typical generalizations included: extension of results from misclassi cation loss to other loss functions; extending results to allow arbitrary ghost sample size; extending results to allow arbitrary scale in the relevant covering numbers; and extending results to allow arbitrary choice of in the use of symmetrization lemmas. These extensions were applied to covering number-based estimators for various measures of deviation, as well as for the special cases of misclassi - cation loss estimators, realizable case estimators, and margin bounds. Extended results were also provided for strati cation by (algorithm- and datadependent) complexity of the decision class. In order to facilitate application of these covering number-based bounds,
77

Interest rate model theory with reference to the South African market

Van Wijck, Tjaart 03 1900 (has links)
Thesis (MComm (Statistics and Actuarial Science))--University of Stellenbosch, 2006. / An overview of modern and historical interest rate model theory is given with the specific aim of derivative pricing. A variety of stochastic interest rate models are discussed within a South African market context. The various models are compared with respect to characteristics such as mean reversion, positivity of interest rates, the volatility structures they can represent, the yield curve shapes they can represent and weather analytical bond and derivative prices can be found. The distribution of the interest rates implied by some of these models is also found under various measures. The calibration of these models also receives attention with respect to instruments available in the South African market. Problems associated with the calibration of the modern models are also discussed.
78

An analysis of income and poverty in South Africa

Malherbe, Jeanine Elizabeth 03 1900 (has links)
Thesis (MComm (Statistics and Actuarial Science))--University of Stellenbosch, 2007. / The aim of this study is to assess the welfare of South Africa in terms of poverty and inequality. This is done using the Income and Expenditure Survey (IES) of 2000, released by Statistics South Africa, and reviewing the distribution of income in the country. A brief literature review of similar studies is given along with a broad de nition of poverty and inequality. A detailed description of the dataset used is given together with aspects of concern surrounding the dataset. An analysis of poverty and income inequality is made using datasets containing the continuous income variable, as well as a created grouped income variable. Results from these datasets are compared and conclusions made on the use of continuous or grouped income variables. Covariate analysis is also applied in the form of biplots. A brief overview of biplots is given and it is then used to obtain a graphical description of the data and identify any patterns. Lastly, the conclusions made in this study are put forward and some future research is mentioned.
79

Value at risk and expected shortfall : traditional measures and extreme value theory enhancements with a South African market application

Dicks, Anelda 12 1900 (has links)
Thesis (MComm)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: Accurate estimation of Value at Risk (VaR) and Expected Shortfall (ES) is critical in the management of extreme market risks. These risks occur with small probability, but the financial impacts could be large. Traditional models to estimate VaR and ES are investigated. Following usual practice, 99% 10 day VaR and ES measures are calculated. A comprehensive theoretical background is first provided and then the models are applied to the Africa Financials Index from 29/01/1996 to 30/04/2013. The models considered include independent, identically distributed (i.i.d.) models and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) stochastic volatility models. Extreme Value Theory (EVT) models that focus especially on extreme market returns are also investigated. For this, the Peaks Over Threshold (POT) approach to EVT is followed. For the calculation of VaR, various scaling methods from one day to ten days are considered and their performance evaluated. The GARCH models fail to converge during periods of extreme returns. During these periods, EVT forecast results may be used. As a novel approach, this study considers the augmentation of the GARCH models with EVT forecasts. The two-step procedure of pre-filtering with a GARCH model and then applying EVT, as suggested by McNeil (1999), is also investigated. This study identifies some of the practical issues in model fitting. It is shown that no single forecasting model is universally optimal and the choice will depend on the nature of the data. For this data series, the best approach was to augment the GARCH stochastic volatility models with EVT forecasts during periods where the first do not converge. Model performance is judged by the actual number of VaR and ES violations compared to the expected number. The expected number is taken as the number of return observations over the entire sample period, multiplied by 0.01 for 99% VaR and ES calculations. / AFRIKAANSE OPSOMMING: Akkurate beraming van Waarde op Risiko (Value at Risk) en Verwagte Tekort (Expected Shortfall) is krities vir die bestuur van ekstreme mark risiko’s. Hierdie risiko’s kom met klein waarskynlikheid voor, maar die finansiële impakte is potensieel groot. Tradisionele modelle om Waarde op Risiko en Verwagte Tekort te beraam, word ondersoek. In ooreenstemming met die algemene praktyk, word 99% 10 dag maatstawwe bereken. ‘n Omvattende teoretiese agtergrond word eers gegee en daarna word die modelle toegepas op die Africa Financials Index vanaf 29/01/1996 tot 30/04/2013. Die modelle wat oorweeg word sluit onafhanklike, identies verdeelde modelle en Veralgemeende Auto-regressiewe Voorwaardelike Heteroskedastiese (GARCH) stogastiese volatiliteitsmodelle in. Ekstreemwaarde Teorie modelle, wat spesifiek op ekstreme mark opbrengste fokus, word ook ondersoek. In hierdie verband word die Peaks Over Threshold (POT) benadering tot Ekstreemwaarde Teorie gevolg. Vir die berekening van Waarde op Risiko word verskillende skaleringsmetodes van een dag na tien dae oorweeg en die prestasie van elk word ge-evalueer. Die GARCH modelle konvergeer nie gedurende tydperke van ekstreme opbrengste nie. Gedurende hierdie tydperke, kan Ekstreemwaarde Teorie modelle gebruik word. As ‘n nuwe benadering oorweeg hierdie studie die aanvulling van die GARCH modelle met Ekstreemwaarde Teorie vooruitskattings. Die sogenaamde twee-stap prosedure wat voor-af filtrering met ‘n GARCH model behels, gevolg deur die toepassing van Ekstreemwaarde Teorie (soos voorgestel deur McNeil, 1999), word ook ondersoek. Hierdie studie identifiseer sommige van die praktiese probleme in model passing. Daar word gewys dat geen enkele vooruistkattingsmodel universeel optimaal is nie en die keuse van die model hang af van die aard van die data. Die beste benadering vir die data reeks wat in hierdie studie gebruik word, was om die GARCH stogastiese volatiliteitsmodelle met Ekstreemwaarde Teorie vooruitskattings aan te vul waar die voorafgenoemde nie konvergeer nie. Die prestasie van die modelle word beoordeel deur die werklike aantal Waarde op Risiko en Verwagte Tekort oortredings met die verwagte aantal te vergelyk. Die verwagte aantal word geneem as die aantal obrengste waargeneem oor die hele steekproefperiode, vermenigvuldig met 0.01 vir die 99% Waarde op Risiko en Verwagte Tekort berekeninge.
80

The identification and application of common principal components

Pepler, Pieter Theo 12 1900 (has links)
Thesis (PhD)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: When estimating the covariance matrices of two or more populations, the covariance matrices are often assumed to be either equal or completely unrelated. The common principal components (CPC) model provides an alternative which is situated between these two extreme assumptions: The assumption is made that the population covariance matrices share the same set of eigenvectors, but have di erent sets of eigenvalues. An important question in the application of the CPC model is to determine whether it is appropriate for the data under consideration. Flury (1988) proposed two methods, based on likelihood estimation, to address this question. However, the assumption of multivariate normality is untenable for many real data sets, making the application of these parametric methods questionable. A number of non-parametric methods, based on bootstrap replications of eigenvectors, is proposed to select an appropriate common eigenvector model for two population covariance matrices. Using simulation experiments, it is shown that the proposed selection methods outperform the existing parametric selection methods. If appropriate, the CPC model can provide covariance matrix estimators that are less biased than when assuming equality of the covariance matrices, and of which the elements have smaller standard errors than the elements of the ordinary unbiased covariance matrix estimators. A regularised covariance matrix estimator under the CPC model is proposed, and Monte Carlo simulation results show that it provides more accurate estimates of the population covariance matrices than the competing covariance matrix estimators. Covariance matrix estimation forms an integral part of many multivariate statistical methods. Applications of the CPC model in discriminant analysis, biplots and regression analysis are investigated. It is shown that, in cases where the CPC model is appropriate, CPC discriminant analysis provides signi cantly smaller misclassi cation error rates than both ordinary quadratic discriminant analysis and linear discriminant analysis. A framework for the comparison of di erent types of biplots for data with distinct groups is developed, and CPC biplots constructed from common eigenvectors are compared to other types of principal component biplots using this framework. A subset of data from the Vermont Oxford Network (VON), of infants admitted to participating neonatal intensive care units in South Africa and Namibia during 2009, is analysed using the CPC model. It is shown that the proposed non-parametric methodology o ers an improvement over the known parametric methods in the analysis of this data set which originated from a non-normally distributed multivariate population. CPC regression is compared to principal component regression and partial least squares regression in the tting of models to predict neonatal mortality and length of stay for infants in the VON data set. The tted regression models, using readily available day-of-admission data, can be used by medical sta and hospital administrators to counsel parents and improve the allocation of medical care resources. Predicted values from these models can also be used in benchmarking exercises to assess the performance of neonatal intensive care units in the Southern African context, as part of larger quality improvement programmes. / AFRIKAANSE OPSOMMING: Wanneer die kovariansiematrikse van twee of meer populasies beraam word, word dikwels aanvaar dat die kovariansiematrikse of gelyk, of heeltemal onverwant is. Die gemeenskaplike hoofkomponente (GHK) model verskaf 'n alternatief wat tussen hierdie twee ekstreme aannames gele e is: Die aanname word gemaak dat die populasie kovariansiematrikse dieselfde versameling eievektore deel, maar verskillende versamelings eiewaardes het. 'n Belangrike vraag in die toepassing van die GHK model is om te bepaal of dit geskik is vir die data wat beskou word. Flury (1988) het twee metodes, gebaseer op aanneemlikheidsberaming, voorgestel om hierdie vraag aan te spreek. Die aanname van meerveranderlike normaliteit is egter ongeldig vir baie werklike datastelle, wat die toepassing van hierdie metodes bevraagteken. 'n Aantal nie-parametriese metodes, gebaseer op skoenlus-herhalings van eievektore, word voorgestel om 'n geskikte gemeenskaplike eievektor model te kies vir twee populasie kovariansiematrikse. Met die gebruik van simulasie eksperimente word aangetoon dat die voorgestelde seleksiemetodes beter vaar as die bestaande parametriese seleksiemetodes. Indien toepaslik, kan die GHK model kovariansiematriks beramers verskaf wat minder sydig is as wanneer aanvaar word dat die kovariansiematrikse gelyk is, en waarvan die elemente kleiner standaardfoute het as die elemente van die gewone onsydige kovariansiematriks beramers. 'n Geregulariseerde kovariansiematriks beramer onder die GHK model word voorgestel, en Monte Carlo simulasie resultate toon dat dit meer akkurate beramings van die populasie kovariansiematrikse verskaf as ander mededingende kovariansiematriks beramers. Kovariansiematriks beraming vorm 'n integrale deel van baie meerveranderlike statistiese metodes. Toepassings van die GHK model in diskriminantanalise, bi-stippings en regressie-analise word ondersoek. Daar word aangetoon dat, in gevalle waar die GHK model toepaslik is, GHK diskriminantanalise betekenisvol kleiner misklassi kasie foutkoerse lewer as beide gewone kwadratiese diskriminantanalise en line^ere diskriminantanalise. 'n Raamwerk vir die vergelyking van verskillende tipes bi-stippings vir data met verskeie groepe word ontwikkel, en word gebruik om GHK bi-stippings gekonstrueer vanaf gemeenskaplike eievektore met ander tipe hoofkomponent bi-stippings te vergelyk. 'n Deelversameling van data vanaf die Vermont Oxford Network (VON), van babas opgeneem in deelnemende neonatale intensiewe sorg eenhede in Suid-Afrika en Namibi e gedurende 2009, word met behulp van die GHK model ontleed. Daar word getoon dat die voorgestelde nie-parametriese metodiek 'n verbetering op die bekende parametriese metodes bied in die ontleding van hierdie datastel wat afkomstig is uit 'n nie-normaal verdeelde meerveranderlike populasie. GHK regressie word vergelyk met hoofkomponent regressie en parsi ele kleinste kwadrate regressie in die passing van modelle om neonatale mortaliteit en lengte van verblyf te voorspel vir babas in die VON datastel. Die gepasde regressiemodelle, wat maklik bekombare dag-van-toelating data gebruik, kan deur mediese personeel en hospitaaladministrateurs gebruik word om ouers te adviseer en die toewysing van mediese sorg hulpbronne te verbeter. Voorspelde waardes vanaf hierdie modelle kan ook gebruik word in normwaarde oefeninge om die prestasie van neonatale intensiewe sorg eenhede in die Suider-Afrikaanse konteks, as deel van groter gehalteverbeteringprogramme, te evalueer.

Page generated in 0.0821 seconds