Global ETD Search

41	Comparative approaches to handling missing data, with particular focus on multiple imputation for both cross-sectional and longitudinal models. Hassan, Ali Satty Ali. January 2012 (has links) Much data-based research are characterized by the unavoidable problem of incompleteness as a result of missing or erroneous values. This thesis discusses some of the various strategies and basic issues in statistical data analysis to address the missing data problem, and deals with both the problem of missing covariates and missing outcomes. We restrict our attention to consider methodologies which address a specific missing data pattern, namely monotone missingness. The thesis is divided into two parts. The first part placed a particular emphasis on the so called missing at random (MAR) assumption, but focuses the bulk of attention on multiple imputation techniques. The main aim of this part is to investigate various modelling techniques using application studies, and to specify the most appropriate techniques as well as gain insight into the appropriateness of these techniques for handling incomplete data analysis. This thesis first deals with the problem of missing covariate values to estimate regression parameters under a monotone missing covariate pattern. The study is devoted to a comparison of different imputation techniques, namely markov chain monte carlo (MCMC), regression, propensity score (PS) and last observation carried forward (LOCF). The results from the application study revealed that we have universally best methods to deal with missing covariates when the missing data pattern is monotone. Of the methods explored, the MCMC and regression methods of imputation to estimate regression parameters with monotone missingness were preferable to the PS and LOCF methods. This study is also concerned with comparative analysis of the techniques applied to incomplete Gaussian longitudinal outcome or response data due to random dropout. Three different methods are assessed and investigated, namely multiple imputation (MI), inverse probability weighting (IPW) and direct likelihood analysis. The findings in general favoured MI over IPW in the case of continuous outcomes, even when the MAR mechanism holds. The findings further suggest that the use of MI and direct likelihood techniques lead to accurate and equivalent results as both techniques arrive at the same substantive conclusions. The study also compares and contrasts several statistical methods for analyzing incomplete non-Gaussian longitudinal outcomes when the underlying study is subject to ignorable dropout. The methods considered include weighted generalized estimating equations (WGEE), multiple imputation after generalized estimating equations (MI-GEE) and generalized linear mixed model (GLMM). The current study found that the MI-GEE method was considerably robust, doing better than all the other methods in terms of small and large sample sizes, regardless of the dropout rates. The primary interest of the second part of the thesis falls under the non-ignorable dropout (MNAR) modelling frameworks that rely on sensitivity analysis in modelling incomplete Gaussian longitudinal data. The aim of this part is to deal with non-random dropout by explicitly modelling the assumptions that caused the dropout and incorporated this additional sub-model into the model for the measurement data, and to assess the sensitivity of the modelling assumptions. The study pays attention to the analysis of repeated Gaussian measures subject to potentially non-random dropout in order to study the influence on inference that might be caused in the data by the dropout process. We consider the construction of a particular type of selection model, namely the Diggle-Kenward model as a tool for assessing the sensitivity of a selection model in terms of the modelling assumptions. The major conclusions drawn were that there was evidence in favour of the MAR process rather than an MCAR process in the context of the assumed model. In addition, there was the need to obtain further insight into the data by comparing various sensitivity analysis frameworks. Lastly, two families of models were also compared and contrasted to investigate the potential influence on inference that dropout might have or exert on the dependent measurement data considered, and to deal with incomplete sequences. The models were based on selection and pattern mixture frameworks used for sensitivity analysis to jointly model the distribution of the dropout process and longitudinal measurement process. The results of the sensitivity analysis were in agreement and hence led to similar parameter estimates. Additional confidence in the findings was gained as both models led to similar results for significant effects such as marginal treatment effects. / Thesis (M.Sc.)-University of KwaZulu-Natal, Pietermaritzburg, 2012. Multiple imputation (Statistics) Multivariate analysis. Missing observations (Statistics)
42	Likelihood based statistical methods for estimating HIV incidence rate. Gabaitiri, Lesego. January 2013 (has links) Estimation of current levels of human immunodeficiency virus (HIV) incidence is essential for monitoring the impact of an epidemic, determining public health priorities, assessing the impact of interventions and for planning purposes. However, there is often insufficient data on incidence as compared to prevalence. A direct approach is to estimate incidence from longitudinal cohort studies. Although this approach can provide direct and unbiased measure of incidence for settings where the study is conducted, it is often too expensive and time consuming. An alternative approach is to estimate incidence from cross sectional survey using biomarkers that distinguish between recent and non-recent/longstanding infections. The original biomarker based approach proposes the detection of HIV-1 p24 antigen in the pre-seroconversion period to identify persons with acute infection for estimating HIV incidence. However, this approach requires large sample sizes in order to obtain reliable estimates of HIV incidence because the duration of antigenemia before antibody detection is short, about 22.5 days. Subsequently, another method that involves dual antibody testing system was developed. In stage one, a sensitive test is used to diagnose HIV infection and a less sensitive test such is used in the second stage to distinguish between long standing infections and recent infections among those who tested positive for HIV in stage one. The question is: how do we combine this data with other relevant information, such as the period an individual takes from being undetectable by a less sensitive test to being detectable, to estimate incidence? The main objective of this thesis is therefore to develop likelihood based method that can be used to estimate HIV incidence when data is derived from cross sectional surveys and the disease classification is achieved by combining two biomarker or assay tests. The thesis builds on the dual antibody testing approach and extends the statistical framework that uses the multinomial distribution to derive the maximum likelihood estimators of HIV incidence for different settings. In order to improve incidence estimation, we develop a model for estimating HIV incidence that incorporate information on the previous or past prevalence and derive maximum likelihood estimators of incidence assuming incidence density is constant over a specified period. Later, we extend the method to settings where a proportion of subjects remain non-reactive to a less sensitive test long after seroconversion. Diagnostic tests used to determine recent infections are prone to errors. To address this problem, we considered a method that simultaneously makes adjustment for sensitivity and specificity. In addition, we also showed that sensitivity is similar to the proportion of subjects who eventually transit the “recent infection” state. We also relax the assumption of constant incidence density by proposing linear incidence density to accommodate settings where incidence might be declining or increasing. We extend the standard adjusted model for estimating incidence to settings where some subjects who tested positive for HIV antibodies were not tested by a less sensitive test resulting in missing outcome data. Models for the risk factors (covariates) of HIV incidence are considered in the last but one chapter. We used data from Botswana AIDS Impact (BAIS) III of 2008 to illustrate the proposed methods. The general conclusion and recommendations for future work are provided in the final chapter. / Theses (Ph.D.)-University of KwaZulu-Natal, Pietermaritzburg, 2013. Mathematical statistics. Probabilities. Biometry. Cohort analysis. HIV-positive persons.
43	Statistical methods for analysing complex survey data : an application to HIV/AIDS in Ethiopia. Mohammed, Mohammed O. M. 12 February 2014 (has links) The HIV/AIDS pandemic is currently the most challenging public health matter that faces third world countries, especially those in Sub-Saharan Africa. Ethiopia, in East Africa, with a generalised and highly heterogeneous epidemic, is no exception, with HIV/AIDS affecting most sectors of the economy. The first case of HIV in Ethiopia was reported in 1984. Since then, HIV/AIDS has become a major public health con cern, leading the Government of Ethiopia to declare a public health emergency in 2002. In 2011, the adult HIV/AIDS prevalence in Ethiopia was estimated at 1.5%. Approximately 1.2 million Ethiopians were living with HIV/AIDS in 2010. Surveys are an important and popular tool for collecting data. Analytical use of survey data especially health survey data has become very common, with a focus on the association of particular outcome variables with explanatory variables at the population level. In this study we used the data from the 2005 Ethiopian Demographic and Health Survey, (EDHS 2005), and identified key demographic, socioeconomic, sociocultural, behavioral and proximate determinants of HIV/AIDS risk factor. Usually most survey analysts ignore the complex survey design issues like clustering, stratification and unequal probability of selection (weights). This study deals with complex survey design and takes the design aspect into account, because failure to do so leads to bias parameters estimates and standard error, wide confidence intervals and statistical tests will be incorrect. In this study, three statistical approaches were used to analyse the complex survey data. The first approach was a survey logistic regression used to model the binary outcome (HIV serostatus) and set of explanatory variables (the dependence of the HIV risk factors). The difference between survey logistic regression and the ordinary logistic regression is that survey logistic regression approach takes the study design into account during analysis. The second approach was a multilevel logistic regression model, that assumed that the data structure in the population was hierarchical, and that individual within household was selected from clusters that were randomly selected from a national sampling frame. We considered a three-level model for our analysis. This second approach considered the results from Frequentist and a Bayesian multilevel models. Bayesian methods can provide accurate estimates of the parameters and the uncertainty associated with them. The third approach used was a Spatial models approach where model parameters were estimated under the Integrated Nested Laplace Approximation (INLA) paradigm. / Thesis (Ph.D.)-University of KwaZulu-Natal, Pietermaritzburg, 2013. AIDS (Disease)--Ethiopia. Theses--Statistics.
44	Multivariate analysis of the BRICS financial markets. Ijumba, Claire. January 2013 (has links) The co-movements and integration of financial markets has been a subject of great concern among many researchers and economists due to an interest in the impacts of stock market integration in terms of international portfolio diversification, asset allocation and asset pricing efficiency. Understanding the interdependence among financial markets is thus of immense importance especially to investors and stakeholders in making viable decisions, managing risks and monitoring portfolio performances. In this thesis, we investigated the levels of interdependence and dynamic linkages among the five emerging economies well known as the BRICS: Brazil, Russia, India, China and South Africa, using a Vector autoregressive (VAR), univariate GARCH(1,1) and multivariate GARCH models. Our data sample consisted of the BRICS weekly returns from the period of January 2000 to December 2012. We used a VAR model to examine the linear dependence among the BRICS markets. The results from the VAR model analysis provided some evidence of unidirectional linear dependencies of the Indian and Chinese markets on the Brazilian stock market. The univariate GARCH(1,1) and multivariate GARCH models were employed to explore the volatility and dynamic correlation in the BRICS stock returns respectively. The results of the univariate GARCH model suggested volatility persistence among all the BRICS stock returns where China appeared to be the most volatile followed by the Russian stock market while the South African market was found to be the least volatile. Results from the multivariate GARCH models revealed similar volatility persistence. Furthermore, we found that, the correlations among the five emerging markets varied with time. From this study, evidence of interdependence among the BRICS cannot be rejected. Moreover, it appears that there are other factors apart from the internal markets themselves that may affect the volatility and correlation among the BRICS. / M.Sc. University of KwaZulu-Natal, Durban 2013. Multivariate analysis. Financial institutions. Theses--Statistics. BRIC countries--Finance. BRIC countries--Statistics.
45	Robust principal component analysis biplots Wedlake, Ryan Stuart 03 1900 (has links) Thesis (MSc (Mathematical Statistics))--University of Stellenbosch, 2008. / In this study several procedures for finding robust principal components (RPCs) for low and high dimensional data sets are investigated in parallel with robust principal component analysis (RPCA) biplots. These RPCA biplots will be used for the simultaneous visualisation of the observations and variables in the subspace spanned by the RPCs. Chapter 1 contains: a brief overview of the difficulties that are encountered when graphically investigating patterns and relationships in multidimensional data and why PCA can be used to circumvent these difficulties; the objectives of this study; a summary of the work done in order to meet these objectives; certain results in matrix algebra that are needed throughout this study. In Chapter 2 the derivation of the classic sample principal components (SPCs) is first discussed in detail since they are the „building blocks‟ of classic principal component analysis (CPCA) biplots. Secondly, the traditional CPCA biplot of Gabriel (1971) is reviewed. Thirdly, modifications to this biplot using the new philosophy of Gower & Hand (1996) are given attention. Reasons why this modified biplot has several advantages over the traditional biplot – some of which are aesthetical in nature – are given. Lastly, changes that can be made to the Gower & Hand (1996) PCA biplot to optimally visualise the correlations between the variables is discussed. Because the SPCs determine the position of the observations as well as the orientation of the arrows (traditional biplot) or axes (Gower and Hand biplot) in the PCA biplot subspace, it is useful to give estimates of the standard errors of the SPCs together with the biplot display as an indication of the stability of the biplot. A computer-intensive statistical technique called the Bootstrap is firstly discussed that is used to calculate the standard errors of the SPCs without making underlying distributional assumptions. Secondly, the influence of outliers on Bootstrap results is investigated. Lastly, a robust form of the Bootstrap is briefly discussed for calculating standard error estimates that remain stable with or without the presence of outliers in the sample. All the preceding topics are the subject matter of Chapter 3. In Chapter 4, reasons why a PC analysis should be made robust in the presence of outliers are firstly discussed. Secondly, different types of outliers are discussed. Thirdly, a method for identifying influential observations and a method for identifying outlying observations are investigated. Lastly, different methods for constructing robust estimates of location and dispersion for the observations receive attention. These robust estimates are used in numerical procedures that calculate RPCs. In Chapter 5, an overview of some of the procedures that are used to calculate RPCs for lower and higher dimensional data sets is firstly discussed. Secondly, two numerical procedures that can be used to calculate RPCs for lower dimensional data sets are discussed and compared in detail. Details and examples of robust versions of the Gower & Hand (1996) PCA biplot that can be constructed using these RPCs are also provided. In Chapter 6, five numerical procedures for calculating RPCs for higher dimensional data sets are discussed in detail. Once RPCs have been obtained by using these methods, they are used to construct robust versions of the PCA biplot of Gower & Hand (1996). Details and examples of these robust PCA biplots are also provided. An extensive software library has been developed so that the biplot methodology discussed in this study can be used in practice. The functions in this library are given in an appendix at the end of this study. This software library is used on data sets from various fields so that the merit of the theory developed in this study can be visually appraised. Biplot Outliers Robust principal components Bootstrap
46	Bayesian approaches of Markov models embedded in unbalanced panel data Muller, Christoffel Joseph Brand 12 1900 (has links) Thesis (PhD)--Stellenbosch University, 2012. / ENGLISH ABSTRACT: Multi-state models are used in this dissertation to model panel data, also known as longitudinal or cross-sectional time-series data. These are data sets which include units that are observed across two or more points in time. These models have been used extensively in medical studies where the disease states of patients are recorded over time. A theoretical overview of the current multi-state Markov models when applied to panel data is presented and based on this theory, a simulation procedure is developed to generate panel data sets for given Markov models. Through the use of this procedure a simulation study is undertaken to investigate the properties of the standard likelihood approach when fitting Markov models and then to assess its shortcomings. One of the main shortcomings highlighted by the simulation study, is the unstable estimates obtained by the standard likelihood models, especially when fitted to small data sets. A Bayesian approach is introduced to develop multi-state models that can overcome these unstable estimates by incorporating prior knowledge into the modelling process. Two Bayesian techniques are developed and presented, and their properties are assessed through the use of extensive simulation studies. Firstly, Bayesian multi-state models are developed by specifying prior distributions for the transition rates, constructing a likelihood using standard Markov theory and then obtaining the posterior distributions of the transition rates. A selected few priors are used in these models. Secondly, Bayesian multi-state imputation techniques are presented that make use of suitable prior information to impute missing observations in the panel data sets. Once imputed, standard likelihood-based Markov models are fitted to the imputed data sets to estimate the transition rates. Two different Bayesian imputation techniques are presented. The first approach makes use of the Dirichlet distribution and imputes the unknown states at all time points with missing observations. The second approach uses a Dirichlet process to estimate the time at which a transition occurred between two known observations and then a state is imputed at that estimated transition time. The simulation studies show that these Bayesian methods resulted in more stable results, even when small samples are available. / AFRIKAANSE OPSOMMING: Meerstadium-modelle word in hierdie verhandeling gebruik om paneeldata, ook bekend as longitudinale of deursnee tydreeksdata, te modelleer. Hierdie is datastelle wat eenhede insluit wat oor twee of meer punte in tyd waargeneem word. Hierdie tipe modelle word dikwels in mediese studies gebruik indien verskillende stadiums van ’n siekte oor tyd waargeneem word. ’n Teoretiese oorsig van die huidige meerstadium Markov-modelle toegepas op paneeldata word gegee. Gebaseer op hierdie teorie word ’n simulasieprosedure ontwikkel om paneeldatastelle te simuleer vir gegewe Markov-modelle. Hierdie prosedure word dan gebruik in ’n simulasiestudie om die eienskappe van die standaard aanneemlikheidsbenadering tot die pas vanMarkov modelle te ondersoek en dan enige tekortkominge hieruit te beoordeel. Een van die hoof tekortkominge wat uitgewys word deur die simulasiestudie, is die onstabiele beramings wat verkry word indien dit gepas word op veral klein datastelle. ’n Bayes-benadering tot die modellering van meerstadiumpaneeldata word ontwikkel omhierdie onstabiliteit te oorkom deur a priori-inligting in die modelleringsproses te inkorporeer. Twee Bayes-tegnieke word ontwikkel en aangebied, en hulle eienskappe word ondersoek deur ’n omvattende simulasiestudie. Eerstens word Bayes-meerstadium-modelle ontwikkel deur a priori-verdelings vir die oorgangskoerse te spesifiseer en dan die aanneemlikheidsfunksie te konstrueer deur van standaard Markov-teorie gebruik te maak en die a posteriori-verdelings van die oorgangskoerse te bepaal. ’n Gekose aantal a priori-verdelings word gebruik in hierdie modelle. Tweedens word Bayesmeerstadium invul tegnieke voorgestel wat gebruik maak van a priori-inligting om ontbrekende waardes in die paneeldatastelle in te vul of te imputeer. Nadat die waardes ge-imputeer is, word standaard Markov-modelle gepas op die ge-imputeerde datastel om die oorgangskoerse te beraam. Twee verskillende Bayes-meerstadium imputasie tegnieke word bespreek. Die eerste tegniek maak gebruik van ’n Dirichletverdeling om die ontbrekende stadium te imputeer by alle tydspunte met ’n ontbrekende waarneming. Die tweede benadering gebruik ’n Dirichlet-proses om die oorgangstyd tussen twee waarnemings te beraam en dan die ontbrekende stadium te imputeer op daardie beraamde oorgangstyd. Die simulasiestudies toon dat die Bayes-metodes resultate oplewer wat meer stabiel is, selfs wanneer klein datastelle beskikbaar is. Longitudinal method Markov processes Bayesian statistical decision theory Imputation
47	Exploratory and inferential multivariate statistical techniques for multidimensional count and binary data with applications in R Ntushelo, Nombasa Sheroline 12 1900 (has links) Thesis (MComm)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: The analysis of multidimensional (multivariate) data sets is a very important area of research in applied statistics. Over the decades many techniques have been developed to deal with such datasets. The multivariate techniques that have been developed include inferential analysis, regression analysis, discriminant analysis, cluster analysis and many more exploratory methods. Most of these methods deal with cases where the data contain numerical variables. However, there are powerful methods in the literature that also deal with multidimensional binary and count data. The primary purpose of this thesis is to discuss the exploratory and inferential techniques that can be used for binary and count data. In Chapter 2 of this thesis we give the detail of correspondence analysis and canonical correspondence analysis. These methods are used to analyze the data in contingency tables. Chapter 3 is devoted to cluster analysis. In this chapter we explain four well-known clustering methods and we also discuss the distance (dissimilarity) measures available in the literature for binary and count data. Chapter 4 contains an explanation of metric and non-metric multidimensional scaling. These methods can be used to represent binary or count data in a lower dimensional Euclidean space. In Chapter 5 we give a method for inferential analysis called the analysis of distance. This method use a similar reasoning as the analysis of variance, but the inference is based on a pseudo F-statistic with the p-value obtained using permutations of the data. Chapter 6 contains real-world applications of these above methods on two special data sets called the Biolog data and Barents Fish data. The secondary purpose of the thesis is to demonstrate how the above techniques can be performed in the software package R. Several R packages and functions are discussed throughout this thesis. The usage of these functions is also demonstrated with appropriate examples. Attention is also given to the interpretation of the output and graphics. The thesis ends with some general conclusions and ideas for further research. / AFRIKAANSE OPSOMMING: Die analise van meerdimensionele (meerveranderlike) datastelle is ’n belangrike area van navorsing in toegepaste statistiek. Oor die afgelope dekades is daar verskeie tegnieke ontwikkel om sulke data te ontleed. Die meerveranderlike tegnieke wat ontwikkel is sluit in inferensie analise, regressie analise, diskriminant analise, tros analise en vele meer verkennende data analise tegnieke. Die meerderheid van hierdie metodes hanteer gevalle waar die data numeriese veranderlikes bevat. Daar bestaan ook kragtige metodes in die literatuur vir die analise van meerdimensionele binêre en telling data. Die primêre doel van hierdie tesis is om tegnieke vir verkennende en inferensiële analise van binêre en telling data te bespreek. In Hoofstuk 2 van hierdie tesis bespreek ons ooreenkoms analise en kanoniese ooreenkoms analise. Hierdie metodes word gebruik om data in gebeurlikheidstabelle te analiseer. Hoofstuk 3 bevat tegnieke vir tros analise. In hierdie hoofstuk verduidelik ons vier gewilde tros analise metodes. Ons bespreek ook die afstand maatstawwe wat beskikbaar is in die literatuur vir binêre en telling data. Hoofstuk 4 bevat ’n verduideliking van metriese en nie-metriese meerdimensionele skalering. Hierdie metodes kan gebruik word om binêre of telling data in ‘n lae dimensionele Euclidiese ruimte voor te stel. In Hoofstuk 5 beskryf ons ’n inferensie metode wat bekend staan as die analise van afstande. Hierdie metode gebruik ’n soortgelyke redenasie as die analise van variansie. Die inferensie hier is gebaseer op ’n pseudo F-toetsstatistiek en die p-waardes word verkry deur gebruik te maak van permutasies van die data. Hoofstuk 6 bevat toepassings van bogenoemde tegnieke op werklike datastelle wat bekend staan as die Biolog data en die Barents Fish data. Die sekondêre doel van die tesis is om te demonstreer hoe hierdie tegnieke uitgevoer word in the R sagteware. Verskeie R pakette en funksies word deurgaans bespreek in die tesis. Die gebruik van die funksies word gedemonstreer met toepaslike voorbeelde. Aandag word ook gegee aan die interpretasie van die afvoer en die grafieke. Die tesis sluit af met algemene gevolgtrekkings en voorstelle vir verdere navorsing. Statistical techniques Binary and count data Exploratory analysis Inferential analysis
48	Aspects of copulas and goodness-of-fit Kpanzou, Tchilabalo Abozou 12 1900 (has links) Thesis (MComm (Statistics and Actuarial Science))--Stellenbosch University, 2008. / The goodness-of- t of a statistical model describes how well it ts a set of observations. Measures of goodness-of- t typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, for example to test for normality, to test whether two samples are drawn from identical distributions, or whether outcome frequencies follow a speci ed distribution. Goodness-of- t for copulas is a special case of the more general problem of testing multivariate models, but is complicated due to the di culty of specifying marginal distributions. In this thesis, the goodness-of- t test statistics for general distributions and the tests for copulas are investigated, but prior to that an understanding of copulas and their properties is developed. In fact copulas are useful tools for understanding relationships among multivariate variables, and are important tools for describing the dependence structure between random variables. Several univariate, bivariate and multivariate test statistics are investigated, the emphasis being on tests for normality. Among goodness-of- t tests for copulas, tests based on the probability integral transform, Rosenblatt's transformation, as well as some dimension reduction techniques are considered. Bootstrap procedures are also described. Simulation studies are conducted to rst compare the power of rejection of the null hypothesis of the Clayton copula by four di erent test statistics under the alternative of the Gumbel-Hougaard copula, and also to compare the power of rejection of the null hypothesis of the Gumbel-Hougaard copula under the alternative of the Clayton copula. An application of the described techniques is made to a practical data set. Goodness-of-fit tests Copulas (Mathematical statistics) Test statistic Power of a test
49	'n Ondersoek na die eindige steekproefgedrag van inferensiemetodes in ekstreemwaarde-teorie Van Deventer, Dewald 03 1900 (has links) Thesis (MComm (Statistics and Actuarial Science))--University of Stellenbosch, 2005. / Extremes are unusual or rare events. However, when such events – for example earthquakes, tidal waves and market crashes - do take place, they typically cause enormous losses, both in terms of human lives and monetary value. For this reason, it is of critical importance to accurately model extremal events. Extreme value theory entails the development of statistical models and techniques in order to describe and model such rare observations. In this document we discuss aspects of extreme value theory. This theory consists of two approaches: The classical maxima method, based on the properties of the maximum of a sample and the more popular threshold theory, based upon the properties of exceedances of a specified threshold value. This document provides the practitioner with the theoretical and practical tools for both these approaches. This will enable him/her to perform extreme value analyses with confidence. Extreme value theory – for both approaches - is based upon asymptotic arguments. For finite samples, the limiting result for the sample maximum holds approximately only. Similarly, for finite choices of the threshold, the limiting distribution for exceedances of that threshold holds only approximately. In this document we investigate the quality of extreme value based inferences with regard to the unknown underlying distribution when the sample size or threshold is finite. Estimation of extreme tail quantiles of the underlying distribution, as well as the calculation of confidence intervals, are typically the most important objectives of an extreme analysis. For that reason, we evaluate the accuracy of extreme based inferences in terms of these estimates. This investigation was carried out using a simulation study, performed with the software package S-Plus. Extreme value theory Sampling (Statistics) Inference
50	Confidence intervals for estimators of welfare indices under complex sampling Kirchoff, Retha 03 1900 (has links) Thesis (MComm (Statistics and Actuarial Science))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: The aim of this study is to obtain estimates and confidence intervals for welfare indices under complex sampling. It begins by looking at sampling in general with specific focus on complex sampling and weighting. For the estimation of the welfare indices, two resampling techniques, viz. jackknife and bootstrap, are discussed. They are used for the estimation of bias and standard error under simple random sampling and complex sampling. Three con dence intervals are discussed, viz. standard (asymptotic), percentile and bootstrap-t. An overview of welfare indices and their estimation is given. The indices are categorized into measures of poverty and measures of inequality. Two Laeken indices, viz. at-risk-of-poverty and quintile share ratio, are included in the discussion. The study considers two poverty lines, namely an absolute poverty line based on percy (ratio of total household income to household size) and a relative poverty line based on equivalized income (ratio of total household income to equivalized household size). The data set used as surrogate population for the study is the Income and Expenditure survey 2005/2006 conducted by Statistics South Africa and details of it are provided and discussed. An analysis of simulation data from the surrogate population was carried out using techniques mentioned above and the results were graphed, tabulated and discussed. Two issues were considered, namely whether the design of the survey should be considered and whether resampling techniques provide reliable results, especially for con dence intervals. The results were a mixed bag . Overall, however, it was found that weighting showed promise in many cases, especially in the improvement of the coverage probabilities of the con dence intervals. It was also found that the bootstrap resampling technique was reliable (by looking at standard errors). Further research options are mentioned as possible solutions towards the mixed results. / AFRIKAANSE OPSOMMING: Die doel van die studie is die verkryging van beramings en vertrouensintervalle vir welvaartsmaatstawwe onder komplekse steekproefneming. 'n Algemene bespreking van steekproefneming word gedoen waar daar spesi ek op komplekse steekproefneming en weging gefokus word. Twee hersteekproefnemingstegnieke, nl. uitsnit (jackknife)- en skoenlushersteekproefneming, word bespreek as metodes vir die beraming van die maatstawwe. Hierdie maatstawwe word gebruik vir sydigheidsberaming asook die beraming van standaardfoute in eenvoudige ewekansige steekproefneming asook komplekse steekproefneming. Drie vertrouensintervalle word bespreek, nl. die standaard (asimptotiese), die persentiel en die bootstrap-t vertrouensintervalle. Daar is ook 'n oorsigtelike bespreking oor welvaartsmaatstawwe en die beraming daarvan. Hierdie welvaartsmaatstawwe vorm twee kategorieë, nl. maatstawwe van armoede en maatstawwe van ongelykheid. Ook ingesluit by hierdie bespreking is die at-risk-of-poverty en quintile share ratio wat deel vorm van die Laekenindekse. Twee armoedemaatlyne , 'n absolute- en relatiewemaatlyn, word in hierdie studie gebruik. Die absolute armoedemaatlyn word gebaseer op percy , die verhouding van die totale huishoudingsinkomste tot die grootte van die huishouding, terwyl die relatiewe armoedemaatlyn gebasseer word op equivalized income , die verhouding van die totale huishoudingsinkomste tot die equivalized grootte van die huishouding. Die datastel wat as surrogaat populasie gedien het in hierdie studie is die Inkomste en Uitgawe opname van 2005/2006 wat deur Statistiek Suid-Afrika uitgevoer is. Inligting met betrekking tot hierdie opname word ook gegee. Gesimuleerde data vanuit die surrogaat populasie is geanaliseer deur middel van die hersteekproefnemingstegnieke wat genoem is. Die resultate van die simulasie is deur middel van gra eke en tabelle aangedui en bespreek. Vanuit die simulasie het twee vrae opgeduik, nl. of die ontwerp van 'n steekproef, dus weging, in ag geneem behoort te word en of die hersteekproefnemingstegnieke betroubare resultate lewer, veral in die geval van die vertrouensintervalle. Die resultate wat verkry is, het baie gevarieer. Daar is egter bepaal dat weging in die algemeen belowende resultate opgelewer het vir baie van die gevalle, maar nie vir almal nie. Dit het veral die dekkingswaarskynlikhede van die vertrouensintervalle verbeter. Daar is ook bepaal, deur na die standaardfoute van die skoenlusberamers te kyk, dat die skoenlustegniek betroubare resultate gelewer het. Verdere navorsingsmoontlikhede is genoem as potensiële verbeteringe op die gemengde resultate wat verkry is. Sampling Welfare indices Confidence intervals Welfare indices -- Estimation

Search results