Spelling suggestions: "subject:"amathematical statistics."" "subject:"dmathematical statistics.""
151 |
Statistical models of breast cancer tumour growth for mammography screening dataAbrahamsson, Linda January 2012 (has links)
No description available.
|
152 |
REGULARISED ITERATIVE MULTIPLE CORRESPONDENCE ANALYSIS IN MULTIPLE IMPUTATIONNienkemper, Johané 07 August 2014 (has links)
Non-responses in survey data are a prevalent problem. Various techniques for the handling of missing data have been studied and published. The application of a regularised iterative multiple correspondence analysis (RIMCA) algorithm in single imputation (SI) has been suggested for the handling of missing data in survey analysis.
Multiple correspondence analysis (MCA) as an imputation procedure is appropriate for survey data, since MCA is concerned with the relationships among the variables in the data. Therefore, missing data can be imputed by exploiting the relationship between observed and missing data.
The RIMCA algorithm expresses MCA as a weighted principal component analysis (PCA) of a data triplet ( ), which represents a weighted data matrix, a metric and a diagonal matrix containing row masses, respectively. Performing PCA on a triplet involves the generalised singular value decomposition of the weighted data matrix . Here, standard singular value decomposition (SVD) will not suffice, since constraints are imposed on the rows and columns because of the weighting.
The success of this algorithm lies in the fact that all eigenvalues are shrunk and the last components are omitted; thus a âdouble shrinkageâ occurs, which reduces variance and stabilises predictions. RIMCA seems to overcome overfitting and underfitting problems with regard to categorical missing data in surveys.
The idea of applying the RIMCA algorithm in MI was appealing, since advantages of MI occur over SI, such as an increase in the accuracy of estimations and the attainment of valid inferences when combining multiple datasets.
The aim of this study was to establish the performance of RIMCA in MI. This was achieved by two objectives: to determine whether RIMCA in MI outperforms RIMCA in SI and to determine the accuracy of predictions made from RIMCA in MI as an imputation model.
Real and simulated data were used. A simulation protocol was followed creating data drawn from multivariate Normal distributions with both high and low correlation structures. Varying the percentages of missing values in the data and missingness mechanisms (missing completely at random (MCAR) and missing at random (MAR)), as is done by Josse et al. (2012), were created in the data.
The first objective was achieved by applying RIMCA in both SI and MI to real data and simulated data. The performance of RIMCA in SI and MI were compared with regard to the obtained mean estimates and confidence intervals. In the case of the real data, the estimates were compared to the mean estimates of the incomplete data, whereas for the simulated data the true mean values and confidence intervals could be compared to the estimates obtained from the imputation procedures.
The second objective was achieved by calculating the apparent error rates of predictions made by the RIMCA algorithm in SI and MI in simulated datasets. Along with the apparent error rates, approximate overall success rates were calculated in order to establish the accuracy of imputations made by the SI and MI.
The results of this study show that the confidence intervals provided by MI are wider in most of the cases, which confirmed the incorporation of additional variance. It was found that for some of the variables the SI procedures were statistically different from the true confidence intervals, which shows that SI was not suitable in these instances for imputation. Overall the mean estimates provided by MI were closer to the true values, with respect to the simulated and real data. A summary of the bias, mean square errors and coverage for the imputation techniques over a thousand simulations were provided, which also confirmed that RIMCA in MI was a better model than RIMCA in SI in the contexts provided by this research.
|
153 |
MODELLING ELECTRICITY DEMAND IN SOUTH AFRICASigauke, Caston 19 August 2014 (has links)
Peak electricity demand is an energy policy concern for all countries throughout
the world, causing blackouts and increasing electricity tariffs for consumers.
This calls for load curtailment strategies to either redistribute or reduce electricity
demand during peak periods. This thesis attempts to address this problem
by providing relevant information through a frequentist and Bayesian
modelling framework for daily peak electricity demand using South African
data. The thesis is divided into two parts. The first part deals with modelling
of short term daily peak electricity demand. This is done through the investigation
of important drivers of electricity demand using (i) piecewise linear
regression models, (ii) a multivariate adaptive regression splines (MARS) modelling
approach, (iii) a regression with seasonal autoregressive integrated moving
average (Reg-SARIMA) model (iv) a Reg-SARIMA model with generalized
autoregressive conditional heteroskedastic errors (Reg-SARIMA-GARCH). The
second part of the thesis explores the use of extreme value theory in modelling
winter peaks, extreme daily positive changes in hourly peak electricity demand
and same day of the week increases in peak electricity demand. This is done
through fitting the generalized Pareto, generalized single Pareto and the generalized
extreme value distributions.
One of the major contributions of this thesis is quantification of the amount of
electricity which should be shifted to off peak hours. This is achieved through
accurate assessment of the level and frequency of future extreme load forecasts.
This modelling approach provides a policy framework for load curtailment and determination of the number of critical peak days for power utility companies.
This has not been done for electricity demand in the context of South Africa to
the best of our knowledge. The thesis further extends the autoregressive moving
average-exponential generalized autoregressive conditional heteroskedasticity
model to an autoregressive moving average exponential generalized autoregressive
conditional heteroskedasticity-generalized single Pareto distribution.
The benefit of this hybrid model is in risk modelling of under and over
demand predictions of peak electricity demand.
Some of the key findings of this thesis are (i) peak electricity demand is influenced
by the tails of probability distributions as well as by means or averages,
(ii) electricity demand in South Africa rises significantly for average temperature
values below 180C and rises slightly for average temperature values above
220C and (iii) modelling under and over demand electricity forecasts provides a
basis for risk assessment and quantification of such risk associated with forecasting
uncertainty including demand variability.
|
154 |
Predicting Gleason score upgrading and downgrading between biopsy Gleason score and prostatectomy Gleason score – A population-based cohort studyFolkvaljon, Yasin January 2013 (has links)
No description available.
|
155 |
Bootstrap and empirical likelihood methods in statistical shape analysisAmaral, Getulio J. A. January 2004 (has links)
The aim of this thesis is to propose bootstrap and empirical likelihood confidence regions and hypothesis tests for use in statistical shape analysis. Bootstrap and empirical likelihood methods have some advantages when compared to conventional methods. In particular, they are nonparametric methods and so it is not necessary to choose a family of distribution for building confidence regions or testing hypotheses. There has been very little work on bootstrap and empirical likelihood methods in statistical shape analysis. Only one paper (Bhattacharya and Patrangenaru, 2003) has considered bootstrap methods in statistical shape analysis, but just for constructing confidence regions. There are no published papers on the use of empirical likelihood methods in statistical shape analysis. Existing methods for building confidence regions and testing hypotheses in shape analysis have some limitations. The Hotelling and Goodall confidence regions and hypothesis tests are not appropriate for data sets with low concentration. The main reason is that these methods are designed for data with high concentration, and if this hypothesis is violated, the methods do not perform well. On the other hand, simulation results have showed that bootstrap and empirical likelihood methods developed in this thesis are appropriate to the statistical shape analysis of low concentrated data sets. For highly concentrated data sets all the methods show similar performance. Theoretical aspects of bootstrap and empirical likelihood methods are also considered. Both methods are based on asymptotic results and those results are explained in this thesis. It is proved that the bootstrap methods proposed in this thesis are asymptotically pivotal. Computational aspects are discussed. All the bootstrap algorithms are implemented in “R”. An algorithm for computing empirical likelihood tests for several populations is also implemented in “R”.
|
156 |
The determination of regression relationships using stepwise regression techniquesPayne, D. John January 1973 (has links)
Stepwise regression routines are rapidly becoming a standard leature of large-scale computer statistical packages. They provide, in particular, a certain degree 01 flexibility in the selection of 'optimum' regression equations when one has available a large set of potential regressor variables. A major problem in the use of such routines is the determination of appropriate 'cut-off' criteria for terminating the procedures. There is a tendency in practice for standard F or t-statistics to be calculated at each step 01 the procedure, and for this value to be compared with conventional critical values. In this thesis an attempt has been made to provide a more satisfactory rationale for (single-step) stepwise procedures. The approach taken is to assume that a 'true' model exists (the regressors in which are a subset of those available) and to investigate the distribution of statistics which, at each stage, seem relevant to the termination decision. This leads to the consideration of alternative tests at each step to those usually employed. In the presence of considerable analytical complexity a simulation approach is used to obtain a comparison of the relative performances of various procedures. This study encompasses the use of forward, backward and mixed forward/backward procedures in both orthogonal and non-orthogonal set-ups. Procedures are evaluated both in terms of the 'closeness' of the finally selected model to the true one, and also in terms of prediction mean square-error. The study ends with an investigation into the usefulness of stepwise regression in identifying and estimating stochastic regression relationships of the type encountered in the analysis of time series.
|
157 |
Bayesian inference for stochastic epidemic models using Markov chain Monte Carlo methodsDemiris, Nikolaos January 2004 (has links)
This thesis is concerned with statistical methodology for the analysis of stochastic SIR (Susceptible->Infective->Removed) epidemic models. We adopt the Bayesian paradigm and we develop suitably tailored Markov chain Monte Carlo (MCMC) algorithms. The focus is on methods that are easy to generalise in order to accomodate epidemic models with complex population structures. Additionally, the models are general enough to be applicable to a wide range of infectious diseases. We introduce the stochastic epidemic models of interest and the MCMC methods we shall use and we review existing methods of statistical inference for epidemic models. We develop algorithms that utilise multiple precision arithmetic to overcome the well-known numerical problems in the calculation of the final size distribution for the generalised stochastic epidemic. Consequently, we use these exact results to evaluate the precision of asymptotic theorems previously derived in the literature. We also use the exact final size probabilities to obtain the posterior distribution of the threshold parameter $R_0$. We proceed to develop methods of statistical inference for an epidemic model with two levels of mixing. This model assumes that the population is partitioned into subpopulations and permits infection on both local (within-group) and global (population-wide) scales. We adopt two different data augmentation algorithms. The first method introduces an appropriate latent variable, the \emph{final severity}, for which we have asymptotic information in the event of an outbreak among a population with a large number of groups. Hence, approximate inference can be performed conditional on a ``major'' outbreak, a common assumption for stochastic processes with threshold behaviour such as epidemics and branching processes. In the last part of this thesis we use a \emph{random graph} representation of the epidemic process and we impute more detailed information about the infection spread. The augmented state-space contains aspects of the infection spread that have been impossible to obtain before. Additionally, the method is exact in the sense that it works for any (finite) population and group sizes and it does not assume that the epidemic is above threshold. Potential uses of the extra information include the design and testing of appropriate prophylactic measures like different vaccination strategies. An attractive feature is that the two algorithms complement each other in the sense that when the number of groups is large the approximate method (which is faster) is almost as accurate as the exact one and can be used instead. Finally, it is straightforward to extend our methods to more complex population structures like overlapping groups, small-world and scale-free networks
|
158 |
Bayesian interpretation of radiocarbon resultsChristen, José Andrés January 1994 (has links)
Over the last thirty years radiocarbon dating has been widely used in archaeology and related fields to address a wide-range of chronological questions. Because of some inherent stochastic factors of a complex nature, radiocarbon dating presents a rich source of challenging statistical problems. The chronological questions posed commonly involve the interpretation of groups of radiocarbon determinations and often substantial amounts of a priori information are available. The statistical techniques used up to very recently could only deal with the analysis of one determination at a time, and no prior information could be included in the analysis. However, over the last few years some problems have been successfully tackled using the Bayesian paradigm. In this thesis we expand that work and develop a general statistical framework for the Bayesian interpretation of radiocarbon determinations. Firstly we consider the problem of radiocarbon calibration and develop a novel approach. Secondly we develop a statistical framework which permits the inclusion of prior archaeological knowledge and illustrate its use with a wide range of examples. We discuss various generic problems some of which are, replications, summarisation, floating chronologies and archaeological phase structures. The techniques used to obtain the posterior distributions of interest are numerical and, in most of the cases, we have used Markov chain Monte Carlo (MCMC) methods. We also discuss the sampling routines needed for the implementation of the MCNIC methods used in our examples. Thirdly we address the very important problem of outliers in radiocarbon dating and develop an original methodology for the identification of outliers in sets of radiocarbon determinations. We show how our framework can be extended to permit the identification of outliers. Finally we apply this extended framework to the analysis of a substantial archaeological dating problem.
|
159 |
Model misspecification in time series analysisDavies, Neville January 1977 (has links)
The Box and Jenkins (1970) methodology of time series model building using an iterative cycle of identification, estimation and diagnostic checking to produce a forecasting mechanism is, by now, well known and widely applied. This thesis is mainly concerned with aspects of the diagnostic checking and forecasting part of their methodology. For diagnostic checking a study is made of the overall or 'portmanteau' statistics suggested by Box and Pierce (1970) and Ljung and Box (1976) with regard to their ability for detecting misspecified models; analytic results are complemented by simulation power studies when the fitted model is known to be misspecified. For forecasting, a general approach is proposed for determining the asymptotic forecasting loss when using any fitted model in the class of structures proposed by Box and Jenkins, when the true process follows any other in that same class. specialisation is made by conducting a thorough study of the asymptotic loss incurred when pure autoregressive models are fitted and used to forecast any other process. In finite samples the Box-Pierce statistic has its mean well below that predicted by asymptotic theory (so that true significance levels will be below that assumed) whilst the Box-Ljung statistic has its mean approximately correct. However, both statistics are shown to be rather weak at detecting misspecified models, with only a few exceptions. Asymptotic forecasting loss is likely to be high when using even high order autoregressive models to predict certain simple processes. This is especially the case when allowance is made for estimation error in the fitted models. Finally, some outstanding problems are outlined. One of these, namely the problem of misspecified error structures in time series regression analysis, is examined in detail.
|
160 |
Aspects of recursive Bayesian estimationWest, Mike January 1982 (has links)
This thesis is concerned with the theoretical and practical aspects of some problems in Bayesian time series analysis and recursive estimation. In particular, we examine procedures for accommodating outliers in dynamic linear models which involve the use of heavy-tailed error distributions as alternatives to normality. Initially we discuss the basic principles of the Bayesian approach to robust estimation in general, and develop those ideas in the context of linear time series models. Following this, the main body of the thesis attacks the problem of intractibility of analysis under outlier accommodating assumptions. For both the dynamic linear model and the classical autoregressive-moving average schemes we develop methods for parameter estimation, forecasting and smoothing with non-normal data. This involves the theoretical examination of non-linear recursive filtering algorithms as robust alternatives to the Kalman filter and numerical examples of the use of these procedures on simulated data. The asymptotic behaviour of some special recursions is also detailed in connection with the theory of stochastic approximation. Finally, we report on an application of Bayesian time series analysis in the monitoring of medical time series, the particular problem involving kidney transplant patients.
|
Page generated in 0.1265 seconds