• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 44
  • 32
  • 10
  • 4
  • 1
  • Tagged with
  • 1313
  • 484
  • 92
  • 86
  • 67
  • 54
  • 49
  • 43
  • 42
  • 41
  • 40
  • 39
  • 36
  • 29
  • 27
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Iterative proportional fitting : theoretical synthesis and practical limitations

Založnik, Maja January 2011 (has links)
Iterative proportional fitting (IPF) is described formally and historically and its advantages and limitations are investigated through two practical simulation exercises using UK census micro data. The theoretical review is unique in being comprehensive and interdisciplinary. It is structured by progressing through three levels of understanding IPF: contingency table analysis in classic applications, analysis using log-linear models and finally the understanding IPF as a method for maximizing entropy. An elaborate methodological section develops the measures and technical tools for the analysis, and explores the geographical aspects of the dataset by providing a unique and exhaustive overview of the ecological fallacy, Simpson's paradox and the modifiable areal unit problem. The practical section investigates the behaviour of IPF under different sampling scenarios and different data availability conditions using a large scale computer simulation based on the UK Samples of Anonymised Records. By systematically and comprehensively investigating the theoretical and practical issues related to IPF this thesis supplements the fragmentary and piecemeal nature of the current literature and does so in an accessible and interdisciplinary manner.
132

An elementary estimation of gamma parameters and the analysis of disdrometer data

Brawn, Dan R. January 2009 (has links)
This thesis considers the problem of estimating the three parameters of a gamma distribution with grouped and truncated data. Sample sizes may be small and commonly subject to lower and perhaps upper truncation. A new simple, closed form, rehable. almost unbiased and low-variance method is developed. It is shown that the new 'hybrid' approach is a first order approximation to maximum like.hood with a constraint that incorporates a single sample moment ratio. Traditional moment methods require three distinct sample moments and typically give ised estimates. The operation of the hybrid method is discussed and extended to encompass the estimation of parameters for a Generalized Gamma mode.
133

Bayesian variable selection in cluster analysis

Dimitrakopoulou, Vasiliki January 2012 (has links)
Statistical analysis of data sets of high-dimensionality has met great interest over the past years, with great applications on disciplines such as medicine, nellascience, pattern recognition, image analysis and many others. The vast number of available variables though, contrary to the limited sample size, often mask the cluster structure of the data. It is often that some variables do not help in distinguishing the different clusters in the data; patterns over the samp•.l ed observations are, thus, usually confined to a small subset of variables. We are therefore interested in identifying the variables that best discriminate the sample, simultaneously to recovering the actual cluster structure of the objects under study. With the Markov Chain Monte Carlo methodology being widely established, we investigate the performance of the combined tasks of variable selection and clustering procedure within the Bayesian framework. Motivated by the work of Tadesse et al. (2005), we identify the set of discriminating variables with the use of a latent vector and form the clustering procedure within the finite mixture models methodology. Using Markov chains we draw inference on, not just the set of selected variables and the cluster allocations, but also on the actual number of components: using the f:teversible Jump MCMC sampler (Green, 1995) and a variation of t he SAMS sampler of Dahl (2005). However, sensitivity t o the hyperparameters settings of the covariance structure of the suggested model motivated our interest in an Empirical Bayes procedure to pre-specify the crucial hyper parameters. Further on addressing the problem of II ~----. -- 1 hyperparameters' sensitivity, we suggest several different covariance structures for the mixture components. Developing MATLAB codes for all models introduced in this thesis, we apply and compare the various models suggested on a set of simulated data, as well as on three real data sets; the iris, the crabs and the arthritis data sets.
134

Modelling individual heterogeneity in mark-recapture studies

Oliver, L. J. January 2012 (has links)
Since its original derivation, researchers have developed different extensions of the Cormack-Jolly-Seber (CJS) model in order to accommodate variation in the survival and recapture probabilities across the population. More recent developments have allowed each of these probabilities to vary as a function of observable covariates. The relationship between the parameters and covariates is often expressed in the form of a logistic regression. Such regressions are useful because they reduce the overall number of parameters in the model and may offer important ecological insight into the survival and recapture processes. However, the use of covariates in mark-recapture studies also gives rise to two important problems: (1) the covariates may contain missing values; and (2) the covariates may be subject to measurement error. To date, the latter issue has only been addressed in the context of closed population models. In this thesis we demonstrate the effects of measurement error in the CJS model. More specifically, we consider the case where the survival probabilities are modelled as a logistic function of an error-prone time-varying covariate. The covariate is then subject to both missing values and measurement error. Although a conditional likelihood approach can be used to handle the missing values, the resulting model makes no provision for errors in the covariate. A simulation study shows that, when these errors are ignored, the regression coefficients are estimated with bias and the effect of the covariate is understated. Furthermore, the bias becomes more severe as the magnitude of the errors increases. To accommodate measurement error in the model, we use a refinement of the regression calibration (RRC) method, which is based on deriving an approximate model for the survival probabilities given the observed covariate values in terms of the true regression parameters.
135

An objective Bayesian approach for discrete scenarios

Villa, Cristiano January 2013 (has links)
Objective prior distributions represent a fundamental part of Bayesian inference. Although several approaches for continuous parameter spaces have been developed, Bayesian theory lacks of a general method that allows to obtain priors for the discrete case. In the present work we propose a novel idea, based on losses, to derive objective priors for discrete parameter spaces. We objectively measure the worth of each parameter values, and link it to the prior probability by means of the self information loss function. The worth is measured by taking into consideration the surroundings of each element of the parameter space. Bayes theorem is then re-interpreted, where prior and posterior beliefs are not expressed as probabilities, but as losses. The approach allows to retain meaning from the beginning to the end of the Bayesian updating process. The prior distribution obtained with the above approach is identified as the t-Walker prior. We illustrate the approach by applyi~t.,to various scenarios. We derive objective priors for five specific models: a population size model, the Hypergeometric and multivariate Hypergeometric models, the Binomial-Beta model, and the Binomial model. We also derive the Villa- Walker prior for the number of degrees of freedom of a t distribution. An important result in this last case, is that the objective prior has to be truncated.
136

Bayesian survival analysis for prognostic index development with many covariates and missing data

Zhao, Xiaohui January 2010 (has links)
Modern computational methods have made the use of complicated models in Bayesian survival analysis more feasible. In this thesis, we consider the Bayesian analysis of a large survival data set with more than 100 explanatory variables and 2025 patients, with diffuse large B-cell lymphoma, collected by the Scotland and Newcastle Lymphoma Group. The aim of the analysis is to use Bayesian survival modelling to produce a prognostic system offering advantages over existing prognostic indices. The system is intended for use in healthcare and also by the pharmaceutical industry in clinical trial design. It will make possible the use of more variables, and a more developed model, than existing indices, and thereby, it is hoped, will give improved prognostic precision, but will also allow computation of prognoses when only some of these variables are observed. We adopt an approach using Weibull mixture models. A difficulty arises when covariate values are missing. Omitting cases with missing values would seriously reduce the number of cases available and might distort our inference. We consider how to model the dependence of survival time on covariates and, in particular, how to construct a missing data model for such a large and diverse set of variables, both for the initial analysis and for the use of the system with new patients when only some covariates are observed. We compare different approaches, which involve factorising the joint probability density of survival time and the covariates in different ways. In particular we introduce a model in which the joint distribution is constructed by modelling the conditional distribution of some covariates on the survival time. The methodology developed should be applicable to Bayesian analysis of other similar large survival data sets where there are missing covariate values.
137

Methods for the improved implementation of the spatial scan statistic when applied to binary labelled point data

Read, Simon January 2011 (has links)
This thesis investigates means of improving, applying, and measuring the success of, the Spatial Scan Statistic (SSS) when applied to binary labelled point data (BLPD). As the SSS is an established means of detecting anomalies in spatial data (also known as clusters), this work has potential application in many fields, notably epidemiology. Firstly, the thesis considers the capacity of the SSS to correctly identify the presence of anomalies, irrespective of location. The most important contribution is the identification that p-values produced by the standard algorithm for implementing of the SSS are sometimes conservative, and thus may lead to lower-than-expected statistical power. A novel means of rectifying this is presented, along with a study of how this can be used in conjunction with an existing technique (Gumbel smoothing) for reducing the computational expense of the SSS. A novel version of the SSS for BLPD is also derived and tested, together with an alternative algorithm for selecting circular scan windows. Secondly, the thesis considers the capacity of the SSS to correctly identify the location of anomalies. This is an under-researched area, and this work is relevant to all forms of data to which the SSS is applied, not just BLPD. A synthesis of current research is presented as a five level framework, facilitating the comparison and hy- bridisation of existing spatial accuracy measures for the SSS. Two novel measures of spatial accuracy ('l!, 0) are derived, both compatible with this framework. 'l! works in conjunction with power; D is independent of power. Both use a single parameter to encapsulate complex information about spatial accuracy performance. This pre- viously required two or more parameters, or an arbitrarily weighted combination of two or more parameters. All novel techniques are benchmark tested against established software, and the statistical significance of performance improvements is measured.
138

Statistical inference for spatial and spatio-temporal processes

Dimitriou-Fakalou, Chrysoula January 2006 (has links)
First, the time series analysis was widely introduced and used in the statistical world. Next, the analysis of spatio-temporal processes has followed, which is taking into account not only when, but also where the phenomenon under observation is taking place. We mainly focus on stationary processes that are assumed to be taking place regularly over both time and space. We examine ways of estimating the parameters involved, without the risk of coming up with a very large bias for our estimators; the bias is the typical problem of estimation for the parameters of stationary processes on Zd, for any d > 2. We particularly study the cases of spatio-temporal ARMA processes and spatial auto-normal formulations on Zd. For both cases and any positive integer d, we propose estimators that are consistent, asymptotically unbiased and normal, if certain conditions are satisfied. We do not only study the spatio-temporal processes that are observed regularly over space, but also those, for which we have recordings on a fixed number of locations anywhere. We might follow the route of a multivariate time series methodology then. Thus, the asymptotic behavior of the estimators proposed might be analyzed as the number of recordings over time only tends to infinity.
139

"Thinking inside the box" : using derivatives to improve Bayesian black box emulation of computer simulators with application to compartmental models

Killeya, Matthew R. H. January 2004 (has links)
Increasingly, science relies on complex numerical models to aid understanding of physical phenomena. Often the equations in such models contain a high number of poorly known parameters so that the resulting output encodes much uncertainty. A 'computer simulator', which comprises the model equations together with a solver routine, produces a solution for a given choice of these 'input' parameters. In cases where the dimension of the input parameter space is high, we can only hope to obtain a thin coverage of the space by running the simulator. Building a representation of the simulator output as a function of the input, then, is a statistical problem in which we observe output at a collection of input choices and, based on these observations, infer output values for unseen inputs about which we are uncertain. In a Bayesian context, this representation, termed the 'emulator', encodes our beliefs about the relationships between inputs and outputs. Our interest is in exploiting the structure of compartmental models to aid in this process. Compartmental models are widely applied to model systems in the absence of fundamental equations to describe the processes of interest. We show that the structure of such models enables us to efficiently generate additional function information, in the form of input derivatives, each time we run the simulator and we adapt the emulator methodology to allow for derivatives. We show that considering derivatives offers a range of natural ways to aid assessment of prior beliefs and that updating based on derivatives can lead to substantial reduction in emulator uncertainty. We show that, in addition, the model structure allows us to derive estimates of increased costs of generating derivatives which we can compare against the corresponding reduction in uncertainties. We are motivated throughout by the problem of calibrating a compartmental model of plankton cycles at multiple locations in the sea, and we show that a knock on effect of reduction of uncertainty by derivatives is an improvement in our ability to perform this calibration. The search for a model which could accurately reproduce plankton cycles at various physical locations, if successful, is thought to have significant ramifications for understanding climate change.
140

Development of the theoretical and methodological aspects of the singular spectrum analysis and its application for analysis and forecasting of economics data

Hassani, Hossein January 2009 (has links)
In recent years Singular Spectrum Analysis (SSA), used as a powerful technique in time series analysis, has been developed and applied to many practical problems. The aim of this research is to develop theoretical and methodological aspects of the SSA technique and to demonstrate that SSA can be considered as a powerful method of time series analysis and forecasting, particulary for economic time series. For practical aspect and empirical results, various economic and financial time series are used. First, the SSA technique is applied as a noise reduction method. The performance of SSA is examined in noise reduction of several important financial series. The daily closing prices of several stock market indices are examined to analyse whether noise reduction matters in measuring dependencies of the financial series. The effect of noise reduction is considered on the linear and nonlinear measures of dependence between two series. The results are compared with those obtained with the linear and nonlinear methods for filtering time series. The results show that the performance of SSA is much better than of the competitive methods. Second, we consider the performance of SSA in forecasting various time series. For consistency with the forecasting results obtained with other current forecasting methods, the performance of the SSA technique is examined by applying it to a well-known time series data set, namely, monthly accidental deaths in the USA. The results are com pared with those obtained using Box-Jenkins SARIMA models, the ARAR algorithm and the Holt-Winter algorithm. The results show that the SSA technique gives a much more accurate forecast than the other methods indicated above. As another example, the performance of the SSA technique is assessed by applying it to 24 series measuring the monthly seasonally unadjusted industrial production for important sectors of the German, French and UK economies. The results confirm that at longer horizons, SSA significantly outperforms ARIMA and Holt-Winter methods. Moreover, the application of SSA to the analysis and forecasting of Iranian national accounts data, which are rather short, are considered to examine capability of SSA in forecasting short time series. The results confirm that SSA works very well for short time series as well as for long time series. The univariate and multivariate SSA are also employed in predicting the value and the changes in direction of inflation series for the United States. The consumer price indices, and real-time chain-weighted GDP price index series are used in these prediction exercises. Moreover, our out-of-sample 1-step-ahead moving prediction results are compared with the prediction results based on methods such as activity-based NAIRU Philips curve, AR(p), and random walk models with the latter as a naive forecasting method. A short-run (quarterly) and long-run (one to six years) time windows are utilized for predictions. The results clearly confirm that prediction of inflation rate in the United States during the period of "Great Moderation" is less challenging compared to more volatile inflationary period of 1970-1985 also. Furthermore, the univariate and multivariate SSA is used for predicting the value and the direction of changes in the daily pound/dollar exchange rate. Empirical results show that the forecast based on the multivariate SSA compares favorably to the forecast of the random walk model both for predicting the value and the direction of changes in the daily pound/dollar exchange rate. The SSA forecasting results are also compared to prediction results based on an error correction model (VEC) in the context of a restricted vector autoregressive model. The results show that the VEC results are inferior. For theoretical development of the technique, two new versions of SSA are introduced the SSA technique based on the minimum variance estimator and based on the perturbation theory. The new versions are examined in reconstructing and forecasting time series. The results are compared with the current version of SSA and indicate that the new versions improve the quality of reconstruction step as well as forecasting results. We also consider the concept of casual relationship between two time series based on the SSA technique. We introduce several criteria which characterize this causality. The criteria are based on the forecasting accuracy and predictability of the direction of change. The performance of the proposed test is examined using different real time series.

Page generated in 0.0148 seconds