• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 22
  • 22
  • 22
  • 22
  • 7
  • 7
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Differential cumulants, hierarchical models and monomial ideals

Bruynooghe, Daniel January 2011 (has links)
No description available.
2

Statistical methods for sparse image time series of remote-sensing lake environmental measurements

Gong, Mengyi January 2017 (has links)
Remote-sensing technology is widely used in Earth observation, from everyday weather forecasting to long-term monitoring of the air, sea and land. The remarkable coverage and resolution of remote sensing data are extremely beneficial to the investigation of environmental problems, such as the state and function of lakes under climate change. However, the attractive features of remote-sensing data bring new challenges to statistical analysis. The wide coverage and high resolution means that data are usually of large volume. The orbit track of the satellite and the occasional obscuring of the instruments due to atmospheric factors could result in substantial missing observations. Applying conventional statistical methods to this type of data can be ineffective and computationally intensive due to its volume and dimensionality. Modifications to existing methods are often required in order to incorporate the missingness. There is a great need of novel statistical approaches to tackle these challenges. This thesis aims to investigate and develop statistical approaches that can be used in the analysis of the sparse remote-sensing image time series of environmental data. Specifically, three aspects of the data are considered, (a) the high dimensionality, which is associated with the volume and the dimension of data, (b) the sparsity, in the sense of high missing percentages and (c) the spatial/temporal structures, including the patterns and the correlations. Initially, methods for temporal and spatial modelling are explored and implemented with care, e.g. harmonic regression and bivariate spline regression with residual correlation structures. In recognizing the drawbacks of these methods, functional data analysis is employed as a general approach in this thesis. Specifically, functional principal component analysis (FPCA) is used to achieve the goal of dimension reduction. Bivariate basis functions are proposed to transform the satellite image data, which typically consists of thousands/millions of pixels, into functional data with low dimensional representations. This approach has the advantage of identifying spatial variation patterns through the principal component (PC) loadings, i.e. eigenfunctions. To overcome the high missing percentages that might invalidate the standard implementation of the FPCA, the mixed model FPCA (MM-FPCA) was investigated in Chapter 3. Through estimating the PCs using a mixed effect model, the influence of sparsity could be accounted for appropriately. Data imputation can be obtained from the fitted model using the (truncated) Karhunen-Loeve expansion. The method's applicability to sparse image series is examined through a simulation study. To incorporate the temporal dependence into the MM-FPCA, a novel spatio-temporal model consisting of a state space component and a FPCA component is proposed in Chapter 4. The model, referred to as SS-FPCA in the thesis, is developed based on the dynamic spatio-temporal model framework. The SS-FPCA exploits a flexible hierarchical design with (a) a data model consisting of a time varying mean function and random component for the common spatial variation patterns formulated as the FPCA, (b) a process model specifying the type of temporal dynamic of the mean function and (c) a parameter model ensuring the identifiability of the model components. A 2-cycle alternating expectation - conditional maximization (AECM) algorithm is proposed to estimate the SS-FPCA model. The AECM algorithm allows different data augmentations and parameter combinations in various cycles within an iteration, which in this case results in analytical solutions for all the MLEs of model parameters. The algorithm uses the Kalman filter/smoother to update the system states according to the data model and the process model. Model investigations are carried out in Chapter 5, including a simulation study on a 1-dimensional space to assess the performance of the model and the algorithm. This is accompanied by a brief summary of the asymptotic results of the EM-type algorithm, some of which can be used to approximate the standard errors of model estimates. Applications of the MM-FPCA and SS-FPCA to the remote-sensing lake surface water temperature and Chlorophyll data of Lake Victoria (obtained from the European Space Agency's Envisat mission) are presented at the end of Chapter 3 and 5. Remarks on the implications and limitations of these two methods are provided in Chapter 6, along with the potential future extensions of both methods. The Appendices provide some additional theorems, computation and derivation details of the methods investigated in the thesis.
3

Predicting hypotensive episodes in the traumatic brain injury domain

Donald, Rob January 2014 (has links)
The domain with which this research is concerned is traumatic brain injury and models which attempt to predict hypotensive (low blood pressure) events occurring in a hospital intensive care unit environment. The models process anonymised, clinical, minute-byminute, physiological data from the BrainIT consortium. The research reviews three predictive modelling techniques: classic time series analysis; hidden Markov models; and classifier models, which are the main focus of this thesis. The data preparation part of this project is extensive and six applications have been developed: an event list generator, used to process a given event definition; a data set generation tool, which produces a series of base data sets that can be used to train machine learning models; a training and test set generation application, which produces randomly drawn training and test data sets; an application used to build and assess a series of logistic regression models; an application to test the statistical models on unseen data, which uses anonymised real clinical data from intensive care unit bedside monitors; and finally, an application that implements a proposed clinical warning protocol, which attempts to assess a model’s performance in terms of usefulness to a clinical team. These applications are being made available under a public domain licence to enable further research (see Appendix A for details). Six logistic regression models and two Bayesian neural network models are examined using the physiological signals heart rate and arterial blood pressure, along with the demographic variables of age and gender. Model performance is assessed using the standard ROC technique to give the AUC metric. An alternative performance metric, the H score, is also investigated. Using unseen clinical data, two of the models are assessed in a manner which mimics the ICU environment. This approach shows that models may perform better than would be suggested by standard assessment metrics. The results of the modelling experiments are compared with a recent similar project in the healthcare domain and show that logistic regression models could form the basis of a practical early warning system for use in a neuro intensive care unit.
4

Bayesian mixture models for count data

Chanialidis, Charalampos January 2015 (has links)
Regression models for count data are usually based on the Poisson distribution. This thesis is concerned with Bayesian inference in more flexible models for count data. Two classes of models and algorithms are presented and studied in this thesis. The first employs a generalisation of the Poisson distribution called the COM-Poisson distribution, which can represent both overdispersed data and underdispersed data. We also propose a density regression technique for count data, which, albeit centered around the Poisson distribution, can represent arbitrary discrete distributions. The key contribution of this thesis are MCMC-based methods for posterior inference in these models. One key challenge in COM-Poisson-based models is the fact that the normalisation constant of the COM-Poisson distribution is not known in closed form. We propose two exact MCMC algorithms which address this problem. One is based on the idea of retrospective sampling; we sample the uniform random variable used to decide on the acceptance (or rejection) of the proposed new state of the unknown parameter first and then only evaluate bounds for the acceptance probability, in the hope that we will not need to know the acceptance probability exactly in order to come to a decision on whether to accept or reject the newly proposed value. This strategy is based on an efficient scheme for computing lower and upper bounds for the normalisation constant. This procedure can be applied to a number of discrete distributions, including the COM-Poisson distribution. The other MCMC algorithm proposed is based on an algorithm known as the exchange algorithm. The latter requires sampling from the COM-Poisson distribution and we will describe how this can be done efficiently using rejection sampling. We will also present simulation studies which show the advantages of using the COM-Poisson regression model compared to the alternative models commonly used in literature (Poisson and negative binomial). Three real world applications are presented: the number of emergency hospital admissions in Scotland in 2010, the number of papers published by Ph.D. students and fertility data from the second German Socio-Economic Panel. COM-Poisson distributions are also the cornerstone of the proposed density regression technique based on Dirichlet process mixture models. Density regression can be thought of as a competitor to quantile regression. Quantile regression estimates the quantiles of the conditional distribution of the response variable given the covariates. This is especially useful when the dispersion changes across the covariates. Instead of estimating the conditional mean , quantile regression estimates the conditional quantile function across different quantiles. As a result, quantile regression models both location and shape shifts of the conditional distribution. This allows for a better understanding of how the covariates affect the conditional distribution of the response variable. Almost all quantile regression techniques deal with a continuous response. Quantile regression models for count data have so far received little attention. A technique that has been suggested is adding uniform random noise ('jittering'), thus overcoming the problem that, for a discrete distribution, the conditional quantile function is not a continuous function of the parameters of interest. Even though this enables us to estimate the conditional quantiles of the response variable, it has disadvantages. For small values of the response variable Y, the added noise can have a large influence on the estimated quantiles. In addition, the problem of 'crossing quantiles' still exists for the jittering method. We eliminate all the aforementioned problems by estimating the density of the data, rather than the quantiles. Simulation studies show that the proposed approach performs better than the already established jittering method. To illustrate the new method we analyse fertility data from the second German Socio-Economic Panel.
5

A Bayesian hierarchical model of compositional data with zeros : classification and evidence evaluation of forensic glass

Napier, Gary January 2014 (has links)
A Bayesian hierarchical model is proposed for modelling compositional data containing large concentrations of zeros. Two data transformations were used and compared: the commonly used additive log-ratio (alr) transformation for compositional data, and the square root of the compositional ratios. For this data the square root transformation was found to stabilise variability in the data better. The square root transformation also had no issues dealing with the large concentrations of zeros. To deal with the zeros, two different approaches have been implemented: the data augmentation approach and the composite model approach. The data augmentation approach treats any zero values as rounded zeros, i.e. traces of components below limits of detection, and updates those zero values with non-zero values. This is better than the simple approach of adding constant values to zeros as it reduces any artificial correlation produced by updating the zeros as part of the modelling procedure. However, due to the small detection limit it does not necessarily alleviate the problems of having a point mass very close to zero. The composite model approach treats any zero components as being absent from a composition. This is done by splitting the data into subsets according to the presence or absence of certain components to produce different data configurations that are then modelled separately. The models are applied to a database consisting of the elemental configurations of forensic glass fragments with many levels of variability and of various use types. The main purposes of the model are (i) to derive expressions for the posterior predictive probabilities of newly observed glass fragments to infer their use type (classification) and (ii) to compute the evidential value of glass fragments under two complementary propositions about their source (forensic evidence evaluation). Simulation studies using cross-validation are carried out to assess both model approaches, with both performing well at classifying glass fragments of use types bulb, headlamp and container, but less well so when classifying car and building windows. The composite model approach marginally outperforms the data augmentation approach at the classification task; both approaches have the edge over support vector machines (SVM). Both model approaches also perform well when evaluating the evidential value of glass fragments, with false negative and false positive error rates below 5%. The results from glass classification and evidence evaluation are an improvement over existing methods. Assessment of the models as part of the evidence evaluation simulation study also leads to a restriction being placed upon the reported strength of the value of this type of evidence. To prevent strong support in favour of the wrong proposition it is recommended that this glass evidence should provide, at most, moderately strong support in favour of a proposition. The classification and evidence evaluation procedures are implemented into an online web application, which outputs the corresponding results for a given set of elemental composition measurements. The web application contributes a quick and easy-to-use tool for forensic scientists that deal with this type of forensic evidence in real-life casework.
6

Aspects of generative and discriminative classifiers

Xue, Jinghao January 2008 (has links)
Meanwhile, we suggest that the so-called output-dependent HMMs could be represented in a state-dependent manner, and vice versa, essentially by application of Bayes' theorem. Finally, in Chapter \ref{ch:img}, we present discriminative approaches to histogram-based image thresholding, in which the optimal threshold is derived from the maximum likelihood based on the conditional distribution $p(y|x)$ of $y$, the class indicator of a grey level $x$, given $x$. The discriminative approaches can be regarded as discriminative extensions of the traditional generative approaches to thresholding, such as Otsu's method~\citep{Otsu:79} and Kittler and Illingworth's minimum error thresholding (MET)~\citep{Kittler:86}. As illustrations, we develop discriminative versions of Otsu's method and MET by using discriminant functions corresponding to the original methods to represent $p(y|x)$. These two discriminative thresholding approaches are compared with their original counterparts on selecting thresholds for a variety of histograms of mixture distributions. Results show that the discriminative Otsu method consistently provides relatively good performance. Although being of higher computational complexity than the original methods in parameter estimation, its robustness and model simplicity can justify the discriminative Otsu method for scenarios in which the risk of model mis-specification is high and the computation is not demanding.
7

Monotone local linear estimation of transducer functions

Hughes, David January 2014 (has links)
Local polynomial regression has received a great deal of attention in the past. It is a highly adaptable regression method when the true response model is not known. However, estimates obtained in this way are not guaranteed to be monotone. In some situations the response is known to depend monotonically upon some variables. Various methods have been suggested for constraining nonparametric local polynomial regression to be monotone. The earliest of these is known as the Pool Adjacent Violators algorithm (PAVA) and was first suggested by Brunk (1958). Kappenman (1987) suggested that a non-parametric estimate could be made monotone by simply increasing the bandwidth used until the estimate was monotone. Dette et al. (2006) have suggested a monotonicity constraint which they call the DNP method. Their method involves calculating a density estimate of the unconstrained regression estimate, and using this to calculate an estimate of the inverse of the regression function. Fan, Heckman and Wand (1995) generalized local polynomial regression to quasi-likelihood based settings. Obviously such estimates are not guaranteed to be monotone, whilst in many practical situations monotonicity of response is required. In this thesis I discuss how the above mentioned monotonicity constraint methods can be adapted to the quasi-likelihood setting. I am particularly interested in the estimation of monotone psychometric functions and, more generally, biological transducer functions, for which the response is often known to follow a distribution which belongs to the exponential family. I consider some of the key theoretical properties of the monotonised local linear estimators in the quasi-likelihood setting. I establish asymptotic expressions for the bias and variance for my adaptation of the DNP method (called the LDNP method) and show that this estimate is asymptotically normally distributed and first{-}order equivalent to competing methods. I demonstrate that this adaptation overcomes some of the problems with using the DNP method in likelihood based settings. I also investigate the choice of second bandwidth for use in the density estimation step. I compare the LDNP method, the PAVA method and the bandwidth method by means of a simulation study. I investigate a variety of response models, including binary, Poisson and exponential. In each study I calculate monotone estimates of the response curve using each method and compare their bias, variance, MSE and MISE. I also apply these methods to analysis of data from various hearing and vision studies. I show some of the deficiencies of using local polynomial estimates, as opposed to local likelihood estimates.
8

Asymmetry and other distributional properties in medical research data

Partlett, Christopher January 2015 (has links)
The central theme of this thesis is to investigate the use of non-parametric methods for making inferences about a random sample with an unknown distribution function. The overarching aim is the development of new methods to make inferences regarding the nature of the unknown distribution to enhance medical research. Initially,the focus is exclusively on the asymmetry of a random variable. In particular, a recently proposed measure of asymmetry provides the foundation for the proposal and development of a new test for symmetry. The potential applications of the test and measure are applied to a number of medical research settings including randomised trials. Moreover, guidance is provided on its implementation, with particular emphasis on the problem of small sample estimation. This investigation is then generalised to examine asymmetry across multiple studies. In particular, meta-analysis methods are used to synthesise information about the amount of asymmetry in several studies. Further, a detailed simulation study is carried out to investigate the impact of asymmetry on linear models and meta-analyses of randomised trials, in terms of the accuracy of the treatment effect estimate and the coverage of confidence and prediction intervals. Finally, the scope of the investigation is widened to encompass the problem of comparing and synthesising information about the probability density function and cumulative distribution function, based on samples from multiple studies. The meta-analysis of the smooth distribution function estimate is then applied to propose new methods for conducting meta-analyses of diagnostic test accuracy, which have a number of merits compared to the existing methodology.
9

Model selection and model averaging in the presence of missing values

Gopal Pillay, Khuneswari January 2015 (has links)
Model averaging has been proposed as an alternative to model selection which is intended to overcome the underestimation of standard errors that is a consequence of model selection. Model selection and model averaging become more complicated in the presence of missing data. Three different model selection approaches (RR, STACK and M-STACK) and model averaging using three model-building strategies (non-overlapping variable sets, inclusive and restrictive strategies) were explored to combine results from multiply-imputed data sets using a Monte Carlo simulation study on some simple linear and generalized linear models. Imputation was carried out using chained equations (via the "norm" method in the R package MICE). The simulation results showed that the STACK method performs better than RR and M-STACK in terms of model selection and prediction, whereas model averaging performs slightly better than STACK in terms of prediction. The inclusive and restrictive strategies perform better in terms of prediction, but non-overlapping variable sets performs better for model selection. STACK and model averaging using all three model-building strategies were proposed to combine the results from a multiply-imputed data set from the Gateshead Millennium Study (GMS). The performance of STACK and model averaging was compared using mean square error of prediction (MSE(P)) in a 10% cross-validation test. The results showed that STACK using an inclusive strategy provided a better prediction than model averaging. This coincides with the results obtained through a mimic simulation study of GMS data. In addition, the inclusive strategy for building imputation and prediction models was better than the non-overlapping variable sets and restrictive strategy. The presence of highly correlated covariates and response is believed to have led to better prediction in this particular context. Model averaging using non-overlapping variable sets performs better only if an auxiliary variable is available. However, STACK using an inclusive strategy performs well when there is no auxiliary variable available. Therefore, it is advisable to use STACK with an inclusive model-building strategy and highly correlated covariates (where available) to make predictions in the presence of missing data. Alternatively, model averaging with non-overlapping variables sets can be used if an auxiliary variable is available.
10

Survival modelling in mathematical and medical statistics

Hua, Hairui January 2015 (has links)
An essential aspect of survival analysis is the estimation and prediction of survival probabilities for individuals. For this purpose, mathematical modelling of the hazard rate function is a fundamental issue. This thesis focuses on the novel estimation and application of hazard rate functions in mathematical and medical research. In mathematical research we focus on the development of a semiparametric kernel-based estimate of hazard rate function and a L\(_1\) error optimal kernel hazard rate estimate. In medical research we concentrate on the development and validation of survival models using individual participant data from multiple studies. We also consider how to fit survival models that predict individual response to treatment effectiveness, given IPD from multiple trials.

Page generated in 0.082 seconds