Global ETD Search

61	Statistical methods for sparse image time series of remote-sensing lake environmental measurements Gong, Mengyi January 2017 (has links) Remote-sensing technology is widely used in Earth observation, from everyday weather forecasting to long-term monitoring of the air, sea and land. The remarkable coverage and resolution of remote sensing data are extremely beneficial to the investigation of environmental problems, such as the state and function of lakes under climate change. However, the attractive features of remote-sensing data bring new challenges to statistical analysis. The wide coverage and high resolution means that data are usually of large volume. The orbit track of the satellite and the occasional obscuring of the instruments due to atmospheric factors could result in substantial missing observations. Applying conventional statistical methods to this type of data can be ineffective and computationally intensive due to its volume and dimensionality. Modifications to existing methods are often required in order to incorporate the missingness. There is a great need of novel statistical approaches to tackle these challenges. This thesis aims to investigate and develop statistical approaches that can be used in the analysis of the sparse remote-sensing image time series of environmental data. Specifically, three aspects of the data are considered, (a) the high dimensionality, which is associated with the volume and the dimension of data, (b) the sparsity, in the sense of high missing percentages and (c) the spatial/temporal structures, including the patterns and the correlations. Initially, methods for temporal and spatial modelling are explored and implemented with care, e.g. harmonic regression and bivariate spline regression with residual correlation structures. In recognizing the drawbacks of these methods, functional data analysis is employed as a general approach in this thesis. Specifically, functional principal component analysis (FPCA) is used to achieve the goal of dimension reduction. Bivariate basis functions are proposed to transform the satellite image data, which typically consists of thousands/millions of pixels, into functional data with low dimensional representations. This approach has the advantage of identifying spatial variation patterns through the principal component (PC) loadings, i.e. eigenfunctions. To overcome the high missing percentages that might invalidate the standard implementation of the FPCA, the mixed model FPCA (MM-FPCA) was investigated in Chapter 3. Through estimating the PCs using a mixed effect model, the influence of sparsity could be accounted for appropriately. Data imputation can be obtained from the fitted model using the (truncated) Karhunen-Loeve expansion. The method's applicability to sparse image series is examined through a simulation study. To incorporate the temporal dependence into the MM-FPCA, a novel spatio-temporal model consisting of a state space component and a FPCA component is proposed in Chapter 4. The model, referred to as SS-FPCA in the thesis, is developed based on the dynamic spatio-temporal model framework. The SS-FPCA exploits a flexible hierarchical design with (a) a data model consisting of a time varying mean function and random component for the common spatial variation patterns formulated as the FPCA, (b) a process model specifying the type of temporal dynamic of the mean function and (c) a parameter model ensuring the identifiability of the model components. A 2-cycle alternating expectation - conditional maximization (AECM) algorithm is proposed to estimate the SS-FPCA model. The AECM algorithm allows different data augmentations and parameter combinations in various cycles within an iteration, which in this case results in analytical solutions for all the MLEs of model parameters. The algorithm uses the Kalman filter/smoother to update the system states according to the data model and the process model. Model investigations are carried out in Chapter 5, including a simulation study on a 1-dimensional space to assess the performance of the model and the algorithm. This is accompanied by a brief summary of the asymptotic results of the EM-type algorithm, some of which can be used to approximate the standard errors of model estimates. Applications of the MM-FPCA and SS-FPCA to the remote-sensing lake surface water temperature and Chlorophyll data of Lake Victoria (obtained from the European Space Agency's Envisat mission) are presented at the end of Chapter 3 and 5. Remarks on the implications and limitations of these two methods are provided in Chapter 6, along with the potential future extensions of both methods. The Appendices provide some additional theorems, computation and derivation details of the methods investigated in the thesis. HA Statistics ; QA Mathematics
62	Sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution Davies, Vinny January 2016 (has links) Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation. 576 HA Statistics
63	Methods for change-point detection with additional interpretability Schröder, Anna Louise January 2016 (has links) The main purpose of this dissertation is to introduce and critically assess some novel statistical methods for change-point detection that help better understand the nature of processes underlying observable time series. First, we advocate the use of change-point detection for local trend estimation in financial return data and propose a new approach developed to capture the oscillatory behaviour of financial returns around piecewise-constant trend functions. Core of the method is a data-adaptive hierarchically-ordered basis of Unbalanced Haar vectors which decomposes the piecewise-constant trend underlying observed daily returns into a binary-tree structure of one-step constant functions. We illustrate how this framework can provide a new perspective for the interpretation of change points in financial returns. Moreover, the approach yields a family of forecasting operators for financial return series which can be adjusted flexibly depending on the forecast horizon or the loss function. Second, we discuss change-point detection under model misspecification, focusing in particular on normally distributed data with changing mean and variance. We argue that ignoring the presence of changes in mean or variance when testing for changes in, respectively, variance or mean, can affect the application of statistical methods negatively. After illustrating the difficulties arising from this kind of model misspecification we propose a new method to address these using sequential testing on intervals with varying length and show in a simulation study how this approach compares to competitors in mixed-change situations. The third contribution of this thesis is a data-adaptive procedure to evaluate EEG data, which can improve the understanding of an epileptic seizure recording. This change-point detection method characterizes the evolution of frequencyspecific energy as measured on the human scalp. It provides new insights to this high dimensional high frequency data and has attractive computational and scalability features. In addition to contrasting our method with existing approaches, we analyse and interpret the method’s output in the application to a seizure data set. 519.5 HA Statistics
64	On variable selection in high dimensions, segmentation and multiscale time series Baranowski, Rafal January 2016 (has links) In this dissertation, we study the following three statistical problems. First, we consider a high-dimensional data framework, where the number of covariates potentially affecting the response is large relatively to the sample size. In this setting, some of the covariates are observed to exhibit an impact on the response spuriously. Addressing this issue, we rank the covariates according to their impact on the response and use certain subsampling scheme to identify the covariates which non-spuriously appear at the top of the ranking. We study the conditions under which such set is unique and show that, with high probability, it can be recovered from the data by our procedure, for rankings based on measures commonly used in statistics. We illustrate its good practical performance in an extensive comparative simulation study and on microarray data. Second, we propose a generic approach to the problem of detecting the unknown number of features in the time series of interest, such as changes in trend or jumps in the mean, occurring at the unknown locations in time. Those locations naturally imply the decomposition of the data into segments of homogeneity, the knowledge of which is useful in e.g. estimation of the mean of the series. We provide a precise description of the type of features we are interested in and, in two important scenarios, demonstrate that our methodology enjoys appealing theoretical properties. We show that the performance of our proposal matches or surpasses the state of the art in the scenarios tested and present its applications on three real datasets: oil price log-returns, temperature anomalies data and the UK House Price Index Finally, we introduce a class of univariate multiscale time series models and propose an estimation procedure to fit those models from the data. We demonstrate that our proposal, with a large probability, correctly identifies important timescales, under the framework in which the largest timescale in the model diverges with the sample size. A good empirical performance of the method is illustrated in an application to high-frequency financial returns for stocks listed on New York Stock Exchange. For all proposed methods, we provide efficient and publicly-available computer implementations. 519.5 HA Statistics
65	Improving predictability of the future by grasping probability less tightly Wheatcroft, Edward January 2015 (has links) In the last 30 years, whilst there has been an explosion in our ability to make quantative predictions, less progress has been made in terms of building useful forecasts to aid decision support. In most real world systems, single point forecasts are fundamentally limited because they only simulate a single scenario and thus do not account for observational uncertainty. Ensemble forecasts aim to account for this uncertainty but are of limited use since it is unclear how they should be interpreted. Building probabilistic forecast densities is a theoretically sound approach with an end result that is easy to interpret for decision makers; it is not clear how to implement this approach given finite ensemble sizes and structurally imperfect models. This thesis explores methods that aid the interpretation of model simulations into predictions of the real world. This includes evaluation of forecasts, evaluation of the models used to make forecasts and the evaluation of the techniques used to interpret ensembles of simulations as forecasts. Bayes theorem is a fundamental relationship used to update a prior probability of the occurence of some event given new information. Under the assumption that each of the probabilities in Bayes theorem are perfect, it can be shown to make optimal use of the information available. Bayes theorem can also be applied to probability density functions and thus updating some previously constructed forecast density with a new one can be expected to improve forecast skill, as long as each forecast density gives a good representation of the uncertainty at that point in time. The relevance of the probability calculus, however, is in doubt when the forecasting system is imperfect, as is always the case in real world systems. Taking the view that we wish to maximise the logarithm of the probability density placed on the outcome, two new approaches to the combination of forecast densities formed at different lead times are introduced and shown to be informative even in the imperfect model scenario, that is a case where the Bayesian approach is shown to fail. 519.2 HA Statistics
66	In-sample forecasting : structured models and reserving Hiabu, Munir January 2016 (has links) In most developed countries, the insurance sector accounts for around eight percent of the GDP. In Europe alone the insurers liabilities are estimated at around e900 billion. Every insurance company regularly estimates its liabilities and reports them, in conjunction with statements about capital and assets, to the regulators. The liabilities determine the insurers solvency and also its pricing and investment strategy. The new EU directive, Solvency II, which came into effect in the beginning of 2016, states that those liabilities should be estimated with ‘realistic assumption’ using ‘relevant actuarial and statistical methods’. However, modern statistics has not found its way in the reserving departments of today’s insurance companies. This thesis attempts to contribute to the connection between the world of mathematical statistics and the reserving practice in general insurance. As part of this thesis, it is in particular shown that today’s reserving practice can be understood as a non-parametric estimation approach in a structured model setting. The forecast of future claims is done without the use of exposure information, i.e., without knowledge about the number of underwritten policies. New statistical estimation techniques and properties are derived which are build from this motivating application. 519.5 HA Statistics
67	The educational and labour market expectations of adolescents and young adults Jerrim, John January 2011 (has links) Understanding why some suitably qualified young adults go on to enter higher education and others do not has been the subject of extensive research by a number of social scientists from a range of disciplines. Economists suggest that young adults’ willingness to invest in a tertiary qualification depends upon what they believe the costs and benefits of this investment will be. On the other hand, sociologists stress that an early expectation of completing university is a key driver of later participation in higher education. Children's subjective beliefs of the future (their “expectations”) are a consistent theme within these distinctively different approaches. Researchers from both disciplines might argue that children's low or mistaken expectations (of future income, financial returns, their ability to complete university) might lead them into making inappropriate educational choices. For instance, young adults who do not have a proper understanding of the graduate labour market may mistakenly invest (or not invest) in tertiary education. Alternatively some academically talented children may not enter university if they do not see it as realistic possibility, or that it is 'not for the likes of them'. I take an interdisciplinary approach within this thesis to tackle both of these issues. Specifically, I investigate whether young adults have realistic expectations about their future in the labour market and if disadvantaged children scoring high marks on a maths assessment at age 15 believe they can complete university. 330.015195 HA Statistics
68	Model-based approaches to the estimation of poverty measures in small areas Donbavand, Steven January 2015 (has links) No description available. 300 HA Statistics
69	Bias corrections in multilevel modelling of survey data with applications to small area estimation Correa, Solange Trinidade January 2008 (has links) In this thesis, a general approach for correcting the bias of an estimator and for obtaining estimates of the accuracy of the bias-corrected estimator is proposed. The method, entitled extended bootstrap bias correction (EBS), is based on the bootstrap resampling technique and attempts to identify the functional relationship between the estimates obtained from the original and bootstrap samples and the true parameter values, drawn from a plausible parameter space. The bootstrap samples are used for studying the behaviour of the bias and, consequently, for the bias correction itself. The EBS approach is assessed by extensive Monte Carlo studies in three different applications of multilevel analysis of survey data. First, the proposed EBS method is applied to bias adjustment of unweighted and probability weighted estimators of two-level model parameters under informative sampling designs with small sample sizes. Second, the EBS approach is considered for estimating the mean squared error (MSE) of predictors of small area means under the area level Fay-Herriot model for different distributions of the model error terms. Finally, the EBS procedure is applied to MSE estimation of predictors of small area proportions under a unit level generalized linear mixed model. The general conclusion emerging from this thesis is that the EBS approach is effective in providing bias corrected estimators in all the three cases considered. 519 HA Statistics
70	Estimation of population totals from imperfect census, survey and administrative records Baffour-Awuah, Bernard January 2009 (has links) The theoretical framework of estimating the population totals from the Census, Survey and an Administrative Records List is based on capture-recapture methodology which has traditionally been employed for the measurement of abundance of biological populations. Under this framework, in order to estimate the unknown population total, N, an initial set of individuals is captured. Further subsequent captures are taken at later periods. The possible capture histories can be represented by the cells of a 2r contingency table, where r is the number of captures. This contingency table will have one cell missing, corresponding to the population missed in all r captures. If this cell count can be estimated, adding this to the sum of the observed cells will yield the population size of interest. There are a number of models that may be specied based on the incomplete 2r 519 HA Statistics

Search results