Global ETD Search

1	Precision Aggregated Local Models Edwards, Adam Michael 28 January 2021 (has links) Large scale Gaussian process (GP) regression is infeasible for larger data sets due to cubic scaling of flops and quadratic storage involved in working with covariance matrices. Remedies in recent literature focus on divide-and-conquer, e.g., partitioning into sub-problems and inducing functional (and thus computational) independence. Such approximations can speedy, accurate, and sometimes even more flexible than an ordinary GPs. However, a big downside is loss of continuity at partition boundaries. Modern methods like local approximate GPs (LAGPs) imply effectively infinite partitioning and are thus pathologically good and bad in this regard. Model averaging, an alternative to divide-and-conquer, can maintain absolute continuity but often over-smooth, diminishing accuracy. Here I propose putting LAGP-like methods into a local experts-like framework, blending partition-based speed with model-averaging continuity, as a flagship example of what I call precision aggregated local models (PALM). Using N_C LAGPs, each selecting n from N data pairs, I illustrate a scheme that is at most cubic in n, quadratic in N_C, and linear in N, drastically reducing computational and storage demands. Extensive empirical illustration shows how PALM is at least as accurate as LAGP, can be much faster in terms of speed, and furnishes continuous predictive surfaces. Finally, I propose sequential updating scheme which greedily refines a PALM predictor up to a computational budget, and several variations on the basic PALM that may provide predictive improvements. / Doctor of Philosophy / Occasionally, when describing the relationship between two variables, it may be helpful to use a so-called ``non-parametric" regression that is agnostic to the function that connects them. Gaussian Processes (GPs) are a popular method of non-parametric regression used for their relative flexibility and interpretability, but they have the unfortunate drawback of being computationally infeasible for large data sets. Past work into solving the scaling issues for GPs has focused on ``divide and conquer" style schemes that spread the data out across multiple smaller GP models. While these model make GP methods much more accessible to large data sets they do so either at the expense of local predictive accuracy of global surface continuity. Precision Aggregated Local Models (PALM) is a novel divide and conquer method for GP models that is scalable for large data while maintaining local accuracy and a smooth global model. I demonstrate that PALM can be built quickly, and performs well predictively compared to other state of the art methods. This document also provides a sequential algorithm for selecting the location of each local model, and variations on the basic PALM methodology. approximate kriging neighborhoods Gaussian process surrogate nonparametric regression nearest neighbor boosting sequential design active learning
2	Some Advances in Local Approximate Gaussian Processes Sun, Furong 03 October 2019 (has links) Nowadays, Gaussian Process (GP) has been recognized as an indispensable statistical tool in computer experiments. Due to its computational complexity and storage demand, its application in real-world problems, especially in "big data" settings, is quite limited. Among many strategies to tailor GP to such settings, Gramacy and Apley (2015) proposed local approximate GP (laGP), which constructs approximate predictive equations by constructing small local designs around the predictive location under certain criterion. In this dissertation, several methodological extensions based upon laGP are proposed. One methodological contribution is the multilevel global/local modeling, which deploys global hyper-parameter estimates to perform local prediction. The second contribution comes from extending the laGP notion of "locale" to a set of predictive locations, along paths in the input space. These two contributions have been applied in the satellite drag emulation, which is illustrated in Chapter 3. Furthermore, the multilevel GP modeling strategy has also been applied to synthesize field data and computer model outputs of solar irradiance across the continental United States, combined with inverse-variance weighting, which is detailed in Chapter 4. Last but not least, in Chapter 5, laGP's performance has been tested on emulating daytime land surface temperatures estimated via satellites, in the settings of irregular grid locations. / Doctor of Philosophy / In many real-life settings, we want to understand a physical relationship/phenomenon. Due to limited resources and/or ethical reasons, it is impossible to perform physical experiments to collect data, and therefore, we have to rely upon computer experiments, whose evaluation usually requires expensive simulation, involving complex mathematical equations. To reduce computational efforts, we are looking for a relatively cheap alternative, which is called an emulator, to serve as a surrogate model. Gaussian process (GP) is such an emulator, and has been very popular due to fabulous out-of-sample predictive performance and appropriate uncertainty quantification. However, due to computational complexity, full GP modeling is not suitable for “big data” settings. Gramacy and Apley (2015) proposed local approximate GP (laGP), the core idea of which is to use a subset of the data for inference and further prediction at unobserved inputs. This dissertation provides several extensions of laGP, which are applied to several real-life “big data” settings. The first application, detailed in Chapter 3, is to emulate satellite drag from large simulation experiments. A smart way is figured out to capture global input information in a comprehensive way by using a small subset of the data, and local prediction is performed subsequently. This method is called “multilevel GP modeling”, which is also deployed to synthesize field measurements and computational outputs of solar irradiance across the continental United States, illustrated in Chapter 4, and to emulate daytime land surface temperatures estimated by satellites, discussed in Chapter 5. nonparametric regression approximate kriging multilevel modeling calibration space-filling design inverse-variance weighting

Search results

Precision Aggregated Local Models

Some Advances in Local Approximate Gaussian Processes