Global ETD Search

91	Role of Majorization in Learning the Kernel within a Gaussian Process Regression Framework Kapat, Prasenjit 21 October 2011 (has links) No description available. Statistics gaussian process learning the kernel majorization hadamard product rkhs machine learning statistical learning
92	Studies on Nonlinear Optimal Control System Design Based on Data-Intensive Approach / データ集約的方法に基づく非線形最適制御系設計法の研究 Beppu, Hirofumi 23 March 2022 (has links) 京都大学 / 新制・課程博士 / 博士(工学) / 甲第23888号 / 工博第4975号 / 新制\|\|工\|\|1777(附属図書館) / 京都大学大学院工学研究科航空宇宙工学専攻 / (主査)教授藤本健治, 教授加納学, 准教授丸田一郎, 教授松野文俊 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DGAM Nonlinear optimal control Data-intensive approach Gaussian process regression Neural network 500
93	Likelihood-based testing and model selection for hazard functions with unknown change-points Williams, Matthew Richard 03 May 2011 (has links) The focus of this work is the development of testing procedures for the existence of change-points in parametric hazard models of various types. Hazard functions and the related survival functions are common units of analysis for survival and reliability modeling. We develop a methodology to test for the alternative of a two-piece hazard against a simpler one-piece hazard. The location of the change is unknown and the tests are irregular due to the presence of the change-point only under the alternative hypothesis. Our approach is to consider the profile log-likelihood ratio test statistic as a process with respect to the unknown change-point. We then derive its limiting process and find the supremum distribution of the limiting process to obtain critical values for the test statistic. We first reexamine existing work based on Taylor Series expansions for abrupt changes in exponential data. We generalize these results to include Weibull data with known shape parameter. We then develop new tests for two-piece continuous hazard functions using local asymptotic normality (LAN). Finally we generalize our earlier results for abrupt changes to include covariate information using the LAN techniques. While we focus on the cases of no censoring, simple right censoring, and censoring generated by staggered-entry; our derivations reveal that our framework should apply to much broader censoring scenarios. / Ph. D. Gaussian process Donsker class Change-point hazard function local asymptotic normality Likelihood ratio test
94	Precision Aggregated Local Models Edwards, Adam Michael 28 January 2021 (has links) Large scale Gaussian process (GP) regression is infeasible for larger data sets due to cubic scaling of flops and quadratic storage involved in working with covariance matrices. Remedies in recent literature focus on divide-and-conquer, e.g., partitioning into sub-problems and inducing functional (and thus computational) independence. Such approximations can speedy, accurate, and sometimes even more flexible than an ordinary GPs. However, a big downside is loss of continuity at partition boundaries. Modern methods like local approximate GPs (LAGPs) imply effectively infinite partitioning and are thus pathologically good and bad in this regard. Model averaging, an alternative to divide-and-conquer, can maintain absolute continuity but often over-smooth, diminishing accuracy. Here I propose putting LAGP-like methods into a local experts-like framework, blending partition-based speed with model-averaging continuity, as a flagship example of what I call precision aggregated local models (PALM). Using N_C LAGPs, each selecting n from N data pairs, I illustrate a scheme that is at most cubic in n, quadratic in N_C, and linear in N, drastically reducing computational and storage demands. Extensive empirical illustration shows how PALM is at least as accurate as LAGP, can be much faster in terms of speed, and furnishes continuous predictive surfaces. Finally, I propose sequential updating scheme which greedily refines a PALM predictor up to a computational budget, and several variations on the basic PALM that may provide predictive improvements. / Doctor of Philosophy / Occasionally, when describing the relationship between two variables, it may be helpful to use a so-called ``non-parametric" regression that is agnostic to the function that connects them. Gaussian Processes (GPs) are a popular method of non-parametric regression used for their relative flexibility and interpretability, but they have the unfortunate drawback of being computationally infeasible for large data sets. Past work into solving the scaling issues for GPs has focused on ``divide and conquer" style schemes that spread the data out across multiple smaller GP models. While these model make GP methods much more accessible to large data sets they do so either at the expense of local predictive accuracy of global surface continuity. Precision Aggregated Local Models (PALM) is a novel divide and conquer method for GP models that is scalable for large data while maintaining local accuracy and a smooth global model. I demonstrate that PALM can be built quickly, and performs well predictively compared to other state of the art methods. This document also provides a sequential algorithm for selecting the location of each local model, and variations on the basic PALM methodology. approximate kriging neighborhoods Gaussian process surrogate nonparametric regression nearest neighbor boosting sequential design active learning
95	Statistical Methods for Variability Management in High-Performance Computing Xu, Li 15 July 2021 (has links) High-performance computing (HPC) variability management is an important topic in computer science. Research topics include experimental designs for efficient data collection, surrogate models for predicting the performance variability, and system configuration optimization. Due to the complex architecture of HPC systems, a comprehensive study of HPC variability needs large-scale datasets, and experimental design techniques are useful for improved data collection. Surrogate models are essential to understand the variability as a function of system parameters, which can be obtained by mathematical and statistical models. After predicting the variability, optimization tools are needed for future system designs. This dissertation focuses on HPC input/output (I/O) variability through three main chapters. After the general introduction in Chapter 1, Chapter 2 focuses on the prediction models for the scalar description of I/O variability. A comprehensive comparison study is conducted, and major surrogate models for computer experiments are investigated. In addition, a tool is developed for system configuration optimization based on the chosen surrogate model. Chapter 3 conducts a detailed study for the multimodal phenomena in I/O throughput distribution and proposes an uncertainty estimation method for the optimal number of runs for future experiments. Mixture models are used to identify the number of modes for throughput distributions at different configurations. This chapter also addresses the uncertainty in parameter estimation and derives a formula for sample size calculation. The developed method is then applied to HPC variability data. Chapter 4 focuses on the prediction of functional outcomes with both qualitative and quantitative factors. Instead of a scalar description of I/O variability, the distribution of I/O throughput provides a comprehensive description of I/O variability. We develop a modified Gaussian process for functional prediction and apply the developed method to the large-scale HPC I/O variability data. Chapter 5 contains some general conclusions and areas for future work. / Doctor of Philosophy / This dissertation focuses on three projects that are all related to statistical methods in performance variability management in high-performance computing (HPC). HPC systems are computer systems that create high performance by aggregating a large number of computing units. The performance of HPC is measured by the throughput of a benchmark called the IOZone Filesystem Benchmark. The performance variability is the variation among throughputs when the system configuration is fixed. Variability management involves studying the relationship between performance variability and the system configuration. In Chapter 2, we use several existing prediction models to predict the standard deviation of throughputs given different system configurations and compare the accuracy of predictions. We also conduct HPC system optimization using the chosen prediction model as the objective function. In Chapter 3, we use the mixture model to determine the number of modes in the distribution of throughput under different system configurations. In addition, we develop a model to determine the number of additional runs for future benchmark experiments. In Chapter 4, we develop a statistical model that can predict the throughout distributions given the system configurations. We also compare the prediction of summary statistics of the throughput distributions with existing prediction models. computer experiments functional prediction Gaussian process Machine learning prediction model performance variability mixture model quantile regression
96	Modeling of the fundamental mechanical interactions of unit load components during warehouse racking storage Molina Montoya, Eduardo 04 February 2021 (has links) The global supply chain has been built on the material handling capabilities provided by the use of pallets and corrugated boxes. Current pallet design methodologies frequently underestimate the load carrying capacity of pallets by assuming they will only carry uniformly distributed, flexible payloads. But, by considering the effect of various payload characteristics and their interactions during the pallet design process, the structure of pallets can be optimized. This, in turn, will reduce the material consumption required to support the pallet industry. In order to understand the mechanical interactions between stacked boxes and pallet decks, and how these interactions affect the bending moment of pallets, a finite element model was developed and validated. The model developed was two-dimensional, nonlinear and implicitly dynamic. It allowed for evaluations of the effects of different payload configurations on the pallet bending response. The model accurately predicted the deflection of the pallet segment and the movement of the packages for each scenario simulated. The second phase of the study characterized the effects, significant factors, and interactions influencing load bridging on unit loads. It provided a clear understanding of the load bridging effect and how it can be successfully included during the unit load design process. It was concluded that pallet yield strength could be increased by over 60% when accounting for the load bridging effect. To provide a more efficient and cost-effective solution, a surrogate model was developed using a Gaussian Process regression. A detailed analysis of the payloads' effects on pallet deflection was conducted. Four factors were identified as generating significant influence: the number of columns in the unit load, the height of the payload, the friction coefficient of the payload's contact with the pallet deck, and the contact friction between the packages. Additionally, it was identified that complex interactions exist between these significant factors, so they must always be considered. / Doctor of Philosophy / Pallets are a key element of an efficient global supply chain. Most products that are transported are commonly packaged in corrugated boxes and handled by stacking these boxes on pallets. Currently, pallet design methods do not take into consideration the product that is being carried, instead using generic flexible loads for the determination of the pallet's load carrying capacity. In practice, most pallets carry discrete loads, such as corrugated boxes. It has been proven that a pallet, when carrying certain types of packages, can have increased performance compared to the design's estimated load carrying capacity. This is caused by the load redistribution across the pallet deck through an effect known as load bridging. Being able to incorporate the load bridging effect on pallet performance during the design process can allow for the optimization of pallets for specific uses and the reduction in costs and in material consumption. Historically, this effect has been evaluated through physical testing, but that is a slow and cumbersome process that does not allow control of all of the variables for the development of a general model. This research study developed a computer simulation model of a simplified unit load to demonstrate and replicate the load bridging effect. Additionally, a surrogate model was developed in order to conduct a detailed analysis of the main factors and their interactions. These models provide pallet designers an efficient method to use to identify opportunities to modify the unit load's characteristics and improve pallet performance for specific conditions of use. pallets packaging unit load unit load interactions Finite element method gaussian process model load bridging
97	Neural Network Gaussian Process considering Input Uncertainty and Application to Composite Structures Assembly Lee, Cheol Hei 18 May 2020 (has links) Developing machine learning enabled smart manufacturing is promising for composite structures assembly process. It requires accurate predictive analysis on deformation of the composite structures to improve production quality and efficiency of composite structures assembly. The novel composite structures assembly involves two challenges: (i) the highly nonlinear and anisotropic properties of composite materials; and (ii) inevitable uncertainty in the assembly process. To overcome those problems, we propose a neural network Gaussian process model considering input uncertainty for composite structures assembly. Deep architecture of our model allows us to approximate a complex system better, and consideration of input uncertainty enables robust modeling with complete incorporation of the process uncertainty. Our case study shows that the proposed method performs better than benchmark methods for highly nonlinear systems. / Master of Science / Composite materials are becoming more popular in many areas due to its nice properties, yet computational modeling of them is not an easy task due to their complex structures. More-over, the real-world problems are generally subject to uncertainty that cannot be observed,and it makes the problem more difficult to solve. Therefore, a successful predictive modeling of composite material for a product is subject to consideration of various uncertainties in the problem.The neural network Gaussian process (NNGP) is one of statistical techniques that has been developed recently and can be applied to machine learning. The most interesting property of NNGP is that it is derived from the equivalent relation between deep neural networks and Gaussian process that have drawn much attention in machine learning fields. However,related work have ignored uncertainty in the input data so far, which may be an inappropriate assumption in real problems.In this paper, we derive the NNGP considering input uncertainty (NNGPIU) based on the unique characteristics of composite materials. Although our motivation is come from the manipulation of composite material, NNGPIU can be applied to any problem where the input data is corrupted by unknown noise. Our work provides how NNGPIU can be derived theoretically; and shows that the proposed method performs better than benchmark methods for highly nonlinear systems. Neural Network Gaussian Process Input Uncertainty Data-driven Manufacturing Composite Structures Assembly
98	Semiparametric Bayesian Kernel Survival Model for Highly Correlated High-Dimensional Data Zhang, Lin 01 May 2018 (has links) We are living in an era in which many mysteries related to science, technologies and design can be answered by "learning" the huge amount of data accumulated over the past few decades. In the processes of those endeavors, highly-correlated high-dimensional data are frequently observed in many areas including predicting shelf life, controlling manufacturing processes, and identifying important pathways related with diseases. We define a "set" as a group of highly-correlated high-dimensional (HCHD) variables that possess a certain practical meaning or control a certain process, and define an "element" as one of the HCHD variables within a certain set. Such an elements-within-a-set structure is very complicated because: (i) the dimensions of elements in different sets can vary dramatically, ranging from two to hundreds or even thousands; (ii) the true relationships, include element-wise associations, set-wise interactions, and element-set interactions, are unknown; (iii) and the sample size (n) is usually much smaller than the dimension of the elements (p). The goal of this dissertation is to provide a systematic way to identify both the set effects and the element effects associated with survival outcomes from heterogeneous populations using Bayesian survival kernel models. By connecting kernel machines with semiparametric Bayesian hierarchical models, the proposed unified model frameworks can identify significant elements as well as sets regardless of mis-specifications of distributions or kernels. The proposed methods can potentially be applied to a vast range of fields to solve real-world problems. / PHD / We are living in an era in which many mysteries related to science, technologies and design can be answered by “learning” the huge amount of data accumulated over the past few decades. In the processes of those endeavors, highly-correlated high-dimensional data are frequently observed in many areas including predicting shelf life, controlling manufacturing processes, and identifying important pathways related with diseases. For example, for a group of 30 patients in a medical study, values for an immense number of variables like gender, age, height, weight, and blood pressure of each patient are recorded. High-dimensional means the number of variables (i.e. p) could be very large (e.g. p > 500), while the number of subjects or the sample size (i.e. n) is small (n = 30). We define a “set” as a group of highly-correlated high-dimensional (HCHD) variables that possess a certain practical meaning or control a certain process, and define an “element” as one of the HCHD variables within a certain set. Such an elements-within-a-set structure is very complicated because: (i) the dimensions of elements in different sets can vary dramatically, ranging from two to hundreds or even thousands; (ii) the true relationships, include element-wise associations, set-wise interactions, and element-set interactions, are unknown; (iii) and the sample size (n) is usually much smaller than the dimension of the elements (p). The goal of this dissertation is to provide a systematic way to identify both the set effects and the element effects associated with survival outcomes from heterogeneous populations using different proposed statistical models. The proposed models can incorporate prior knowledge to boost the model performance. The proposed methods can potentially be applied to a vast range of fields to solve real-world problems. Gaussian Process Kernel Machine Mixture Model Pathway-Based Analysis
99	High-dimensional Multimodal Bayesian Learning Salem, Mohamed Mahmoud 12 December 2024 (has links) High-dimensional datasets are fast becoming a cornerstone across diverse domains, fueled by advancements in data-capturing technology like DNA sequencing, medical imaging techniques, and social media. This dissertation delves into the inherent opportunities and challenges posed by these types of datasets. We develop three Bayesian methods: (1) Multilevel Network Recovery for Genomics, (2) Network Recovery for Functional data, and (3) Bayesian Inference in Transformer-based Models. Chapter 2 in our work examines a two-tiered data structure; to simultaneously explore the variable selection and identify dependency structures among both higher and lower-level variables, we propose a multi-level nonparametric kernel machine approach, utilizing variational inference to jointly identify multi-level variables as well as build the network. Chapter 3 addresses the development of a simultaneous selection of functional domain subsets, selection of functional graphical nodes, and continuous response modeling given both scalar and functional covariates under semiparametric, nonadditive models, which allow us to capture unknown, possibly nonlinear, interaction terms among high dimensional functional variables. In Chapter 4, we extend our investigation of leveraging structure in high dimensional datasets to the relatively new transformer architecture; we introduce a new penalty structure to the Bayesian classification transformer, leveraging the multi-tiered structure of the transformer-based model. This allows for increased, likelihood-based regularization, which is needed given the high dimensional nature of our motivating dataset. This new regularization approach allows us to integrate Bayesian inference via variational approximations into our transformer-based model and improves the calibration of probability estimates. / Doctor of Philosophy / In today's data-driven landscape, high-dimensional datasets have emerged as a corner stone across diverse domains, fueled by advancements in technology like sensor networks, genomics, and social media platforms. This dissertation delves into the inherent opportunities and challenges posed by these datasets, emphasizing their potential for uncovering hidden patterns and correlations amidst their complexity. As high-dimensional datasets proliferate, researchers face significant challenges in effectively analyzing and interpreting them. This research focuses on leveraging Bayesian methods as a robust approach to address these challenges. Bayesian approaches offer unique advantages, particularly in handling small sample sizes and complex models. By providing robust uncertainty quantification and regularization techniques, Bayesian methods ensure reliable inference and model generalization, even in the face of sparse or noisy data. Furthermore, this work examines the strategic integration of structured information as a regularization technique. By exploiting patterns and dependencies within the data, structured regularization enhances the interpretability and resilience of statistical models across various domains. Whether the structure arises from spatial correlations, temporal dependencies, or coordinated actions among covariates, incorporating this information enriches the modeling process and improves the reliability of the results. By exploring these themes, this research contributes to advancing the understanding and application of high-dimensional data analysis. Through a thorough examination of Bayesian methods and structured regularization techniques, this dissertation aims to support researchers in effectively navigating and extracting meaningful insights from the complex landscape of high-dimensional datasets. Gaussian Process High Dimensional Data Variable Selection Variational Inference Uncertainty Quantification
100	A Dual Metamodeling Perspective for Design and Analysis of Stochastic Simulation Experiments Wang, Wenjing 17 July 2019 (has links) Fueled by a growing number of applications in science and engineering, the development of stochastic simulation metamodeling methodologies has gained momentum in recent years. A majority of the existing methods, such as stochastic kriging (SK), only focus on efficiently metamodeling the mean response surface implied by a stochastic simulation experiment. As the simulation outputs are stochastic with the simulation variance varying significantly across the design space, suitable methods for variance modeling are required. This thesis takes a dual metamodeling perspective and aims at exploiting the benefits of fitting the mean and variance functions simultaneously for achieving an improved predictive performance. We first explore the effects of replacing the sample variances with various smoothed variance estimates on the performance of SK and propose a dual metamodeling approach to obtain an efficient simulation budget allocation rule. Second, we articulate the links between SK and least-square support vector regression and propose to use a ``dense and shallow'' initial design to facilitate selection of important design points and efficient allocation of the computational budget. Third, we propose a variational Bayesian inference-based Gaussian process (VBGP) metamodeling approach to accommodate the situation where either one or multiple simulation replications are available at every design point. VBGP can fit the mean and variance response surfaces simultaneously, while taking into full account the uncertainty in the heteroscedastic variance. Lastly, we generalize VBGP for handling large-scale heteroscedastic datasets based on the idea of ``transductive combination of GP experts.'' / Doctor of Philosophy / In solving real-world complex engineering problems, it is often helpful to learn the relationship between the decision variables and the response variables to better understand the real system of interest. Directly conducting experiments on the real system can be impossible or impractical, due to the high cost or time involved. Instead, simulation models are often used as a surrogate to model the complex stochastic systems for conducting simulation-based design and analysis. However, even simulation models can be very expensive to run. To alleviate the computational burden, a metamodel is often built based on the outputs of the simulation runs at some selected design points to map the performance response surface as a function of the controllable decision variables, or uncontrollable environmental variables, to approximate the behavior of the original simulation model. There has been a plethora of work in the simulation research community dedicated to studying stochastic simulation metamodeling methodologies suitable for analyzing stochastic simulation experiments in science and engineering. A majority of the existing methods, such as stochastic kriging (SK), have been known as effective metamodeling tool for approximating a mean response surface implied by a stochastic simulation. Despite that SK has been extensively used as an effective metamodeling methodology for stochastic simulations, SK and metamodeling techniques alike still face four methodological barriers: 1) Lack of the study in variance estimates methods; 2) Absence of an efficient experimental design for simultaneous mean and variance metamodeling; 3) Lack of flexibility to accommodate situations where simulation replications are not available; and 4) Lack of scalability. To overcome the aforementioned barriers, this thesis takes a dual metamodeling perspective and aims at exploiting the benefits of fitting the mean and variance functions simultaneously for achieving an improved predictive performance. We first explore the effects of replacing the sample variances with various smoothed variance estimates on the performance of SK and propose a dual metamodeling approach to obtain an efficient simulation budget allocation rule. Second, we articulate the links between SK and least-square support vector regression and propose to use a “dense and shallow” initial design to facilitate selection of important design points and efficient allocation of the computational budget. Third, we propose a variational Bayesian inference-based Gaussian process (VBGP) metamodeling approach to accommodate the situation where either one or multiple simulation replications are available at every design point. VBGP can fit the mean and variance response surfaces simultaneously, while taking into full account the uncertainty in the heteroscedastic variance. Lastly, we generalize VBGP for handling large-scale heteroscedastic datasets based on the idea of “transductive combination of GP experts.” Simulation metamodeling heteroscedastic variance estimation simulation experiment design

Search results