Return to search

Bayesian Sparse Regression with Application to Data-driven Understanding of Climate

Sparse regressions based on constraining the L1-norm of the coefficients became popular due to their ability to handle high dimensional data unlike the regular regressions which suffer from overfitting and model identifiability issues especially when sample size is small. They are often the method of choice in many fields of science and engineering for simultaneously selecting covariates and fitting parsimonious linear models that are better generalizable and easily interpretable. However, significant challenges may be posed by the need to accommodate extremes and other domain constraints such as dynamical relations among variables, spatial and temporal constraints, need to provide uncertainty estimates and feature correlations, among others. We adopted a hierarchical Bayesian version of the sparse regression framework and exploited its inherent flexibility to accommodate the constraints. We applied sparse regression for the feature selection problem of statistical downscaling of the climate variables with particular focus on their extremes. This is important for many impact studies where the climate change information is required at a spatial scale much finer than that provided by the global or regional climate models. Characterizing the dependence of extremes on covariates can help in identification of plausible causal drivers and inform extremes downscaling. We propose a general-purpose sparse Bayesian framework for covariate discovery that accommodates the non-Gaussian distribution of extremes within a hierarchical Bayesian sparse regression model. We obtain posteriors over regression coefficients, which indicate dependence of extremes on the corresponding covariates and provide uncertainty estimates, using a variational Bayes approximation. The method is applied for selecting informative atmospheric covariates at multiple spatial scales as well as indices of large scale circulation and global warming related to frequency of precipitation extremes over continental United States. Our results confirm the dependence relations that may be expected from known precipitation physics and generates novel insights which can inform physical understanding. We plan to extend our model to discover covariates for extreme intensity in future. We further extend our framework to handle the dynamic relationship among the climate variables using a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP). The extended model can achieve simultaneous clustering and discovery of covariates within each cluster. Moreover, the a priori knowledge about association between pairs of data-points is incorporated in the model through must-link constraints on a Markov Random Field (MRF) prior. A scalable and efficient variational Bayes approach is developed to infer posteriors on regression coefficients and cluster variables. / Computer and Information Science

Identiferoai:union.ndltd.org:TEMPLE/oai:scholarshare.temple.edu:20.500.12613/2753
Date January 2015
CreatorsDas, Debasish
ContributorsObradovic, Zoran, Ganguly, Auroop R., Vucetic, Slobodan, Latecki, Longin
PublisherTemple University. Libraries
Source SetsTemple University
LanguageEnglish
Detected LanguageEnglish
TypeThesis/Dissertation, Text
Format85 pages
RightsIN COPYRIGHT- This Rights Statement can be used for an Item that is in copyright. Using this statement implies that the organization making this Item available has determined that the Item is in copyright and either is the rights-holder, has obtained permission from the rights-holder(s) to make their Work(s) available, or makes the Item available under an exception or limitation to copyright (including Fair Use) that entitles it to make the Item available., http://rightsstatements.org/vocab/InC/1.0/
Relationhttp://dx.doi.org/10.34944/dspace/2735, Theses and Dissertations

Page generated in 0.0033 seconds