Spelling suggestions: "subject:"additive models"" "subject:"dditive models""
1 |
Additive models with shape constraintsPya, Natalya January 2010 (has links)
In many practical situations when analyzing a dependence of one or more explanatory variables on a response variable it is essential to assume that the relationship of interest obeys certain shape constraints, such as monotonicity or monotonicity and convexity/concavity. In this thesis a new approach to shape preserving smoothing within generalized additive models has been developed. In contrast with previous quadratic programming based methods, the project develops intermediate rank penalized smoothers with shape constrained restrictions based on re-parameterized B-splines and penalties based on the P-spline ideas of Eilers and Marx (1996). Smoothing under monotonicity constraints and monotonicity together with convexity/concavity for univariate smooths; and smoothing of bivariate functions with monotonicity restrictions on both covariates and on only one of them are considered. The proposed shape constrained smoothing has been incorporated into generalized additive models with a mixture of unconstrained and shape restricted smooth terms (mono-GAM). A fitting procedure for mono-GAM is developed. Since a major challenge of any flexible regression method is its implementation in a computationally efficient and stable manner, issues such as convergence, rank deficiency of the working model matrix, initialization, and others have been thoroughly dealt with. A question about the limiting posterior distribution of the model parameters is solved, which allows us to construct Bayesian confidence intervals of the mono-GAM smooth terms by means of the delta method. The performance of these confidence intervals is examined by assessing realized coverage probabilities using simulation studies. The proposed modelling approach has been implemented in an R package monogam. The model setup is the same as in mgcv(gam) with the addition of shape constrained smooths. In order to be consistent with the unconstrained GAM, the package provides key functions similar to those associated with mgcv(gam). Performance and timing comparisons of mono-GAM with other alternative methods has been undertaken. The simulation studies show that the new method has practical advantages over the alternatives considered. Applications of mono-GAM to various data sets are presented which demonstrate its ability to model many practical situations.
|
2 |
Natural and anthropogenic controls of landslides on Vancouver IslandGoetz, Jason 30 April 2012 (has links)
Empirically-based models of landslide distribution and susceptibility are currently the most commonly used approach for mapping probabilities of landslide initiation and analyzing their association with natural and anthropogenic environmental factors. In general, these models statistically estimate susceptibility based on the predisposition of an area to experience a landslide given a range of environmental factors, which may include land use, topography, hydrology and other spatial attributes. Novel statistical approaches include the generalized additive model (GAM), a non-parametric regression technique, which is used in this study to explore the relationship of landslide initiation to topography, rainfall and forest land cover and logging roads on Vancouver Island, British Columbia.
The analysis is centered on an inventory of 639 landslides of winter 2006/07. Data sources representing potentially relevant environmental conditions of landslide initiation are based on: terrain analysis derived from a 20-m CDED digital elevation model; forest land cover classified from Landsat TM scenes for the summer before the 2006 rainy season; geostatistically interpolated antecedent rainfall patterns representing different temporal scales of rainfall (a major storm, winter and annual rainfall); and the main lithological units of surface geology.
In order to assess the incremental effect of these data sources to predict landslide susceptibility, predictive performances of models based on GAMs are compared using spatial cross-validation estimates of the area under the ROC curve (AUROC), and variable selection frequencies are used to determine the prevalence of non-parametric associations to landslides.
In addition to topographic variables, forest land cover (e.g., deforestation), and logging roads showed a strong association with landslide initiation, followed by rainfall patterns and the very general lithological classification as less important controls of landscape-scale landslide activity in this area. Annual rainfall patterns are found not to contribute significantly to model prediction improvement and may lead to model overfitting. Comparisons to generalized linear models (i.e., logistic regression) indicate that GAMs are significantly better for modeling landslide susceptibility.
Overall, based on the model predictions, the most susceptible 4% of the study area had 29 times higher density of landslide initiation points than the least susceptible 73% of the study area (0.156 versus 0.005 landslides/km2).
|
3 |
Natural and anthropogenic controls of landslides on Vancouver IslandGoetz, Jason 30 April 2012 (has links)
Empirically-based models of landslide distribution and susceptibility are currently the most commonly used approach for mapping probabilities of landslide initiation and analyzing their association with natural and anthropogenic environmental factors. In general, these models statistically estimate susceptibility based on the predisposition of an area to experience a landslide given a range of environmental factors, which may include land use, topography, hydrology and other spatial attributes. Novel statistical approaches include the generalized additive model (GAM), a non-parametric regression technique, which is used in this study to explore the relationship of landslide initiation to topography, rainfall and forest land cover and logging roads on Vancouver Island, British Columbia.
The analysis is centered on an inventory of 639 landslides of winter 2006/07. Data sources representing potentially relevant environmental conditions of landslide initiation are based on: terrain analysis derived from a 20-m CDED digital elevation model; forest land cover classified from Landsat TM scenes for the summer before the 2006 rainy season; geostatistically interpolated antecedent rainfall patterns representing different temporal scales of rainfall (a major storm, winter and annual rainfall); and the main lithological units of surface geology.
In order to assess the incremental effect of these data sources to predict landslide susceptibility, predictive performances of models based on GAMs are compared using spatial cross-validation estimates of the area under the ROC curve (AUROC), and variable selection frequencies are used to determine the prevalence of non-parametric associations to landslides.
In addition to topographic variables, forest land cover (e.g., deforestation), and logging roads showed a strong association with landslide initiation, followed by rainfall patterns and the very general lithological classification as less important controls of landscape-scale landslide activity in this area. Annual rainfall patterns are found not to contribute significantly to model prediction improvement and may lead to model overfitting. Comparisons to generalized linear models (i.e., logistic regression) indicate that GAMs are significantly better for modeling landslide susceptibility.
Overall, based on the model predictions, the most susceptible 4% of the study area had 29 times higher density of landslide initiation points than the least susceptible 73% of the study area (0.156 versus 0.005 landslides/km2).
|
4 |
Prediction and variable selection in sparse ultrahigh dimensional additive modelsRamirez, Girly Manguba January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Haiyan Wang / The advance in technologies has enabled many fields to collect datasets where the number of covariates (p) tends to be much bigger than the number of observations (n), the
so-called ultrahigh dimensionality. In this setting, classical regression methodologies are
invalid. There is a great need to develop methods that can explain the variations of the
response variable using only a parsimonious set of covariates. In the recent years, there
have been significant developments of variable selection procedures. However, these available procedures usually result in the selection of too many false variables. In addition, most of the available procedures are appropriate only when the response variable is linearly associated with the covariates. Motivated by these concerns, we propose another procedure for variable selection in ultrahigh dimensional setting which has the ability to reduce the number of false positive variables. Moreover, this procedure can be applied when the response variable is continuous or binary, and when the response variable is linearly or non-linearly related to the covariates. Inspired by the Least Angle Regression approach, we develop two multi-step algorithms to select variables in sparse ultrahigh dimensional additive models. The variables go through a series of nonlinear dependence evaluation following a Most Significant Regression (MSR) algorithm. In addition, the MSR algorithm is also designed to
implement prediction of the response variable. The first algorithm called MSR-continuous
(MSRc) is appropriate for a dataset with a response variable that is continuous. Simulation
results demonstrate that this algorithm works well. Comparisons with other methods such
as greedy-INIS by Fan et al. (2011) and generalized correlation procedure by Hall and Miller (2009) showed that MSRc not only has false positive rate that is significantly less than both methods, but also has accuracy and true positive rate comparable with greedy-INIS. The second algorithm called MSR-binary (MSRb) is appropriate when the response variable is binary. Simulations demonstrate that MSRb is competitive in terms of prediction accuracy and true positive rate, and better than GLMNET in terms of false positive rate. Application of MSRb to real datasets is also presented. In general, MSR algorithm usually selects fewer variables while preserving the accuracy of predictions.
|
5 |
Air Pollution and Health: Time Series Tools and AnalysisBurr, WESLEY SAMUEL 29 October 2012 (has links)
This thesis is concerned, loosely, with time series analysis. It is also, loosely, concerned with smoothers and Generalized Additive Models. And, finally, it is also concerned with the estimation of health risk due to air pollution.
In the field of time series analysis, we develop two data-driven interpolation algorithms for interpolation of mixed time series data; that is, data which has a stationary or “almost” stationary background with embedded deterministic trend and
sinusoidal components. These interpolators are developed to deal with the problem of estimating power spectra under the condition that some observations of the series are unavailable.
We examine the structure of time-based cubic regression spline smoothers in Generalized Additive Models and demonstrate several interpretation problems with the
resultant models. We propose, implement, and test a replacement smoother and show dramatic improvement. We further demonstrate a new, spectrally motivated way of
examining residuals in Generalized Additive Models which drives many of the findings of this thesis.
Finally, we create and analyze a large-scale Canadian air pollution and mortality database. In the course of analyzing the data we rebuild the standard risk estimation model and demonstrate several improvements. We conclude with a comparison of the original model and the updated model and show that the new model gives consistently more positive risk estimates. / Thesis (Ph.D, Mathematics & Statistics) -- Queen's University, 2012-10-26 14:32:00.678
|
6 |
Análise da correlação entre o Ibovespa e o ativo petr4 : estimação via modelos Garch e modelos aditivosNunes, Fábio Magalhães January 2009 (has links)
A estimação e previsão da volatilidade de ativos são de suma importância para os mercados financeiros. Temas como risco e incerteza na teoria econômica incentivaram a procura por métodos capazes de modelar a variância condicional que evolui ao longo do tempo. O objetivo central desta dissertação foi modelar via modelos ARCH – GARCH e modelos aditivos o índice do IBOVESPA e o ativo PETR4 para analisar a existência de correlação entre as volatilidades estimadas. A estimação da volatilidade dos ativos no método paramétrico foi realizada via modelos EGARCH; já para o método não paramétrico, utilizouse os modelos aditivos com 5 defasagens. / Volatility estimation and forecasting are very important matters for the financial markets. Themes like risk and uncertainty in modern economic theory have encouraged the search for methods that allow for modeling of time varying variances. The main objective of this dissertation was estimate through GARCH models and additive models of IBOVESPA and PETR4 assets; and analyzes the existence of correlation between volatilities estimated. We use EGARCH models to estimate through parametric methods and use additive models 5 to estimate non parametric methods.
|
7 |
An Additive Bivariate Hierarchical Model for Functional Data and Related ComputationsRedd, Andrew Middleton 2010 August 1900 (has links)
The work presented in this dissertation centers on the theme of regression and
computation methodology. Functional data is an important class of longitudinal
data, and principal component analysis is an important approach to regression with
this type of data. Here we present an additive hierarchical bivariate functional data
model employing principal components to identify random e ects. This additive
model extends the univariate functional principal component model. These models
are implemented in the pfda package for R. To t the curves from this class of models
orthogonalized spline basis are used to reduce the dimensionality of the t, but retain
exibility. Methods for handing spline basis functions in a purely analytical manner,
including the orthogonalizing process and computing of penalty matrices used to t
the principal component models are presented. The methods are implemented in the
R package orthogonalsplinebasis.
The projects discussed involve complicated coding for the implementations in R.
To facilitate this I created the NppToR utility to add R functionality to the popular
windows code editor Notepad . A brief overview of the use of the utility is also
included.
|
8 |
Trends in Forest Soil Acidity : A GAM Based Approach with Application on Swedish Forest Soil Inventory DataBetnér, Staffan January 2018 (has links)
The acidification of soils has been a continuous process since at least the beginning of the 20th century. Therefore, an inquiry of how and when the soil pH levels have changed is relevant to gain better understanding of this process. The aim of this thesis is to study the average national soil pH level over time in Sweden and the local spatial differences within Sweden over time. With data from the Swedish National Forest Inventory, soil pH surfaces are estimated for each surveyed year together with the national average soil pH using a generalized additive modeling approach with one model for each pair of consecutive years. A decreasing trend in average national level soil pH was found together with some very weak evidence of year-to-year differences in the spatial structure of soil pH.
|
9 |
Análise da correlação entre o Ibovespa e o ativo petr4 : estimação via modelos Garch e modelos aditivosNunes, Fábio Magalhães January 2009 (has links)
A estimação e previsão da volatilidade de ativos são de suma importância para os mercados financeiros. Temas como risco e incerteza na teoria econômica incentivaram a procura por métodos capazes de modelar a variância condicional que evolui ao longo do tempo. O objetivo central desta dissertação foi modelar via modelos ARCH – GARCH e modelos aditivos o índice do IBOVESPA e o ativo PETR4 para analisar a existência de correlação entre as volatilidades estimadas. A estimação da volatilidade dos ativos no método paramétrico foi realizada via modelos EGARCH; já para o método não paramétrico, utilizouse os modelos aditivos com 5 defasagens. / Volatility estimation and forecasting are very important matters for the financial markets. Themes like risk and uncertainty in modern economic theory have encouraged the search for methods that allow for modeling of time varying variances. The main objective of this dissertation was estimate through GARCH models and additive models of IBOVESPA and PETR4 assets; and analyzes the existence of correlation between volatilities estimated. We use EGARCH models to estimate through parametric methods and use additive models 5 to estimate non parametric methods.
|
10 |
Análise da correlação entre o Ibovespa e o ativo petr4 : estimação via modelos Garch e modelos aditivosNunes, Fábio Magalhães January 2009 (has links)
A estimação e previsão da volatilidade de ativos são de suma importância para os mercados financeiros. Temas como risco e incerteza na teoria econômica incentivaram a procura por métodos capazes de modelar a variância condicional que evolui ao longo do tempo. O objetivo central desta dissertação foi modelar via modelos ARCH – GARCH e modelos aditivos o índice do IBOVESPA e o ativo PETR4 para analisar a existência de correlação entre as volatilidades estimadas. A estimação da volatilidade dos ativos no método paramétrico foi realizada via modelos EGARCH; já para o método não paramétrico, utilizouse os modelos aditivos com 5 defasagens. / Volatility estimation and forecasting are very important matters for the financial markets. Themes like risk and uncertainty in modern economic theory have encouraged the search for methods that allow for modeling of time varying variances. The main objective of this dissertation was estimate through GARCH models and additive models of IBOVESPA and PETR4 assets; and analyzes the existence of correlation between volatilities estimated. We use EGARCH models to estimate through parametric methods and use additive models 5 to estimate non parametric methods.
|
Page generated in 0.0523 seconds