Spelling suggestions: "subject:"penalized spline"" "subject:"menalized spline""
1 |
Avoiding the redundant effect on regression analyses of including an outcome in the imputation modelTamegnon, Monelle 01 January 2018 (has links)
Imputation is one well recognized method for handling missing data. Multiple imputation provides a framework for imputing missing data that incorporate uncertainty about the imputations at the analysis stage. An important factor to consider when performing multiple imputation is the imputation model. In particular, a careful choice of the covariates to include in the model is crucial. The current recommendation by several authors in the literature (Van Buren, 2012; Moons et al., 2006, Little and Rubin, 2002) is to include all variables that will appear in the analytical model including the outcome as covariates in the imputation model. When the goal of the analysis is to explore the relationship between the outcome and the variable with missing data (the target variable), this recommendation seems questionable. Should we make use of the outcome to fill-in the target variable missing observations and then use these filled-in observations along with the observed data on the target variable to explore the relationship of the target variable with the outcome? We believe that this approach is circular. Instead, we have designed multiple imputation approaches rooted in machines learning techniques that avoid the use of the outcome at the imputation stage and maintain reasonable inferential properties. We also compare our approaches performances to currently available methods.
|
2 |
Zonal And Regional Load Forecasting In The New England Wholesale Electricity Market: A Semiparametric Regression ApproachFarland, Jonathan 01 January 2013 (has links) (PDF)
Power system planning, reliability analysis and economically efficient capacity scheduling all rely heavily on electricity demand forecasting models. In the context of a deregulated wholesale electricity market, using scheduling a region’s bulk electricity generation is inherently linked to future values of demand. Predictive models are used by municipalities and suppliers to bid into the day-ahead market and by utilities in order to arrange contractual interchanges among neighboring utilities. These numerical predictions are therefore pervasive in the energy industry.
This research seeks to develop a regression-based forecasting model. Specifically, electricity demand is modeled as a function of calendar effects, lagged demand effects, weather effects, and a stochastic disturbance. Variables such as temperature, wind speed, cloud cover and humidity are known to be among the strongest predictors of electricity demand and as such are used as model inputs. It is well known, however, that the relationship between demand and weather can be highly nonlinear. Rather than assuming a linear functional form, the structural change in these relationships is explored. Those variables that indicate a nonlinear relationship with demand are accommodated with penalized splines in a semiparametric regression framework. The equivalence between penalized splines and the special case of a mixed model formulation allows for model estimation with currently available statistical packages such as R, STATA and SAS.
Historical data are available for the entire New England region as well as for the smaller zones that collectively make up the regional grid. As such, a secondary research objective of this thesis is to explore whether or not an aggregation of zonal forecasts might perform better than those produced from a single regional model. Prior to this research, neither the applicability of a semiparametric regression-based approach towards load forecasting nor the potential improvement in forecasting performance resulting from zonal load forecasting has been investigated for the New England wholesale electricity market.
|
3 |
Essays on High-dimensional Nonparametric Smoothing and Its Applications to Asset PricingWu, Chaojiang 25 October 2013 (has links)
No description available.
|
4 |
Semiparametric Methods for the Generalized Linear ModelChen, Jinsong 01 July 2010 (has links)
The generalized linear model (GLM) is a popular model in many research areas. In the GLM, each outcome of the dependent variable is assumed to be generated from a particular distribution function in the exponential family. The mean of the distribution depends on the independent variables. The link function provides the relationship between the linear predictor and the mean of the distribution function. In this dissertation, two semiparametric extensions of the GLM will be developed. In the first part of this dissertation, we have proposed a new model, called a semiparametric generalized linear model with a log-concave random component (SGLM-L). In this model, the estimate of the distribution of the random component has a nonparametric form while the estimate of the systematic part has a parametric form. In the second part of this dissertation, we have proposed a model, called a generalized semiparametric single-index mixed model (GSSIMM). A nonparametric component with a single index is incorporated into the mean function in the generalized linear mixed model (GLMM) assuming that the random component is following a parametric distribution.
In the first part of this dissertation, since most of the literature on the GLM deals with the parametric random component, we relax the parametric distribution assumption for the random component of the GLM and impose a log-concave constraint on the distribution. An iterative numerical algorithm for computing the estimators in the SGLM-L is developed. We construct a log-likelihood ratio test for inference. In the second part of this dissertation, we use a single index model to generalize the GLMM to have a linear combination of covariates enter the model via a nonparametric mean function, because the linear model in the GLMM is not complex enough to capture the underlying relationship between the response and its associated covariates. The marginal likelihood is approximated using the Laplace method. A penalized quasi-likelihood approach is proposed to estimate the nonparametric function and parameters including single-index coe±cients in the GSSIMM. We estimate variance components using marginal quasi-likelihood. Asymptotic properties of the estimators are developed using a similar idea by Yu (2008). A simulation example is carried out to compare the performance of the GSSIMM with that of the GLMM. We demonstrate the advantage of my approach using a study of the association between daily air pollutants and daily mortality adjusted for temperature and wind speed in various counties of North Carolina. / Ph. D.
|
5 |
Splines multidimensionnelles pénalisées pour modéliser le taux de survenue d’un événement : application au taux de mortalité en excès et à la survie nette en épidémiologie des maladies chroniques / Multidimensional penalized splines for hazard modelling : application to excess mortality hazard and net survival in chronic disease epidemiologyFauvernier, Mathieu 24 September 2019 (has links)
L’étude du temps de survenue d’un événement représente un champ très important des statistiques. Lorsque l’événement étudié est le décès, on cherche à décrire la survie des individus ainsi que leur taux de mortalité, c’est-à-dire la « force de mortalité » qui s’applique à un instant donné. Les patients atteints d’une maladie chronique présentent en général un excès de mortalité par rapport à une population ne présentant pas la maladie en question. En épidémiologie, l’étude du taux de mortalité en excès des patients, et notamment de l’impact des facteurs pronostiques sur celui-ci, représente donc un enjeu majeur de santé publique. D’un point de vue statistique, la modélisation du taux de mortalité (en excès) implique de prendre en compte les effets potentiellement non-linéaires et dépendants du temps des facteurs pronostiques ainsi que les interactions. Les splines de régression, polynômes par morceaux paramétriques et flexibles, sont des outils particulièrement bien adaptés pour modéliser des effets d’une telle complexité. Toutefois, la flexibilité des splines de régression comporte un risque de sur-ajustement. Pour éviter ce risque, les splines de régression pénalisées ont été proposées dans le cadre des modèles additifs généralisés. Leur principe est le suivant : à chaque spline peuvent être associés un ou plusieurs termes de pénalité contrôlés par des paramètres de lissage. Les paramètres de lissage représentent les degrés de pénalisation souhaités. En pratique, ils sont inconnus et doivent être estimés tout comme les paramètres de régression. Dans le cadre de cette thèse, nous avons développé une méthode permettant de modéliser le taux de mortalité (en excès) à l’aide de splines de régression multidimensionnelles pénalisées. Des splines cubiques restreintes ont été utilisées comme splines unidimensionnelles ou bien comme bases marginales afin de former des splines multidimensionnelles par produits tensoriels. Le processus d’optimisation s’appuie sur deux algorithmes de Newton-Raphson emboîtés. L’estimation des paramètres de lissage est effectuée en optimisant un critère de validation croisée ou bien la vraisemblance marginale des paramètres de lissage par un algorithme de Newton-Raphson dit externe. A paramètres de lissage fixés, les paramètres de régression sont estimés par maximisation de la vraisemblance pénalisée par un algorithme de Newton-Raphson dit interne.Les bonnes propriétés de cette approche en termes de performances statistiques et de stabilité numérique ont ensuite été démontrées par simulation. La méthode a ensuite été implémentée au sein du package R survPen. Enfin, la méthode a été appliquée sur des données réelles afin de répondre aux deux questions épidémiologiques suivantes : l’impact de la défavorisation sociale sur la mortalité en excès des patients atteints d’un cancer du col de l’utérus et l’impact de l’âge courant sur la mortalité en excès des patients atteints de sclérose en plaques / Time-to-event analysis is a very important field in statistics. When the event under study is death, the analysis focuses on the probability of survival of the subjects as well as on their mortality hazard, that is, on the "force of mortality" that applies at any given moment. Patients with a chronic disease usually have an excess mortality compared to a population that does not have the disease. Studying the excess mortality hazard associated with a disease and investigating the impact of prognostic factors on this hazard are important public health issues in epidemiology. From a statistical point of view, modelling the (excess) mortality hazard involves taking into account potentially non-linear and time-dependent effects of prognostic factors as well as their interactions. Regression splines (i.e., parametric and flexible piecewise polynomials) are ideal for dealing with such a complexity. They make it possible to build easily nonlinear effects and, regarding interactions between continuous variables, make it easy to form a multidimensional spline from two or more marginal one-dimensional splines. However, the flexibility of regression splines presents a risk of overfitting. To avoid this risk, penalized regression splines have been proposed as part of generalized additive models. Their principle is to associate each spline with one or more penalty terms controlled by smoothing parameters. The smoothing parameters represent the desired degrees of penalization. In practice, these parameters are unknown and have to be estimated just like the regression parameters. This thesis describes the development of a method to model the (excess) hazard using multidimensional penalized regression splines. Restricted cubic splines were used as one-dimensional splines or marginal bases to form multidimensional splines by tensor products. The optimization process relies on two nested Newton-Raphson algorithms. Smoothing parameter estimation is performed by optimizing a cross-validation criterion or the marginal likelihood of the smoothing parameters with an outer Newton-Raphson algorithm. At fixed smoothing parameters, the regression parameters are estimated by maximizing the penalized likelihood by an inner Newton-Raphson algorithm.The good properties of this approach in terms of statistical performance and numerical stability were then demonstrated through simulation. The described method was then implemented within the R package survPen. Finally, the method was applied to real data to investigate two epidemiological issues: the impact of social deprivation on the excess mortality in cervical cancer patients and the impact of the current age on the excess mortality in multiple sclerosis patients
|
6 |
Variable Selection and Function Estimation Using Penalized MethodsXu, Ganggang 2011 December 1900 (has links)
Penalized methods are becoming more and more popular in statistical research. This dissertation research covers two major aspects of applications of penalized methods:
variable selection and nonparametric function estimation. The following two paragraphs give brief introductions to each of the two topics.
Infinite variance autoregressive models are important for modeling heavy-tailed time series. We use a penalty method to conduct model selection for autoregressive models with innovations in the domain of attraction of a stable law indexed by alpha is an element of (0, 2). We show that by combining the least absolute deviation loss function and the adaptive lasso penalty, we can consistently identify the true model. At the same time, the resulting coefficient estimator converges at a rate of n^(?1/alpha) . The proposed approach gives a unified variable selection procedure for both the finite and infinite variance autoregressive models.
While automatic smoothing parameter selection for nonparametric function estimation has been extensively researched for independent data, it is much less so for clustered and longitudinal data. Although leave-subject-out cross-validation (CV) has been widely used, its theoretical property is unknown and its minimization is computationally expensive, especially when there are multiple smoothing parameters. By focusing on penalized modeling methods, we show that leave-subject-out CV is optimal in that its minimization is asymptotically equivalent to the minimization of the true loss function. We develop an efficient Newton-type algorithm to compute the smoothing parameters that minimize the CV criterion. Furthermore, we derive one simplification of the leave-subject-out CV, which leads to a more efficient algorithm for selecting the smoothing parameters. We show that the simplified version of CV criteria is asymptotically equivalent to the unsimplified one and thus enjoys the same optimality property. This CV criterion also provides a completely data driven approach to select working covariance structure using generalized estimating equations in longitudinal data analysis. Our results are applicable to additive, linear varying-coefficient, nonlinear models with data from exponential families.
|
7 |
Estimation and Inference in Special Nonparametric Models with Applications to Topics in Development Economics / Schätzung und Inferenz in speziellen nichtparametrischen Modellen mit Andwendungen in der EntwicklungsökonomieWiesenfarth, Manuel 11 May 2012 (has links)
No description available.
|
8 |
Advances on the Birnbaum-Saunders distribution / Avanços na distribuição Birnbaum-SaundersNakamura, Luiz Ricardo 26 August 2016 (has links)
The Birnbaum-Saunders (BS) distribution is the most popular model used to describe lifetime process under fatigue. Throughout the years, this distribution has received a wide ranging of applications, demanding some more flexible extensions to solve more complex problems. One of the most well-known extensions of the BS distribution is the generalized Birnbaum- Saunders (GBS) family of distributions that includes the Birnbaum-Saunders special-case (BSSC) and the Birnbaum-Saunders generalized t (BSGT) models as special cases. Although the BS-SC distribution was previously developed in the literature, it was never deeply studied and hence, in this thesis, we provide a full Bayesian study and develop a tool to generate random numbers from this distribution. Further, we develop a very flexible regression model, that admits different degrees of skewness and kurtosis, based on the BSGT distribution using the generalized additive models for location, scale and shape (GAMLSS) framework. We also introduce a new extension of the BS distribution called the Birnbaum-Saunders power (BSP) family of distributions, which contains several special or limiting cases already published in the literature, including the GBS family. The main feature of the new family is that it can produce both unimodal and bimodal shapes depending on its parameter values. We also introduce this new family of distributions into the GAMLSS framework, in order to model any or all the parameters of the distribution using parametric linear and/or nonparametric smooth functions of explanatory variables. Throughout this thesis we present five different applications in real data sets in order to illustrate the developed theoretical results. / A distribuição Birnbaum-Saunders (BS) é o modelo mais popular utilizado para descrever processos de fadiga. Ao longo dos anos, essa distribuição vem recebendo aplicações nas mais diversas áreas, demandando assim algumas extensões mais flexíveis para resolver problemas mais complexos. Uma das extensões mais conhecidas na literatura é a família de distribuições Birnbaum-Saunders generalizada (GBS), que inclui as distribuições Birnbaum-Saunders casoespecial (BS-SC) e Birnbaum-Saunders t generalizada (BSGT) como modelos especiais. Embora a distribuição BS-SC tenha sido previamente desenvolvida na literatura, nunca foi estudada mais profundamente e, assim, nesta tese, um estudo bayesiano é desenvolvido acerca da mesma além de um novo gerador de números aleatórios dessa distribuição ser apresentado. Adicionalmente, um modelo de regressão baseado na distribuição BSGT é desenvolvido utilizando-se os modelos aditivos generalizados para locação, escala e forma (GAMLSS), os quais apresentam grande flexibilidade tanto para a assimetria como para a curtose. Uma nova extensão da distribuição BS também é apresentada, denominada família de distribuições Birnbaum-Saunders potência (BSP), que contém inúmeros casos especiais ou limites já publicados na literatura, incluindo a família GBS. A principal característica desta nova família é que ela é capaz de produzir formas tanto uni como bimodais dependendo do valor de seus parâmetros. Esta nova família também é introduzida na estrutura dos modelos GAMLSS para fornecer uma ferramenta capaz de modelar todos os parâmetros da distribuição como funções lineares e/ou não-lineares suavizadas de variáveis explicativas. Ao longo desta tese são apresentadas cinco diferentes aplicações em conjuntos de dados reais para ilustrar os resultados teóricos obtidos.
|
9 |
Advances on the Birnbaum-Saunders distribution / Avanços na distribuição Birnbaum-SaundersLuiz Ricardo Nakamura 26 August 2016 (has links)
The Birnbaum-Saunders (BS) distribution is the most popular model used to describe lifetime process under fatigue. Throughout the years, this distribution has received a wide ranging of applications, demanding some more flexible extensions to solve more complex problems. One of the most well-known extensions of the BS distribution is the generalized Birnbaum- Saunders (GBS) family of distributions that includes the Birnbaum-Saunders special-case (BSSC) and the Birnbaum-Saunders generalized t (BSGT) models as special cases. Although the BS-SC distribution was previously developed in the literature, it was never deeply studied and hence, in this thesis, we provide a full Bayesian study and develop a tool to generate random numbers from this distribution. Further, we develop a very flexible regression model, that admits different degrees of skewness and kurtosis, based on the BSGT distribution using the generalized additive models for location, scale and shape (GAMLSS) framework. We also introduce a new extension of the BS distribution called the Birnbaum-Saunders power (BSP) family of distributions, which contains several special or limiting cases already published in the literature, including the GBS family. The main feature of the new family is that it can produce both unimodal and bimodal shapes depending on its parameter values. We also introduce this new family of distributions into the GAMLSS framework, in order to model any or all the parameters of the distribution using parametric linear and/or nonparametric smooth functions of explanatory variables. Throughout this thesis we present five different applications in real data sets in order to illustrate the developed theoretical results. / A distribuição Birnbaum-Saunders (BS) é o modelo mais popular utilizado para descrever processos de fadiga. Ao longo dos anos, essa distribuição vem recebendo aplicações nas mais diversas áreas, demandando assim algumas extensões mais flexíveis para resolver problemas mais complexos. Uma das extensões mais conhecidas na literatura é a família de distribuições Birnbaum-Saunders generalizada (GBS), que inclui as distribuições Birnbaum-Saunders casoespecial (BS-SC) e Birnbaum-Saunders t generalizada (BSGT) como modelos especiais. Embora a distribuição BS-SC tenha sido previamente desenvolvida na literatura, nunca foi estudada mais profundamente e, assim, nesta tese, um estudo bayesiano é desenvolvido acerca da mesma além de um novo gerador de números aleatórios dessa distribuição ser apresentado. Adicionalmente, um modelo de regressão baseado na distribuição BSGT é desenvolvido utilizando-se os modelos aditivos generalizados para locação, escala e forma (GAMLSS), os quais apresentam grande flexibilidade tanto para a assimetria como para a curtose. Uma nova extensão da distribuição BS também é apresentada, denominada família de distribuições Birnbaum-Saunders potência (BSP), que contém inúmeros casos especiais ou limites já publicados na literatura, incluindo a família GBS. A principal característica desta nova família é que ela é capaz de produzir formas tanto uni como bimodais dependendo do valor de seus parâmetros. Esta nova família também é introduzida na estrutura dos modelos GAMLSS para fornecer uma ferramenta capaz de modelar todos os parâmetros da distribuição como funções lineares e/ou não-lineares suavizadas de variáveis explicativas. Ao longo desta tese são apresentadas cinco diferentes aplicações em conjuntos de dados reais para ilustrar os resultados teóricos obtidos.
|
Page generated in 0.055 seconds