Spelling suggestions: "subject:"conlinear models"" "subject:"collinear models""
111 |
Implementação em R de modelos de regressão binária com ligação paramétrica / R implementation of binary regression models with parametric linkBernardo Pereira dos Santos 27 February 2013 (has links)
A análise de dados binários é usualmente feita através da regressão logística, mas esse modelo possui limitações. Modificar a função de ligação da regressão permite maior flexibilidade na modelagem e diversas propostas já foram feitas nessa área. No entanto, não se sabe de nenhum pacote estatístico capaz de estimar esses modelos, o que dificulta sua utilização. O presente trabalho propõe uma implementação em R de quatro modelos de regressão binária com função de ligação paramétrica usando tanto a abordagem frequentista como a Bayesiana. / Binary data analysis is usually conducted with logistic regression, but this model has limitations. Modifying the link function allows greater flexibility in modelling and several proposals have been made on the field. However, to date there are no packages capable of estimating these models imposing some difficulties to utilize them. The present work develops an R implementation of four binary regression models with parametric link functions in both frequentist and Bayesian approaches.
|
112 |
Modeling time series data with semi-reflective boundariesJohnson, Amy May 01 December 2013 (has links)
High frequency time series data have become increasingly common. In many settings, such as the medical sciences or economics, these series may additionally display semi-reflective boundaries. These are boundaries, either physically existing, arbitrarily set, or determined based on inherent qualities of the series, which may be exceeded and yet based on probable consequences offer incentives to return to mid-range levels. In a lane control setting, Dawson, Cavanaugh, Zamba, and Rizzo (2010) have previously developed a weighted third-order autoregressive model utilizing flat, linear, and quadratic projections with a signed error term in order to depict key features of driving behavior, where the probability of a negative residual is predicted via logistic regression. In this driving application, the intercept (Λ0) of the logistic regression model describes the central tendency of a particular driver while the slope parameter (Λ1 ) can be intuitively defined as a representation of the propensity of the series to return to mid-range levels. We call this therefore the "re-centering" parameter, though this is a slight misnomer since the logistic model does not describe the position of the series, but rather the probability of a negative residual. In this framework a multi-step estimation algorithm, which we label as the Single-Pass method, was provided.
In addition to investigating the statistical properties of the Single-Pass method, several other estimation techniques are investigated. These techniques include an Iterated Grid Search, which utilizes the underlying likelihood model, and four modified versions of the Single-Pass method. These Modified Single-Pass (MSP) techniques utilize respectively unconstrained least squares estimation for the vector of projection coefficients (Β), use unconstrained linear regression with a post-hoc application of the summation constraint, reduce the regression model to include only the flat and linear projections, or implement the Least Absolute Shrinkage and Selection Operator (LASSO). For each of these techniques, mean bias, confidence intervals, and coverage probabilities were calculated which indicated that of the modifications only the first two were promising alternatives.
In a driving application, we therefore considered these two modified techniques along with the Single-Pass and Iterative Grid Search. It was found that though each of these methods remains biased with generally lower than ideal coverage probabilities, in a lane control setting they are each able to distinguish between two populations based on disease status. It has also been found that the re-centering parameter, estimated based on data collected in a driving simulator amongst a control population, is significantly correlated with neuropsychological outcomes as well as driving errors performed on-road. Several of these correlations were apparent regardless of the estimation technique, indicating real-world validity of the model across related assessments. Additionally, the Iterated Grid Search produces estimates that are most distinct with generally lower bias and improved coverage with the exception of the estimate of Λ1. However this method also requires potentially large time and memory commitments as compared to the other techniques considered. Thus the optimal estimation scheme is dependent upon the situation. When feasible the Iterated Grid Search appears to be the best overall method currently available. However if time or memory is a limiting factor, or if a reliable estimate of the re-centering parameter with reasonably accurate estimation of the Β vector is desired, the Modified Single-Pass technique utilizing unconstrained linear regression followed by implementation of the summation constraint is a sensible alternative.
|
113 |
Improved effort estimation of software projects based on metricsAndersson, Veronika, Sjöstedt, Hanna January 2005 (has links)
<p>Saab Ericsson Space AB develops products for space for a predetermined price. Since the price is fixed, it is crucial to have a reliable prediction model to estimate the effort needed to develop the product. In general software effort estimation is difficult, and at the software department this is a problem.</p><p>By analyzing metrics, collected from former projects, different prediction models are developed to estimate the number of person hours a software project will require. Models for predicting the effort before a project begins is first developed. Only a few variables are known at this state of a project. The models developed are compared to a current model used at the company. Linear regression models improve the estimate error with nine percent units and nonlinear regression models improve the result even more. The model used today is also calibrated to improve its predictions. A principal component regression model is developed as well. Also a model to improve the estimate during an ongoing project is developed. This is a new approach, and comparison with the first estimate is the only evaluation.</p><p>The result is an improved prediction model. There are several models that perform better than the one used today. In the discussion, positive and negative aspects of the models are debated, leading to the choice of a model, recommended for future use.</p>
|
114 |
Reduced-order, trajectory piecewise-linear models for nonlinear computational fluid dynamicsGratton, David, Willcox, Karen E. 01 1900 (has links)
A trajectory piecewise-linear (TPWL) approach is developed for a computational fluid dynamics (CFD) model of the two-dimensional Euler equations. The approach uses a weighted combination of linearized models to represent the nonlinear CFD system. The proper orthogonal decomposition (POD) is then used to create a reduced-space basis, onto which the TPWL model is projected. This projection yields an efficient reduced-order model of the nonlinear system, which does not require the evaluation of any full-order system residuals. The method is applied to the case of flow through an actively controlled supersonic diffuser. With an appropriate choice of linearization points and POD basis vectors, the method is found to yield accurate results, including cases with significant shock motion. / Singapore-MIT Alliance (SMA)
|
115 |
The Turkish Catastrophe Insurance Pool Claims Modeling 2000-2008 DataSaribekir, Gozde 01 March 2013 (has links) (PDF)
After the 1999 Marmara Earthquake, social, economic and engineering studies on earthquakes became more intensive. The Turkish Catastrophe Insurance Pool (TCIP) was established after the Marmara Earthquake to share the deficit in the budget of the Government. The TCIP has become a data source for researchers, consisting of variables such as number of claims, claim amount and magnitude. In this thesis, the TCIP earthquake claims, collected between 2000 and 2008, are studied. The number of claims and claim payments (aggregate claim amount) are modeled by using Generalized Linear Models (GLM). Observed sudden jumps in claim data are represented by using the exponential kernel function. Model parameters are estimated by using the Maximum Likelihood Estimation (MLE). The results can be used as recommendation in the computation of expected value of the aggregate claim amounts and the premiums of the TCIP.
|
116 |
Improved effort estimation of software projects based on metricsAndersson, Veronika, Sjöstedt, Hanna January 2005 (has links)
Saab Ericsson Space AB develops products for space for a predetermined price. Since the price is fixed, it is crucial to have a reliable prediction model to estimate the effort needed to develop the product. In general software effort estimation is difficult, and at the software department this is a problem. By analyzing metrics, collected from former projects, different prediction models are developed to estimate the number of person hours a software project will require. Models for predicting the effort before a project begins is first developed. Only a few variables are known at this state of a project. The models developed are compared to a current model used at the company. Linear regression models improve the estimate error with nine percent units and nonlinear regression models improve the result even more. The model used today is also calibrated to improve its predictions. A principal component regression model is developed as well. Also a model to improve the estimate during an ongoing project is developed. This is a new approach, and comparison with the first estimate is the only evaluation. The result is an improved prediction model. There are several models that perform better than the one used today. In the discussion, positive and negative aspects of the models are debated, leading to the choice of a model, recommended for future use.
|
117 |
Bayesian Semiparametric Models for Heterogeneous Cross-platform Differential Gene ExpressionDhavala, Soma Sekhar 2010 December 1900 (has links)
We are concerned with testing for differential expression and consider three different
aspects of such testing procedures. First, we develop an exact ANOVA type
model for discrete gene expression data, produced by technologies such as a Massively
Parallel Signature Sequencing (MPSS), Serial Analysis of Gene Expression (SAGE)
or other next generation sequencing technologies. We adopt two Bayesian hierarchical
models—one parametric and the other semiparametric with a Dirichlet process
prior that has the ability to borrow strength across related signatures, where a signature
is a specific arrangement of the nucleotides. We utilize the discreteness of the
Dirichlet process prior to cluster signatures that exhibit similar differential expression
profiles. Tests for differential expression are carried out using non-parametric
approaches, while controlling the false discovery rate. Next, we consider ways to
combine expression data from different studies, possibly produced by different technologies
resulting in mixed type responses, such as Microarrays and MPSS. Depending
on the technology, the expression data can be continuous or discrete and can have different
technology dependent noise characteristics. Adding to the difficulty, genes can
have an arbitrary correlation structure both within and across studies. Performing
several hypothesis tests for differential expression could also lead to false discoveries.
We propose to address all the above challenges using a Hierarchical Dirichlet process
with a spike-and-slab base prior on the random effects, while smoothing splines model the unknown link functions that map different technology dependent manifestations
to latent processes upon which inference is based. Finally, we propose an algorithm
for controlling different error measures in a Bayesian multiple testing under generic
loss functions, including the widely used uniform loss function. We do not make
any specific assumptions about the underlying probability model but require that
indicator variables for the individual hypotheses are available as a component of the
inference. Given this information, we recast multiple hypothesis testing as a combinatorial
optimization problem and in particular, the 0-1 knapsack problem which
can be solved efficiently using a variety of algorithms, both approximate and exact in
nature.
|
118 |
Testing Lack-of-Fit of Generalized Linear Models via Laplace ApproximationGlab, Daniel Laurence 2011 May 1900 (has links)
In this study we develop a new method for testing the null hypothesis that the predictor
function in a canonical link regression model has a prescribed linear form. The class of
models, which we will refer to as canonical link regression models, constitutes arguably
the most important subclass of generalized linear models and includes several of the most
popular generalized linear models. In addition to the primary contribution of this study,
we will revisit several other tests in the existing literature. The common feature among the
proposed test, as well as the existing tests, is that they are all based on orthogonal series
estimators and used to detect departures from a null model.
Our proposal for a new lack-of-fit test is inspired by the recent contribution of Hart
and is based on a Laplace approximation to the posterior probability of the null hypothesis.
Despite having a Bayesian construction, the resulting statistic is implemented in a
frequentist fashion. The formulation of the statistic is based on characterizing departures
from the predictor function in terms of Fourier coefficients, and subsequent testing that all
of these coefficients are 0. The resulting test statistic can be characterized as a weighted
sum of exponentiated squared Fourier coefficient estimators, whereas the weights depend
on user-specified prior probabilities. The prior probabilities provide the investigator the
flexibility to examine specific departures from the prescribed model. Alternatively, the use
of noninformative priors produces a new omnibus lack-of-fit statistic.
We present a thorough numerical study of the proposed test and the various existing
orthogonal series-based tests in the context of the logistic regression model. Simulation
studies demonstrate that the test statistics under consideration possess desirable power
properties against alternatives that have been identified in the existing literature as being
important.
|
119 |
Parameter Estimation In Generalized Partial Linear Models With Conic Quadratic ProgrammingCelik, Gul 01 September 2010 (has links) (PDF)
In statistics, regression analysis is a technique, used to understand and model the
relationship between a dependent variable and one or more independent variables.
Multiple Adaptive Regression Spline (MARS) is a form of regression analysis. It is a
non-parametric regression technique and can be seen as an extension of linear models
that automatically models non-linearities and interactions. MARS is very important
in both classification and regression, with an increasing number of applications in
many areas of science, economy and technology.
In our study, we analyzed Generalized Partial Linear Models (GPLMs), which are
particular semiparametric models. GPLMs separate input variables into two parts
and additively integrates classical linear models with nonlinear model part. In order
to smooth this nonparametric part, we use Conic Multiple Adaptive Regression Spline
(CMARS), which is a modified form of MARS. MARS is very benefical for high
dimensional problems and does not require any particular class of relationship between
the regressor variables and outcome variable of interest. This technique offers a great advantage for fitting nonlinear multivariate functions. Also, the contribution of the
basis functions can be estimated by MARS, so that both the additive and interaction
effects of the regressors are allowed to determine the dependent variable. There are
two steps in the MARS algorithm: the forward and backward stepwise algorithms. In
the first step, the model is constructed by adding basis functions until a maximum
level of complexity is reached. Conversely, in the second step, the backward stepwise
algorithm reduces the complexity by throwing the least significant basis functions from
the model.
In this thesis, we suggest not using backward stepwise algorithm, instead, we employ
a Penalized Residual Sum of Squares (PRSS). We construct PRSS for MARS as a
Tikhonov Regularization Problem. We treat this problem using continuous optimization
techniques which we consider to become an important complementary technology
and alternative to the concept of the backward stepwise algorithm. Especially, we apply
the elegant framework of Conic Quadratic Programming (CQP) an area of convex
optimization that is very well-structured, hereby, resembling linear programming and,
therefore, permitting the use of interior point methods.
At the end of this study, we compare CQP with Tikhonov Regularization problem
for two different data sets, which are with and without interaction effects. Moreover,
by using two another data sets, we make a comparison between CMARS and two
other classification methods which are Infinite Kernel Learning (IKL) and Tikhonov
Regularization whose results are obtained from the thesis, which is on progress.
|
120 |
Log-linear Rasch-type models for repeated categorical data with a psychobiological applicationHatzinger, Reinhold, Katzenbeisser, Walter January 2008 (has links) (PDF)
The purpose of this paper is to generalize regression models for repeated categorical data based on maximizing a conditional likelihood. Some existing methods, such as those proposed by Duncan (1985), Fischer (1989), and Agresti (1993, and 1997) are special cases of this latent variable approach, used to account for dependencies in clustered observations. The generalization concerns the incorporation of rather general data structures such as subject-specific time-dependent covariates, a variable number of observations per subject and time periods of arbitrary length in order to evaluate treatment effects on a categorical response variable via a linear parameterization. The response may be polytomous, ordinal or dichotomous. The main tool is the log-linear representation of appropriately parameterized Rasch-type models, which can be fitted using standard software, e.g., R. The proposed method is applied to data from a psychiatric study on the evaluation of psychobiological variables in the therapy of depression. The effects of plasma levels of the antidepressant drug Clomipramine and neuroendocrinological variables on the presence or absence of anxiety symptoms in 45 female patients are analyzed. The individual measurements of the time dependent variables were recorded on 2 to 11 occasions. The findings show that certain combinations of the variables investigated are favorable for the treatment outcome. (author´s abstract) / Series: Research Report Series / Department of Statistics and Mathematics
|
Page generated in 0.0478 seconds