Spelling suggestions: "subject:"model selection"" "subject:"godel selection""
71 |
Working correlation selection in generalized estimating equationsJang, Mi Jin 01 December 2011 (has links)
Longitudinal data analysis is common in biomedical research area. Generalized estimating equations (GEE) approach is widely used for longitudinal marginal models. The GEE method is known to provide consistent regression parameter estimates regardless of the choice of working correlation structure, provided the square root of n consistent nuisance parameters are used. However, it is important to use the appropriate working correlation structure in small samples, since it improves the statistical efficiency of β estimate. Several working correlation selection criteria have been proposed (Rotnitzky and Jewell, 1990; Pan, 2001; Hin and Wang, 2009; Shults et. al, 2009). However, these selection criteria have the same limitation in that they perform poorly when over-parameterized structures are considered as candidates. In this dissertation, new working correlation selection criteria are developed based on generalized eigenvalues. A set of generalized eigenvalues is used to measure the disparity between the bias-corrected sandwich variance estimator under the hypothesized working correlation matrix and the model-based variance estimator under a working independence assumption. A summary measure based on the set of the generalized eigenvalues provides an indication of the disparity between the true correlation structure and the misspecified working correlation structure. Motivated by the test statistics in MANOVA, three working correlation selection criteria are proposed: PT (Pillai's trace type criterion),WR (Wilks' ratio type criterion) and RMR (Roy's Maximum Root type criterion). The relationship between these generalized eigenvalues and the CIC measure is revealed.
In addition, this dissertation proposes a method to penalize for the over-parameterized working correlation structures. The over-parameterized structure converges to the true correlation structure, using extra parameters. Thus, the true correlation structure and the over-parameterized structure tend to provide similar variance estimate of the estimated β and similar working correlation selection criterion values. However, the over-parameterized structure is more likely to be chosen as the best working correlation structure by "the smaller the better" rule for criterion values. This is because the over-parameterization leads to the negatively biased sandwich variance estimator, hence smaller selection criterion value. In this dissertation, the over-parameterized structure is penalized through cluster detection and an optimization function. In order to find the group ("cluster") of the working correlation structures that are similar to each other, a cluster detection method is developed, based on spacings of the order statistics of the selection criterion measures. Once a cluster is found, the optimization function considering the trade-off between bias and variability provides the choice of the "best" approximating working correlation structure.
The performance of our proposed criterion measures relative to other relevant criteria (QIC, RJ and CIC) is examined in a series of simulation studies.
|
72 |
Semiparametric regression analysis of zero-inflated dataLiu, Hai 01 July 2009 (has links)
Zero-inflated data abound in ecological studies as well as in other scientific and quantitative fields. Nonparametric regression with zero-inflated response may be studied via the zero-inflated generalized additive model (ZIGAM). ZIGAM assumes that the conditional distribution of the response variable belongs to the zero-inflated 1-parameter exponential family which is a probabilistic mixture of the zero atom and the 1-parameter exponential family, where the zero atom accounts for an excess of zeroes in the data. We propose the constrained zero-inflated generalized additive model (COZIGAM) for analyzing zero-inflated data, with the further assumption that the probability of non-zero-inflation is some monotone function of the (non-zero-inflated) exponential family distribution mean. When the latter assumption obtains, the new approach provides a unified framework for modeling zero-inflated data, which is more parsimonious and efficient than the unconstrained ZIGAM. We develop an iterative algorithm for model estimation based on the penalized likelihood approach, and derive formulas for constructing confidence intervals of the maximum penalized likelihood estimator. Some asymptotic properties including the consistency of the regression function estimator and the limiting distribution of the parametric estimator are derived. We also propose a Bayesian model selection criterion for choosing between the unconstrained and the constrained ZIGAMs. We consider several useful extensions of the COZIGAM, including imposing additive-component-specific proportional and partial constraints, and incorporating threshold effects to account for regime shift phenomena. The new methods are illustrated with both simulated data and real applications. An R package COZIGAM has been developed for model fitting and model selection with zero-inflated data.
|
73 |
Spatio-temporal hidden Markov models for incorporating interannual variability in rainfallFrost, Andrew James January 2004 (has links)
Two new spatio-temporal hidden Markov models (HMM) are introduced in this thesis, with the purpose of capturing the persistent, spatially non-homogeneous nature of climate influence on annual rainfall series observed in Australia. The models extend the two-state HMM applied by Thyer (2001) by relaxing the assumption that all sites are under the same climate control. The Switch HMM (SHMM) allows at-site anomalous states, whilst still maintaining a regional control. The Regional HMM (RHMM), on the other hand, allows sites to be partitioned into different Markovian state regions. The analyses were conducted using a Bayesian framework to explicitly account for parameter uncertainty and select between competing hypotheses. Bayesian model averaging was used for comparison of the HMM and its generalisations. The HMM, SHMM and RHMM were applied to four groupings of four sites located on the Eastern coast of Australia, an area that has previously shown evidence of interannual persistence. In the majority of case studies, the RHMM variants showed greatest posterior weight, indicating that the data favoured the multiple region RHMM over the single region HMM or the SHMM variants. In no cases does the HMM produce the maximum marginal likelihood when compared to the SHMM and RHMM. The HMM state series and preferred model variants were sensitive to the parameterisation of the small-scale site-to-site correlation structure. Several parameterisations of the small-scale Gaussian correlation were trialled, namely Fitted Correlation, Exponential Decay Correlation, Empirical and Zero Correlation. Significantly, it was shown that annual rainfall data outliers can have a large effect on inference for a model that uses Gaussian distributions. The practical value of this modelling is demonstrated by the conditioning of the event based point rainfall model DRIP on the hidden state series of the HMM variants. Short timescale models typically underestimate annual variability because there is no explicit structure to incorporate long-term persistence. The two-state conditioned DRIP model was shown to reproduce the annual variability observed to a greater degree than the single state DRIP. / PhD Doctorate
|
74 |
雙線性時間序列模式選取之研究 / Model Selection of Bilinear Time Series劉瑞芝, Liou, Ruey Chih Unknown Date (has links)
時間序列在過去二十年當中,受到熱列地討論,而絕大多數的文獻都是研
究線性時間序列模式。但在現實生活中,很多時間序列並不符合線性的假
設,因此近十年來很多學者致力研究非線性時間序列模式。其中有一種雙
線性模式,因其性質與線性模式類似,故引起了廣泛注意。在本篇文章中
我們是採用Subba Rao 和Gabr(1984)提出的迭代等式以及高斯-賽德迭代
法估計參數,再配合 Subba Rao(1981)提出的巢狀搜尋程序,來選取雙線
性模式的階數。將其選模結果與AIC、BIC以及修正後的 PKK 選模法比較
。
|
75 |
Essays in empirical asset pricingParmler, Johan January 2005 (has links)
Capital Asset Pricing Model (CAPM) is the most widely used model in asset pricing. This model evaluates the asset return in relation to the market return and the sensitivity of the security to the market. However, the evidence supporting the CAPM is mixed. Alternatives to the CAPM in determining the expected rate of return on portfolios and stocks was introduced through the Arbitrage Pricing Theory and through the Intertemporal CAPM. The introduction of these more general models raised the following important question: how should the risk factors in a multifactor pricing model be specified? Since the multifactor model theory is not very explicit regarding the number or nature of the factors the selection of factors has, to a large extent, become an empirical issue. In the first and the second chapters, we conduct an exhaustive evaluation of multifactor asset pricing models based on observable factors. In the first chapter we find strong evidence that a multifactor pricing model should include the market excess return, the size- , and the value premium. In the second chapter we relax the assumption of normal distributed returns. Even if this new setup does not alter the selected factors, we found strong evidence of deviation from normality which makes our approach more appropriate. In contrast to the first two chapters, the third chapter takes the approach of using latent factors. Using data from the US market, 4 to 6 pervasive factor were generally found. Furthermore, the data speaks in favor of an approximate factor structure with time series dependence across assets. In the final chapter, we examine if a momentum strategy, is superior to a benchmark model once the effects of data-snooping have been accounted for. Data snooping occurs when a given set of data is used more than once for inference or model selection. The result shows that data-snooping bias can be very substantial. In this study, neglecting the problem would lead to very different conclusions. For the US data there is strong evidence of a momentum effect and we reject the hypothesis of weak market efficiency. For the Swedish data the results indicates that momentum strategies based on individual stocks generate positive and significant profits. Interestingly, a very weak or none at all, momentum effect can be found when stocks are sorted by size, book-to-market and industry. / Diss. Stockholm : Handelshögskolan, 2005. Johan Parmler hette tidigare Johan Ericsson.
|
76 |
Information Matrices in Estimating Function Approach: Tests for Model Misspecification and Model SelectionZhou, Qian January 2009 (has links)
Estimating functions have been widely used for parameter
estimation in various statistical problems. Regular estimating
functions produce parameter estimators which have desirable
properties, such as consistency and asymptotic normality. In
quasi-likelihood inference, an important example of estimating
functions, correct specification of the first two moments of the
underlying distribution leads to the information unbiasedness, which
states that two forms of the information matrix: the negative
sensitivity matrix (negative expectation of the first order
derivative of an estimating function) and the variability matrix
(variance of an estimating function) are equal, or in other words,
the analogue of the Fisher information is equivalent to the Godambe
information. Consequently, the information unbiasedness indicates
that the model-based covariance matrix estimator and sandwich
covariance matrix estimator are equivalent. By comparing the
model-based and sandwich variance estimators, we propose information
ratio (IR) statistics for testing model misspecification of
variance/covariance structure under correctly specified mean
structure, in the context of linear regression models, generalized
linear regression models and generalized estimating equations.
Asymptotic properties of the IR statistics are discussed. In
addition, through intensive simulation studies, we show that the IR
statistics are powerful in various applications: test for
heteroscedasticity in linear regression models, test for
overdispersion in count data, and test for misspecified variance
function and/or misspecified working correlation structure.
Moreover, the IR statistics appear more powerful than the classical
information matrix test proposed by White (1982).
In the literature, model selection criteria have been intensively
discussed, but almost all of them target choosing the optimal mean
structure. In this thesis, two model selection procedures are
proposed for selecting the optimal variance/covariance structure
among a collection of candidate structures. One is based on a
sequence of the IR tests for all the competing variance/covariance
structures. The other is based on an ``information discrepancy
criterion" (IDC), which provides a measurement of discrepancy
between the negative sensitivity matrix and the variability matrix.
In fact, this IDC characterizes the relative efficiency loss when
using a certain candidate variance/covariance structure, compared
with the true but unknown structure. Through simulation studies and
analyses of two data sets, it is shown that the two proposed model
selection methods both have a high rate of detecting the
true/optimal variance/covariance structure. In particular, since the
IDC magnifies the difference among the competing structures, it is
highly sensitive to detect the most appropriate variance/covariance
structure.
|
77 |
Fully Bayesian Analysis of Switching Gaussian State Space ModelsFrühwirth-Schnatter, Sylvia January 2000 (has links) (PDF)
In the present paper we study switching state space models from a Bayesian point of view. For estimation, the model is reformulated as a hierarchical model. We discuss various MCMC methods for Bayesian estimation, among them unconstrained Gibbs sampling, constrained sampling and permutation sampling. We address in detail the problem of unidentifiability, and discuss potential information available from an unidentified model. Furthermore the paper discusses issues in model selection such as selecting the number of states or testing for the presence of Markov switching heterogeneity. The model likelihoods of all possible hypotheses are estimated by using the method of bridge sampling. We conclude the paper with applications to simulated data as well as to modelling the U.S./U.K. real exchange rate. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
|
78 |
Information Matrices in Estimating Function Approach: Tests for Model Misspecification and Model SelectionZhou, Qian January 2009 (has links)
Estimating functions have been widely used for parameter
estimation in various statistical problems. Regular estimating
functions produce parameter estimators which have desirable
properties, such as consistency and asymptotic normality. In
quasi-likelihood inference, an important example of estimating
functions, correct specification of the first two moments of the
underlying distribution leads to the information unbiasedness, which
states that two forms of the information matrix: the negative
sensitivity matrix (negative expectation of the first order
derivative of an estimating function) and the variability matrix
(variance of an estimating function) are equal, or in other words,
the analogue of the Fisher information is equivalent to the Godambe
information. Consequently, the information unbiasedness indicates
that the model-based covariance matrix estimator and sandwich
covariance matrix estimator are equivalent. By comparing the
model-based and sandwich variance estimators, we propose information
ratio (IR) statistics for testing model misspecification of
variance/covariance structure under correctly specified mean
structure, in the context of linear regression models, generalized
linear regression models and generalized estimating equations.
Asymptotic properties of the IR statistics are discussed. In
addition, through intensive simulation studies, we show that the IR
statistics are powerful in various applications: test for
heteroscedasticity in linear regression models, test for
overdispersion in count data, and test for misspecified variance
function and/or misspecified working correlation structure.
Moreover, the IR statistics appear more powerful than the classical
information matrix test proposed by White (1982).
In the literature, model selection criteria have been intensively
discussed, but almost all of them target choosing the optimal mean
structure. In this thesis, two model selection procedures are
proposed for selecting the optimal variance/covariance structure
among a collection of candidate structures. One is based on a
sequence of the IR tests for all the competing variance/covariance
structures. The other is based on an ``information discrepancy
criterion" (IDC), which provides a measurement of discrepancy
between the negative sensitivity matrix and the variability matrix.
In fact, this IDC characterizes the relative efficiency loss when
using a certain candidate variance/covariance structure, compared
with the true but unknown structure. Through simulation studies and
analyses of two data sets, it is shown that the two proposed model
selection methods both have a high rate of detecting the
true/optimal variance/covariance structure. In particular, since the
IDC magnifies the difference among the competing structures, it is
highly sensitive to detect the most appropriate variance/covariance
structure.
|
79 |
Bayesian Adjustment for MultiplicityScott, James Gordon January 2009 (has links)
<p>This thesis is about Bayesian approaches for handling multiplicity. It considers three main kinds of multiple-testing scenarios: tests of exchangeable experimental units, tests for variable inclusion in linear regresson models, and tests for conditional independence in jointly normal vectors. Multiplicity adjustment in these three areas will be seen to have many common structural features. Though the modeling approach throughout is Bayesian, frequentist reasoning regarding error rates will often be employed.</p><p>Chapter 1 frames the issues in the context of historical debates about Bayesian multiplicity adjustment. Chapter 2 confronts the problem of large-scale screening of functional data, where control over Type-I error rates is a crucial issue. Chapter 3 develops new theory for comparing Bayes and empirical-Bayes approaches for multiplicity correction in regression variable selection. Chapters 4 and 5 describe new theoretical and computational tools for Gaussian graphical-model selection, where multiplicity arises in performing many simultaneous tests of pairwise conditional independence. Chapter 6 introduces a new approach to sparse-signal modeling based upon local shrinkage rules. Here the focus is not on multiplicity per se, but rather on using ideas from Bayesian multiple-testing models to motivate a new class of multivariate scale-mixture priors. Finally, Chapter 7 describes some directions for future study, many of which are the subjects of my current research agenda.</p> / Dissertation
|
80 |
Managing Information Collection in Simulation-Based DesignLing, Jay Michael 22 May 2006 (has links)
An important element of successful engineering design is the effective management of resources to support design decisions. Design decisions can be thought of as having two phasesa formulation phase and a solution phase. As part of the formulation phase, engineers must decide how much information to collect and which models to use to support the design decision. Since more information and more accurate models come at a greater cost, a cost-benefit trade-off must be made. Previous work has considered such trade-offs in decision problems when all aspects of the decision problem can be represented using precise probabilities, an assumption that is not justified when information is sparse.
In this thesis, we use imprecise probabilities to manage the information cost-benefit trade-off for two decision problems in which the quality of the information is imprecise: 1) The decision of when to stop collecting statistical data about a quantity that is characterized by a probability distribution with unknown parameters; and 2) The selection of the most preferred model to help guide a particular design decision when the model accuracy is characterized as an interval. For each case, a separate novel approach is developed in which the principles of information economics are incorporated into the information management decision.
The problem of statistical data collection is explored with a pressure vessel design. This design problem requires the characterization of the probability distribution that describes a novel material's strength. The model selection approach is explored with the design of an I-beam structure. The designer must decide how accurate of a model to use to predict the maximum deflection in the span of the structure. For both problems, it is concluded that the information economic approach developed in this thesis can assist engineers in their information management decisions.
|
Page generated in 0.1038 seconds