• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 154
  • 45
  • 32
  • 15
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 300
  • 300
  • 76
  • 53
  • 50
  • 47
  • 44
  • 42
  • 42
  • 42
  • 35
  • 34
  • 28
  • 27
  • 26
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Model selection criteria in the presence of missing data based on the Kullback-Leibler discrepancy

Sparks, JonDavid 01 December 2009 (has links)
An important challenge in statistical modeling involves determining an appropriate structural form for a model to be used in making inferences and predictions. Missing data is a very common occurrence in most research settings and can easily complicate the model selection problem. Many useful procedures have been developed to estimate parameters and standard errors in the presence of missing data;however, few methods exist for determining the actual structural form of a modelwhen the data is incomplete. In this dissertation, we propose model selection criteria based on the Kullback-Leiber discrepancy that can be used in the presence of missing data. The criteria are developed by accounting for missing data using principles related to the expectation maximization (EM) algorithm and bootstrap methods. We formulate the criteria for three specific modeling frameworks: for the normal multivariate linear regression model, a generalized linear model, and a normal longitudinal regression model. In each framework, a simulation study is presented to investigate the performance of the criteria relative to their traditional counterparts. We consider a setting where the missingness is confined to the outcome, and also a setting where the missingness may occur in the outcome and/or the covariates. The results from the simulation studies indicate that our criteria provide better protection against underfitting than their traditional analogues. We outline the implementation of our methodology for a general discrepancy measure. An application is presented where the proposed criteria are utilized in a study that evaluates the driving performance of individuals with Parkinson's disease under low contrast (fog) conditions in a driving simulator.
62

INFERENCE USING BHATTACHARYYA DISTANCE TO MODEL INTERACTION EFFECTS WHEN THE NUMBER OF PREDICTORS FAR EXCEEDS THE SAMPLE SIZE

Janse, Sarah A. 01 January 2017 (has links)
In recent years, statistical analyses, algorithms, and modeling of big data have been constrained due to computational complexity. Further, the added complexity of relationships among response and explanatory variables, such as higher-order interaction effects, make identifying predictors using standard statistical techniques difficult. These difficulties are only exacerbated in the case of small sample sizes in some studies. Recent analyses have targeted the identification of interaction effects in big data, but the development of methods to identify higher-order interaction effects has been limited by computational concerns. One recently studied method is the Feasible Solutions Algorithm (FSA), a fast, flexible method that aims to find a set of statistically optimal models via a stochastic search algorithm. Although FSA has shown promise, its current limits include that the user must choose the number of times to run the algorithm. Here, statistical guidance is provided for this number iterations by deriving a lower bound on the probability of obtaining the statistically optimal model in a number of iterations of FSA. Moreover, logistic regression is severely limited when two predictors can perfectly separate the two outcomes. In the case of small sample sizes, this occurs quite often by chance, especially in the case of a large number of predictors. Bhattacharyya distance is proposed as an alternative method to address this limitation. However, little is known about the theoretical properties or distribution of B-distance. Thus, properties and the distribution of this distance measure are derived here. A hypothesis test and confidence interval are developed and tested on both simulated and real data.
63

EXAMINING THE CONFIRMATORY TETRAD ANALYSIS (CTA) AS A SOLUTION OF THE INADEQUACY OF TRADITIONAL STRUCTURAL EQUATION MODELING (SEM) FIT INDICES

Liu, Hangcheng 01 January 2018 (has links)
Structural Equation Modeling (SEM) is a framework of statistical methods that allows us to represent complex relationships between variables. SEM is widely used in economics, genetics and the behavioral sciences (e.g. psychology, psychobiology, sociology and medicine). Model complexity is defined as a model’s ability to fit different data patterns and it plays an important role in model selection when applying SEM. As in linear regression, the number of free model parameters is typically used in traditional SEM model fit indices as a measure of the model complexity. However, only using number of free model parameters to indicate SEM model complexity is crude since other contributing factors, such as the type of constraint or functional form are ignored. To solve this problem, a special technique, Confirmatory Tetrad Analysis (CTA) is examined. A tetrad refers to the difference in the products of certain covariances (or correlations) among four random variables. A structural equation model often implies that some tetrads should be zero. These model implied zero tetrads are called vanishing tetrads. In CTA, the goodness of fit can be determined by testing the null hypothesis that the model implied vanishing tetrads are equal to zero. CTA can be helpful to improve model selection because different functional forms may affect the model implied vanishing tetrad number (t), and models not nested according to the traditional likelihood ratio test may be nested in terms of tetrads. In this dissertation, an R package was created to perform CTA, a two-step method was developed to determine SEM model complexity using simulated data, and it is demonstrated how the number of vanishing tetrads can be helpful to indicate SEM model complexity in some situations.
64

Probabilistic pairwise model comparisons based on discrepancy measures and a reconceptualization of the p-value

Riedle, Benjamin N. 01 May 2018 (has links)
Discrepancy measures are often employed in problems involving the selection and assessment of statistical models. A discrepancy gauges the separation between a fitted candidate model and the underlying generating model. In this work, we consider pairwise comparisons of fitted models based on a probabilistic evaluation of the ordering of the constituent discrepancies. An estimator of the probability is derived using the bootstrap. In the framework of hypothesis testing, nested models are often compared on the basis of the p-value. Specifically, the simpler null model is favored unless the p-value is sufficiently small, in which case the null model is rejected and the more general alternative model is retained. Using suitably defined discrepancy measures, we mathematically show that, in general settings, the Wald, likelihood ratio (LR) and score test p-values are approximated by the bootstrapped discrepancy comparison probability (BDCP). We argue that the connection between the p-value and the BDCP leads to potentially new insights regarding the utility and limitations of the p-value. The BDCP framework also facilitates discrepancy-based inferences in settings beyond the limited confines of nested model hypothesis testing.
65

Passive detection of radionuclides from weak and poorly resolved gamma-ray energy spectra

Kump, Paul 01 July 2012 (has links)
Large passive detectors used in screening for special nuclear materials at ports of entry are characterized by poor spectral resolution, making identification of radionuclides a difficult task. Most identification routines, which fit empirical shapes and use derivatives, are impractical in these situations. Here I develop new, physics-based methods to determine the presence of spectral signatures of one or more of a set of isotopes. Gamma-ray counts are modeled as Poisson processes, where the average part is taken to be the model and the difference between the observed gamma-ray counts and the average is considered random noise. In the linear part, the unknown coefficients represent the intensites of the isotopes. Therefore, it is of great interest not to estimate each coefficient, but rather determine if the coefficient is non-zero, corresponding to the presence of the isotope. This thesis provides new selection algorithms, and, since detector data is undoubtedly finite, this unique work emphasizes selection when data is fixed and finite.
66

Working correlation selection in generalized estimating equations

Jang, Mi Jin 01 December 2011 (has links)
Longitudinal data analysis is common in biomedical research area. Generalized estimating equations (GEE) approach is widely used for longitudinal marginal models. The GEE method is known to provide consistent regression parameter estimates regardless of the choice of working correlation structure, provided the square root of n consistent nuisance parameters are used. However, it is important to use the appropriate working correlation structure in small samples, since it improves the statistical efficiency of β estimate. Several working correlation selection criteria have been proposed (Rotnitzky and Jewell, 1990; Pan, 2001; Hin and Wang, 2009; Shults et. al, 2009). However, these selection criteria have the same limitation in that they perform poorly when over-parameterized structures are considered as candidates. In this dissertation, new working correlation selection criteria are developed based on generalized eigenvalues. A set of generalized eigenvalues is used to measure the disparity between the bias-corrected sandwich variance estimator under the hypothesized working correlation matrix and the model-based variance estimator under a working independence assumption. A summary measure based on the set of the generalized eigenvalues provides an indication of the disparity between the true correlation structure and the misspecified working correlation structure. Motivated by the test statistics in MANOVA, three working correlation selection criteria are proposed: PT (Pillai's trace type criterion),WR (Wilks' ratio type criterion) and RMR (Roy's Maximum Root type criterion). The relationship between these generalized eigenvalues and the CIC measure is revealed. In addition, this dissertation proposes a method to penalize for the over-parameterized working correlation structures. The over-parameterized structure converges to the true correlation structure, using extra parameters. Thus, the true correlation structure and the over-parameterized structure tend to provide similar variance estimate of the estimated β and similar working correlation selection criterion values. However, the over-parameterized structure is more likely to be chosen as the best working correlation structure by "the smaller the better" rule for criterion values. This is because the over-parameterization leads to the negatively biased sandwich variance estimator, hence smaller selection criterion value. In this dissertation, the over-parameterized structure is penalized through cluster detection and an optimization function. In order to find the group ("cluster") of the working correlation structures that are similar to each other, a cluster detection method is developed, based on spacings of the order statistics of the selection criterion measures. Once a cluster is found, the optimization function considering the trade-off between bias and variability provides the choice of the "best" approximating working correlation structure. The performance of our proposed criterion measures relative to other relevant criteria (QIC, RJ and CIC) is examined in a series of simulation studies.
67

Semiparametric regression analysis of zero-inflated data

Liu, Hai 01 July 2009 (has links)
Zero-inflated data abound in ecological studies as well as in other scientific and quantitative fields. Nonparametric regression with zero-inflated response may be studied via the zero-inflated generalized additive model (ZIGAM). ZIGAM assumes that the conditional distribution of the response variable belongs to the zero-inflated 1-parameter exponential family which is a probabilistic mixture of the zero atom and the 1-parameter exponential family, where the zero atom accounts for an excess of zeroes in the data. We propose the constrained zero-inflated generalized additive model (COZIGAM) for analyzing zero-inflated data, with the further assumption that the probability of non-zero-inflation is some monotone function of the (non-zero-inflated) exponential family distribution mean. When the latter assumption obtains, the new approach provides a unified framework for modeling zero-inflated data, which is more parsimonious and efficient than the unconstrained ZIGAM. We develop an iterative algorithm for model estimation based on the penalized likelihood approach, and derive formulas for constructing confidence intervals of the maximum penalized likelihood estimator. Some asymptotic properties including the consistency of the regression function estimator and the limiting distribution of the parametric estimator are derived. We also propose a Bayesian model selection criterion for choosing between the unconstrained and the constrained ZIGAMs. We consider several useful extensions of the COZIGAM, including imposing additive-component-specific proportional and partial constraints, and incorporating threshold effects to account for regime shift phenomena. The new methods are illustrated with both simulated data and real applications. An R package COZIGAM has been developed for model fitting and model selection with zero-inflated data.
68

Spatio-temporal hidden Markov models for incorporating interannual variability in rainfall

Frost, Andrew James January 2004 (has links)
Two new spatio-temporal hidden Markov models (HMM) are introduced in this thesis, with the purpose of capturing the persistent, spatially non-homogeneous nature of climate influence on annual rainfall series observed in Australia. The models extend the two-state HMM applied by Thyer (2001) by relaxing the assumption that all sites are under the same climate control. The Switch HMM (SHMM) allows at-site anomalous states, whilst still maintaining a regional control. The Regional HMM (RHMM), on the other hand, allows sites to be partitioned into different Markovian state regions. The analyses were conducted using a Bayesian framework to explicitly account for parameter uncertainty and select between competing hypotheses. Bayesian model averaging was used for comparison of the HMM and its generalisations. The HMM, SHMM and RHMM were applied to four groupings of four sites located on the Eastern coast of Australia, an area that has previously shown evidence of interannual persistence. In the majority of case studies, the RHMM variants showed greatest posterior weight, indicating that the data favoured the multiple region RHMM over the single region HMM or the SHMM variants. In no cases does the HMM produce the maximum marginal likelihood when compared to the SHMM and RHMM. The HMM state series and preferred model variants were sensitive to the parameterisation of the small-scale site-to-site correlation structure. Several parameterisations of the small-scale Gaussian correlation were trialled, namely Fitted Correlation, Exponential Decay Correlation, Empirical and Zero Correlation. Significantly, it was shown that annual rainfall data outliers can have a large effect on inference for a model that uses Gaussian distributions. The practical value of this modelling is demonstrated by the conditioning of the event based point rainfall model DRIP on the hidden state series of the HMM variants. Short timescale models typically underestimate annual variability because there is no explicit structure to incorporate long-term persistence. The two-state conditioned DRIP model was shown to reproduce the annual variability observed to a greater degree than the single state DRIP. / PhD Doctorate
69

雙線性時間序列模式選取之研究 / Model Selection of Bilinear Time Series

劉瑞芝, Liou, Ruey Chih Unknown Date (has links)
時間序列在過去二十年當中,受到熱列地討論,而絕大多數的文獻都是研 究線性時間序列模式。但在現實生活中,很多時間序列並不符合線性的假 設,因此近十年來很多學者致力研究非線性時間序列模式。其中有一種雙 線性模式,因其性質與線性模式類似,故引起了廣泛注意。在本篇文章中 我們是採用Subba Rao 和Gabr(1984)提出的迭代等式以及高斯-賽德迭代 法估計參數,再配合 Subba Rao(1981)提出的巢狀搜尋程序,來選取雙線 性模式的階數。將其選模結果與AIC、BIC以及修正後的 PKK 選模法比較 。
70

Essays in empirical asset pricing

Parmler, Johan January 2005 (has links)
Capital Asset Pricing Model (CAPM) is the most widely used model in asset pricing. This model evaluates the asset return in relation to the market return and the sensitivity of the security to the market. However, the evidence supporting the CAPM is mixed. Alternatives to the CAPM in determining the expected rate of return on portfolios and stocks was introduced through the Arbitrage Pricing Theory and through the Intertemporal CAPM. The introduction of these more general models raised the following important question: how should the risk factors in a multifactor pricing model be specified? Since the multifactor model theory is not very explicit regarding the number or nature of the factors the selection of factors has, to a large extent, become an empirical issue. In the first and the second chapters, we conduct an exhaustive evaluation of multifactor asset pricing models based on observable factors. In the first chapter we find strong evidence that a multifactor pricing model should include the market excess return, the size- , and the value premium. In the second chapter we relax the assumption of normal distributed returns. Even if this new setup does not alter the selected factors, we found strong evidence of deviation from normality which makes our approach more appropriate. In contrast to the first two chapters, the third chapter takes the approach of using latent factors. Using data from the US market, 4 to 6 pervasive factor were generally found. Furthermore, the data speaks in favor of an approximate factor structure with time series dependence across assets. In the final chapter, we examine if a momentum strategy, is superior to a benchmark model once the effects of data-snooping have been accounted for. Data snooping occurs when a given set of data is used more than once for inference or model selection. The result shows that data-snooping bias can be very substantial. In this study, neglecting the problem would lead to very different conclusions. For the US data there is strong evidence of a momentum effect and we reject the hypothesis of weak market efficiency. For the Swedish data the results indicates that momentum strategies based on individual stocks generate positive and significant profits. Interestingly, a very weak or none at all, momentum effect can be found when stocks are sorted by size, book-to-market and industry. / Diss. Stockholm : Handelshögskolan, 2005. Johan Parmler hette tidigare Johan Ericsson.

Page generated in 0.045 seconds