Global ETD Search

271	Statistical inference in high dimensional linear and AFT models Chai, Hao 01 July 2014 (has links) Variable selection procedures for high dimensional data have been proposed and studied by a large amount of literature in the last few years. Most of the previous research focuses on the selection properties as well as the point estimation properties. In this paper, our goal is to construct the confidence intervals for some low-dimensional parameters in the high-dimensional setting. The models we study are the partially penalized linear and accelerated failure time models in the high-dimensional setting. In our model setup, all variables are split into two groups. The first group consists of a relatively small number of variables that are more interesting. The second group consists of a large amount of variables that can be potentially correlated with the response variable. We propose an approach that selects the variables from the second group and produces confidence intervals for the parameters in the first group. We show the sign consistency of the selection procedure and give a bound on the estimation error. Based on this result, we provide the sufficient conditions for the asymptotic normality of the low-dimensional parameters. The high-dimensional selection consistency and the low-dimensional asymptotic normality are developed for both linear and AFT models with high-dimensional data. Accelerated failure time model Asymptotic normality Confidence interval LASSO Minimax concave penalty Sign consistency Statistics and Probability
272	Statistical detection with weak signals via regularization Li, Jinzheng 01 July 2012 (has links) There has been an increasing interest in uncovering smuggled nuclear materials associated with the War on Terror. Detection of special nuclear materials hidden in cargo containers is a major challenge in national and international security. We propose a new physics-based method to determine the presence of the spectral signature of one or more nuclides from a poorly resolved spectra with weak signatures. The method is different from traditional methods that rely primarily on peak finding algorithms. The new approach considers each of the signatures in the library to be a linear combination of subspectra. These subspectra are obtained by assuming a signature consisting of just one of the unique gamma rays emitted by the nuclei. We propose a Poisson regression model for deducing which nuclei are present in the observed spectrum. In recognition that a radiation source generally comprises few nuclear materials, the underlying Poisson model is sparse, i.e. most of the regression coefficients are zero (positive coefficients correspond to the presence of nuclear materials). We develop an iterative algorithm for a penalized likelihood estimation that prompts sparsity. We illustrate the efficacy of the proposed method by simulations using a variety of poorly resolved, low signal-to-noise ratio (SNR) situations, which show that the proposed approach enjoys excellent empirical performance even with SNR as low as to -15db. The proposed method is shown to be variable-selection consistent, in the framework of increasing detection time and under mild regularity conditions. We study the problem of testing for shielding, i.e. the presence of intervening materials that attenuate the gamma ray signal. We show that, as detection time increases to infinity, the Lagrange multiplier test, the likelihood ratio test and Wald test are asymptotically equivalent, under the null hypothesis, and their asymptotic null distribution is Chi-square. We also derived the local power of these tests. We also develop a nonparametric approach for detecting spectra indicative of the presence of SNM. This approach characterizes the shape change in a spectrum from background radiation. We do this by proposing a dissimilarity function that characterizes the complete shape change of a spectrum from the background, over all energy channels. We derive the null asymptotic test distributions in terms of functionals of the Brownian bridge. Simulation results show that the proposed approach is very powerful and promising for detecting weak signals. It is able to accurately detect weak signals with SNR as low as -37db. gamma-ray spectrum Hypothesis Testing penalized likelihood estimation Poisson regression sparisty weak signal detection Statistics and Probability
273	Two-level lognormal frailty model and competing risks model with missing cause of failure Tang, Xiongwen 01 May 2012 (has links) In clustered survival data, unobservable cluster effects may exert powerful influences on the outcomes and thus induce correlation among subjects within the same cluster. The ordinary partial likelihood approach does not account for this dependence. Frailty models, as an extension to Cox regression, incorporate multiplicative random effects, called frailties, into the hazard model and have become a very popular way to account for the dependence within clusters. We particularly study the two-level nested lognormal frailty model and propose an estimation approach based on the complete data likelihood with frailty terms integrated out. We adopt B-splines to model the baseline hazards and adaptive Gauss-Hermite quadrature to approximate the integrals efficiently. Furthermore, in finding the maximum likelihood estimators, instead of the Newton-Raphson iterative algorithm, Gauss-Seidel and BFGS methods are used to improve the stability and efficiency of the estimation procedure. We also study competing risks models with missing cause of failure in the context of Cox proportional hazards models. For competing risks data, there exists more than one cause of failure and each observed failure is exclusively linked to one cause. Conceptually, the causes are interpreted as competing risks before the failure is observed. Competing risks models are constructed based on the proportional hazards model specified for each cause of failure respectively, which can be estimated using partial likelihood approach. However, the ordinary partial likelihood is not applicable when the cause of failure could be missing for some reason. We propose a weighted partial likelihood approach based on complete-case data, where weights are computed as the inverse of selection probability and the selection probability is estimated by a logistic regression model. The asymptotic properties of the regression coefficient estimators are investigated by applying counting process and martingale theory. We further develop a double robust approach based on the full data to improve the efficiency as well as the robustness. B-splines competing risks double robust frailty model Gauss-Hermite quadrature inverse probability Statistics and Probability
274	From valuing equity-linked death benefits to pricing American options Zhou, Zhenhao 01 May 2017 (has links) Motivated by the Guaranteed Minimum Death Benefits (GMDB) in variable annuities, we are interested in valuing equity-linked options whose expiry date is the time of the death of the policyholder. Because the time-until-death distribution can be approximated by linear combinations of exponential distributions or mixtures of Erlang distributions, the analysis can be reduced to the case where the time-until-death distribution is exponential or Erlang. We present two probability methods to price American options with an exponential expiry date. Both methods give the same results. An American option with Erlang expiry date can be seen as an extension of the exponential expiry date case. We calculate its price as the sum of the price of the corresponding European option and the early exercise premium. Because the optimal exercise boundary takes the form of a staircase, the pricing formula is a triple sum. We determine the optimal exercise boundary recursively by imposing the “smooth pasting” condition. The examples of the put option, the exchange option, and the maximum option are provided to illustrate how the methods work. Another issue related to variable annuities is the surrender behavior of the policyholders. To model this behavior, we suggest using barrier options. We generalize the reflection principle and use it to derive explicit formulas for outside barrier options, double barrier options with constant barriers, and double barrier options with time varying exponential barriers. Finally, we provide a method to approximate the distribution of the time-until-death random variable by combinations of exponential distributions or mixtures of Erlang distributions. Compared to directly fitting the distributions, my method has two advantages: 1) It is more robust to the initial guess. 2) It is more likely to obtain the global minimizer. american option fit distribution random expiry date variable annuity Statistics and Probability
275	Likelihood-based inference for antedependence (Markov) models for categorical longitudinal data Xie, Yunlong 01 July 2011 (has links) Antedependence (AD) of order p, also known as the Markov property of order p, is a property of index-ordered random variables in which each variable, given at least p immediately preceding variables, is independent of all further preceding variables. Zimmerman and Nunez-Anton (2010) present statistical methodology for fitting and performing inference for AD models for continuous (primarily normal) longitudinal data. But analogous AD-model methodology for categorical longitudinal data has not yet been well developed. In this thesis, we derive maximum likelihood estimators of transition probabilities under antedependence of any order, and we use these estimators to develop likelihood-based methods for determining the order of antedependence of categorical longitudinal data. Specifically, we develop a penalized likelihood method for determining variable-order antedependence structure, and we derive the likelihood ratio test, score test, Wald test and an adaptation of Fisher's exact test for pth-order antedependence against the unstructured (saturated) multinomial model. Simulation studies show that the score (Pearson's Chi-square) test performs better than all the other methods for complete and monotone missing data, while the likelihood ratio test is applicable for data with arbitrary missing pattern. But since the likelihood ratio test is oversensitive under the null hypothesis, we modify it by equating the expectation of the test statistic to its degrees of freedom so that it has actual size closer to nominal size. Additionally, we modify the likelihood ratio tests for use in testing for pth-order antedependence against qth-order antedependence, where q > p, and for testing nested variable-order antedependence models. We extend the methods to deal with data having a monotone or arbitrary missing pattern. For antedependence models of constant order p, we develop methods for testing transition probability stationarity and strict stationarity and for maximum likelihood estimation of parametric generalized linear models that are transition probability stationary AD(p) models. The methods are illustrated using three data sets. ANTEDEPENDENCE MODELS CATEGORICAL DATA LIKELIHOOD-BASED INFERENCE LONGITUDINAL DATA Statistics and Probability
276	Semiparametric regression analysis of zero-inflated data Liu, Hai 01 July 2009 (has links) Zero-inflated data abound in ecological studies as well as in other scientific and quantitative fields. Nonparametric regression with zero-inflated response may be studied via the zero-inflated generalized additive model (ZIGAM). ZIGAM assumes that the conditional distribution of the response variable belongs to the zero-inflated 1-parameter exponential family which is a probabilistic mixture of the zero atom and the 1-parameter exponential family, where the zero atom accounts for an excess of zeroes in the data. We propose the constrained zero-inflated generalized additive model (COZIGAM) for analyzing zero-inflated data, with the further assumption that the probability of non-zero-inflation is some monotone function of the (non-zero-inflated) exponential family distribution mean. When the latter assumption obtains, the new approach provides a unified framework for modeling zero-inflated data, which is more parsimonious and efficient than the unconstrained ZIGAM. We develop an iterative algorithm for model estimation based on the penalized likelihood approach, and derive formulas for constructing confidence intervals of the maximum penalized likelihood estimator. Some asymptotic properties including the consistency of the regression function estimator and the limiting distribution of the parametric estimator are derived. We also propose a Bayesian model selection criterion for choosing between the unconstrained and the constrained ZIGAMs. We consider several useful extensions of the COZIGAM, including imposing additive-component-specific proportional and partial constraints, and incorporating threshold effects to account for regime shift phenomena. The new methods are illustrated with both simulated data and real applications. An R package COZIGAM has been developed for model fitting and model selection with zero-inflated data. Asymptotic normality Constrained model EM algorithm Model selection Penalized likelihood Threshold model Statistics and Probability
277	Hedging out the mark-to market volatility for structured credit portfolios Ilerisoy, Mahmut 01 December 2009 (has links) Credit derivatives are among the most criticized financial instruments in the current credit crises. Given their short history, finance professionals are still researching to discover effective ways to reduce the mark-to-market (MTM) volatility in credit derivatives, especially in turbulent market conditions. Many credit portfolios have been struggling to find out appropriate tools and techniques to help them navigate the current credit crises and hedge mark-to-market volatility in their portfolios. In this study we provide a tool kit to help reduce the pricing fluctuations in structured credit portfolios utilizing data analysis and statistical methods. In Chapter One we provide a snapshot of credit derivatives market by summarizing different types of credit derivatives; including single-name credit default swaps (CDS), market credit indices, bespoke portfolios, market index tranches, and bespoke tranches (synthetic CDOs). In Chapter Two we illustrate a method to calculate a stable hedge ratio (beta) by combining industry practices and statistical techniques. Choosing an appropriate hedge ratio is critical for funds that desire to hedge mark-to-market volatility. Many credit portfolios suffered 40%-80% market value losses in 2008 and 2009 due to the mark-to-market volatility in their long positions. In this chapter we introduce ten different betas in order to hedge a long bespoke portfolio by liquid market indices. We measure the effectives of these betas by two measures: Stability and mark-to-market volatility reduction. Among all betas we present, we deduct that the following betas are appropriate to be used as hedge ratios: Implied Beta, Quarterly Regression Beta on Spread Levels, Yearly Regression Betas on Spread Levels, Up Beta, and Down Beta. In Chapter Three we analyze the risk factors that impact the MTM volatility in CDS tranches; namely Spread Risk, Correlation Risk, Dispersion Risk, and Curve Risk. We focus our analysis in explaining the risks in the equity tranche as this is the riskiest tranche in the capital structure. We show that all four risks introduced are critical in explaining MTM volatility in equity tranches. We also perform multiple regression analysis to show the correlations between different risk factors. We show that, when combined, spread, correlation, and dispersion risks are the most important risk factors in analyzing MTM fluctuations in equity tranche. Curve risk can be used as an add-on risk to further explain local instances. After understanding various risk factors that impact the MTM changes in equity tranche, we put this knowledge to work to analyze two instances in 2008 in which we experienced significant spread widening in equity tranche. Both examples show that a good understanding of the risks that drive MTM changes in CDS tranches is critical in making informed trading decisions. In Chapter Four we focus on two topics: Portfolio Stratification and Index Selection. While portfolio stratification helps us better understand the composition of a portfolio, index selection shows us which indices are more suitable in hedging long bespoke positions. In stratifying a portfolio we define Class-A as the widest credits, Class-B as the middle tier, and Class-C as the tightest credits in a credit portfolio. By portfolio stratification we show that Class-A has significant impact on the overall portfolio. We use five different risk measures to analyze different properties of the three classes we introduce. The risk measures are Sum of Spreads (SOS), Sigma/Mu, Basis Point Volatility (BPVOL), Skewness, and Kurtosis. For all risk measures we show that there is high correlation between Class-A and the whole portfolio. We also show that it is critical to monitor the risks in Class-A to better understand the spread moves in the overall portfolio. In the second part of Chapter Four, we perform analysis to find out which credit index should be used in hedging a long bespoke portfolio. We compare four credit indices for their ability to track the bespoke portfolio on spread levels and on spread changes. Analysis show that CDX.HY and CDX IG indices fits the best to hedge our sample bespoke portfolio in terms of spread levels and spread changes, respectively. Finally, we perform multiple regression analysis using backward selection, forward selection, and stepwise regression methods to find out if we should use multiple indices in our hedging practices. Multiple regression analysis show that CDX.HY and CDX.IG are the best candidates to hedge the sample bespoke portfolio we introduced. Credit Default Swaps Credit Derivatives Credit Hedge Fund Hedging Portfolio Management Structured Credit Statistics and Probability
278	Statistical Algorithms for Optimal Experimental Design with Correlated Observations Li, Chang 01 May 2013 (has links) This research is in three parts with different although related objectives. The first part developed an efficient, modified simulated annealing algorithm to solve the D-optimal (determinant maximization) design problem for 2-way polynomial regression with correlated observations. Much of the previous work in D-optimal design for regression models with correlated errors focused on polynomial models with a single predictor variable, in large part because of the intractability of an analytic solution. In this research, we present an improved simulated annealing algorithm, providing practical approaches to specifications of the annealing cooling parameters, thresholds and search neighborhoods for the perturbation scheme, which finds approximate D-optimal designs for 2-way polynomial regression for a variety of specific correlation structures with a given correlation coefficient. Results in each correlated-errors case are compared with the best design selected from the class of designs that are known to be D-optimal in the uncorrelated case: annealing results had generally higher D-efficiency than the best comparison design, especially when the correlation parameter was well away from 0. The second research objective, using Balanced Incomplete Block Designs (BIBDs), wasto construct weakly universal optimal block designs for the nearest neighbor correlation structure and multiple block sizes, for the hub correlation structure with any block size, and for circulant correlation with odd block size. We also constructed approximately weakly universal optimal block designs for the block-structured correlation. Lastly, we developed an improved Particle Swarm Optimization(PSO) algorithm with time varying parameters, and solved D-optimal design for linear regression with it. Then based on that improved algorithm, we combined the non-linear regression problem and decision making, and developed a nested PSO algorithm that finds (nearly) optimal experimental designs with each of the pessimistic criterion, index of optimism criterion, and regret criterion for the Michaelis-Menten model and logistic regression model. D-Optimal Design Problem 2-Way Polynomial Regression BIBDs PSO Statistics and Probability
279	Exact Approaches for Bias Detection and Avoidance with Small, Sparse, or Correlated Categorical Data Schwartz, Sarah E. 01 December 2017 (has links) Every day, traditional statistical methodology are used world wide to study a variety of topics and provides insight regarding countless subjects. Each technique is based on a distinct set of assumptions to ensure valid results. Additionally, many statistical approaches rely on large sample behavior and may collapse or degenerate in the presence of small, spare, or correlated data. This dissertation details several advancements to detect these conditions, avoid their consequences, and analyze data in a different way to yield trustworthy results. One of the most commonly used modeling techniques for outcomes with only two possible categorical values (eg. live/die, pass/fail, better/worse, ect.) is logistic regression. While some potential complications with this approach are widely known, many investigators are unaware that their particular data does not meet the foundational assumptions, since they are not easy to verify. We have developed a routine for determining if a researcher should be concerned about potential bias in logistic regression results, so they can take steps to mitigate the bias or use a different procedure altogether to model the data. Correlated data may arise from common situations such as multi-site medical studies, research on family units, or investigations on student achievement within classrooms. In these circumstance the associations between cluster members must be included in any statistical analysis testing the hypothesis of a connection be-tween two variables in order for results to be valid. Previously investigators had to choose between using a method intended for small or sparse data while assuming independence between observations or a method that allowed for correlation between observations, while requiring large samples to be reliable. We present a new method that allows for small, clustered samples to be assessed for a relationship between a two-level predictor (eg. treatment/control) and a categorical outcome (eg. low/medium/high). logistic regression multinomial outcome conditioning exact test trend test Mathematics Statistics and Probability
280	Existence and Multiplicity Results on Standing Wave Solutions of Some Coupled Nonlinear Schrodinger Equations Tian, Rushun 01 May 2013 (has links) Coupled nonlinear Schrodinger equations (CNLS) govern many physical phenomena, such as nonlinear optics and Bose-Einstein condensates. For their wide applications, many studies have been carried out by physicists, mathematicians and engineers from different respects. In this dissertation, we focused on standing wave solutions, which are of particular interests for their relatively simple form and the important roles they play in studying other wave solutions. We studied the multiplicity of this type of solutions of CNLS via variational methods and bifurcation methods. Variational methods are useful tools for studying differential equations and systems of differential equations that possess the so-called variational structure. For such an equation or system, a weak solution can be found through finding the critical point of a corresponding energy functional. If this equation or system is also invariant under a certain symmetric group, multiple solutions are often expected. In this work, an integer-valued function that measures symmetries of CNLS was used to determine critical values. Besides variational methods, bifurcation methods may also be used to find solutions of a differential equation or system, if some trivial solution branch exists and the system is degenerate somewhere on this branch. If local bifurcations exist, then new solutions can be found in a neighborhood of each bifurcation point. If global bifurcation branches exist, then there is a continuous solution branch emanating from each bifurcation point. We consider two types of CNLS. First, for a fully symmetric system, we introduce a new index and use it to construct a sequence of critical energy levels. Using variational methods and the symmetric structure, we prove that there is at least one solution on each one of these critical energy levels. Second, we study the bifurcation phenomena of a two-equation asymmetric system. All these bifurcations take place with respect to a positive solution branch that is already known. The locations of the bifurcation points are determined through an equation of a coupling parameter. A few nonexistence results of positive solutions are also given Bifurcation Coupled nonlinear Schrodinger equations Indefinite Standing wave solutions Z_N symmetry Physical Sciences and Mathematics Statistics and Probability

Search results