51 |
A Study of the Asymptotic Properties of Lasso Estimates for Correlated DataUnknown Date (has links)
In this thesis we investigate post-model selection properties of L1 penalized weighted least squares estimators in regression models with a large number of variables M and correlated errors. We focus on correct subset selection and on the asymptotic distribution of the penalized estimators. In the simple case of AR(1) errors we give conditions under which correct subset selection can be achieved via our procedure. We then provide a detailed generalization of this result to models with errors that have a weak-dependency structure (Doukhan 1996). In all cases, the number M of regression variables is allowed to exceed the sample size n. We further investigate the asymptotic distribution of our estimates, when M < n, and show that under appropriate choices of the tuning parameters the limiting distribution is multivariate normal. This generalizes to the case of correlated errors the result of Knight and Fu (2000), obtained for regression models with independent errors. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. / Summer Semester, 2009. / May 1, 2009. / Lasso, Correlated Data, Asymptotic / Includes bibliographical references. / Florentina Bunea, Professor Directing Dissertation; Joshua Gert, Outside Committee Member; Myles Hollander, Committee Member; Marten Wegkamp, Committee Member.
|
52 |
Adaptive Series Estimators for Copula DensitiesUnknown Date (has links)
In this thesis, based on an orthonormal series expansion, we propose a new nonparametric method to estimate copula density functions. Since the basis coefficients turn out to be expectations, empirical averages are used to estimate these coefficients. We propose estimators of the variance of the estimated basis coefficients and establish their consistency. We derive the asymptotic distribution of the estimated coefficients under mild conditions. We derive a simple oracle inequality for the copula density estimator based on a finite series using the estimated coefficients. We propose a stopping rule for selecting the number of coefficients used in the series and we prove that this rule minimizes the mean integrated squared error. In addition, we consider hard and soft thresholding techniques for sparse representations. We obtain oracle inequalities that hold with prescribed probability for various norms of the difference between the copula density and our threshold series density estimator. Uniform confidence bands are derived as well. The oracle inequalities clearly reveal that our estimator adapts to the unknown degree of sparsity of the series representation of the copula density. A simulation study indicates that our method is extremely easy to implement and works very well, and it compares favorably to the popular kernel based copula density estimator, especially around the boundary points, in terms of mean squared error. Finally, we have applied our method to an insurance dataset. After comparing our method with the previous data analyses, we reach the same conclusion as the parametric methods in the literature and as such we provide additional justification for the use of the developed parametric model. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. / Summer Semester, 2009. / April 30, 2009. / Copula, Nonparametric Estimation, Copula Density / Includes bibliographical references. / Marten Wegkamp, Professor Directing Dissertation; Robert A. van Engelen, Outside Committee Member; Xufeng Niu, Committee Member; Fred Huffer, Committee Member.
|
53 |
Ultrafast Lattice Dynamics in Metal Thin Films and Nano-ParticlesUnknown Date (has links)
This thesis presents the new development of the 3rd generation femtosecond diffractometer (FED) in Professor Jim Cao's group and its application to study ultrafast structural dynamics of solid state materials. The 3rd generation FED prevails its former type and other similar FED instruments by a DC electron gun that can generate much higher energy electron pulses, and a more efficient imaging system. This combination together with miscellaneous improvements significantly boosts the signal-to-noise ratio and thus enables us to study more complex solid state materials. Two main thrusts are discussed in details in this thesis. The first one is the dynamics of coherent phonon generation by ultrafast heating in gold thin film and nano-particles, which emphasizes the electronic thermal stress. The other one is the ultrafast dynamics in Nickel, which shows that the mutual interactions among lattice, spin and electron sub-systems can significantly alter the ultrafast lattice dynamics. In these studies, we exploit the advantage of FED instrument as an ideal tool that can directly and simultaneously monitor the coherent and random motion of lattice. / A Dissertation Submitted to the Department of Physics in Partial FulfiLlment of the
Requirements for the Degree of Ph.D.. / Fall Semester, 2010. / September 23, 2010. / Ultrafast, Lattice dynamics, UED, Time-resoved / Includes bibliographical references. / Jim Cao, Professor Directing Dissertation; Wei Yang, University Representative; Nicholas Bonesteel, Committee Member; Mark Riley, Committee Member; Peng Xiong, Committee Member.
|
54 |
Transformation Models for Survival Data Analysis and ApplicationsUnknown Date (has links)
It is often assumed that all uncensored subjects will eventually experience the event of interest in standard survival models. However, in some situations when the event considered is not death, it will never occur for a proportion of subjects. Survival models with a cure fraction are becoming popular in analyzing this type of study. We propose a generalized transformation model motivated by Zeng et al's (2006) transformed proportional time cure model. In our proposed model, fractional polynomials are used instead of the simple linear combination of the covariates. The proposed models give us more flexibility without loosing any good properties of the original model, such as asymptotic consistency and asymptotic normality of the regression coefficients. The proposed model will better fit the data where the relationship between a response variable and covariates is non-linear. We also provide a power selection procedure based on the likelihood function. A simulation study is carried out to show the accuracy of the proposed power selection procedure. The proposed models are applied to coronary heart disease and cancer related medical data from both observational cohort studies and clinical trials / A Dissertation Submitted to the Department of Statistics in Partial FulfiLlment of
the Requirements for the Degree of Doctor of Philosophy. / Spring Semester, 2009. / March 23, 2009. / Cure Rate Models, Survival Models / Includes bibliographical references. / Xu-Feng Niu, Professor Directing Dissertation; Donald Lloyd, Outside Committee Member; Dan McGee, Committee Member; Debajyoti Sinha, Committee Member.
|
55 |
Multistate Intensity Model with AR-GARCH Random Effect for Corporate Credit Rating Transition AnalysisUnknown Date (has links)
This thesis presents a stochastic process and time series study on corporate credit rating and market implied rating transitions. By extending an existing model, this paper incorporates the generalized autoregressive conditional heteroscedastic (GARCH) random effects to capture volatility changes in the instantaneous transition rates. The GARCH model is a crucial part in financial research since its ability to model volatility changes gives the market practitioners flexibility to build more accurate models on high frequency financial data. The corporate rating transition modeling was historically dealing with low frequency data which did not have the need to specify the volatility. However, the newly published Moody's market implied ratings are exhibiting much higher transition frequencies. Therefore, we feel that it is necessary to capture the volatility component and make extensions to existing models to reflect this fact. The theoretical model specification and estimation details are discussed thoroughly in this dissertation. The performance of our models is studied on several simulated data sets and compared to the original model. Finally, the models are applied to both Moody's issuer rating and market implied rating transition data as an application. / A Dissertation Submitted to the Department of Statistics in Partial FulfiLlment of
the Requirements for the Degree of Doctor of Philosophy. / Fall Semester, 2010. / October 19, 2010. / Rating Transition Analysis / Includes bibliographical references. / Xufeng Niu, Professor Co-Directing Dissertation; Fred Huffer, Professor Co-Directing Dissertation; Alec Kercheval, Outside Committee Member; Wei Wu, Committee Member.
|
56 |
The Effect of Risk Factors on Coronary Heart Disease: An Age-Relevant Multivariate Meta AnalysisUnknown Date (has links)
The importance of major risk factors, such as hypertension, total cholesterol, body mass index, diabetes, smoking, for predicting incidence and mortality of Coronary Heart Disease (CHD) is well known. In light of the fact that age is also a major risk factor for CHD death, a natural question is whether the risk effects on CHD change with age. This thesis focuses on examining the interaction between age and risk factors using data from multiple studies containing differing age ranges. The aim of my research is to use statistical methods to determine whether we can combine these diverse results to obtain an overall summary, using which one can find how the risk effects on CHD death change with age. One intuitive approach is to use classical meta analysis based on generalized linear models. More specifically, one can fit a logistic model with CHD death as response and age, a risk factor and their interaction as covariates for each of the studies, and conduct meta analysis on every set of three coefficients in the multivariate setting to obtain 'synthesized' coefficients. Another aspect of the thesis is a new method, meta analysis with respect to curves that goes beyond linear models. The basic idea is that one can choose the same spline with the same knots on covariates, say age and systolic blood pressure (SBP), for all the studies to ensure common basis functions. The knot-based tensor product basis coefficients obtained from penalized logistic regression can be used for multivariate meta analysis. Using the common basis functions and the 'synthesized' knot-based basis coefficients from meta analysis, a two-dimensional smooth surface on the age-SBP domain is estimated. By cutting through the smooth surface along two axes, the resulting slices show how the risk effect on CHD death change at an arbitrary age as well as how the age effect on CHD death change at an arbitrary SBP value. The application to multiple studies will be presented. / A Dissertation Submitted to the Department of Statistics in Partial FulfiLlment of
the Requirements for the Degree of Doctor of Philosophy. / Fall Semester, 2010. / August 18, 2010. / Meta Analysis, Coronary Heart Disease / Includes bibliographical references. / Dan McGee, Professor Co-Directing Dissertation; Yiyuan She, Professor Co-Directing Dissertation; Ike Eberstein, University Representative; Xufeng Niu, Committee Member.
|
57 |
Flexible Additive Risk Models Using Piecewise Constant Hazard FunctionsUnknown Date (has links)
We study a weighted least squares (WLS) estimator for Aalen's additive risk model which allows for a very flexible handling of covariates. We divide the follow-up period into intervals and assume a constant hazard rate in each interval. The model is motivated as a piecewise approximation of a hazard function composed of three parts: arbitrary nonparametric functions for some covariate effects, smoothly varying functions for others, and known (or constant) functions for yet others. The proposed estimator is an extension of the grouped data version of the Huffer-McKeague estimator (1991). Our estimator may also be regarded as a piecewise constant analog of the semiparametric estimates of McKeague & Sasieni (1994), and Lin & Ying (1994). By using a fairly large number of intervals, we should get an essentially semiparametric model similar to the McKeague-Sasieni and Lin-Ying approaches. For our model, since the number of parameters is finite (although large), conventional approaches (such as maximum likelihood) are easy to formulate and implement. The approach is illustrated by simulations, and is applied to data from the Framingham heart study. / A Dissertation Submitted to the Department of Statistics in Partial FulfiLlment of
the Requirements for the Degree of Doctor of Philosophy. / Fall Semester, 2007. / August 10, 2007. / Risk, Additive, Hazard / Includes bibliographical references. / Fred W. Huffer, Professor Directing Dissertation; Alec Kercheval, Outside Committee Member; Dan McGee, Committee Member; Xufeng Niu, Committee Member.
|
58 |
Association Models for Clustered Data with Binary and Continuous ResponsesUnknown Date (has links)
This dissertation develops novel single random effect models as well as bivariate correlated random effects model for clustered data with bivariate mixed responses. Logit and identity link functions are used for the binary and continuous responses. For the ease of interpretation of the regression effects, random effect of the binary response has bridge distribution so that the marginal model of mean of the binary response after integrating out the random effect preserves logistic form. And the marginal regression function of the continuous response preserves linear form. Within-cluster and within-subject associations could be measured by our proposed models. For the bivariate correlated random effects model, we illustrate how different levels of the association between two random effects induce different Kendall's tau values for association between the binary and continuous responses from the same cluster. Fully parametric and semi-parametric Bayesian methods as well as maximum likelihood method are illustrated for model analysis. In the semiparametric Bayesian model, normality assumption of the regression error for the continuous response is relaxed by using a nonparametric Dirichlet Process prior. Robustness of the bivariate correlated random effects model using ML method to misspecifications of regression function as well as random effect distribution is investigated by simulation studies. The Bayesian and likelihood methods are applied to a developmental toxicity study of ethylene glycol in mice. / A Dissertation Submitted to the Department of Statistics in Partial FulfiLlment of
the Requirements for the Degree of Doctor of Philosophy. / Spring Semester, 2009. / April 8, 2009. / Dirichlet Process Prior, Bivariate Binary And Continuous Responses, Copula Model, Bridge Distribution, Bayesian Analysis, MCMC / Includes bibliographical references. / Debajyoti Sinha, Professor Directing Dissertation; Myra Hurt, Outside Committee Member; Stuart R. Lipsitz, Committee Member; Daniel McGee, Committee Member.
|
59 |
A Method for Finding the Nadir of Non-Monotonic RelationshipsUnknown Date (has links)
Different methods have been proposed to model the J-shaped or U-shaped relationship between a risk factor and mortality so that the optimal risk-factor value (nadir) associated with the lowest mortality can be estimated. The basic model considered is the Cox Proportional Hazards model. Current methods include a quadratic method, a method with transformation, fractional polynomials, a change point method and fixed-knot spline regression. A quadratic method contains both the linear and the quadratic term of the risk factor, it is simple but often it generates unrealistic nadir estimates. The transformation method converts the original risk factor so that after transformation it has a Normal distribution, but this may not work when there is no good transformation to normality. Fractional polynomials are an extended class of regular polynomials that applies negative and fractional powers to the risk factor. Compared with the quadratic method or the transformation method it does not always have a good model interpretation and inferences about it do not incorporate the uncertainty coming from pre-selection of powers and degree. A change point method models the prognostic index using two pieces of upward quadratic functions that meet at their common nadir. This method assumes the knot and the nadir are the same, which is not always true. Fixed-knot spline regression has also been used to model non-linear prognostic indices. But its inference does not account for variation arising from knot selections. Here we consider spline regressions with free knots, a natural generalization of the quadratic, the change point and the fixed-knot spline method. They can be applied to risk factors that do not have a good transformation to normality as well as keep intuitive model interpretations. Asymptotic normality and consistency of the maximum partial likelihood estimators are established under certain condition. When the condition is not satisfied simulations are used to explore asymptotic properties. The new method is motivated by and applied to the nadir estimation in non-monotonic relationships between BMI (body mass index) and all-cause mortality. Its performance is compared with that of existing methods, adopting criteria of nadir estimation ability and goodness of fit. / A Dissertation Submitted to the Department of Statistics in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy. / Fall Semester, 2007. / November 8, 2007. / nadir estimation, polynomial spline, free-knot, survival analysis, non-linear regression, Cox model / Includes bibliographical references. / Daniel McGee, Professor Directing Dissertation; Donald Lloyd, Outside Committee Member; Fred Huffer, Committee Member; Xufeng Niu, Committee Member; Gareth Dutton, Committee Member.
|
60 |
A Class of Mixed-Distribution Models with Applications in Financial Data AnalysisUnknown Date (has links)
Statisticians often encounter data in the form of a combination of discrete and continuous outcomes. A special case is zero-inflated longitudinal data where the response variable has a large portion of zeros. These data exhibit correlation because observations are obtained on the same subjects over time. In this dissertation, we propose a two-part mixed distribution model to model zero-inflated longitudinal data. The first part of the model is a logistic regression model that models the probability of nonzero response; the other part is a linear model that models the mean response given that the outcomes are not zeros. Random effects with AR(1) covariance structure are introduced into both parts of the model to allow serial correlation and subject specific effect. Estimating the two-part model is challenging because of high dimensional integration necessary to obtain the maximum likelihood estimates. We propose a Monte Carlo EM algorithm for estimating the maximum likelihood estimates of parameters. Through simulation study, we demonstrate the good performance of the MCEM method in parameter and standard error estimation. To illustrate, we apply the two-part model with correlated random effects and the model with autoregressive random effects to executive compensation data to investigate potential determinants of CEO stock option grants. / A Dissertation Submitted to the Department of Statistics in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy. / Spring Semester, 2011. / March 16, 2011. / MCEM Algorithm, Mixed-Distribution Models, CEO Compensation / Includes bibliographical references. / Xufeng Niu, Professor Directing Dissertation; Yingmei Cheng, University Representative; Wei Wu, Committee Member; Fred Huffer, Committee Member.
|
Page generated in 0.0374 seconds