41 |
Bayesian Portfolio Optimization with Time-Varying Factor ModelsUnknown Date (has links)
We develop a modeling framework to simultaneously evaluate various types of predictability in stock returns, including stocks' sensitivity ("betas") to systematic risk factors, stocks' abnormal returns unexplained by risk factors ("alphas"), and returns of risk factors in excess of the risk-free rate ("risk premia"). Both firm-level characteristics and macroeconomic variables are used to predict stocks' time-varying alphas and betas, and macroeconomic variables are used to predict the risk premia. All of the models are specified in a Bayesian framework to account for estimation risk, and informative prior distributions on both stock returns and model parameters are adopted to reduce estimation error. To gauge the economic signicance of the predictability, we apply the models to the U.S. stock market and construct optimal portfolios based on model predictions. Out-of-sample performance of the portfolios is evaluated to compare the models. The empirical results confirm predictabiltiy from all of the sources considered in our model: (1) The equity risk premium is time-varying and predictable using macroeconomic variables; (2) Stocks' alphas and betas differ cross-sectionally and are predictable using firm-level characteristics; and (3) Stocks' alphas and betas are also timevarying and predictable using macroeconomic variables. Comparison of different sub-periods shows that the predictability of stocks' betas is persistent over time, but the predictability of stocks' alphas and the risk premium has diminished to some extent. The empirical results also suggest that Bayesian statistical techinques, especially the use of informative prior distributions, help reduce model estimation error and result in portfolios that out-perform the passive indexing strategy. The findings are robust in the presence of transaction costs. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. / Spring Semester, 2011. / February 11, 2011. / Stock Return Predictability, Bayesian Portfolio Optimization / Includes bibliographical references. / Xufeng Niu, Professor Directing Dissertation; Yingmei Cheng, University Representative; Fred W. Huffer, Committee Member; Jinfeng Zhang, Committee Member.
|
42 |
Goodness-of-Tests for Logistic RegressionUnknown Date (has links)
The generalized linear model and particularly the logistic model are widely used in public health, medicine, and epidemiology. Goodness-of-fit tests for these models are popularly used to describe how well a proposed model fits a set of observations. These different goodness-of-fit tests all have individual advantages and disadvantages. In this thesis, we mainly consider the performance of the "Hosmer-Lemeshow" test, the Pearson's chi-square test, the unweighted sum of squares test and the cumulative residual test. We compare their performance in a series of empirical studies as well as particular simulation scenarios. We conclude that the unweighted sum of squares test and the cumulative sums of residuals test give better overall performance than the other two. We also conclude that the commonly suggested practice of assuming that a p-value less than 0.15 is an indication of lack of fit at the initial steps of model diagnostics should be adopted. Additionally, D'Agostino et al. presented the relationship of the stacked logistic regression and the Cox regression model in the Framingham Heart Study. So in our future study, we will examine the possibility and feasibility of the adaption these goodness-of-fit tests to the Cox proportional hazards model using the stacked logistic regression. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. / Fall Semester, 2010. / August 19, 2010. / Generalized Linear Model, Stacked Logistic Regression, Goodness-of-fit Tests, Logistic Regression / Includes bibliographical references. / Dan L. McGee, Professor Co-Directing Dissertation; Jinfeng Zhang, Professor Co-Directing Dissertation; Myra Hurt, University Representative; Debajyoti Sinha, Committee Member.
|
43 |
Analysis of Multivariate Data with Random Cluster SizeUnknown Date (has links)
In this dissertation, we examine binary correlated data with present/absent component or missing data that are related to binary responses of interest. Depending on the data structure, correlated binary data can be referred as emph{clustered data} if sampling unit is a cluster of subjects, or it can be referred as emph{longitudinal data} when it involves repeated measurement of same subject over time. We propose our novel models in these two data structures and illustrate the model with real data applications. In biomedical studies involving clustered binary responses, the cluster size can vary because some components of the cluster can be absent. When both the presence of a cluster component as well as the binary disease status of a present component are treated as responses of interest, we propose a novel two-stage random effects logistic regression framework. For the ease of interpretation of regression effects, both the marginal probability of presence/absence of a component as well as the conditional probability of disease status of a present component, preserve the approximate logistic regression forms. We present a maximum likelihood method of estimation implementable using standard statistical software. We compare our models and the physical interpretation of regression effects with competing methods from literature. We also present a simulation study to assess the robustness of our procedure to wrong specification of the random effects distribution and to compare finite sample performances of estimates with existing methods. The methodology is illustrated via analyzing a study of the periodontal health status in a diabetic Gullah population. We extend this model in longitudinal studies with binary longitudinal response and informative missing data. In longitudinal studies, when treating each subject as a cluster, cluster size is the total number of observations for each subject. When data is informatively missing, cluster size of each subject can vary and is related to the binary response of interest and we are also interested in the missing mechanism. This is a modified situation of the cluster binary data with present components. We modify and adopt our proposed two-stage random effects logistic regression model so that both the marginal probability of binary response and missing indicator as well as the conditional probability of binary response and missing indicator preserve logistic regression forms. We present a Bayesian framework of this model and illustrate our proposed model on an AIDS data example. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. / Spring Semester, 2011. / December 2, 2010. / Clustered data, Longitudinal data analysis, Informative missing, Categorical data anlaysis, Logistic regression, Bridge distribution / Includes bibliographical references. / Debajyoti Sinha, Professor Directing Dissertation; Yi Zhou, University Representative; Dan McGee, Committee Member; Stuart Lipsitz, Committee Member.
|
44 |
Covariance on ManifoldsUnknown Date (has links)
With ever increasing complexity of observational and theoretical data models, the sufficiency of the classical statistical techniques, designed to be applied only on vector quantities, is being challenged. Nonlinear statistical analysis has become an area of intensive research in recent years. Despite the impressive progress in this direction, a unified and consistent framework has not been reached. In this regard, the following work is an attempt to improve our understanding of random phenomena on non-Euclidean spaces. More specifically, the motivating goal of the present dissertation is to generalize the notion of distribution covariance, which in standard settings is defined only in Euclidean spaces, on arbitrary manifolds with metric. We introduce a tensor field structure, named covariance field, that is consistent with the heterogeneous nature of manifolds. It not only describes the variability imposed by a probability distribution but also provides alternative distribution representations. The covariance field combines the distribution density with geometric characteristics of its domain and thus fills the gap between these two.We present some of the properties of the covariance fields and argue that they can be successfully applied to various statistical problems. In particular, we provide a systematic approach for defining parametric families of probability distributions on manifolds, parameter estimation for regression analysis, nonparametric statistical tests for comparing probability distributions and interpolation between such distributions. We then present several application areas where this new theory may have potential impact. One of them is the branch of directional statistics, with domain of influence ranging from geosciences to medical image analysis. The fundamental level at which the covariance based structures are introduced, also opens a new area for future research. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. / Spring Semester, 2009. / March 25, 2009. / Statistics, Manifolds, Covariance / Includes bibliographical references. / Anuj Srivastava, Professor Directing Dissertation; Eric Klassen, Outside Committee Member; Victor Patrangenaru, Committee Member; Daniel McGee, Committee Member.
|
45 |
Some New Methods for Design and Analysis of Survival DataUnknown Date (has links)
For survival outcomes, usually, statistical equivalent tests to show a new treatment therapeutically equivalent to a standard treatment are based on the Cox (1972) proportional hazards assumption. We present an alternative method based on the linear transformation model (LTM) for two treatment arms, and show the advantages of using this equivalence test instead of tests based on the Cox's model. LTM is a very general class of models including models such as the proportional odds survival model (POSM). We presented a sufficient condition to check whether log-rank based tests have inflated Type I error rates. We show that POSM and some other commonly used survival models within the LTM class all satisfy this condition. Simulation studies show that repeated use of our test instead of using log-rank based tests will be a safer statistical practice. Our second goal is to develop a practical Bayesian model for survival data with high dimensional covariate vector. We develop the Information Matrix (IM) and Information Matrix Ridge (IMR) priors for commonly used survival models including the Cox's model and the cure rate model proposed by Chen et al. (1999), and examine many desirable theoretical properties including sufficient conditions for the existence of the moment generating functions for these priors and corresponding posterior distributions. The performance of these priors in practice is compared with some competing priors via the Bayesian analysis of a study that investigates the relationship between lung cancer survival time and a large number of genetic markers. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. / Fall Semester, 2010. / September 14, 2010. / Type I Error, Fisher Information, Prior Elicitation, Semiparametric model, Therapeutic Equivalence / Includes bibliographical references. / Debajyoti Sinha, Professor Directing Dissertation; Bahram H. Arjmandi, University Representative; Dan McGee, Committee Member; Xufeng Niu, Committee Member; Kai Yu, Committee Member.
|
46 |
Bayesian Generalized Polychotomous Response Models and ApplicationsUnknown Date (has links)
Polychotomous quantal response models are widely used in medical and econometric studies to analyze categorical or ordinal data. In this study, we apply the Bayesian methodology through a mixed-effects polychotomous quantal response model. For the Bayesian polychotomous quantal response model, we assume uniform improper priors for the regression coeffcients and explore the suffcient conditions for a proper joint posterior distribution of the parameters in the models. Simulation results from Gibbs sampling estimates will be compared to traditional maximum likelihood estimates to show the strength that using the uniform improper priors for the regression coeffcients. Motivated by investigating of relationship between BMI categories and several risk factors, we carry out the application studies to examine the impact of risk factors on BMI categories, especially for categories of "Overweight" and "Obesities". By applying the mixed-effects Bayesian polychotomous response model with uniform improper priors, we would get similar interpretations of the association between risk factors and BMI, comparing to literature findings. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. / Fall Semester, 2010. / October 19, 2010. / Bayesian, Polychotomous / Includes bibliographical references. / Xu-Feng Niu, Professor Directing Dissertation; Suzanne B. Johnson, University Representative; Dan McGee, Committee Member; Fred Huffer, Committee Member.
|
47 |
A Probabilistic and Graphical Analysis of Evidence in O.J. Simpson's Murder Case Using Bayesian NetworksUnknown Date (has links)
This research work is an attempt to illustrate the versatility and wide applications of the field of statistical science. Specifically, the research work involves the application of statistics in the field of law. The application will focus on the sub-fields of Evidence and Criminal law using one of the most celebrated cases in the history of American jurisprudence - the 1994 O.J. Simpson murder case in California. Our task here is to do a probabilistic and graphical analysis of the body of evidence in this case using Bayesian Networks. We will begin the analysis by first constructing our main hypothesis regarding the guilt or non-guilt of the accused; this main hypothesis will be supplemented by a series of ancillary hypotheses. Using graphs and probability concepts, we will be evaluating the probative force or strength of the evidence and how well the body of evidence at hand will prove our main hypothesis. We will employ Bayes rule, likelihoods and likelihood ratios to carry out such an evaluation. Some sensitivity analyses will be carried out by varying the degree of our prior beliefs or probabilities, and evaluating the effect of such variations on the likelihood ratios regarding our main hypothesis. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Ph.D. / Fall Semester, 2010. / October 14, 2010. / O.J.Simpson, Bayesian Networks, Analysis ofEvidence / Includes bibliographical references. / Fred Huffer, Professor Directing Dissertation; Valerie Shute, University Representative; Debajyoti Sinha, Committee Member; Xufeng Niu, Committee Member; Wayne Logan, Committee Member.
|
48 |
Investigating the Use of Mortality Data as a Surrogate for Morbidity DataUnknown Date (has links)
We are interested in differences between risk models based on Coronary Heart Disease (CHD) incidence, or morbidity, compared to risk models based on CHD death. Risk models based on morbidity have been developed based on the Framingham Heart Study, while the European SCORE project developed a risk model for CHD death. Our goal is to determine whether these two developed models differ in treatment decisions concerning patient heart health. We begin by reviewing recent metrics in surrogate variables and prognostic model performance. We then conduct bootstrap hypotheses tests between two Cox proportional hazards models using Framingham data, one with incidence as a response, and one with death as a response, and find that the coefficients differ for the age covariate, but find no significant differences for the other risk factors. To understand how surrogacy can be applied to our case, where the surrogate variable is nested within the true variable of interest, we examine models based on a composite event compared to models based on singleton events. We also conduct a simulation, simulating times to a CHD incidence and time from CHD incidence to CHD death, censoring at 25 years to represent the end of a study. We compare a Cox model with death response with a Cox model based on incidence using bootstrapped confidence intervals, and find that age and systolic blood pressure have differences with their covariates. We continue the simulation by using Net Reclassification Index (NRI) to evaluate the treatment decision performance of the two models, and find that the two models do not perform significantly different in correctly classifying events, if the decisions are based on the risk ranks of the individuals. As long as the relative order of patients' risks is preserved across different risk models, treatment decisions based on classifying an upper specified percent as high risk will not be significantly different. We conclude the dissertation with statements about future methods for approaching our question. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. / Summer Semester, 2011. / June 7, 2011. / Risk Models, Morbidity, Mortality, Cox Proportional Hazards / Includes bibliographical references. / Myles Hollander, Professor Co-Directing Dissertation; Daniel McGee, Professor Co-Directing Dissertation; Myra Hurt, University Representative; Wei Wu, Committee Member; Jinfeng Zhang, Committee Member.
|
49 |
Nonparametric Estimation of Three Dimensional Projective Shapes with Applications in Medical Imaging and in Pattern RecognitionUnknown Date (has links)
This dissertation is on analysis of invariants of a 3D configuration from its 2D images in pictures of this configuration, without requiring any restriction on the camera positioning relative to the scene pictured. We briefly review some of the main results found in the literature. The methodology used is nonparametric, manifold based combined with standard computer vision re-construction techniques. More specifically, we use asymptotic results for the extrinsic sample mean and the extrinsic sample covariance to construct boot-strap confidence regions for mean projective shapes of 3D configurations. Chapters 4, 5 and 6 contain new results. In chapter 4, we develop tests for coplanarity. In chapter 5, is on reconstruction of 3D polyhedral scenes, including texture from arbitrary partial views. In chapter 6, we develop a nonparametric methodology for estimating the mean change for matched samples on a Lie group. We then notice that for k ≥ 4, a manifold of projective shapes of k-ads in general position in 3D has a structure of 3k − 15 dimensional Lie group (P-Quaternions) that is equivariantly embedded in an Euclidean space, therefore testing for mean 3D projective shape change amounts to a one sample test for extrinsic mean PQuaternion Objects. The Lie group technique leads to a large sample and nonparametric bootstrap test for one population extrinsic mean on a projective shape space, as recently developed by Patrangenaru, Liu and Sughatadasa. On the other hand, in absence of occlusions, the 3D projective shape of a spatial configuration can be recovered from a stereo pair of images, thus allowing to test for mean glaucomatous 3D projective shape change detection from standard stereo pairs of eye images. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. / Summer Semester, 2010. / May 3, 2010. / Extrinsic Mean, Statistics on Manifolds, Nonparametric Bootstrap, Image Analysis / Includes bibliographical references. / Victor Patrangenaru, Professor Directing Dissertation; Xiuwen Liu, University Representative; Fred W. Huffer, Committee Member; Debajyoti Sinha, Committee Member.
|
50 |
Estimating the Probability of Cardiovascular Disease: A Comparison of MethodsUnknown Date (has links)
Risk prediction plays an important role in clinical medicine. It not only helps in educating patients to improve life style and in targeting individuals at high risk, but also guides treatment decisions. So far, various instruments have been used for different risk assessment in different countries and the risk predictions based from these different models are not consistent. In public use, a reliable risk prediction is necessary. This thesis discusses the models that have been developed for risk assessment and evaluates the performance of prediction at two levels, including the overall level and the individual level. At the overall level, cross validation and simulation are used to assess the risk prediction, while at the individual level, the "Parametric Bootstrap" and the delta method are used to evaluate the uncertainty of the individual risk prediction. Further exploration of the reasons producing different performance among the models is ongoing. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of
Doctor of Philosophy. / Fall Semester, 2009. / May 5, 2009. / Cardiovascular Disease, Risk Prediction, Logistic, Cox PH, Weibull, VLDAFT / Includes bibliographical references. / Daniel McGee, Professor Directing Dissertation; Myra Hurt, University Representative; XuFeng Niu, Committee Member; Fred Huffer, Committee Member.
|
Page generated in 0.0569 seconds