61 |
Interrelating of Longitudinal Processes: An Empirical ExampleUnknown Date (has links)
The Barker Hypothesis states that maternal and `in utero' attributes during pregnancy affects a child's cardiovascular health throughout life. We present an analysis of a unique longitudinal dataset from Jamaica that consists of three longitudinal processes: (i) Maternal longitudinal process- Blood pressure and anthropometric measurements at seven time-points on the mother during pregnancy. (ii) In Utero measurements - Ultrasound measurements of the fetus taken at six time-points during pregnancy. (iii) Birth to present process - Children's anthropometric and blood pressure measurements at 24 time-points from birth to 14 years. A comprehensive analysis of the interrelationship of these three longitudinal processes is presented using joint modeling for multivariate longitudinal profiles. We propose a new methodology of examining child's cardiovascular risk by extending a current view of likelihood estimation. Joint modeling of multivariate longitudinal profiles is done and the extension of the traditional likelihood method is utilized in this paper and compared to the maximum likelihood estimates. Our main goal is to examine whether the process in mothers predicts fetal development which in turn predicts the future cardiovascular health of the children. One of the difficulties with `in utero' and early childhood data is that certain variables are highly correlated and so using dimension reduction techniques are quite applicable in this scenario. Principal component analysis (PCA) is utilized in creating a smaller dimension of uncorrelated data which is then utilized in a longitudinal analysis setting. These principal components are then utilized in an optimal linear mixed model for longitudinal data which indicates that in utero and early childhood attributes predicts the future cardiovascular health of the children. This dissertation has added a body of knowledge to developmental origins of adult diseases and has supplied some significant results while utilizing a rich diversity of statistical methodologies. / A Dissertation Submitted to the Department of Statistics in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy. / Summer Semester, 2011. / May 16, 2011. / Principal Component, Cardiovascular, Fetal Origins, Pseudolikelihood, Linear Mixed Model, Longitudinal / Includes bibliographical references. / Daniel McGee, Professor Directing Dissertation; Cathy Levenson, University Representative; Debajyoti Sinha, Committee Member; Clive Osmond, Committee Member; Xufeng Niu, Committee Member.
|
62 |
Logistic Regression, Measures of Explained Variation, and the Base Rate ProblemUnknown Date (has links)
One of the desirable properties of the coefficient of determinant (R2 measure) is that its values for different models should be comparable whether the models differ in one or more predictors, or in the dependent variable, or whether the models are specified as being different for different subsets of a dataset. This allows researchers to compare adequacy of models across subgroups of the population or models with different but related dependent variables. However, the various analogs of the R2 measure used for logistic regression analysis are highly sensitive to the base rate (proportion of successes in the sample) and thus do not possess this property. An R2 measure sensitive to the base rate is not suitable to comparison for the same or different model on different datasets, different subsets of a dataset or different but related dependent variables. We evaluated 14 R2 measures that have been suggested or might be useful to measure the explained variation in the logistic regression models based on three criteria 1) intuitively reasonable interpret ability; 2) numerical consistency with the Rho2 of underlying model, and 3) the base rate sensitivity. We carried out a Monte Carlo Simulation study to examine the numerical consistency and the base rate dependency of the various R2 measures for logistic regression analysis. We found all of the parametric R2 measures to be substantially sensitive to the base rate. The magnitude of the base rate sensitivity of these measures tends to be further influenced by the rho2 of the underlying model. None of the measures considered in our study are found to perform equally well in all of the three evaluation criteria used. While R2L stands out for its intuitively reasonable interpretability as a measures of explained variation as well as its independence from the base rate, it appears to severely underestimate the underlying rho2. We found R2CS to be numerically most consistent with the underlying Rho2, with R2N its nearest competitor. In addition, the base rate sensitivity of these two measures appears to be very close to that of the R2L, the most base rate invariant parametric R2 measure. Therefore, we suggest to use R2CS and R2N for logistic regression modeling, specially when it is reasonable to believe that a underlying latent variable exists. However, when the latent variable does not exit, comparability with theunderlying rho2 is not an issue and R2L might be a better choice over all the R2 measures. / A Dissertation Submitted to the Department of Statistics in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy. / Summer Semester, 2006. / June 26, 2006. / Logistic Regression, Explained Variation, Base Rate, Base Rate Problem, Coefficient of Determinant, R^2 Statistics, Latent Variable / Includes bibliographical references. / Daniel L. McGee, Sr., Professor Directing Dissertation; Myra Hurt, Outside Committee Member; Xu-Feng Niu, Committee Member; Eric Chicken, Committee Member.
|
63 |
Optimal Linear Representations of Images under Diverse CriteriaUnknown Date (has links)
Image analysis often requires dimension reduction before statistical analysis, in order to apply sophisticated procedures. Motivated by eventual applications, a variety of criteria have been proposed: reconstruction error, class separation, non-Gaussianity using kurtosis, sparseness, mutual information, recognition of objects, and their combinations. Although some criteria have analytical solutions, the remaining ones require numerical approaches. We present geometric tools for finding linear projections that optimize a given criterion for a given data set. The main idea is to formulate a problem of optimization on a Grassmann or a Stiefel manifold, and to use differential geometry of the underlying space to construct optimization algorithms. Purely deterministic updates lead to local solutions, and addition of random components allows for stochastic gradient searches that eventually lead to global solutions. We demonstrate these results using several image datasets, including natural images and facial images. / A Dissertation Submitted to the Department of Statistics in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy. / Summer Semester, 2006. / June 29, 2006. / Tangent Spaces of Manifolds, Efficient Algorithm, Matrix Exponent, Gradient Optimization, Entropy / Includes bibliographical references. / Anuj Srivastava, Professor Directing Dissertation; Xiuwen Liu, Outside Committee Member; Fred Huffer, Committee Member; Eric Chicken, Committee Member.
|
64 |
A Bayesian Approach to Meta-Regression: The Relationship Between Body Mass Index and All-Cause MortalityUnknown Date (has links)
This thesis presents a Bayesian approach to Meta-Regression and Individual Patient Data (IPD) Meta-analysis. The focus of the research is on establishing the relationship between Body Mass Index (BMI) and all-cause mortality. This has been an area of continuing interest in the medical and public health communities and no concensus has been reached on what the optimal weight for individuals is. Standards are usually speci ed in terms of body mass index (BMI = wt(kg) over height(m)2 ) which is associated with body fat percentage. Many studies in the literature have modelled the relationship between BMI and mortality and reported a variety of relationships including U-shaped, J-shaped and linear curves. The aim of my research was to use statistical methods to determine whether we can combine these diverse results an obtain single estimated relationship, using which one can nd the point of minimum mortality and establish reasonable ranges for optimal BMI or how we can best examine the reasons for the heterogeneity of results. Commonly used techniques of Meta-analysis and Meta-regression are explored and a problem with the estimation procedure in the multivariate setting is presented. A Bayesian approach using Hierarchical Generalized Linear Mixed Model is suggested and implemented to overcome this drawback of standard estimation techniques. Another area which is explored briefly is that of Individual Patient Data meta-analysis. A Frailty model or Random Effects Proportional Hazards Survival model approach is proposed to carry out IPD meta-regression and come up with a single estimated relationship between BMI and mortality, adjusting for the variation between studies. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester, 2007. / June 3, 2007. / Bayesian Hierarchical Models, Frailty Models, BMI, Meta-Analysis, Meta-Regression / Includes bibliographical references. / Dan McGee, Professor Directing Dissertation; Myra Hurt, Outside Committee Member; Xiufeng Niu, Committee Member; Fred Huffer, Committee Member.
|
65 |
Stochastic Models and Inferences for Commodity Futures PricingUnknown Date (has links)
The stochastic modeling of financial assets is essential to the valuation of financial products and investment decisions. These models are governed by certain parameters that are estimated through a process known as calibration. Current procedures typically perform a grid-search optimization of a given objective function over a specified parameter space. These methods can be computationally intensive and require restrictions on the parameter space to achieve timely convergence. In this thesis, we propose an alternative Kalman Smoother Expectation Maximization procedure (KSEM) that can jointly estimate all the parameters and produces better model t that compared to alternative estimation procedures. Further, we consider the additional complexity of the modeling of jumps or spikes that may occur in a time series. For this calibration we develop a Particle Smoother Expectation Maximization procedure (PSEM) for the optimization of nonlinear systems. This is an entirely new estimation approach, and we provide several examples of it's application. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester, 2009. / July 17, 2009. / Particle Smoothing, EM Algorithm, Particle Filter Kalman Filter, Kalman Smoothing, Parameter Learning, Gaussian Mixture / Includes bibliographical references. / Anuj Srivastava, Professor Co-Directing Dissertation; James Doran, Professor Co-Directing Dissertation; Patrick Mason, Outside Committee Member; Xufeng Niu, Committee Member; Fred Huffer, Committee Member; Wei Wu, Committee Member.
|
66 |
A Bayesian MRF Framework for Labeling Terrain Using Hyperspectral ImagingUnknown Date (has links)
We explore the non-Gaussianity of hyperspectral data and present probability models that capture variability of hyperspectral images. In particular, we present a nonparametric probability distribution that models the distribution of the hyperspectral data after reducing the dimension of the data via either principal components or Fisher's discriminant analysis. We also explore the directional differences in observed images and present two parametric distributions, the generalized Laplacian and the Bessel K form, that well model the non-Gaussian behavior of the directional differences. We then propose a model that labels each spatial site, using Bayesian inference and Markov random fields, that incorporates the information of the non-parametric distribution of the data, and the parametric distributions of the directional differences, along with a prior distribution that favors smooth labeling. We then test our model on actual hyperspectral data and present the results of our model, using the Washington D.C. Mall and Indian Springs rural area data sets. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester, 2004. / August 27, 2004. / Hyperspectral, Bayesian, Labeling, Gibbs Random Fields, Markov Random Fields / Includes bibliographical references. / Anuj Srivastava, Professor Directing Dissertation; Xiuwen Liu, Outside Committee Member; Fred Huffer, Committee Member; Marten Wegkamp, Committee Member.
|
67 |
Spatiotemporal Bayesian Hierarchical Models, with Application to Birth OutcomesUnknown Date (has links)
A class of hierarchical Bayesian models is introduced for adverse birth outcomes such as preterm birth, which are assumed to follow a conditional binomial distribution. The log-odds of an adverse outcome in a particular county, logit(p(i)), follows a linear model which includes observed covariates and normally-distributed random effects. Spatial dependence between neighboring regions is allowed for by including an intrinsic autoregressive (IAR) prior or an IAR convolution prior in the linear predictor. Temporal dependence is incorporated by including a temporal IAR term also. It is shown that the variance parameters underlying these random effects (IAR, convolution, convolution plus temporal IAR) are identifiable. The same results are also shown to hold when the IAR is replaced by a conditional autoregressive (CAR) model. Furthermore, properties of the CAR parameter ρ are explored. The Deviance Information Criterion (DIC) is considered as a way to compare spatial hierarchical models. Simulations are performed to test whether the DIC can identify whether binomial outcomes come from an IAR, an IAR convolution, or independent normal deviates. Having established the theoretical foundations of the class of models and validated the DIC as a means of comparing models, we examine preterm birth and low birth weight counts in the state of Arkansas from 1994 to 2005. We find that preterm birth and low birth weight have different spatial patterns of risk, and that rates of low birth weight can be fit with a strikingly simple model that includes a constant spatial effect for all periods, a linear trend, and three covariates. It is also found that the risks of each outcome are increasing over time, even with adjustment for covariates. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester, 2008. / November 16, 2007. / Conditional Autoregressive, Intrinsically Autoregressive, Disease Mapping, Spatial Statistics, Preterm Birth, Low Birth Weight / Includes bibliographical references. / Xufeng Niu, Professor Directing Dissertation; Isaac Eberstein, Outside Committee Member; Fred Huffer, Committee Member; Daniel McGee, Committee Member.
|
68 |
Impact of Missing Data on Building Prognostic Models and Summarizing Models Across StudiesUnknown Date (has links)
We examine the impact of missing data in two settings, the development of prognostic models and the addition of new risk factors to existing risk functions. Most statistical software presently available perform complete case analysis, wherein only participants with known values for all of the characteristics being analyzed are included in model development. Missing data also impacts the summarization of evidence amongst multiple studies using meta-analytic techniques. As we progress in medical research, new covariates become available for studying various outcomes. While we want to investigate the influence of new factors on the outcome, we also do not want to discard the historical datasets that do not have information about these markers. Our research plan is to investigate different methods to estimate parameters for a model when some of the covariates are missing. These methods include likelihood based inference for the study-level coefficients and likelihood based inference for the logistic model on the person-level data. We compare the results from our methods to the corresponding results from complete case analysis. We focus our empirical investigation on a historical example, the addition of high density lipoproteins to existing equations for predicting death due to coronary heart disease. We verify our methods through simulation studies on this example. / A Dissertation Submitted to the Department of Statistics in Partial FulfiLlment of the Requirements for the Degree of Doctor of Philosophy. / Fall Semester, 2005. / September 9, 2005. / Coronary Heart Disease, Stratified Model, Summary Coefficients, Maximum Likelihood Estimation, Logistic Model, Missing Data / Includes bibliographical references. / Daniel McGee, Sr., Professor Directing Dissertation; Isaac Eberstein, Outside Committee Member; Myles Hollander, Committee Member; Xufeng Niu, Committee Member; Somesh Chattopadhyay, Committee Member.
|
69 |
Statistical Modelling and Applications of Neural Spike TrainsUnknown Date (has links)
In this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the state-space Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target information into the model. We examine both of these modelling frameworks on motor cortex data recorded from monkeys performing different target-driven hand and arm movement tasks. Finally, we perform temporal coding analysis of sensory stimulation using principled statistical models and show the efficacy of our approach. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester, 2011. / March 24, 2011. / generalized linear model, Neural coding, state space model / Includes bibliographical references. / Wei Wu, Professor Directing Thesis; Robert J. Contreras, University Representative; Anuj Srivastava, Committee Member; Fred Huffer, Committee Member; Xufeng Niu, Committee Member.
|
70 |
Statistical Models on Human Shapes with Application to Bayesian Image Segmentation and Gait RecognitionUnknown Date (has links)
In this dissertation we develop probability models for human shapes and apply those probability models to the problems of image segmentation and human identi_cation by gait recognition. To build probability models on human shapes, we consider human shape to be realizations of random variables on a space of simple closed curves and a space of elastic curves. Both of these spaces are quotient spaces of in_nite dimensional manifolds. Our probability models arise through Tangent Principal Component Analysis, a method of studying probability models on manifolds by projecting them onto a tangent plane to the manifold. Since we put the tangent plane at the Karcher mean of sample shapes, we begin our study by examining statistical properties of Karcher means on manifolds. We derive theoretical results for the location of Karcher means on certain manifolds, and perform a simulation study of properties of Karcher means on our shape space. Turning to the speci_c problem of distributions on human shapes we examine alternatives for probability models and _nd that kernel density estimators perform well. We use this model to sample shapes and to perform shape testing. The _rst application we consider is human detection in infrared images. We pursue this application using Bayesian image segmentation, in which our proposed human in an image is a maximum likelihood estimate, obtained using a prior distribution on human shapes and a likelihood arising from a divergence measure on the pixels in the image. We then consider human identi_cation by gait recognition. We examine human gait as a cyclo-stationary process on the space of elastic curves and develop a metric on processes based on the geodesic distance between sequences on that space. We develop and demonstrate a framework for gait recognition based on this metric, which includes the following elements: automatic detection of gait cycles, interpolation to register gait cycles, computation of a mean gait cycle, and identi_cation by matching a test cycle to the nearest member of a training set. We perform the matching both by an exhaustive search of the training set and through an expedited method using cluster-based trees and boosting. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester, 2005. / September 14, 2005. / Gait Recognition, Statistical Shape Analysis, Image Segmentation / Includes bibliographical references. / Anuj Srivastava, Professor Directing Dissertation; Washington Mio, Outside Committee Member; Eric Chicken, Committee Member; Marten Wegkamp, Committee Member.
|
Page generated in 0.0355 seconds