Global ETD Search

41	Multilevel logistic regression: An illustration examining high school drop-outs Unknown Date (has links) The purpose of this study was to provide an exposition on multilevel logistic regression. Included in the study was a description of a two-level logistic-model and a comparison between the two-level logistic model and traditional logistic regression. A strategy for building and analyzing multilevel logistic models was devised and offered in the study, and the modeling strategy was illustrated in the analysis of high school drop-outs. The illustration demonstrated the flexibility and utility of the multilevel model in analyzing the impact of school characteristics on students' decisions to leave school. / Source: Dissertation Abstracts International, Volume: 54-02, Section: B, page: 0921. / Major Professor: F. Craig Johnson. / Thesis (Ph.D.)--The Florida State University, 1993. Statistics
42	Generalized Pearson-Fisher chi-square goodness of fit tests, with applications to models with life history data Unknown Date (has links) Suppose that $X\sb1,\...,X\sb{n}$ are i.i.d. $\sim$ F, and we wish to test the null hypothesis that F is a member of the parametric family ${\cal F}=\{F\sb\theta(x);$ $\theta\in\Theta\}$ where $\Theta\subset\IR\sp{q}.$ The classical Pearson-Fisher chi-square test involves partitioning the real axis into k cells $I\sb1,\...,I\sb{k}$ and forming the chi-square statistic $X\sp2=\Sigma\sbsp{i=1}{k}$ $(O\sb{i} - nF\sb{\\theta}(I\sb{i}))\sp2/nF\sb{\\theta}(I\sb{i}),$ where $O\sb{i}$ is the number of observations falling into cell i and $\\theta$ is the value of $\theta$ minimizing $\Sigma\sbsp{i=1}{k}$ $(O\sb{i} - nF\sb\theta(I\sb{i}))\sp2/nF\sb\theta(I\sb{i}).$ We obtain a generalization of this test to any situation for which there is available a nonparametric estimator F of F for which $n\sp{1\over2}(\{F} - F){d\atop\to}W$ where W is a continuous zero mean Gaussian process satisfying a mild regularity condition. We allow the cells to be data dependent. Essentially, we estimate $\theta$ by the value $\\theta$ that minimizes a "distance" between the vectors $(\{F}(I\sb1),\...,\{F}(I\sb{k}))$ and $(F\sb\theta(I\sb1),\...,F\sb\theta(I\sb{k})),$ where distance is measured through an arbitrary positive definite quadratic form, and then form a chi-square type test statistic based on the difference between $(\{F}(I\sb1),\...,\{F}(I\sb{k}))$ and $(F\sb{\\theta}(I\sb1),\...,F\sb{\\theta}(I\sb{k})).$ We prove that this test statistic has asymptotically a chi-square distribution with $k-q-1$ degrees of freedom, and point out some errors in the literature on chi-square tests in survival analysis. Our procedure is very general and applies to a number of well-known models in survival analysis, such as right censoring and left truncation. We apply our method to deal with questions of model selection in the problem of estimating the distribution of the length of the incubation / period of the AIDS virus using the CDC's data on blood-transfusion related AIDS. Our analysis suggests some models that seem to fit better than those used in the literature. / Source: Dissertation Abstracts International, Volume: 53-07, Section: B, page: 3576. / Major Professor: Hani Doss. / Thesis (Ph.D.)--The Florida State University, 1992. Statistics
43	Individual Patient-Level Data Meta-Analysis: A Comparison of Methods for the Diverse Populations Collaboration Data Set Unknown Date (has links) DerSimonian and Laird define meta-analysis as "the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical meta-analytic approaches in known as Individual Patient-Level Data, or IPD, meta-analysis. Rather than depending on summary statistics calculated for individual studies, IPD meta-analysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the meta-analytic framework are investigated. A two-stage analysis is first conducted, in which individual models are fit for each study and summarized using classical meta-analysis procedures. Secondly, a one-stage approach that singularly models the data and summarizes the information across studies is investigated. Data from the Diverse Populations Collaboration data set are used to investigate the differences between these two methods in a specific example. The bootstrap procedure is used to determine if the two methods produce statistically different results in the DPC example. Finally, a simulation study is conducted to investigate the accuracy of each method in given scenarios. / A Dissertation submitted to the Department of Statistics in partial fulﬁllment of the requirements for the degree of Ph.D.. / Degree Awarded: Spring Semester, 2011. / Date of Defense: December 2, 2010. / Individual Patient-level Data, IPD meta-analysis, Meta-analysis / Includes bibliographical references. / Daniel McGee, Professor Directing Dissertation; Betsy Becker, University Representative; Xufeng Niu, Committee Member; Jinfeng Zhang, Committee Member. Statistics
44	Minimax Tests for Nonparametric Alternatives with Applications to High Frequency Data Unknown Date (has links) We present a general methodology for developing an asymptotically distribution-free, asymptotic minimax tests. The tests are constructed via a nonparametric density-quantile function and the limiting distribution is derived by a martingale approach. The procedure can be viewed as a novel parametric extension of the classical parametric likelihood ratio test. The proposed tests are shown to be omnibus within an extremely large class of nonparametric global alternatives characterized by simple conditions. Furthermore, we establish that the proposed tests provide better minimax distinguishability. The tests have much greater power for detecting high-frequency nonparametric alternatives than the existing classical tests such as Kolmogorov-Smirnov and Cramer-von Mises tests. The good performance of the proposed tests is demonstrated by Monte Carlo simulations and applications in High Energy Physics. / A Dissertation submitted to the Department of Statistics in partial fulﬁllment of the requirements for the degree of Doctor of Philosophy. / Degree Awarded: Summer Semester, 2006. / Date of Defense: April 24, 2006. / Nonparametric Alternatives, Nonparametric Likelihood Ratio, Minimaxity, Kullback-Leibler / Includes bibliographical references. / Kai-Sheng Song, Professor Directing Dissertation; Jack Quine Professor, Outside Committee Member; Fred Huﬀer Professor, Committee Member; Dan McGee Professor, Committee Member. Statistics
45	Testing for the Equality of Two Distributions on High Dimensional Object Spaces and Nonparametric Inference for Location Parameters Unknown Date (has links) Our view is that while some of the basic principles of data analysis are going to remain unchanged, others are to be gradually replaced with Geometry and Topology methods. Linear methods are still making sense for functional data analysis, or in the context of tangent bundles of object spaces. Complex nonstandard data is represented on object spaces. An object space admitting a manifold stratification may be embedded in an Euclidean space. One defines the extrinsic energy distance associated with two probability measures on an arbitrary object space embedded in a numerical space, and one introduces an extrinsic energy statistic to test for homogeneity of distributions of two random objects (r.o.'s) on such an object space. This test is validated via a simulation example on the Kendall space of planar k-ads with a Veronese-Whitney (VW) embedding. One considers an application to medical imaging, to test for the homogeneity of the distributions of Kendall shapes of the midsections of the Corpus Callosum in a clinically normal population vs a population of ADHD diagnosed individuals. Surprisingly, due to the high dimensionality, these distributions are not significantly different, although they are known to have highly significant VW-means. New spread and location parameters are to be added to reflect the nontrivial topology of certain object spaces. TDA is going to be adapted to object spaces, and hypothesis testing for distributions is going to be based on extrinsic energy methods. For a random point on an object space embedded in an Euclidean space, the mean vector cannot be represented as a point on that space, except for the case when the embedded space is convex. To address this misgiving, since the mean vector is the minimizer of the expected square distance, following Frechet (1948), on an embedded compact object space, one may consider both minimizers and maximizers of the expected square distance to a given point on the embedded object space as mean, respectively anti-mean of the random point. Of all distances on an object space, one considers here the chord distance associated with the embedding of the object space, since for such distances one can give a necessary and sufficient condition for the existence of a unique Frechet mean (respectively Frechet anti-mean). For such distributions these location parameters are called extrinsic mean (respectively extrinsic anti-mean), and the corresponding sample statistics are consistent estimators of their population counterparts. Moreover around the extrinsic mean ( anti-mean ) located at a smooth point, one derives the limit distribution of such estimators. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2017. / June 14, 2017. / Includes bibliographical references. / Vic Patrangenaru, Professor Directing Dissertation; Washington Mio, University Representative; Adrian Barbu, Committee Member; Jonathan Bradley, Committee Member. Statistics
46	Regression Methods for Skewed and Heteroscedastic Response with High-Dimensional Covariates Unknown Date (has links) The rise of studies with high-dimensional potential covariates has invited a renewed interest in dimension reduction that promotes more parsimonious models, ease of interpretation and computational tractability. However, current variable selection methods restricted to continuous response often assume Gaussian response for methodological as well as theoretical developments. In this thesis, we consider regression models that induce sparsity, gain prediction power, and accommodates response distributions beyond Gaussian with common variance. The first part of this thesis is a transform-both-side Bayesian variable selection model (TBS) which allows skewness, heteroscedasticity and extreme heavy tailed responses. Our method develops a framework which facilitates computationally feasible inference in spite of inducing non-local priors on the original regression coefficients. Even if the transformed conditional mean is no longer linear with respect to covariates, we still prove the consistency of our Bayesian TBS estimators. Simulation studies and real data analysis demonstrate the advantages of our methods. Another main part of this thesis deals the above challenges from a frequentist standpoint. This model incorporates a penalized likelihood to accommodate skewed response, arising from an epsilon-skew-normal (ESN) distribution. With suitable optimization techniques to handle this two-piece penalized likelihood, our method demonstrates substantial gains in sensitivity and specificity even under high-dimensional settings. We conclude this thesis with a novel Bayesian semi-parametric modal regression method along with its implementation and simulation studies. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2017. / June 9, 2017. / Includes bibliographical references. / Debajyoti Sinha, Professor Directing Dissertation; Miles Taylor, University Representative; Debdeep Pati, Committee Member; Yiyuan She, Committee Member; Yun Yang, Committee Member. Statistics
47	Nonparametric Change Point Detection Methods for Profile Variability Unknown Date (has links) Due to the importance of seeing profile change in devices such as of medical apparatus, measuring the change point in variability of a different functions is important. In a sequence of functional observations (each of the same length), we wish to determine as quickly as possible when a change in the observations has occurred. Wavelet-based change point methods are proposed that determine when the variability of the noise in a sequence of functional profiles (i.e. the precision profile of medical devices) has occurred; goes out of control from a known, fixed value, or an estimated in-control value. Various methods have been proposed which focus on changes in the form of the function. One method, the NEWMA, based on EWMA, focuses on changes in both. However, the drawback is that the form of the in-control function is known. Others methods, including the χ² for Phase I & Phase II make some assumption about the function. Our interest, however, is in detecting changes in the variance from one function to the next. In particular, we are interested not on differences from one profile to another (variance between), rather differences in variance (variance within). The functional portion of the profiles is allowed to come from a large class of functions and may vary from profile to profile. The estimator is evaluated on a variety of conditions, including allowing the wavelet noise subspace to be substantially contaminated by the profile's functional structure, and is compared to two competing noise monitoring methods. Nikoo and Noorossana (2013) propose a nonparametric wavelet regression method that uses both change point techniques to monitor the variance: a Nonparametric Control Charts, via the mean of m median control charts, and a Parametric Control Charts, via χ²distribution. We propose improvements to their method by incorporating prior data and making use of likelihood ratios. Our methods make use of the orthogonal properties of wavelet projections to accurately and efficiently monitor the level of noise from one profile to the next; detect changes in noise in Phase II setting. We show through simulation results that our proposed methods have better power and are more robust against the confounding effect between variance estimation and function estimation. The proposed methods are shown to be very efficient at detecting when the variability has changed through an extensive simulation study. Extensions are considered that explore the usage of windowing and estimated in-control values for the MAD method; and the effect of the exact distribution under normality rather than the asymptotic distribution. These developments are implemented in the parametric, nonparametric scale, and complete nonparameric settings. The proposed methodologies are tested through simulation and applicable to various biometric and health related topics; and have the potential to improve in computational efficiency and in reducing the number of assumptions required. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2017. / April 25, 2017. / Change Point, Noise Profiling, Nonparametric, Profile Variability, Statistics Process Control, Wavelets / Includes bibliographical references. / Eric Chicken, Professor Directing Dissertation; Guosheng Liu, University Representative; Debajyoti Sinha, Committee Member; Xin Zhang, Committee Member. Statistics
48	Scalable and Structured High Dimensional Covariance Matrix Estimation Unknown Date (has links) With rapid advances in data acquisition and storage techniques, modern scientific investigations in epidemiology, genomics, imaging and networks are increasingly producing challenging data structures in the form of high-dimensional vectors, matrices and multiway arrays (tensors) rendering traditional statistical and computational tools inappropriate. One hope for meaningful inferences in such situations is to discover an inherent lower-dimensional structure that explains the physical or biological process generating the data. The structural assumptions impose constraints that force the objects of interest to lie in lower-dimensional spaces, thereby facilitating their estimation and interpretation and, at the same time reducing computational burden. The assumption of an inherent structure, motivated by various scientific applications, is often adopted as the guiding light in the analysis and is fast becoming a standard tool for parsimonious modeling of such high dimensional data structures. The content of this thesis is specifically directed towards methodological development of statistical tools, with attractive computational properties, for drawing meaningful inferences though such structures. The third chapter of this thesis proposes a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of high-dimensional covariance matrix estimation to multiple cores, solves each subproblem separately via a latent factor model, and then combines these estimates to produce a global estimate of the covariance matrix. Existing divide and conquer methods focus exclusively on dividing the total number of observations n into subsamples while keeping the dimension p fixed. The approach is novel in this regard: it includes all of the n samples in each subproblem and, instead, splits the dimension p into smaller subsets for each subproblem. The subproblems themselves can be challenging to solve when p is large due to the dependencies across dimensions. To circumvent this issue, a novel hierarchical structure is specified on the latent factors that allows for flexible dependencies across dimensions, while still maintaining computational efficiency. Our approach is readily parallelizable and is shown to have computational efficiency of several orders of magnitude in comparison to fitting a full factor model. The fourth chapter of this thesis proposes a novel way of estimating a covariance matrix that can be represented as a sum of a low-rank matrix and a diagonal matrix. The proposed method compresses high-dimensional data, computes the sample covariance in the compressed space, and lifts it back to the ambient space via a decompression operation. A salient feature of our approach relative to existing literature on combining sparsity and low-rank structures in covariance matrix estimation is that we do not require the low-rank component to be sparse. A principled framework for estimating the compressed dimension using Stein's Unbiased Risk Estimation theory is demonstrated. In the final chapter of this thesis, we tackle the problem of variable selection in high dimensions. Consistent model selection in high dimensions has received substantial interest in recent years and is an extremely challenging problem for Bayesians. The literature on model selection with continuous shrinkage priors is even less-developed due to the unavailability of exact zeros in the posterior samples of parameter of interest. Heuristic methods based on thresholding the posterior mean are often used in practice which lack theoretical justification, and inference is highly sensitive to the choice of the threshold. We aim to address the problem of selecting variables through a novel method of post processing the posterior samples. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2017. / May 16, 2017. / Bayesian, Compressed Sensing, Covariance Matrix, Divide and Conquer, Factor models, Low-Rank / Includes bibliographical references. / Debdeep Pati, Professor Directing Dissertation; Alec Kercheval, University Representative; Debajyoti Sinha, Committee Member; Eric Chicken, Committee Member. Statistics
49	An Examination of the Relationship between Alcohol and Dementia in a Longitudinal Study Unknown Date (has links) The high mortality rate and huge expenditure caused by dementia makes it a pressing concern for public health researchers. Among the potential risk factors in diet and nutrition, the relation between alcohol usage and dementia has been investigated in many studies, but no clear picture has emerged. This association has been reported as protective, neurotoxic, U-shaped curve, and insignificant in different sources. An individual’s alcohol usage is dynamic and could change over time, however, to our knowledge, only one study took this time-varying nature into account when assessing the association between alcohol intake and cognition. Using Framingham Heart Study (FHS) data, our work fills an important gap in that both alcohol use and dementia status were included into the analysis longitudinally. Furthermore, we incorporated a gender-specific categorization of alcohol consumption. In this study, we examined three aspects of the association: (1) Concurrent alcohol usage and dementia, longitudinally, (2) Past alcohol usage and later dementia, (3) Cumulative alcohol usage and dementia. The data consisted of 2,192 FHS participants who took Exams 17-23 during 1981-1996, which included dementia assessment, and had complete data on alcohol use (mean follow-up = 40 years) and key covariates. Cognitive status was determined using information from the Mini-Mental State Examinations (MMSE) and the examiner’s assessment. Alcohol consumption was determined in oz/week and also categorized as none, moderate and heavy. We investigated both total alcohol consumption and consumption by type of alcoholic beverage. Results showed that the association between alcohol and dementia may differ by gender and by alcoholic type. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2018. / May 7, 2018. / Includes bibliographical references. / Dan McGee, Professor Co-Directing Dissertation; Elizabeth H. Slate, Professor Co-Directing Dissertation; Myra M. Hurt, University Representative; Xufeng Niu, Committee Member. Statistics
50	A Study of Some Issues of Goodness-of-Fit Tests for Logistic Regression Unknown Date (has links) Goodness-of-fit tests are important to assess how well a model fits a set of observations. Hosmer-Lemeshow (HL) test is a popular and commonly used method to assess the goodness-of-fit for logistic regression. However, there are two issues for using HL test. One of them is that we have to specify the number of partition groups and the different groups often suggest the different decisions. So in this study, we propose several grouping tests to combine multiple HL tests with varying the number of groups to make the decision instead of just using one arbitrary group or finding the optimum group. This is due to the reason that the best selection for the groups is data-dependent and it is not easy to find. The other drawback of HL test is that it is not powerful to detect the violation of missing interactions between continuous and dichotomous covariates. Therefore, we propose global and interaction tests in order to capture such violations. Simulation studies are carried out to assess the Type I errors and powers for all the proposed tests. These tests are illustrated by the bone mineral density data from NHANES III. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2018. / July 17, 2018. / Includes bibliographical references. / Dan McGee, Professor Co-Directing Dissertation; Qing Mai, Professor Co-Directing Dissertation; Cathy Levenson, University Representative; Xufeng Niu, Committee Member. Statistics

Search results