Global ETD Search

41	Nonparametric Change Point Detection Methods for Profile Variability Unknown Date (has links) Due to the importance of seeing profile change in devices such as of medical apparatus, measuring the change point in variability of a different functions is important. In a sequence of functional observations (each of the same length), we wish to determine as quickly as possible when a change in the observations has occurred. Wavelet-based change point methods are proposed that determine when the variability of the noise in a sequence of functional profiles (i.e. the precision profile of medical devices) has occurred; goes out of control from a known, fixed value, or an estimated in-control value. Various methods have been proposed which focus on changes in the form of the function. One method, the NEWMA, based on EWMA, focuses on changes in both. However, the drawback is that the form of the in-control function is known. Others methods, including the χ² for Phase I & Phase II make some assumption about the function. Our interest, however, is in detecting changes in the variance from one function to the next. In particular, we are interested not on differences from one profile to another (variance between), rather differences in variance (variance within). The functional portion of the profiles is allowed to come from a large class of functions and may vary from profile to profile. The estimator is evaluated on a variety of conditions, including allowing the wavelet noise subspace to be substantially contaminated by the profile's functional structure, and is compared to two competing noise monitoring methods. Nikoo and Noorossana (2013) propose a nonparametric wavelet regression method that uses both change point techniques to monitor the variance: a Nonparametric Control Charts, via the mean of m median control charts, and a Parametric Control Charts, via χ²distribution. We propose improvements to their method by incorporating prior data and making use of likelihood ratios. Our methods make use of the orthogonal properties of wavelet projections to accurately and efficiently monitor the level of noise from one profile to the next; detect changes in noise in Phase II setting. We show through simulation results that our proposed methods have better power and are more robust against the confounding effect between variance estimation and function estimation. The proposed methods are shown to be very efficient at detecting when the variability has changed through an extensive simulation study. Extensions are considered that explore the usage of windowing and estimated in-control values for the MAD method; and the effect of the exact distribution under normality rather than the asymptotic distribution. These developments are implemented in the parametric, nonparametric scale, and complete nonparameric settings. The proposed methodologies are tested through simulation and applicable to various biometric and health related topics; and have the potential to improve in computational efficiency and in reducing the number of assumptions required. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2017. / April 25, 2017. / Change Point, Noise Profiling, Nonparametric, Profile Variability, Statistics Process Control, Wavelets / Includes bibliographical references. / Eric Chicken, Professor Directing Dissertation; Guosheng Liu, University Representative; Debajyoti Sinha, Committee Member; Xin Zhang, Committee Member. Read more Statistics
42	Scalable and Structured High Dimensional Covariance Matrix Estimation Unknown Date (has links) With rapid advances in data acquisition and storage techniques, modern scientific investigations in epidemiology, genomics, imaging and networks are increasingly producing challenging data structures in the form of high-dimensional vectors, matrices and multiway arrays (tensors) rendering traditional statistical and computational tools inappropriate. One hope for meaningful inferences in such situations is to discover an inherent lower-dimensional structure that explains the physical or biological process generating the data. The structural assumptions impose constraints that force the objects of interest to lie in lower-dimensional spaces, thereby facilitating their estimation and interpretation and, at the same time reducing computational burden. The assumption of an inherent structure, motivated by various scientific applications, is often adopted as the guiding light in the analysis and is fast becoming a standard tool for parsimonious modeling of such high dimensional data structures. The content of this thesis is specifically directed towards methodological development of statistical tools, with attractive computational properties, for drawing meaningful inferences though such structures. The third chapter of this thesis proposes a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of high-dimensional covariance matrix estimation to multiple cores, solves each subproblem separately via a latent factor model, and then combines these estimates to produce a global estimate of the covariance matrix. Existing divide and conquer methods focus exclusively on dividing the total number of observations n into subsamples while keeping the dimension p fixed. The approach is novel in this regard: it includes all of the n samples in each subproblem and, instead, splits the dimension p into smaller subsets for each subproblem. The subproblems themselves can be challenging to solve when p is large due to the dependencies across dimensions. To circumvent this issue, a novel hierarchical structure is specified on the latent factors that allows for flexible dependencies across dimensions, while still maintaining computational efficiency. Our approach is readily parallelizable and is shown to have computational efficiency of several orders of magnitude in comparison to fitting a full factor model. The fourth chapter of this thesis proposes a novel way of estimating a covariance matrix that can be represented as a sum of a low-rank matrix and a diagonal matrix. The proposed method compresses high-dimensional data, computes the sample covariance in the compressed space, and lifts it back to the ambient space via a decompression operation. A salient feature of our approach relative to existing literature on combining sparsity and low-rank structures in covariance matrix estimation is that we do not require the low-rank component to be sparse. A principled framework for estimating the compressed dimension using Stein's Unbiased Risk Estimation theory is demonstrated. In the final chapter of this thesis, we tackle the problem of variable selection in high dimensions. Consistent model selection in high dimensions has received substantial interest in recent years and is an extremely challenging problem for Bayesians. The literature on model selection with continuous shrinkage priors is even less-developed due to the unavailability of exact zeros in the posterior samples of parameter of interest. Heuristic methods based on thresholding the posterior mean are often used in practice which lack theoretical justification, and inference is highly sensitive to the choice of the threshold. We aim to address the problem of selecting variables through a novel method of post processing the posterior samples. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2017. / May 16, 2017. / Bayesian, Compressed Sensing, Covariance Matrix, Divide and Conquer, Factor models, Low-Rank / Includes bibliographical references. / Debdeep Pati, Professor Directing Dissertation; Alec Kercheval, University Representative; Debajyoti Sinha, Committee Member; Eric Chicken, Committee Member. Read more Statistics
43	An Examination of the Relationship between Alcohol and Dementia in a Longitudinal Study Unknown Date (has links) The high mortality rate and huge expenditure caused by dementia makes it a pressing concern for public health researchers. Among the potential risk factors in diet and nutrition, the relation between alcohol usage and dementia has been investigated in many studies, but no clear picture has emerged. This association has been reported as protective, neurotoxic, U-shaped curve, and insignificant in different sources. An individual’s alcohol usage is dynamic and could change over time, however, to our knowledge, only one study took this time-varying nature into account when assessing the association between alcohol intake and cognition. Using Framingham Heart Study (FHS) data, our work fills an important gap in that both alcohol use and dementia status were included into the analysis longitudinally. Furthermore, we incorporated a gender-specific categorization of alcohol consumption. In this study, we examined three aspects of the association: (1) Concurrent alcohol usage and dementia, longitudinally, (2) Past alcohol usage and later dementia, (3) Cumulative alcohol usage and dementia. The data consisted of 2,192 FHS participants who took Exams 17-23 during 1981-1996, which included dementia assessment, and had complete data on alcohol use (mean follow-up = 40 years) and key covariates. Cognitive status was determined using information from the Mini-Mental State Examinations (MMSE) and the examiner’s assessment. Alcohol consumption was determined in oz/week and also categorized as none, moderate and heavy. We investigated both total alcohol consumption and consumption by type of alcoholic beverage. Results showed that the association between alcohol and dementia may differ by gender and by alcoholic type. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2018. / May 7, 2018. / Includes bibliographical references. / Dan McGee, Professor Co-Directing Dissertation; Elizabeth H. Slate, Professor Co-Directing Dissertation; Myra M. Hurt, University Representative; Xufeng Niu, Committee Member. Read more Statistics
44	A Study of Some Issues of Goodness-of-Fit Tests for Logistic Regression Unknown Date (has links) Goodness-of-fit tests are important to assess how well a model fits a set of observations. Hosmer-Lemeshow (HL) test is a popular and commonly used method to assess the goodness-of-fit for logistic regression. However, there are two issues for using HL test. One of them is that we have to specify the number of partition groups and the different groups often suggest the different decisions. So in this study, we propose several grouping tests to combine multiple HL tests with varying the number of groups to make the decision instead of just using one arbitrary group or finding the optimum group. This is due to the reason that the best selection for the groups is data-dependent and it is not easy to find. The other drawback of HL test is that it is not powerful to detect the violation of missing interactions between continuous and dichotomous covariates. Therefore, we propose global and interaction tests in order to capture such violations. Simulation studies are carried out to assess the Type I errors and powers for all the proposed tests. These tests are illustrated by the bone mineral density data from NHANES III. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2018. / July 17, 2018. / Includes bibliographical references. / Dan McGee, Professor Co-Directing Dissertation; Qing Mai, Professor Co-Directing Dissertation; Cathy Levenson, University Representative; Xufeng Niu, Committee Member. Read more Statistics
45	Elastic Functional Regression Model Unknown Date (has links) Functional variables serve important roles as predictors in a variety of pattern recognition and vision applications. Focusing on a specific subproblem, termed scalar-on-function regression, most current approaches adopt the standard L2 inner product to form a link between functional predictors and scalar responses. These methods may perform poorly when predictor functions contain nuisance phase variability, i.e., predictors are temporally misaligned due to noise. While a simple solution could be to pre-align predictors as a pre-processing step, before applying a regression model, this alignment is seldom optimal from the perspective of regression. In this dissertation, we propose a new approach, termed elastic functional regression, where alignment is included in the regression model itself, and is performed in conjunction with the estimation of other model parameters. This model is based on a norm-preserving warping of predictors, not the standard time warping of functions, and provides better prediction in situations where the shape or the amplitude of the predictor is more useful than its phase. We demonstrate the effectiveness of this framework using simulated and real data. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 17, 2018. / Functional Data Analysis, Functional Regression Model, Phase Variation, Scalar-on-Function Regression / Includes bibliographical references. / Anuj Srivastava, Professor Directing Thesis; Eric Klassen, University Representative; Wei Wu, Committee Member; Fred Huffer, Committee Member. Read more Statistics
46	Elastic Functional Principal Component Analysis for Modeling and Testing of Functional Data Unknown Date (has links) Statistical analysis of functional data requires tools for comparing, summarizing and modeling observed functions as elements of a function space. A key issue in Functional Data Analysis (FDA) is the presence of the phase variability in the observed data. A successful statistical model of functional data has to account for the presence of phase variability. Otherwise the ensuing inferences can be inferior. Recent methods for FDA include steps for phase separation or functional alignment. For example, Elastic Functional Principal Component Analysis (Elastic FPCA) uses the strengths of Functional Principal Component Analysis (FPCA), along with the tools from Elastic FDA, to perform joint phase-amplitude separation and modeling. A related problem in FDA is to quantify and test for the amount of phase in a given data. We develop two types of hypothesis tests for testing the significance of phase variability: a metric-based approach and a model-based approach. The metric-based approach treats phase and amplitude as independent components and uses their respective metrics to apply the Friedman-Rafsky Test, Schilling's Nearest Neighbors, and Energy Test to test the differences between functions and their amplitudes. In the model-based test, we use Concordance Correlation Coefficients as a tool to quantify the agreement between functions and their reconstructions using FPCA and Elastic FPCA. We demonstrate this framework using a number of simulated and real data, including weather, tecator, and growth data. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 19, 2018. / Includes bibliographical references. / Anuj Srivastava, Professor Directing Thesis; Eric Klassen, University Representative; Fred Huffer, Committee Member; Wei Wu, Committee Member. Read more Statistics
47	Building a Model Performance Measure for Examining Clinical Relevance Using Net Benefit Curves Unknown Date (has links) ROC curves are often used to evaluate predictive accuracy of statistical prediction models. This thesis studies other measures which not only incorporate the statistical but also the clinical consequences of using a particular prediction model. Depending on the disease and population under study, the mis-classification costs of false positives and false negatives vary. The concept of Decision Curve Analysis (DCA) takes this cost into account, by using the threshold probability (the probability above which a patient opts for treatment). Using the DCA technique, a Net Benefit Curve is built by plotting "Net Benefit", a function of the expected benefit and expected harm of using a model, by the threshold probability. Only the threshold probability range that is relevant to the disease and the population under study is used to plot the net benefit curve to obtain the optimum results using a particular statistical model. This thesis concentrates on the process of construction of a summary measure to find which predictive model yields highest net benefit. The most intuitive approach is to calculate the area under the net benefit curve. We examined whether the use of weights such as, the estimated empirical distribution of the threshold probability to compute the weighted area under the curve, creates a better summary measure. Real data from multiple cardiovascular research studies- The Diverse Population Collaboration (DPC) datasets, is used to compute the summary measures: area under the ROC curve (AUROC), area under the net benefit curve (ANBC) and weighted area under the net benefit curve (WANBC). The results from the analysis are used to compare these measures to examine whether these measures are in agreement with each other and which would be the best to use in specified clinical scenarios. For different models the summary measures and its standard errors (SE) were calculated to study the variability in the measure. The method of meta-analysis is used to summarize these estimated summary measures to reveal if there is significant variability among these studies. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 11, 2018. / Area under ROC Curve, Meta analysis, Net Benefit Curve, Predictive Accuracy, Summary Measure, Threshold Probability / Includes bibliographical references. / Daniel L. McGee, Professor Directing Dissertation; Myra Hurt, University Representative; Elizabeth Slate, Committee Member; Debajyoti Sinha, Committee Member. Read more Statistics
48	Non-Parametric and Semi-Parametric Estimation and Inference with Applications to Finance and Bioinformatics Unknown Date (has links) In this dissertation, we develop tools from non-parametric and semi-parametric statistics to perform estimation and inference. In the first chapter, we propose a new method called Non-Parametric Outlier Identification and Smoothing (NOIS), which robustly smooths stock prices, automatically detects outliers and constructs pointwise confidence bands around the resulting curves. In real- world examples of high-frequency data, NOIS successfully detects erroneous prices as outliers and uncovers borderline cases for further study. NOIS can also highlight notable features and reveal new insights in inter-day chart patterns. In the second chapter, we focus on a method for non-parametric inference called empirical likelihood (EL). Computation of EL in the case of a fixed parameter vector is a convex optimization problem easily solved by Lagrange multipliers. In the case of a composite empirical likelihood (CEL) test where certain components of the parameter vector are free to vary, the optimization problem becomes non-convex and much more difficult. We propose a new algorithm for the CEL problem named the BI-Linear Algorithm for Composite EmPirical Likelihood (BICEP). We extend the BICEP framework by introducing a new method called Robust Empirical Likelihood (REL) that detects outliers and greatly improves the inference in comparison to the non-robust EL. The REL method is combined with CEL by the TRI-Linear Algorithm for Composite EmPirical Likelihood (TRICEP). We demonstrate the efficacy of the proposed methods on simulated and real world datasets. We present a novel semi-parametric method for variable selection with interesting biological applications in the final chapter. In bioinformatics datasets the experimental units often have structured relationships that are non-linear and hierarchical. For example, in microbiome data the individual taxonomic units are connected to each other through a phylogenetic tree. Conventional techniques for selecting relevant taxa either do not account for the pairwise dependencies between taxa, or assume linear relationships. In this work we propose a new framework for variable selection called Semi-Parametric Affinity Based Selection (SPAS), which has the flexibility to utilize struc- tured and non-parametric relationships between variables. In synthetic data experiments SPAS outperforms existing methods and on real world microbiome datasets it selects taxa according to their phylogenetic similarities. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 19, 2018. / Bioinformatics, Empirical likelihood, Finance, Non-parametric, Outlier detection, Variable selection / Includes bibliographical references. / Yiyuan She, Professor Directing Dissertation; Giray Okten, University Representative; Eric Chicken, Committee Member; Xufeng Niu, Committee Member; Minjing Tao, Committee Member. Read more Statistics
49	Generalized Mahalanobis Depth in Point Process and Its Application in Neural Coding and Semi-Supervised Learning in Bioinformatics Unknown Date (has links) In the first project, we propose to generalize the notion of depth in temporal point process observations. The new depth is defined as a weighted product of two probability terms: 1) the number of events in each process, and 2) the center-outward ranking on the event times conditioned on the number of events. In this study, we adopt the Poisson distribution for the first term and the Mahalanobis depth for the second term. We propose an efficient bootstrapping approach to estimate parameters in the defined depth. In the case of Poisson process, the observed events are order statistics where the parameters can be estimated robustly with respect to sample size. We demonstrate the use of the new depth by ranking realizations from a Poisson process. We also test the new method in classification problems using simulations as well as real neural spike train data. It is found that the new framework provides more accurate and robust classifications as compared to commonly used likelihood methods. In the second project, we demonstrate the value of semi-supervised dimension reduction in clinical area. The advantage of semi-supervised dimension reduction is very easy to understand. Semi-Supervised dimension reduction method adopts the unlabeled data information to perform dimension reduction and it can be applied to help build a more precise prediction model comparing with common supervised dimension reduction techniques. After thoroughly comparing with dimension embedding methods with label data only, we show the improvement of semi-supervised dimension reduction with unlabeled data in breast cancer chemotherapy clinical area. In our semi-supervised dimension reduction method, we not only explore adding unlabeled data to linear dimension reduction such as PCA, we also explore semi-supervised non-linear dimension reduction, such as semi-supervised LLE and semi-supervised Isomap. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / March 21, 2018. / depth, point process, semi-supervised learning / Includes bibliographical references. / Wei Wu, Professor Directing Dissertation; Xiaoqiang Wang, University Representative; Jinfeng Zhang, Committee Member; Qing Mai, Committee Member. Read more Statistics
50	Wavelet-Based Bayesian Approaches to Sequential Profile Monitoring Unknown Date (has links) We consider change-point detection and estimation in sequences of functional observations. This setting often arises when the quality of a process is characterized by such observations, termed profiles, and monitoring profiles for changes in structure can be used to ensure the stability of the process over time. While interest in profile monitoring has grown, few methods approach the problem from a Bayesian perspective. In this dissertation, we propose three wavelet-based Bayesian approaches to profile monitoring -- the last of which can be extended to a general process monitoring setting. First, we develop a general framework for the problem of interest in which we base inference on the posterior distribution of the change point without placing restrictive assumptions on the form of profiles. The proposed method uses an analytic form of the posterior distribution in order to run online without relying on Markov chain Monte Carlo (MCMC) simulation. Wavelets, an effective tool for estimating nonlinear signals from noise-contaminated observations, enable the method to flexibly distinguish between sustained changes in profiles and the inherent variability of the process. Second, we modify the initial framework in a posterior approximation algorithm designed to utilize past information in a computationally efficient manner. We show that the approximation can detect changes of smaller magnitude better than traditional alternatives for curbing computational cost. Third, we introduce a monitoring scheme that allows an unchanged process to run infinitely long without a false alarm; the scheme maintains the ability to detect a change with probability one. We include theoretical results regarding these properties and illustrate the implementation of the scheme in the previously established framework. We demonstrate the efficacy of proposed methods on simulated data and significantly outperform a relevant frequentist competitor. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 20, 2018. / Includes bibliographical references. / Eric Chicken, Professor Co-Directing Dissertation; Antonio Linero, Professor Co-Directing Dissertation; Kevin Huffenberger, University Representative; Yun Yang, Committee Member. Read more Statistics

Search results