Global ETD Search

11	MONITORING AUTOCORRELATED PROCESSES Tang, Weiping 10 1900 (has links) <p>This thesis is submitted by Weiping Tang on August 2, 2011.</p> / <p>Several control schemes for monitoring process mean shifts, including cumulative sum (CUSUM), weighted cumulative sum (WCUSUM), adaptive cumulative sum (ACUSUM) and exponentially weighted moving average (EWMA) control schemes, display high performance in detecting constant process mean shifts. However, a variety of dynamic mean shifts frequently occur and few control schemes can efficiently work in these situations due to the limited window for catching shifts, particularly when the mean decreases rapidly. This is precisely the case when one uses the residuals from autocorrelated data to monitor the process mean, a feature often referred to as forecast recovery. This thesis focuses on detecting a shift in the mean of a time series when a forecast recovery dynamic pattern in the mean of the residuals is observed. Specifically, we examine in detail several particular cases of the Autoregressive Integrated Moving Average (ARIMA) time series models. We introduce a new upper-sided control chart based on the Exponentially Weighted Moving Average (EWMA) scheme combined with the Fast Initial Response (FIR) feature. To assess chart performance we use the well-established Average</p> <p>Run Length (ARL) criterion. A non-homogeneous Markov chain method is developed for ARL calculation for the proposed chart. We show numerically that the proposed procedure performs as well or better than the Weighted Cumulative Sum (WCUSUM) chart introduced by Shu, Jiang and Tsui (2008), and better than the conventional CUSUM, the ACUSUM and the Generalized Likelihood Ratio Test (GLRT) charts. The methods are illustrated on molecular weight data from a polymer manufacturing process.</p> / Master of Science (MSc) Autoregressive integrated moving average Dynamic mean shift Forecast recovery Monte Carlo simulation non-homogeneous Markov chain One-sided EWMA Statistical Methodology Statistical Models Statistical Theory Statistical Methodology
12	STATISTICAL AND METHODOLOGICAL ISSUES IN EVALUATION OF INTEGRATED CARE PROGRAMS Ye, Chenglin January 2014 (has links) <p><strong>Background </strong></p> <p>Integrated care programs are collaborations to improve health services delivery for patients with multiple conditions.</p> <p><strong>Objectives</strong></p> <p>This thesis investigated three issues in evaluation of integrated care programs: (1) quantifying integration for integrated care programs, (2) analyzing integrated care programs with substantial non-compliance, and (3) assessing bias when evaluating integrated care programs under different non-compliant scenarios.</p> <p><strong>Methods</strong></p> <p>Project 1: We developed a method to quantity integration through service providers’ perception and expectation. For each provider, four integration scores were calculated. The properties of the scores were assessed.</p> <p>Project 2: A randomized controlled trial (RCT) compared the Children’s Treatment Network (CTN) with usual care on managing the children with complex conditions. To handle non-compliance, we employed the intention-to-treat (ITT), as-treated (AT), per-protocol (PP), and instrumental variable (IV) analyses. We also investigated propensity score (PS) methods to control for potential confounding.</p> <p>Project 3: Based on the CTN study, we simulated trials of different non-compliant scenarios. We then compared the ITT, AT, PP, IV, and complier average casual effect methods in analyzing the data. The results were compared by the bias of the estimate, mean square error, and 95% coverage.</p> <p><strong>Results and conclusions</strong></p> <p>Project 1: We demonstrated the proposed method in measuring integration and some of its properties. By bootstrapping analyses, we showed that the global integration score was robust. Our method has extended existing measures of integration and possesses a good extent of validity.</p> <p>Project 2: The CTN intervention was not significantly different from usual care on improving patients’ outcomes. The study highlighted some methodological challenges in evaluating integrated care programs in a RCT setting.</p> <p>Project 3: When an intervention had a moderate or large effect, the ITT analysis was considerably biased under non-compliance and alternative analyses could provide unbiased results. To minimize the bias, we make some recommendations for the choice of analyses under different scenarios.</p> / Doctor of Philosophy (PhD) Biostatistics Clinical Epidemiology Clinical Trials Integrated care Statistical Methodology Health Research Methodology Applied Statistics Biostatistics Clinical Epidemiology Clinical Trials Health Services Research Statistical Methodology Statistical Models Applied Statistics
13	Statistical Methods for Handling Intentional Inaccurate Responders McQuerry, Kristen J. 01 January 2016 (has links) In self-report data, participants who provide incorrect responses are known as intentional inaccurate responders. This dissertation provides statistical analyses for address intentional inaccurate responses in the data. Previous work with adolescent self-report, labeled survey participants who intentionally provide inaccurate answers as mischievous responders. This phenomenon also occurs in clinical research. For example, pregnant women who smoke may report that they are nonsmokers. Our advantage is that we do not solely have self-report answers and can verify responses with lab values. Currently, there is no clear method for handling these intentional inaccurate respondents when it comes to making statistical inferences. We propose a using an EM algorithm to account for the intentional behavior while maintaining all responses in the data. The performance of this model is evaluated using simulated data and real data. The strengths and weaknesses of the EM algorithm approach will be demonstrated. intentional inaccurate response self-report data EM algorithm Applied Statistics Statistical Methodology
14	EMPIRICAL LIKELIHOOD AND DIFFERENTIABLE FUNCTIONALS Shen, Zhiyuan 01 January 2016 (has links) Empirical likelihood (EL) is a recently developed nonparametric method of statistical inference. It has been shown by Owen (1988,1990) and many others that empirical likelihood ratio (ELR) method can be used to produce nice confidence intervals or regions. Owen (1988) shows that -2logELR converges to a chi-square distribution with one degree of freedom subject to a linear statistical functional in terms of distribution functions. However, a generalization of Owen's result to the right censored data setting is difficult since no explicit maximization can be obtained under constraint in terms of distribution functions. Pan and Zhou (2002), instead, study the EL with right censored data using a linear statistical functional constraint in terms of cumulative hazard functions. In this dissertation, we extend Owen's (1988) and Pan and Zhou's (2002) results subject to non-linear but Hadamard differentiable statistical functional constraints. In this purpose, a study of differentiable functional with respect to hazard functions is done. We also generalize our results to two sample problems. Stochastic process and martingale theories will be applied to prove the theorems. The confidence intervals based on EL method are compared with other available methods. Real data analysis and simulations are used to illustrate our proposed theorem with an application to the Gini's absolute mean difference. Empirical Likelihood Statistical Functional Hadamard Differentiable Applied Statistics Statistical Methodology Statistical Theory Survival Analysis
15	Reweighting methods in high dimensional regression Fang, Zhou January 2012 (has links) In this thesis, we focus on the application of covariate reweighting with Lasso-style methods for regression in high dimensions, particularly where p ≥ n. We apply a particular focus to the case of sparse regression under a-priori grouping structures. In such problems, even in the linear case, accurate estimation is difficult. Various authors have suggested ideas such as the Group Lasso and the Sparse Group Lasso, based on convex penalties, or alternatively methods like the Group Bridge, which rely on convergence under repetition to some local minimum of a concave penalised likelihood. We propose in this thesis a methodology that uses concave penalties to inspire a procedure whereupon we compute weights from an initial estimate, and then do a single second reweighted Lasso. This procedure -- the Co-adaptive Lasso -- obtains excellent results in empirical experiments, and we present some theoretical prediction and estimation error bounds. Further, several extensions and variants of the procedure are discussed and studied. In particular, we propose a Lasso style method of doing additive isotonic regression in high dimensions, the Liso algorithm, and enhance it using the Co-adaptive methodology. We also propose a method of producing rules based regression estimates for high dimensional non-parametric regression, that often outperforms the current leading method, the RuleFit algorithm. We also discuss extensions involving robust statistics applied to weight computation, repeating the algorithm, and online computation. 519.5
16	Application of Multivariate Statistical Methodology to Model Factors Influencing Fate and Transport of Fecal Pollution in Surface Waters Hall, Kimberlee K., Evanshen, Brian G., Maier, Kurt J., Scheuerman, Phillip R. 01 January 2014 (has links) The increasing number of polluted watersheds and water bodies with total maximum daily loads (TMDLs) has resulted in increased research to find methods that effectively and universally identify fecal pollution sources. A fundamental requirement to identify such methods is understanding the microbial and chemical processes that influence fate and transport of fecal indicators from various sources to receiving streams. Using the Watauga River watershed in northeast Tennessee as a model to better understand these processes, multivariate statistical analyses were conducted on data collected from four creeks that have or are expected to have pathogen TMDLs. The application of canonical correlation and discriminant analyses revealed spatial and temporal variability in the microbial and chemical parameters influencing water quality, suggesting that these creeks differ in terms of the nature and extent of fecal pollution. The identification of creeks within a watershed that have similar sources of fecal pollution using this data analysis approach could change prioritization of best management practices selection and placement. Furthermore, this suggests that TMDL development may require multiyear and multisite data using a targeted sampling approach instead of a 30-d geometric mean in large, complex watersheds. This technique may facilitate the choice between watershed TMDLs and single segment or stream TMDLs. fecal pollution multivariate statistical methodology Environmental Health Toxicology
17	INFERENCE USING BHATTACHARYYA DISTANCE TO MODEL INTERACTION EFFECTS WHEN THE NUMBER OF PREDICTORS FAR EXCEEDS THE SAMPLE SIZE Janse, Sarah A. 01 January 2017 (has links) In recent years, statistical analyses, algorithms, and modeling of big data have been constrained due to computational complexity. Further, the added complexity of relationships among response and explanatory variables, such as higher-order interaction effects, make identifying predictors using standard statistical techniques difficult. These difficulties are only exacerbated in the case of small sample sizes in some studies. Recent analyses have targeted the identification of interaction effects in big data, but the development of methods to identify higher-order interaction effects has been limited by computational concerns. One recently studied method is the Feasible Solutions Algorithm (FSA), a fast, flexible method that aims to find a set of statistically optimal models via a stochastic search algorithm. Although FSA has shown promise, its current limits include that the user must choose the number of times to run the algorithm. Here, statistical guidance is provided for this number iterations by deriving a lower bound on the probability of obtaining the statistically optimal model in a number of iterations of FSA. Moreover, logistic regression is severely limited when two predictors can perfectly separate the two outcomes. In the case of small sample sizes, this occurs quite often by chance, especially in the case of a large number of predictors. Bhattacharyya distance is proposed as an alternative method to address this limitation. However, little is known about the theoretical properties or distribution of B-distance. Thus, properties and the distribution of this distance measure are derived here. A hypothesis test and confidence interval are developed and tested on both simulated and real data. Bhattacharyya Distance model selection Feasible Solutions Algorithm perfect separation interaction effects logistic regression Statistical Methodology
18	High Dimensional Multivariate Inference Under General Conditions Kong, Xiaoli 01 January 2018 (has links) In this dissertation, we investigate four distinct and interrelated problems for high-dimensional inference of mean vectors in multi-groups. The first problem concerned is the profile analysis of high dimensional repeated measures. We introduce new test statistics and derive its asymptotic distribution under normality for equal as well as unequal covariance cases. Our derivations of the asymptotic distributions mimic that of Central Limit Theorem with some important peculiarities addressed with sufficient rigor. We also derive consistent and unbiased estimators of the asymptotic variances for equal and unequal covariance cases respectively. The second problem considered is the accurate inference for high-dimensional repeated measures in factorial designs as well as any comparisons among the cell means. We derive asymptotic expansion for the null distributions and the quantiles of a suitable test statistic under normality. We also derive the estimator of parameters contained in the approximate distribution with second-order consistency. The most important contribution is high accuracy of the methods, in the sense that p-values are accurate up to the second order in sample size as well as in dimension. The third problem pertains to the high-dimensional inference under non-normality. We relax the commonly imposed dependence conditions which has become a standard assumption in high dimensional inference. With the relaxed conditions, the scope of applicability of the results broadens. The fourth problem investigated pertains to a fully nonparametric rank-based comparison of high-dimensional populations. To develop the theory in this context, we prove a novel result for studying the asymptotic behavior of quadratic forms in ranks. The simulation studies provide evidence that our methods perform reasonably well in the high-dimensional situation. Real data from Electroencephalograph (EEG) study of alcoholic and control subjects is analyzed to illustrate the application of the results. Profile analysis MANOVA High-dimension Repeated measure Non-parametric Rank transforms Multivariate Analysis Statistical Methodology
19	UNSUPERVISED LEARNING IN PHYLOGENOMIC ANALYSIS OVER THE SPACE OF PHYLOGENETIC TREES Kang, Qiwen 01 January 2019 (has links) A phylogenetic tree is a tree to represent an evolutionary history between species or other entities. Phylogenomics is a new field intersecting phylogenetics and genomics and it is well-known that we need statistical learning methods to handle and analyze a large amount of data which can be generated relatively cheaply with new technologies. Based on the existing Markov models, we introduce a new method, CURatio, to identify outliers in a given gene data set. This method, intrinsically an unsupervised method, can find outliers from thousands or even more genes. This ability to analyze large amounts of genes (even with missing information) makes it unique in many parametric methods. At the same time, the exploration of statistical analysis in high-dimensional space of phylogenetic trees has never stopped, many tree metrics are proposed to statistical methodology. Tropical metric is one of them. We implement a MCMC sampling method to estimate the principal components in a tree space with the tropical metric for achieving dimension reduction and visualizing the result in a 2-D tropical triangle. Evolutionary models Gene trees Phylogenomics MCMC Tropical geometry Biostatistics Statistical Methodology
20	COMPOSITE NONPARAMETRIC TESTS IN HIGH DIMENSION Villasante Tezanos, Alejandro G. 01 January 2019 (has links) This dissertation focuses on the problem of making high-dimensional inference for two or more groups. High-dimensional means both the sample size (n) and dimension (p) tend to infinity, possibly at different rates. Classical approaches for group comparisons fail in the high-dimensional situation, in the sense that they have incorrect sizes and low powers. Much has been done in recent years to overcome these problems. However, these recent works make restrictive assumptions in terms of the number of treatments to be compared and/or the distribution of the data. This research aims to (1) propose and investigate refined small-sample approaches for high-dimension data in the multi-group setting (2) propose and study a fully-nonparametric approach, and (3) conduct an extensive comparison of the proposed methods with some existing ones in a simulation. When treatment effects can meaningfully be formulated in terms of means, a semiparametric approach under equal and unequal covariance assumptions is investigated. Composites of F-type statistics are used to construct two tests. One test is a moderate-p version – the test statistic is centered by asymptotic mean – and the other test is a large-p version asymptotic-expansion based finite-sample correction for the mean of the test statistic. These tests do not make any distributional assumptions and, therefore, they are nonparametric in a way. The theory for the tests only requires mild assumptions to regulate the dependence. Simulation results show that, for moderately small samples, the large-p version yields substantial gain in the size with a small power tradeoff. In some situations mean-based inference is not appropriate, for example, for data that is in ordinal scale or heavy tailed. For these situations, a high-dimensional fully-nonparametric test is proposed. In the two-sample situation, a composite of a Wilcoxon-Mann-Whitney type test is investigated. Assumptions needed are weaker than those in the semiparametric approach. Numerical comparisons with the moderate-p version of the semiparametric approach show that the nonparametric test has very similar size but achieves superior power, especially for skewed data with some amount of dependence between variables. Finally, we conduct an extensive simulation to compare our proposed methods with other nonparametric test and rank transformation methods. A wide spectrum of simulation settings is considered. These simulation settings include a variety of heavy tailed and skewed data distributions, homoscedastic and heteroscedastic covariance structures, various amounts of dependence and choices of tuning (smoothing window) parameter for the asymptotic variance estimators. The fully-nonparametric and the rank transformation methods behave similarly in terms of type I and type II errors. However, the two approaches fundamentally differ in their hypotheses. Although there are no formal mathematical proofs for the rank transformations, they have a tendency to provide immunity against effects of outliers. From a theoretical standpoint, our nonparametric method essentially uses variable-by-variable ranking which naturally arises from estimating the nonparametric effect of interest. As a result of this, our method is invariant against application of any monotone marginal transformations. For a more practical comparison, real-data from an Encephalogram (EEG) experiment is analyzed. Multivariate Analysis High Dimension Statistical Tests Multivariate Analysis Statistical Methodology Statistical Models

Search results