• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 162
  • 29
  • 15
  • 10
  • 8
  • 6
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 288
  • 288
  • 142
  • 81
  • 57
  • 46
  • 46
  • 37
  • 32
  • 31
  • 30
  • 26
  • 24
  • 22
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

The comparison of the sensitivities of experiments using different scales of measurement

Schumann, D. E. W. January 1956 (has links)
Ph. D.
92

Asymptotic theory for decentralized sequential hypothesis testing problems and sequential minimum energy design algorithm

Wang, Yan 19 May 2011 (has links)
The dissertation investigates asymptotic theory of decentralized sequential hypothesis testing problems as well as asymptotic behaviors of the Sequential Minimum Energy Design (SMED). The main results are summarized as follows. 1.We develop the first-order asymptotic optimality theory for decentralized sequential multi-hypothesis testing under a Bayes framework. Asymptotically optimal tests are obtained from the class of "two-stage" procedures and the optimal local quantizers are shown to be the "maximin" quantizers that are characterized as a randomization of at most M-1 Unambiguous Likelihood Quantizers (ULQ) when testing M >= 2 hypotheses. 2. We generalize the classical Kullback-Leibler inequality to investigate the quantization effects on the second-order and other general-order moments of log-likelihood ratios. It is shown that a quantization may increase these quantities, but such an increase is bounded by a universal constant that depends on the order of the moment. This result provides a simpler sufficient condition for asymptotic theory of decentralized sequential detection. 3. We propose a class of multi-stage tests for decentralized sequential multi-hypothesis testing problems, and show that with suitably chosen thresholds at different stages, it can hold the second-order asymptotic optimality properties when the hypotheses testing problem is "asymmetric." 4. We characterize the asymptotic behaviors of SMED algorithm, particularly the denseness and distributions of the design points. In addition, we propose a simplified version of SMED that is computationally more efficient.
93

Hypothesis testing and community detection on networks with missingness and block structure

Guilherme Maia Rodrigues Gomes (8086652) 06 December 2019 (has links)
Statistical analysis of networks has grown rapidly over the last few years with increasing number of applications. Graph-valued data carries additional information of dependencies which opens the possibility of modeling highly complex objects in vast number of fields such as biology (e.g. brain networks , fungi networks, genes co-expression), chemistry (e.g. molecules fingerprints), psychology (e.g. social networks) and many others (e.g. citation networks, word co-occurrences, financial systems, anomaly detection). While the inclusion of graph structure in the analysis can further help inference, simple statistical tasks in a network is very complex. For instance, the assumption of exchangeability of the nodes or the edges is quite strong, and it brings issues such as sparsity, size bias and poor characterization of the generative process of the data. Solutions to these issues include adding specific constraints and assumptions on the data generation process. In this work, we approach this problem by assuming graphs are globally sparse but locally dense, which allows exchangeability assumption to hold in local regions of the graph. We consider problems with two types of locality structure: block structure (also framed as multiple graphs or population of networks) and unstructured sparsity which can be seen as missing data. For the former, we developed a hypothesis testing framework for weighted aligned graphs; and a spectral clustering method for community detection on population of non-aligned networks. For the latter, we derive an efficient spectral clustering approach to learn the parameters of the zero inflated stochastic blockmodel. Overall, we found that incorporating multiple local dense structures leads to a more precise and powerful local and global inference. This result indicates that this general modeling scheme allows for exchangeability assumption on the edges to hold while generating more realistic graphs. We give theoretical conditions for our proposed algorithms, and we evaluate them on synthetic and real-world datasets, we show our models are able to outperform the baselines on a number of settings. <br>
94

Nonparametric tests to detect relationship between variables in the presence of heteroscedastic treatment effects

Tolos, Siti January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Haiyan Wang / Statistical tools to detect nonlinear relationship between variables are commonly needed in various practices. The first part of the dissertation presents a test of independence between a response variable, either discrete or continuous, and a continuous covariate after adjusting for heteroscedastic treatment effects. The method first involves augmenting each pair of the data for all treatments with a fixed number of nearest neighbors as pseudo-replicates. A test statistic is then constructed by taking the difference of two quadratic forms. Using such differences eliminate the need to estimate any nonlinear regression function, reducing the computational time. Although using a fixed number of nearest neighbors poses significant difficulty in the inference compared to when the number of nearest neighbors goes to infinity, the parametric standardizing rate is obtained for the asymptotic distribution of the proposed test statistics. Numerical studies show that the new test procedure maintains the intended type I error rate and has robust power to detect nonlinear dependency in the presence of outliers. The second part of the dissertation discusses the theory and numerical studies for testing the nonparametric effects of no covariate-treatment interaction and no main covariate based on the decomposition of the conditional mean of regression function that is potentially nonlinear. A similar test was discussed in Wang and Akritas (2006) for the effects defined through the decomposition of the conditional distribution function, but with the number of pseudo-replicates going to infinity. Consequently, their test statistics have slow convergence rates and computational speeds. Both test limitations are overcome using new model and tests. The last part of the dissertation develops theory and numerical studies to test for no covariate-treatment interaction, no simple covariate and no main covariate effects for cases when the number of factor levels and the number of covariate values are large.
95

Mathematical Methods for Enhanced Information Security in Treaty Verification

MacGahan, Christopher, MacGahan, Christopher January 2016 (has links)
Mathematical methods have been developed to perform arms-control-treaty verification tasks for enhanced information security. The purpose of these methods is to verify and classify inspected items while shielding the monitoring party from confidential aspects of the objects that the host country does not wish to reveal. Advanced medical-imaging methods used for detection and classification tasks have been adapted for list-mode processing, useful for discriminating projection data without aggregating sensitive information. These models make decisions off of varying amounts of stored information, and their task performance scales with that information. Development has focused on the Bayesian ideal observer, which assumes com- plete probabilistic knowledge of the detector data, and Hotelling observer, which assumes a multivariate Gaussian distribution on the detector data. The models can effectively discriminate sources in the presence of nuisance parameters. The chan- nelized Hotelling observer has proven particularly useful in that quality performance can be achieved while reducing the size of the projection data set. The inclusion of additional penalty terms into the channelizing-matrix optimization offers a great benefit for treaty-verification tasks. Penalty terms can be used to generate non- sensitive channels or to penalize the model's ability to discriminate objects based on confidential information. The end result is a mathematical model that could be shared openly with the monitor. Similarly, observers based on the likelihood probabilities have been developed to perform null-hypothesis tasks. To test these models, neutron and gamma-ray data was simulated with the GEANT4 toolkit. Tasks were performed on various uranium and plutonium in- spection objects. A fast-neutron coded-aperture detector was simulated to image the particles.
96

Robustness of the One-Sample Kolmogorov Test to Sampling from a Finite Discrete Population

Tucker, Joanne M. (Joanne Morris) 12 1900 (has links)
One of the most useful and best known goodness of fit test is the Kolmogorov one-sample test. The assumptions for the Kolmogorov (one-sample test) test are: 1. A random sample; 2. A continuous random variable; 3. F(x) is a completely specified hypothesized cumulative distribution function. The Kolmogorov one-sample test has a wide range of applications. Knowing the effect fromusing the test when an assumption is not met is of practical importance. The purpose of this research is to analyze the robustness of the Kolmogorov one-sample test to sampling from a finite discrete distribution. The standard tables for the Kolmogorov test are derived based on sampling from a theoretical continuous distribution. As such, the theoretical distribution is infinite. The standard tables do not include a method or adjustment factor to estimate the effect on table values for statistical experiments where the sample stems from a finite discrete distribution without replacement. This research provides an extension of the Kolmogorov test when the hypothesized distribution function is finite and discrete, and the sampling distribution is based on sampling without replacement. An investigative study has been conducted to explore possible tendencies and relationships in the distribution of Dn when sampling with and without replacement for various parameter settings. In all, 96 sampling distributions were derived. Results show the standard Kolmogorov table values are conservative, particularly when the sample sizes are small or the sample represents 10% or more of the population.
97

Computational and Statistical Advances in Testing and Learning

Ramdas, Aaditya Kumar 01 July 2015 (has links)
This thesis makes fundamental computational and statistical advances in testing and estimation, making critical progress in theory and application of classical statistical methods like classification, regression and hypothesis testing, and understanding the relationships between them. Our work connects multiple fields in often counter-intuitive and surprising ways, leading to new theory, new algorithms, and new insights, and ultimately to a cross-fertilization of varied fields like optimization, statistics and machine learning. The first of three thrusts has to do with active learning, a form of sequential learning from feedback-driven queries that often has a provable statistical advantage over passive learning. We unify concepts from two seemingly different areas—active learning and stochastic firstorder optimization. We use this unified view to develop new lower bounds for stochastic optimization using tools from active learning and new algorithms for active learning using ideas from optimization. We also study the effect of feature noise, or errors-in-variables, on the ability to actively learn. The second thrust deals with the development and analysis of new convex optimization algorithms for classification and regression problems. We provide geometrical and convex analytical insights into the role of the margin in margin-based classification, and develop new greedy primal-dual algorithms for non-linear classification. We also develop a unified proof for convergence rates of randomized algorithms for the ordinary least squares and ridge regression problems in a variety of settings, with the purpose of investigating which algorithm should be utilized in different settings. Lastly, we develop fast state-of-the-art numerically stable algorithms for an important univariate regression problem called trend filtering with a wide variety of practical extensions. The last thrust involves a series of practical and theoretical advances in nonparametric hypothesis testing. We show that a smoothedWasserstein distance allows us to connect many vast families of univariate and multivariate two sample tests. We clearly demonstrate the decreasing power of the families of kernel-based and distance-based two-sample tests and independence tests with increasing dimensionality, challenging existing folklore that they work well in high dimensions. Surprisingly, we show that these tests are automatically adaptive to simple alternatives and achieve the same power as other direct tests for detecting mean differences. We discover a computation-statistics tradeoff, where computationally more expensive two-sample tests have a provable statistical advantage over cheaper tests. We also demonstrate the practical advantage of using Stein shrinkage for kernel independence testing at small sample sizes. Lastly, we develop a novel algorithmic scheme for performing sequential multivariate nonparametric hypothesis testing using the martingale law of the iterated logarithm to near-optimally control both type-1 and type-2 errors. One perspective connecting everything in this thesis involves the closely related and fundamental problems of linear regression and classification. Every contribution in this thesis, from active learning to optimization algorithms, to the role of the margin, to nonparametric testing fits in this picture. An underlying theme that repeats itself in this thesis, is the computational and/or statistical advantages of sequential schemes with feedback. This arises in our work through comparing active with passive learning, through iterative algorithms for solving linear systems instead of direct matrix inversions, and through comparing the power of sequential and batch hypothesis tests.
98

Permutační testy statistických hypotéz / Permutation Tests of Statistical Hypotheses

Veselý, Zdeněk January 2015 (has links)
Title: Permutation Tests of Statistical Hypotheses Author: Zdeněk Veselý Department: Department of Probability and Mathematical Statistics Supervisor: prof. RNDr. Jana Jurečková DrSc., Department of Probability and Mathematical Statistics Abstract: This thesis presents permutation tests concept. Permutation test is demonstrated as response to testing problems where it is inconvenient to make any deeper presumptions on data probability distribution. For some of these problems it is even the only exact solution. The construction of permutation test is described in the thesis as well as approach to search of the most powerful tests to specific alternatives. In the second part of the thesis there are comparisons of powers of parametric, permutation and rank test using simulations. The result is that power of parametric and permutation test are very similar most of the times and that confirms that permutation tests are useful tool for praxis. Keywords: Permutation tests, Exact tests, Hypothesis testing, Power of tests
99

Uma análise sobre duas medidas de evidência: p-valor e s-valor / An analysis on two measures of evidence: p-value and s-value

Santos, Eriton Barros dos 04 August 2016 (has links)
Este trabalho tem como objetivo o estudo de duas medidas de evidência, a saber: o p-valor e o s-valor. A estatística da razão de verossimilhanças é utilizada para o cálculo dessas duas medidas de evidência. De maneira informal, o p-valor é a probabilidade de ocorrer um evento extremo sob as condições impostas pela hipótese nula, enquanto que o s-valor é o maior nível de significância da região de confiança tal que o espaço paramétrico sob a hipótese nula e a região de confiança tenham ao menos um elemento em comum. Para ambas as medidas, quanto menor forem seus respectivos valores, maior é o grau de inconsistência entre os dados observados e a hipótese nula postulada. O estudo será restrito a hipóteses nulas simples e compostas, considerando independência e distribuição normal para os dados. Os resultados principais deste trabalho são: 1) obtenção de fórmulas analíticas para o p-valor, utilizando probabilidades condicionais, e para o s-valor; e 2) comparação entre o p-valor e o s-valor em diferentes cenários, a saber: variância conhecida e desconhecida, e hipóteses nulas simples e compostas. Para hipóteses nulas simples, o s-valor coincide com o p-valor, e quando as hipóteses nulas são compostas, a relação entre o p-valor e o s-valor são complexas. No caso da variância conhecida, se a hipótese nula for uma semi-reta o p-valor é majorado pelo s-valor, se a hipótese é um intervalo fechado a diferença entre as duas medidas de evidência diminui conforme o comprimento do intervalo da hipótese testada. No caso de variância desconhecida e hipóteses nulas compostas, o s-valor é majorado pelo p-valor para valores pequenos do s-valor, por exemplo, quando o s-valor é menor do que 0.05. / This work aims to study two measures of evidence, namely: the p-value and s-value. The likelihood ratio statistic is used to calculate these two evidence measures. Informally, the p-value is the probability of an extreme event under the conditions imposed by the null hypothesis, while the s-value is the greatest confidence level of the confidence region such that the parameter space under the null hypothesis and the confidence region have at least one element in common. For both measures, the smaller are the respective values, the greater is the degree of inconsistency between the observed values and the null hypothesis. In this study, we will consider simple and composite null hypotheses and it will be restricted to independently and normally distributed data. The main results are: 1) to obtain the analytical formulas for the p-value, by using conditional probabilities, and for the s-value, and 2) to compare the p-value and s-value under different scenarios, namely: known and unknown variance, and simple and composite null hypotheses. For simple null hypotheses, the s-value coincides with the p-value, and for composite null hypotheses, the p-value and the s-value relationships are complex. In the case of known variance, if the null hypothesis is a half-line the p-value is smaller than the s-value, if the null hypothesis is a closed interval the difference between the two measures of evidence decreases with the interval width specified in the null hypothesis. In the case of unknown variance and composite hypotheses, the s-value is smaller than the p-value when the value of the s-value is small.
100

Statistical methods for certain large, complex data challenges

Li, Jun 15 November 2018 (has links)
Big data concerns large-volume, complex, growing data sets, and it provides us opportunities as well as challenges. This thesis focuses on statistical methods for several specific large, complex data challenges - each involving representation of data with complex format, utilization of complicated information, and/or intensive computational cost. The first problem we work on is hypothesis testing for multilayer network data, motivated by an example in computational biology. We show how to represent the complex structure of a multilayer network as a single data point within the space of supra-Laplacians and then develop a central limit theorem and hypothesis testing theories for multilayer networks in that space. We develop both global and local testing strategies for mean comparison and investigate sample size requirements. The methods were applied to the motivating computational biology example and compared with the classic Gene Set Enrichment Analysis(GSEA). More biological insights are found in this comparison. The second problem is the source detection problem in epidemiology, which is one of the most important issues for control of epidemics. Ideally, we want to locate the sources based on all history data. However, this is often infeasible, because the history data is complex, high-dimensional and cannot be fully observed. Epidemiologists have recognized the crucial role of human mobility as an important proxy to a complete history, but little in the literature to date uses this information for source detection. We recast the source detection problem as identifying a relevant mixture component in a multivariate Gaussian mixture model. Human mobility within a stochastic PDE model is used to calibrate the parameters. The capability of our method is demonstrated in the context of the 2000-2002 cholera outbreak in the KwaZulu-Natal province. The third problem is about multivariate time series imputation, which is a classic problem in statistics. To address the common problem of low signal-to-noise ratio in high-dimensional multivariate time series, we propose models based on state-space models which provide more precise inference of missing values by clustering multivariate time series components in a nonparametric way. The models are suitable for large-scale time series due to their efficient parameter estimation. / 2019-05-15T00:00:00Z

Page generated in 0.0847 seconds