Statistical hypothesis testing is one of the most powerful and interpretable tools for arriving at real-world conclusions from empirical observations. The classical set-up for testing goes as follows: the practitioner is given a sequence of 𝑛 independent and identically distributed data with the goal being to test the null hypothesis as to whether the observations are drawn from a particular family of distributions, say 𝐹, or otherwise. This is achieved by constructing a test statistic, say 𝑇_n (which is a function of the independent and identically distributed observations) and rejecting the null hypothesis if 𝑇_n is larger than some resampling/permutation-based, often asymptotic, threshold. In this thesis, we will deviate from this standard framework in the following two ways:
1. Often, in real-world applications, observations are not expected to be independent and identically distributed. This is particularly relevant in network data, where the dependence between observations is governed by an underlying graph. In Chapters 1 and 2, the focus is on a widely popular network-based model for binary outcome data, namely the Ising model, which has also attracted significant attention from the Statistical Physics community. We obtain precise estimates for the intractable normalizing constants in this model, which in turn enables us to study new weak laws and fluctuations that exhibit a certain \emph{sharp phase-transition} behavior. From a testing viewpoint, we address a structured signal detection problem in the context of Ising models. Our findings illustrate that the presence of network dependence can indeed be a \emph{blessing} for inference. I particular, we show that at the sharp phase-transition point, it is possible to detect much weaker signals compared to the case when data were drawn independent of one another.
2. While accepting/rejecting hypotheses, using resampling-based, or asymptotic thresholds can be unsatisfactory because it either requires recomputing the test statistic for every set of resampled observations or it only gives asymptotic validity of the type I error. In Chapters 3 and 4, the goal is to do away with these shortcomings. We propose a general strategy to construct exactly distribution-free tests for two celebrated nonparametric multivariate testing problems: (a) two-sample and (b) independence testing. Having distribution-freeness ensures that one can get rejection thresholds that do not rely on resampling but still yield exact finite sample type I error guarantees. Our proposal relies on the construction of a notion of multivariate ranks using the theory of optimal transport. These tests proceed without any moment assumptions (making them attractive for heavy-tailed data) and are more robust to outliers. Under some structural assumptions, we also prove that these tests can be more efficient for a broad class of alternatives than other popular tests which are not distribution-free.
From a mathematical standpoint, the proofs rely on Stein's method of exchangeable pairs for concentrations and (non) normal approximations, large deviation and correlation-decay type arguments, convex analysis, Le Cam's regularity theory and change of measures via contiguity, to name a few.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/s8fx-pd79 |
Date | January 2022 |
Creators | Deb, Nabarun |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0021 seconds