Global ETD Search

Return to search

Higher Criticism Testing for Signal Detection in Rare And Weak Models

examples - we need models for selecting a small subset of useful features from high-dimensional data, where the useful features are both rare and weak, this being crucial for e.g. supervised classfication of sparse high- dimensional data. A preceding step is to detect the presence of useful features, signal detection. This problem is related to testing a very large number of hypotheses, where the proportion of false null hypotheses is assumed to be very small. However, reliable signal detection will only be possible in certain areas of the two-dimensional sparsity-strength parameter space, the phase space. In this report, we focus on two families of distributions, N and χ2. In the former case, features are supposed to be independent and normally distributed. In the latter, in search for a more sophisticated model, we suppose that features depend in blocks, whose empirical separation strength asymptotically follows the non-central χ2ν-distribution. Our search for informative features explores Tukey's higher criticism (HC), which is a second-level significance testing procedure, for comparing the fraction of observed signi cances to the expected fraction under the global null. Throughout the phase space we investgate the estimated error rate, Err = (#Falsely rejected H0+ #Falsely rejected H1)/#Simulations, where H0: absence of informative signals, and H1: presence of informative signals, in both the N-case and the χ2ν-case, for ν= 2; 10; 30. In particular, we find, using a feature vector of the approximately same size as in genomic applications, that the analytically derived detection boundary is too optimistic in the sense that close to it, signal detection is still failing, and we need to move far from the boundary into the success region to ensure reliable detection. We demonstrate that Err grows fast and irregularly as we approach the detection boundary from the success region. In the χ2ν-case, ν > 2, no analytical detection boundary has been derived, but we show that the empirical success region there is smaller than in the N-case, especially as ν increases.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-103284

Probability Theory and Statistics

Sannolikhetsteori och statistik

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-103284
Date	January 2012
Creators	Blomberg, Niclas
Publisher	KTH, Matematisk statistik
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	Trita-MAT, 1401-2286 ; 25

Page generated in 0.002 seconds

Higher Criticism Testing for Signal Detection in Rare And Weak Models

Description

Links & Downloads

Tags

Additional Fields