Return to search

Permutation Tests for Classification

We introduce and explore an approach to estimating statistical significance of classification accuracy, which is particularly useful in scientific applications of machine learning where high dimensionality of the data and the small number of training examples render most standard convergence bounds too loose to yield a meaningful guarantee of the generalization ability of the classifier. Instead, we estimate statistical significance of the observed classification accuracy, or the likelihood of observing such accuracy by chance due to spurious correlations of the high-dimensional data patterns with the class labels in the given training set. We adopt permutation testing, a non-parametric technique previously developed in classical statistics for hypothesis testing in the generative setting (i.e., comparing two probability distributions). We demonstrate the method on real examples from neuroimaging studies and DNA microarray analysis and suggest a theoretical analysis of the procedure that relates the asymptotic behavior of the test to the existing convergence bounds.

Identiferoai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/6723
Date28 August 2003
CreatorsMukherjee, Sayan, Golland, Polina, Panchenko, Dmitry
Source SetsM.I.T. Theses and Dissertation
Languageen_US
Detected LanguageEnglish
Format22 p., 1135156 bytes, 662639 bytes, application/postscript, application/pdf
RelationAIM-2003-019

Page generated in 0.0023 seconds