Global ETD Search

Return to search

Permutation Tests for Classification

We introduce and explore an approach to estimating statistical significance of classification accuracy, which is particularly useful in scientific applications of machine learning where high dimensionality of the data and the small number of training examples render most standard convergence bounds too loose to yield a meaningful guarantee of the generalization ability of the classifier. Instead, we estimate statistical significance of the observed classification accuracy, or the likelihood of observing such accuracy by chance due to spurious correlations of the high-dimensional data patterns with the class labels in the given training set. We adopt permutation testing, a non-parametric technique previously developed in classical statistics for hypothesis testing in the generative setting (i.e., comparing two probability distributions). We demonstrate the method on real examples from neuroimaging studies and DNA microarray analysis and suggest a theoretical analysis of the procedure that relates the asymptotic behavior of the test to the existing convergence bounds.

Classification

Permutation testing

Statistical significance.

Identifer	oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/6723
Date	28 August 2003
Creators	Mukherjee, Sayan, Golland, Polina, Panchenko, Dmitry
Source Sets	M.I.T. Theses and Dissertation
Language	en_US
Detected Language	English
Format	22 p., 1135156 bytes, 662639 bytes, application/postscript, application/pdf
Relation	AIM-2003-019

Page generated in 0.0017 seconds

Permutation Tests for Classification

Description

Links & Downloads

Tags

Additional Fields