Global ETD Search

Return to search

Permutation Tests for Classification

We introduce and explore an approach to estimating statisticalsignificance of classification accuracy, which is particularly usefulin scientific applications of machine learning where highdimensionality of the data and the small number of training examplesrender most standard convergence bounds too loose to yield ameaningful guarantee of the generalization ability of theclassifier. Instead, we estimate statistical significance of theobserved classification accuracy, or the likelihood of observing suchaccuracy by chance due to spurious correlations of thehigh-dimensional data patterns with the class labels in the giventraining set. We adopt permutation testing, a non-parametric techniquepreviously developed in classical statistics for hypothesis testing inthe generative setting (i.e., comparing two probabilitydistributions). We demonstrate the method on real examples fromneuroimaging studies and DNA microarray analysis and suggest atheoretical analysis of the procedure that relates the asymptoticbehavior of the test to the existing convergence bounds.

Classification

Permutation testing

Statistical significance.

Identifer	oai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/30408
Date	28 August 2003
Creators	Mukherjee, Sayan, Golland, Polina, Panchenko, Dmitry
Source Sets	M.I.T. Theses and Dissertation
Language	en_US
Detected Language	English
Format	22 p., 22876548 bytes, 882217 bytes, application/postscript, application/pdf
Relation	Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory

Page generated in 0.002 seconds

Permutation Tests for Classification

Description

Links & Downloads

Tags

Additional Fields