Global ETD Search

Return to search

Analysis and Optimization of Classifier Error Estimator Performance within a Bayesian Modeling Framework

With the advent of high-throughput genomic and proteomic technologies, in conjunction with the difficulty in obtaining even moderately sized samples, small-sample classifier design has become a major issue in the biological and medical communities. Training-data error estimation becomes mandatory, yet none of the popular error estimation techniques have been rigorously designed via statistical inference or optimization. In this investigation, we place classifier error estimation in a framework of minimum mean-square error (MMSE) signal estimation in the presence of uncertainty, where uncertainty is relative to a prior over a family of distributions. This results in a Bayesian approach to error estimation that is optimal and unbiased relative to the model. The prior addresses a trade-off between estimator robustness (modeling assumptions) and accuracy.

Closed-form representations for Bayesian error estimators are provided for two important models: discrete classification with Dirichlet priors (the discrete model) and linear classification of Gaussian distributions with fixed, scaled identity or arbitrary covariances and conjugate priors (the Gaussian model). We examine robustness to false modeling assumptions and demonstrate that Bayesian error estimators perform especially well for moderate true errors.

The Bayesian modeling framework facilitates both optimization and analysis. It naturally gives rise to a practical expected measure of performance for arbitrary error estimators: the sample-conditioned mean-square error (MSE). Closed-form expressions are provided for both Bayesian models. We examine the consistency of Bayesian error estimation and illustrate a salient application in censored sampling, where sample points are collected one at a time until the conditional MSE reaches a stopping criterion.

We address practical considerations for gene-expression microarray data, including the suitability of the Gaussian model, a methodology for calibrating normal-inverse-Wishart priors from unused data, and an approximation method for non-linear classification. We observe superior performance on synthetic high-dimensional data and real data, especially for moderate to high expected true errors and small feature sizes.

Finally, arbitrary error estimators may be optimally calibrated assuming a fixed Bayesian model, sample size, classification rule, and error estimation rule. Using a calibration function mapping error estimates to their optimally calibrated values off-line, error estimates may be calibrated on the fly whenever the assumptions apply.

http://hdl.handle.net/1969.1/ETD-TAMU-2012-05-10873

minimum mean-square estimation

small samples

Identifer	oai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/ETD-TAMU-2012-05-10873
Date	2012 May 1900
Creators	Dalton, Lori Anne
Contributors	Dougherty, Edward R.
Source Sets	Texas A and M University
Language	en_US
Detected Language	English
Type	thesis, text
Format	application/pdf

Page generated in 0.0016 seconds

Analysis and Optimization of Classifier Error Estimator Performance within a Bayesian Modeling Framework

Description

Links & Downloads

Tags

Additional Fields