Global ETD Search

Return to search

A framework for estimating risk

Thesis (PhD (Statistics and Actuarial Sciences))--Stellenbosch University, 2008. / We consider the problem of model assessment by risk estimation. Various
approaches to risk estimation are considered in a uni ed framework. This a discussion of various complexity dimensions and approaches to obtaining
bounds on covering numbers is also presented.
The second type of training sample interval estimator discussed in the thesis
is Rademacher bounds. These bounds use advanced concentration inequalities,
so a chapter discussing such inequalities is provided. Our discussion
of Rademacher bounds leads to the presentation of an alternative, slightly
stronger, form of the core result used for deriving local Rademacher bounds,
by avoiding a few unnecessary relaxations.
Next, we turn to a discussion of PAC-Bayesian bounds. Using an approach
developed by Olivier Catoni, we develop new PAC-Bayesian bounds based
on results underlying Hoe ding's inequality. By utilizing Catoni's concept
of \exchangeable priors", these results allowed the extension of a covering
number-based result to averaging classi ers, as well as its corresponding
algorithm- and data-dependent result.
The last contribution of the thesis is the development of a more
exible
shell decomposition bound: by using Hoe ding's tail inequality rather than
Hoe ding's relative entropy inequality, we extended the bound to general
loss functions, allowed the use of an arbitrary number of bins, and introduced
between-bin and within-bin \priors".
Finally, to illustrate the calculation of these bounds, we applied some of them
to the UCI spam classi cation problem, using decision trees and boosted
stumps.
framework is an extension of a decision-theoretic framework proposed by
David Haussler. Point and interval estimation based on test samples and
training samples is discussed, with interval estimators being classi ed based
on the measure of deviation they attempt to bound.
The main contribution of this thesis is in the realm of training sample interval
estimators, particularly covering number-based and PAC-Bayesian
interval estimators. The thesis discusses a number of approaches to obtaining
such estimators. The rst type of training sample interval estimator
to receive attention is estimators based on classical covering number arguments.
A number of these estimators were generalized in various directions.
Typical generalizations included: extension of results from misclassi cation
loss to other loss functions; extending results to allow arbitrary ghost sample
size; extending results to allow arbitrary scale in the relevant covering
numbers; and extending results to allow arbitrary choice of in the use of
symmetrization lemmas.
These extensions were applied to covering number-based estimators for various
measures of deviation, as well as for the special cases of misclassi -
cation loss estimators, realizable case estimators, and margin bounds. Extended
results were also provided for strati cation by (algorithm- and datadependent)
complexity of the decision class.
In order to facilitate application of these covering number-based bounds,

http://hdl.handle.net/10019.1/1104

Risk estimation

Concentration inequalities

Training sample bounds

Covering numbers

Risk assessment -- Mathematical models

Estimation theory

Bayesian statistical decision theory

Sampling (Statistics)

Identifer	oai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/1104
Date	03 1900
Creators	Kroon, Rodney Stephen
Contributors	Steel, S. J., Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.
Publisher	Stellenbosch : Stellenbosch University
Source Sets	South African National ETD Portal
Language	English
Detected Language	English
Type	Thesis
Rights	Stellenbosch University

Page generated in 0.0023 seconds

A framework for estimating risk

Description

Links & Downloads

Tags

Additional Fields