The technical advancements in genomics, functional magnetic-resonance
and other areas of scientific research seen in the last two decades
have led to a burst of interest in multiple testing procedures.
A driving factor for innovations in the field of multiple testing has been the problem of
large scale simultaneous testing. There, the goal is to uncover lower--dimensional signals
from high--dimensional data. Mathematically speaking, this means that the dimension d
is usually in the thousands while the sample size n is relatively small (max. 100 in general,
often due to cost constraints) --- a characteristic commonly abbreviated as d >> n.
In my thesis I look at several multiple testing problems and corresponding
procedures from a false discovery rate (FDR) perspective, a methodology originally introduced in a seminal paper by Benjamini and Hochberg (2005).
FDR analysis starts by fitting a two--component mixture model to the observed test statistics. This mixture consists of a null model density and an alternative component density from which the interesting cases are assumed to be drawn.
In the thesis I proposed a new approach called log--FDR
to the estimation of false discovery rates. Specifically,
my new approach to truncated maximum likelihood estimation yields accurate
null model estimates. This is complemented by constrained maximum
likelihood estimation for the alternative density using log--concave
density estimation.
A recent competitor to the FDR is the method of \"Higher
Criticism\". It has been strongly advocated
in the context of variable selection in classification
which is deeply linked to multiple comparisons.
Hence, I also looked at variable selection in class prediction which can be viewed as
a special signal identification problem. Both FDR methods and Higher Criticism
can be highly useful for signal identification. This is discussed in the context of
variable selection in linear discriminant analysis (LDA),
a popular classification method.
FDR methods are not only useful for multiple testing situations in the strict sense,
they are also applicable to related problems. I looked at several kinds of applications of FDR in linear classification. I present and extend statistical techniques related to effect size estimation using false discovery rates and showed how to use these for variable selection. The resulting fdr--effect
method proposed for effect size estimation is shown to work as well as competing
approaches while being conceptually simple and computationally inexpensive.
Additionally, I applied the fdr--effect method to variable selection by minimizing
the misclassification rate and showed that it works very well and leads to compact
and interpretable feature sets.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:11818 |
Date | 09 January 2013 |
Creators | Klaus, Bernd |
Contributors | Boulesteix, Anne–Laure, Kirstein, Bernd, Universität Leipzig |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0021 seconds