Global ETD Search

1	False Discovery Rates, Higher Criticism and Related Methods in High-Dimensional Multiple Testing Klaus, Bernd 16 January 2013 (has links) (PDF) The technical advancements in genomics, functional magnetic-resonance and other areas of scientific research seen in the last two decades have led to a burst of interest in multiple testing procedures. A driving factor for innovations in the field of multiple testing has been the problem of large scale simultaneous testing. There, the goal is to uncover lower--dimensional signals from high--dimensional data. Mathematically speaking, this means that the dimension d is usually in the thousands while the sample size n is relatively small (max. 100 in general, often due to cost constraints) --- a characteristic commonly abbreviated as d >> n. In my thesis I look at several multiple testing problems and corresponding procedures from a false discovery rate (FDR) perspective, a methodology originally introduced in a seminal paper by Benjamini and Hochberg (2005). FDR analysis starts by fitting a two--component mixture model to the observed test statistics. This mixture consists of a null model density and an alternative component density from which the interesting cases are assumed to be drawn. In the thesis I proposed a new approach called log--FDR to the estimation of false discovery rates. Specifically, my new approach to truncated maximum likelihood estimation yields accurate null model estimates. This is complemented by constrained maximum likelihood estimation for the alternative density using log--concave density estimation. A recent competitor to the FDR is the method of \"Higher Criticism\". It has been strongly advocated in the context of variable selection in classification which is deeply linked to multiple comparisons. Hence, I also looked at variable selection in class prediction which can be viewed as a special signal identification problem. Both FDR methods and Higher Criticism can be highly useful for signal identification. This is discussed in the context of variable selection in linear discriminant analysis (LDA), a popular classification method. FDR methods are not only useful for multiple testing situations in the strict sense, they are also applicable to related problems. I looked at several kinds of applications of FDR in linear classification. I present and extend statistical techniques related to effect size estimation using false discovery rates and showed how to use these for variable selection. The resulting fdr--effect method proposed for effect size estimation is shown to work as well as competing approaches while being conceptually simple and computationally inexpensive. Additionally, I applied the fdr--effect method to variable selection by minimizing the misclassification rate and showed that it works very well and leads to compact and interpretable feature sets. Multiples Testen Hochdimensionale Daten FDR Klassifikation Higher Criticism Multiple Testing High-dimensional Data FDR Classification Higher Criticism ddc:500
2	Visual Analysis of High-Dimensional Point Clouds using Topological Abstraction Oesterling, Patrick 17 May 2016 (has links) (PDF) This thesis is about visualizing a kind of data that is trivial to process by computers but difficult to imagine by humans because nature does not allow for intuition with this type of information: high-dimensional data. Such data often result from representing observations of objects under various aspects or with different properties. In many applications, a typical, laborious task is to find related objects or to group those that are similar to each other. One classic solution for this task is to imagine the data as vectors in a Euclidean space with object variables as dimensions. Utilizing Euclidean distance as a measure of similarity, objects with similar properties and values accumulate to groups, so-called clusters, that are exposed by cluster analysis on the high-dimensional point cloud. Because similar vectors can be thought of as objects that are alike in terms of their attributes, the point cloud\'s structure and individual cluster properties, like their size or compactness, summarize data categories and their relative importance. The contribution of this thesis is a novel analysis approach for visual exploration of high-dimensional point clouds without suffering from structural occlusion. The work is based on implementing two key concepts: The first idea is to discard those geometric properties that cannot be preserved and, thus, lead to the typical artifacts. Topological concepts are used instead to shift away the focus from a point-centered view on the data to a more structure-centered perspective. The advantage is that topology-driven clustering information can be extracted in the data\'s original domain and be preserved without loss in low dimensions. The second idea is to split the analysis into a topology-based global overview and a subsequent geometric local refinement. The occlusion-free overview enables the analyst to identify features and to link them to other visualizations that permit analysis of those properties not captured by the topological abstraction, e.g. cluster shape or value distributions in particular dimensions or subspaces. The advantage of separating structure from data point analysis is that restricting local analysis only to data subsets significantly reduces artifacts and the visual complexity of standard techniques. That is, the additional topological layer enables the analyst to identify structure that was hidden before and to focus on particular features by suppressing irrelevant points during local feature analysis. This thesis addresses the topology-based visual analysis of high-dimensional point clouds for both the time-invariant and the time-varying case. Time-invariant means that the points do not change in their number or positions. That is, the analyst explores the clustering of a fixed and constant set of points. The extension to the time-varying case implies the analysis of a varying clustering, where clusters appear as new, merge or split, or vanish. Especially for high-dimensional data, both tracking---which means to relate features over time---but also visualizing changing structure are difficult problems to solve. Hochdimensionale Daten Punktwolken Clustering Topologie Visualisierung Dichtefunktion High-Dimensional Data Point Clouds Clustering Topology Visualization Temporal Clustering Scalar Fields Merge Tree Density Function ddc:500
3	Analysis of high dimensional repeated measures designs: The one- and two-sample test statistics / Entwicklung von Verfahren zur Analyse von hochdimensionalen Daten mit Messwiederholungen Ahmad, Muhammad Rauf 07 July 2008 (has links) No description available. 500 Naturwissenschaften allgemein Mathematics and Computer Science hochdimensionale daten bilinearformen Box approximation high dimensional data bilinear forms Box approximation 31.73
4	False Discovery Rates, Higher Criticism and Related Methods in High-Dimensional Multiple Testing Klaus, Bernd 09 January 2013 (has links) The technical advancements in genomics, functional magnetic-resonance and other areas of scientific research seen in the last two decades have led to a burst of interest in multiple testing procedures. A driving factor for innovations in the field of multiple testing has been the problem of large scale simultaneous testing. There, the goal is to uncover lower--dimensional signals from high--dimensional data. Mathematically speaking, this means that the dimension d is usually in the thousands while the sample size n is relatively small (max. 100 in general, often due to cost constraints) --- a characteristic commonly abbreviated as d >> n. In my thesis I look at several multiple testing problems and corresponding procedures from a false discovery rate (FDR) perspective, a methodology originally introduced in a seminal paper by Benjamini and Hochberg (2005). FDR analysis starts by fitting a two--component mixture model to the observed test statistics. This mixture consists of a null model density and an alternative component density from which the interesting cases are assumed to be drawn. In the thesis I proposed a new approach called log--FDR to the estimation of false discovery rates. Specifically, my new approach to truncated maximum likelihood estimation yields accurate null model estimates. This is complemented by constrained maximum likelihood estimation for the alternative density using log--concave density estimation. A recent competitor to the FDR is the method of \"Higher Criticism\". It has been strongly advocated in the context of variable selection in classification which is deeply linked to multiple comparisons. Hence, I also looked at variable selection in class prediction which can be viewed as a special signal identification problem. Both FDR methods and Higher Criticism can be highly useful for signal identification. This is discussed in the context of variable selection in linear discriminant analysis (LDA), a popular classification method. FDR methods are not only useful for multiple testing situations in the strict sense, they are also applicable to related problems. I looked at several kinds of applications of FDR in linear classification. I present and extend statistical techniques related to effect size estimation using false discovery rates and showed how to use these for variable selection. The resulting fdr--effect method proposed for effect size estimation is shown to work as well as competing approaches while being conceptually simple and computationally inexpensive. Additionally, I applied the fdr--effect method to variable selection by minimizing the misclassification rate and showed that it works very well and leads to compact and interpretable feature sets. info:eu-repo/classification/ddc/500 ddc:500
5	Efficient multivariate approximation with transformed rank-1 lattices Nasdala, Robert 17 May 2022 (has links) We study the approximation of functions defined on different domains by trigonometric and transformed trigonometric functions. We investigate which of the many results known from the approximation theory on the d-dimensional torus can be transfered to other domains. We define invertible parameterized transformations and prove conditions under which functions from a weighted Sobolev space can be transformed into functions defined on the torus, that still have a certain degree of Sobolev smoothness and for which we know worst-case upper error bounds. By reverting the initial change of variables we transfer the fast algorithms based on rank-1 lattices used to approximate functions on the torus efficiently over to other domains and obtain adapted FFT algorithms.:1 Introduction 2 Preliminaries and notations 3 Fourier approximation on the torus 4 Torus-to-R d transformation mappings 5 Torus-to-cube transformation mappings 6 Conclusion Alphabetical Index / Wir betrachten die Approximation von Funktionen, die auf verschiedenen Gebieten definiert sind, mittels trigonometrischer und transformierter trigonometrischer Funktionen. Wir untersuchen, welche bisherigen Ergebnisse für die Approximation von Funktionen, die auf einem d-dimensionalen Torus definiert wurden, auf andere Definitionsgebiete übertragen werden können. Dazu definieren wir parametrisierte Transformationsabbildungen und beweisen Bedingungen, bei denen Funktionen aus einem gewichteten Sobolevraum in Funktionen, die auf dem Torus definiert sind, transformiert werden können, die dabei einen gewissen Grad an Sobolevglattheit behalten und für die obere Schranken der Approximationsfehler bewiesen wurden. Durch Umkehrung der ursprünglichen Koordinatentransformation übertragen wir die schnellen Algorithmen, die Rang-1 Gitter Methoden verwenden um Funktionen auf dem Torus effizient zu approximieren, auf andere Definitionsgebiete und erhalten adaptierte FFT Algorithmen.:1 Introduction 2 Preliminaries and notations 3 Fourier approximation on the torus 4 Torus-to-R d transformation mappings 5 Torus-to-cube transformation mappings 6 Conclusion Alphabetical Index info:eu-repo/classification/ddc/510 ddc:510 Numerische Mathematik Approximationstheorie Koordinatentransformation
6	Tensor product methods in numerical simulation of high-dimensional dynamical problems Dolgov, Sergey 08 September 2014 (has links) (PDF) Quantification of stochastic or quantum systems by a joint probability density or wave function is a notoriously difficult computational problem, since the solution depends on all possible states (or realizations) of the system. Due to this combinatorial flavor, even a system containing as few as ten particles may yield as many as $10^{10}$ discretized states. None of even modern supercomputers are capable to cope with this curse of dimensionality straightforwardly, when the amount of quantum particles, for example, grows up to more or less interesting order of hundreds. A traditional approach for a long time was to avoid models formulated in terms of probabilistic functions, and simulate particular system realizations in a randomized process. Since different times in different communities, data-sparse methods came into play. Generally, they aim to define all data points indirectly, by a map from a low amount of representers, and recast all operations (e.g. linear system solution) from the initial data to the effective parameters. The most advanced techniques can be applied (at least, tried) to any given array, and do not rely explicitly on its origin. The current work contributes further progress to this area in the particular direction: tensor product methods for separation of variables. The separation of variables has a long history, and is based on the following elementary concept: a function of many variables may be expanded as a product of univariate functions. On the discrete level, a function is encoded by an array of its values, or a tensor. Therefore, instead of a huge initial array, the separation of variables allows to work with univariate factors with much less efforts. The dissertation contains a short overview of existing tensor representations: canonical PARAFAC, Hierarchical Tucker, Tensor Train (TT) formats, as well as the artificial tensorisation, resulting in the Quantized Tensor Train (QTT) approximation method. The contribution of the dissertation consists in both theoretical constructions and practical numerical algorithms for high-dimensional models, illustrated on the examples of the Fokker-Planck and the chemical master equations. Both arise from stochastic dynamical processes in multiconfigurational systems, and govern the evolution of the probability function in time. A special focus is put on time propagation schemes and their properties related to tensor product methods. We show that these applications yield large-scale systems of linear equations, and prove analytical separable representations of the involved functions and operators. We propose a new combined tensor format (QTT-Tucker), which descends from the TT format (hence TT algorithms may be generalized smoothly), but provides complexity reduction by an order of magnitude. We develop a robust iterative solution algorithm, constituting most advantageous properties of the classical iterative methods from numerical analysis and alternating density matrix renormalization group (DMRG) techniques from quantum physics. Numerical experiments confirm that the new method is preferable to DMRG algorithms. It is as fast as the simplest alternating schemes, but as reliable and accurate as the Krylov methods in linear algebra. Hochdimensionale Probleme DMRG MPS Tensor Train Format alternierende Verfahren stochastische Probleme chemische Mastergleichung Fokker-Planck Gleichung high-dimensional problems DMRG MPS tensor train format alternating methods stochastic problems chemical master equation Fokker-Planck equation ddc:500
7	Numerische Methoden zur Analyse hochdimensionaler Daten / Numerical Methods for Analyzing High-Dimensional Data Heinen, Dennis 01 July 2014 (has links) Diese Dissertation beschäftigt sich mit zwei der wesentlichen Herausforderungen, welche bei der Bearbeitung großer Datensätze auftreten, der Dimensionsreduktion und der Datenentstörung. Der erste Teil dieser Dissertation liefert eine Zusammenfassung über Dimensionsreduktion. Ziel der Dimensionsreduktion ist eine sinnvolle niedrigdimensionale Darstellung eines vorliegenden hochdimensionalen Datensatzes. Insbesondere diskutieren und vergleichen wir bewährte Methoden des Manifold-Learning. Die zentrale Annahme des Manifold-Learning ist, dass der hochdimensionale Datensatz (approximativ) auf einer niedrigdimensionalen Mannigfaltigkeit liegt. Störungen im Datensatz sind bei allen Dimensionsreduktionsmethoden hinderlich. Der zweite Teil dieser Dissertation stellt eine neue Entstörungsmethode für hochdimensionale Daten vor, eine Wavelet-Shrinkage-Methode für die Glättung verrauschter Abtastwerte einer zugrundeliegenden multivariaten stückweise stetigen Funktion, wobei die Abtastpunkte gestreut sein können. Die Methode stellt eine Verallgemeinerung und Weiterentwicklung der für die Bildkompression eingeführten "Easy Path Wavelet Transform" (EPWT) dar. Grundlage ist eine eindimensionale Wavelet-Transformation entlang (adaptiv) zu konstruierender Pfade durch die Abtastpunkte. Wesentlich für den Erfolg der Methode sind passende adaptive Pfadkonstruktionen. Diese Dissertation beinhaltet weiterhin eine kurze Diskussion der theoretischen Eigenschaften von Wavelets entlang von Pfaden sowie numerische Resultate und schließt mit möglichen Modifikationen der Entstörungsmethode. 510 Mathematics (PPN61756535X)
8	Visual Analysis of High-Dimensional Point Clouds using Topological Abstraction Oesterling, Patrick 14 April 2016 (has links) This thesis is about visualizing a kind of data that is trivial to process by computers but difficult to imagine by humans because nature does not allow for intuition with this type of information: high-dimensional data. Such data often result from representing observations of objects under various aspects or with different properties. In many applications, a typical, laborious task is to find related objects or to group those that are similar to each other. One classic solution for this task is to imagine the data as vectors in a Euclidean space with object variables as dimensions. Utilizing Euclidean distance as a measure of similarity, objects with similar properties and values accumulate to groups, so-called clusters, that are exposed by cluster analysis on the high-dimensional point cloud. Because similar vectors can be thought of as objects that are alike in terms of their attributes, the point cloud\''s structure and individual cluster properties, like their size or compactness, summarize data categories and their relative importance. The contribution of this thesis is a novel analysis approach for visual exploration of high-dimensional point clouds without suffering from structural occlusion. The work is based on implementing two key concepts: The first idea is to discard those geometric properties that cannot be preserved and, thus, lead to the typical artifacts. Topological concepts are used instead to shift away the focus from a point-centered view on the data to a more structure-centered perspective. The advantage is that topology-driven clustering information can be extracted in the data\''s original domain and be preserved without loss in low dimensions. The second idea is to split the analysis into a topology-based global overview and a subsequent geometric local refinement. The occlusion-free overview enables the analyst to identify features and to link them to other visualizations that permit analysis of those properties not captured by the topological abstraction, e.g. cluster shape or value distributions in particular dimensions or subspaces. The advantage of separating structure from data point analysis is that restricting local analysis only to data subsets significantly reduces artifacts and the visual complexity of standard techniques. That is, the additional topological layer enables the analyst to identify structure that was hidden before and to focus on particular features by suppressing irrelevant points during local feature analysis. This thesis addresses the topology-based visual analysis of high-dimensional point clouds for both the time-invariant and the time-varying case. Time-invariant means that the points do not change in their number or positions. That is, the analyst explores the clustering of a fixed and constant set of points. The extension to the time-varying case implies the analysis of a varying clustering, where clusters appear as new, merge or split, or vanish. Especially for high-dimensional data, both tracking---which means to relate features over time---but also visualizing changing structure are difficult problems to solve. info:eu-repo/classification/ddc/500 ddc:500
9	Adaptive risk management Chen, Ying 13 February 2007 (has links) In den vergangenen Jahren ist die Untersuchung des Risikomanagements vom Baselkomitee angeregt, um die Kredit- und Bankwesen regelmäßig zu aufsichten. Für viele multivariate Risikomanagementmethoden gibt es jedoch Beschränkungen von: 1) verlässt sich die Kovarianzschätzung auf eine zeitunabhängige Form, 2) die Modelle beruhen auf eine unrealistischen Verteilungsannahme und 3) numerische Problem, die bei hochdimensionalen Daten auftreten. Es ist das primäre Ziel dieser Doktorarbeit, präzise und schnelle Methoden vorzuschlagen, die diesen Beschränkungen überwinden. Die Grundidee besteht darin, zuerst aus einer hochdimensionalen Zeitreihe die stochastisch unabhängigen Komponenten (IC) zu extrahieren und dann die Verteilungsparameter der resultierenden IC beruhend auf eindimensionale Heavy-Tailed Verteilungsannahme zu identifizieren. Genauer gesagt werden zwei lokale parametrische Methoden verwendet, um den Varianzprozess jeder IC zu schätzen, das lokale Moving Window Average (MVA) Methode und das lokale Exponential Smoothing (ES) Methode. Diese Schätzungen beruhen auf der realistischen Annahme, dass die IC Generalized Hyperbolic (GH) verteilt sind. Die Berechnung ist schneller und erreicht eine höhere Genauigkeit als viele bekannte Risikomanagementmethoden. / Over recent years, study on risk management has been prompted by the Basel committee for the requirement of regular banking supervisory. There are however limitations of many risk management methods: 1) covariance estimation relies on a time-invariant form, 2) models are based on unrealistic distributional assumption and 3) numerical problems appear when applied to high-dimensional portfolios. The primary aim of this dissertation is to propose adaptive methods that overcome these limitations and can accurately and fast measure risk exposures of multivariate portfolios. The basic idea is to first retrieve out of high-dimensional time series stochastically independent components (ICs) and then identify the distributional behavior of every resulting IC in univariate space. To be more specific, two local parametric approaches, local moving window average (MWA) method and local exponential smoothing (ES) method, are used to estimate the volatility process of every IC under the heavy-tailed distributional assumption, namely ICs are generalized hyperbolic (GH) distributed. By doing so, it speeds up the computation of risk measures and achieves much better accuracy than many popular risk management methods. Risikomanagement Heavy-Tailed Verteilung Lokale parametrische Methoden Hochdimensionale Datenanalyse Risk management Heavy-tailed distribution Local parametric methods High-dimensional data analysis 330 Wirtschaft 17 Wirtschaft QP 300 ddc:330
10	High Dimensional Fast Fourier Transform Based on Rank-1 Lattice Sampling / Hochdimensionale schnelle Fourier-Transformation basierend auf Rang-1 Gittern als Ortsdiskretisierungen Kämmerer, Lutz 24 February 2015 (has links) (PDF) We consider multivariate trigonometric polynomials with frequencies supported on a fixed but arbitrary frequency index set I, which is a finite set of integer vectors of length d. Naturally, one is interested in spatial discretizations in the d-dimensional torus such that - the sampling values of the trigonometric polynomial at the nodes of this spatial discretization uniquely determines the trigonometric polynomial, - the corresponding discrete Fourier transform is fast realizable, and - the corresponding fast Fourier transform is stable. An algorithm that computes the discrete Fourier transform and that needs a computational complexity that is bounded from above by terms that are linear in the maximum of the number of input and output data up to some logarithmic factors is called fast Fourier transform. We call the fast Fourier transform stable if the Fourier matrix of the discrete Fourier transform has a condition number near one and the fast algorithm does not corrupt this theoretical stability. We suggest to use rank-1 lattices and a generalization as spatial discretizations in order to sample multivariate trigonometric polynomials and we develop construction methods in order to determine reconstructing sampling sets, i.e., sets of sampling nodes that allow for the unique, fast, and stable reconstruction of trigonometric polynomials. The methods for determining reconstructing rank-1 lattices are component{by{component constructions, similar to the seminal methods that are developed in the field of numerical integration. During this thesis we identify a component{by{component construction of reconstructing rank-1 lattices that allows for an estimate of the number of sampling nodes M \|I\|\le M\le \max\left(\frac{2}{3}\|I\|^2,\max\{3\\|\mathbf{k}\\|_\infty\colon\mathbf{k}\in I\}\right) that is sufficient in order to uniquely reconstruct each multivariate trigonometric polynomial with frequencies supported on the frequency index set I. We observe that the bounds on the number M only depends on the number of frequency indices contained in I and the expansion of I, but not on the spatial dimension d. Hence, rank-1 lattices are suitable spatial discretizations in arbitrarily high dimensional problems. Furthermore, we consider a generalization of the concept of rank-1 lattices, which we call generated sets. We use a quite different approach in order to determine suitable reconstructing generated sets. The corresponding construction method is based on a continuous optimization method. Besides the theoretical considerations, we focus on the practicability of the presented algorithms and illustrate the theoretical findings by means of several examples. In addition, we investigate the approximation properties of the considered sampling schemes. We apply the results to the most important structures of frequency indices in higher dimensions, so-called hyperbolic crosses and demonstrate the approximation properties by the means of several examples that include the solution of Poisson's equation as one representative of partial differential equations. Rang-1 Gitter komponentenweise Konstruktion multivariate trigonometrische Polynome Frequenzindexmenge Differenzmenge der Frequenzindexmenge Rang-1 Abtastmenge hyperbolisches Kreuz rank-1 lattice component-by-component lattice rule high dimensional fast Fourier transform multivariate trigonometric polynomials frequency index set difference set of a frequency index set generated set approximation periodization hyperbolic cross energy-norm based hyperbolic cross ddc:518 Approximation Periodisierung

Search results