1 |
Scalable Nonparametric L1 Density Estimation via Sparse Subtree PartitioningSandstedt, Axel January 2023 (has links)
We consider the construction of multivariate histogram estimators for any density f seeking to minimize its L1 distance to the true underlying density using arbitrarily large sample sizes. Theory for such estimators exist and the early stages of distributed implementations are available. Our main contributions are new algorithms which seek to optimise out unnecessary network communication taking place in the distributed stages of the construction of such estimators using sparse binary tree arithmetics.
|
2 |
Statistical Regular Pavings and their ApplicationsTeng, Gloria Ai Hui January 2013 (has links)
We propose using statistical regular pavings (SRPs) as an efficient and adaptive statistical data structure for processing massive, multi-dimensional data. A regular paving (RP) is an ordered binary tree that recursively bisects a box in $\Rz^{d}$ along the first widest side. An SRP is extended from an RP by allowing mutable caches of recursively computable statistics of the data. In this study we use SRPs for two major applications: estimating histogram densities and summarising large spatio-temporal datasets.
The SRP histograms produced are $L_1$-consistent density estimators driven by a randomised priority queue that adaptively grows the SRP tree, and formalised as a Markov chain over the space of SRPs. A way to select an estimate is to run a Markov chain over the space of SRP trees, also initialised by the randomised priority queue, but here the SRP tree either shrinks or grows adaptively through pruning or splitting operations. The stationary distribution of the Markov chain is then the posterior distribution over the space of all possible histograms. We then take advantage of the recursive nature of SRPs to make computationally efficient arithmetic averages, and take the average of the states sampled from the stationary distribution to obtain the posterior mean histogram estimate.
We also show that SRPs are capable of summarizing large datasets by working with a dataset containing high frequency aircraft position information. Recursively computable statistics can be stored for variable-sized regions of airspace. The regions themselves can be created automatically to reflect the varying density of aircraft observations, dedicating more computational resources and providing more detailed information in areas with more air traffic. In particular, SRPs are able to very quickly aggregate or separate data with different characteristics so that data describing individual aircraft or collected using different technologies (reflecting different levels of precision) can be stored separately and yet also very quickly combined using standard arithmetic operations.
|
3 |
Fisher Information Test of NormalityLee, Yew-Haur Jr. 21 September 1998 (has links)
An extremal property of normal distributions is that they have the smallest Fisher Information for location among all distributions with the same variance. A new test of normality proposed by Terrell (1995) utilizes the above property by finding that density of maximum likelihood constrained on having the expected Fisher Information under normality based on the sample variance. The test statistic is then constructed as a ratio of the resulting likelihood against that of normality.
Since the asymptotic distribution of this test statistic is not available, the critical values for n = 3 to 200 have been obtained by simulation and smoothed using polynomials. An extensive power study shows that the test has superior power against distributions that are symmetric and leptokurtic (long-tailed). Another advantage of the test over existing ones is the direct depiction of any deviation from normality in the form of a density estimate. This is evident when the test is applied to several real data sets.
Testing of normality in residuals is also investigated. Various approaches in dealing with residuals being possibly heteroscedastic and correlated suffer from a loss of power. The approach with the fewest undesirable features is to use the Ordinary Least Squares (OLS) residuals in place of independent observations. From simulations, it is shown that one has to be careful about the levels of the normality tests and also in generalizing the results. / Ph. D.
|
4 |
Data-Adaptive Multivariate Density Estimation Using Regular Pavings, With Applications to Simulation-Intensive InferenceHarlow, Jennifer January 2013 (has links)
A regular paving (RP) is a finite succession of bisections that partitions a multidimensional box into sub-boxes using a binary tree-based data structure, with the restriction that an existing sub-box in the partition may only be bisected on its first widest side. Mapping a real value to each element of the partition gives a real-mapped regular paving (RMRP) that can be used to represent a piecewise-constant function density estimate on a multidimensional domain. The RP structure allows real arithmetic to be extended to density estimates represented as RMRPs. Other operations such as computing marginal and conditional functions can also be carried out very efficiently by exploiting these arithmetical properties and the binary tree structure.
The purpose of this thesis is to explore the potential for density estimation using RPs. The thesis is structured in three parts. The first part formalises the operational properties of RP-structured density estimates. The next part considers methods for creating a suitable RP partition for an RMRP-structured density estimate. The advantages and disadvantages of a Markov chain Monte Carlo algorithm, already developed, are investigated and this is extended to include a semi-automatic method for heuristic diagnosis of convergence of the chain. An alternative method is also proposed that uses an RMRP to approximate a kernel density estimate. RMRP density estimates are not differentiable and have slower convergence rates than good multivariate kernel density estimators. The advantages of an RMRP density estimate relate to its operational properties. The final part of this thesis describes a new approach to Bayesian inference for complex models with intractable likelihood functions that exploits these operational properties.
|
5 |
Asymptotische Aequivalenz fuer ein Modell unabhaengiger nicht identisch verteilter DatenJähnisch, Michael 01 January 1999 (has links)
Die Dissertation ``Asymptotische \Äquivalenz f\ür ein Modell unabh\ängiger nicht identisch verteilter Daten'' besch\äftigt sich mit der Le Camschen Theorie der Experimente. Le Cam hat den sogenannten $\Delta$-Abstand zwischen statistischen Experimenten definiert; ist dieser Abstand f\ür zwei Modelle klein, so sind ihre statistischen Eigenschaften \ähnlich. Zwei Folgen von Experimenten nennt man asymptotisch \äquivalent, falls ihr $\Delta$-Abstand gegen Null konvergiert.\\ In dieser Arbeit beweisen wir asymptotische \Äquivalenz zwischen einem Modell mit unabh\ängigen, nicht identisch verteilten Beobachtungen und einem Gaußschen Shift-Modell. Die i-te Beobachtung des ersten Experimentes ist dabei gem\äß einer Dichte $h(i/n,.)$ verteilt, wobei die Funktion h eine Schar von Dichten bildet. Wir approximieren also ein kompliziertes statistisches Experiment durch ein einfacheres, n\äymlich ein Gaußsches Shift-Modell. Die Dichten h geh\ören einer Menge h\ölderstetiger Funktionen an, so daß wir es mit einem nichtparametrischen Problem zu tun haben. Das von uns bewiesene \Äquivalenzresultat kann auch als eine nichtparametrische Version der ebenfalls von Le Cam eingef\ührten LAN Bedingung aufgefaßt werden. Ein wichtiges Hilfsmittel zum Beweis des oben beschriebenen Resultats ist das sogenannte Coupling von stochastischen Prozessen, d.h. die Konstruktion solcher Prozesse auf einem gemeinsamen Wahrscheinlichkeitsraum, so daß die Prozesse nahe beieinander liegen. Im zweiten Teil der Arbeit beweisen wir eine funktionale Version eines solchen Coupling Resultats f\ür den sequentiellen empirischen Prozeß und den Kiefer-M\üller Prozeß unter Verwendung der sogenannten Ungarischen Konstruktion. / The thesis "Asymptotic Equivalence of Experiments for a Model with Independent and Nonidentically distributed Observations" deals with the theory of experiments that was developped by Le Cam. \\ Le Cam defined the so called $\Delta$-distance between experiments. If this distance is small for two given models it means that their statistical properties are similar. We call two sequences of experiments asymptotic equivalent if their $\Delta$-distance converges to zero.\\ In this thesis we prove asymptotic equivalence between a model with independent and nonidentically distributed observations and a Gaussian shift model. The i-th observation in the first model is distributed according to a density $h(i/n,.)$ where $h$ is a bunch of densities on the unit interval. This means that we approximate a complicated statistical experiment by a simpler one, namely a Gaussian shift model. The densites h belong to a H\"older ball such that we have a nonparametric problem. Our result can also be viewed as a nonparametric version of the LAN property which was also defined by Le Cam. An important tool for proving our result is the coupling of stochastic processes, i.e. the construction of processes on a common probability space such that they are close in a strong sense. In the second part of the thesis we prove a functional version of such a coupling result for the sequential empirical process and the Kiefer-M\"uller process by using the Hungarian construction.
|
6 |
Simple Solutions to hard Problems in the Estimation and Prediction of Welfare Distributions / Einfache Lösungen für schwierige Probleme in der Schätzung und Vorhersage der WohlfahrtsverteilungDai, Jing 08 April 2011 (has links)
No description available.
|
7 |
Estimation Bayésienne non Paramétrique de Systèmes Dynamiques en Présence de Bruits Alpha-Stables / Nonparametric Bayesian Estimition of Dynamical Systems in the Presence of Alpha-Stable NoiseJaoua, Nouha 06 June 2013 (has links)
Dans un nombre croissant d'applications, les perturbations rencontrées s'éloignent fortement des modèles classiques qui les modélisent par une gaussienne ou un mélange de gaussiennes. C'est en particulier le cas des bruits impulsifs que nous rencontrons dans plusieurs domaines, notamment celui des télécommunications. Dans ce cas, une modélisation mieux adaptée peut reposer sur les distributions alpha-stables. C'est dans ce cadre que s'inscrit le travail de cette thèse dont l'objectif est de concevoir de nouvelles méthodes robustes pour l'estimation conjointe état-bruit dans des environnements impulsifs. L'inférence est réalisée dans un cadre bayésien en utilisant les méthodes de Monte Carlo séquentielles. Dans un premier temps, cette problématique a été abordée dans le contexte des systèmes de transmission OFDM en supposant que les distorsions du canal sont modélisées par des distributions alpha-stables symétriques. Un algorithme de Monte Carlo séquentiel a été proposé pour l'estimation conjointe des symboles OFDM émis et des paramètres du bruit $\alpha$-stable. Ensuite, cette problématique a été abordée dans un cadre applicatif plus large, celui des systèmes non linéaires. Une approche bayésienne non paramétrique fondée sur la modélisation du bruit alpha-stable par des mélanges de processus de Dirichlet a été proposée. Des filtres particulaires basés sur des densités d'importance efficaces sont développés pour l'estimation conjointe du signal et des densités de probabilité des bruits / In signal processing literature, noise's sources are often assumed to be Gaussian. However, in many fields the conventional Gaussian noise assumption is inadequate and can lead to the loss of resolution and/or accuracy. This is particularly the case of noise that exhibits impulsive nature. The latter is found in several areas, especially telecommunications. $\alpha$-stable distributions are suitable for modeling this type of noise. In this context, the main focus of this thesis is to propose novel methods for the joint estimation of the state and the noise in impulsive environments. Inference is performed within a Bayesian framework using sequential Monte Carlo methods. First, this issue has been addressed within an OFDM transmission link assuming a symmetric alpha-stable model for channel distortions. For this purpose, a particle filter is proposed to include the joint estimation of the transmitted OFDM symbols and the noise parameters. Then, this problem has been tackled in the more general context of nonlinear dynamic systems. A flexible Bayesian nonparametric model based on Dirichlet Process Mixtures is introduced to model the alpha-stable noise. Moreover, sequential Monte Carlo filters based on efficient importance densities are implemented to perform the joint estimation of the state and the unknown measurement noise density
|
Page generated in 0.1654 seconds