Global ETD Search

11	Algebraic Methods for Log-Linear Models Pribadi, Aaron 31 May 2012 (has links) Techniques from representation theory (Diaconis, 1988) and algebraic geometry (Drton et al., 2008) have been applied to the statistical analysis of discrete data with log-linear models. With these ideas in mind, we discuss the selection of sparse log-linear models, especially for binary data and data on other structured sample spaces. When a sample space and its symmetry group satisfy certain conditions, we construct a natural spanning set for the space of functions on the sample space which respects the isotypic decomposition; these vectors may be used in algorithms for model selection. The construction is explicitly carried out for the case of binary data. 62G07 Density estimation 43A85 Analysis on homogeneous spaces
12	Bootstrapping in a high dimensional but very low sample size problem Song, Juhee 16 August 2006 (has links) High Dimension, Low Sample Size (HDLSS) problems have received much attention recently in many areas of science. Analysis of microarray experiments is one such area. Numerous studies are on-going to investigate the behavior of genes by measuring the abundance of mRNA (messenger RiboNucleic Acid), gene expression. HDLSS data investigated in this dissertation consist of a large number of data sets each of which has only a few observations. We assume a statistical model in which measurements from the same subject have the same expected value and variance. All subjects have the same distribution up to location and scale. Information from all subjects is shared in estimating this common distribution. Our interest is in testing the hypothesis that the mean of measurements from a given subject is 0. Commonly used tests of this hypothesis, the t-test, sign test and traditional bootstrapping, do not necessarily provide reliable results since there are only a few observations for each data set. We motivate a mixture model having C clusters and 3C parameters to overcome the small sample size problem. Standardized data are pooled after assigning each data set to one of the mixture components. To get reasonable initial parameter estimates when density estimation methods are applied, we apply clustering methods including agglomerative and K-means. Bayes Information Criterion (BIC) and a new criterion, WMCV (Weighted Mean of within Cluster Variance estimates), are used to choose an optimal number of clusters. Density estimation methods including a maximum likelihood unimodal density estimator and kernel density estimation are used to estimate the unknown density. Once the density is estimated, a bootstrapping algorithm that selects samples from the estimated density is used to approximate the distribution of test statistics. The t-statistic and an empirical likelihood ratio statistic are used, since their distributions are completely determined by the distribution common to all subject. A method to control the false discovery rate is used to perform simultaneous tests on all small data sets. Simulated data sets and a set of cDNA (complimentary DeoxyriboNucleic Acid) microarray experiment data are analyzed by the proposed methods. Bootstrap Density Estimation Clustering High dimensional Data
13	Essays on wage dispersion Davies, Stuart January 1999 (has links) No description available. 330
14	Mean Hellinger Distance as an Error Criterion in Univariate and Multivariate Kernel Density Estimation Anver, Haneef Mohamed 01 December 2010 (has links) Ever since the pioneering work of Parzen the mean square error( MSE) and its integrated form (MISE) have been used as the error criteria in choosing the bandwidth matrix for multivariate kernel density estimation. More recently other criteria have been advocated as competitors to the MISE, such as the mean absolute error. In this study we define a weighted version of the Hellinger distance for multivariate densities and show that it has an asymptotic form, which is one-fourth the asymptotic MISE under weak smoothness conditions on the multivariate density f. In addition the proposed criteria give rise to a new data-dependent bandwidth matrix selector. The performance of the new data-dependent bandwidth matrix selector is compared with other well known bandwidth matrix selectors such as the least squared cross validation (LSCV) and the plug-in (HPI) through simulation. We derived a closed form formula for the mean Hellinger distance (MHD) in the univariate case. We also compared via simulation mean weighted Hellinger distance (MWHD) and the asymptotic MWHD, and the MISE and the asymptotic MISE for both univariate and bivariate cases for various densities and sample sizes. bandwidth Density estimation Hellinger distance kernel MISE
15	Connectionist multivariate density-estimation and its application to speech synthesis Uria, Benigno January 2016 (has links) Autoregressive models factorize a multivariate joint probability distribution into a product of one-dimensional conditional distributions. The variables are assigned an ordering, and the conditional distribution of each variable modelled using all variables preceding it in that ordering as predictors. Calculating normalized probabilities and sampling has polynomial computational complexity under autoregressive models. Moreover, binary autoregressive models based on neural networks obtain statistical performances similar to that of some intractable models, like restricted Boltzmann machines, on several datasets. The use of autoregressive probability density estimators based on neural networks to model real-valued data, while proposed before, has never been properly investigated and reported. In this thesis we extend the formulation of neural autoregressive distribution estimators (NADE) to real-valued data; a model we call the real-valued neural autoregressive density estimator (RNADE). Its statistical performance on several datasets, including visual and auditory data, is reported and compared to that of other models. RNADE obtained higher test likelihoods than other tractable models, while retaining all the attractive computational properties of autoregressive models. However, autoregressive models are limited by the ordering of the variables inherent to their formulation. Marginalization and imputation tasks can only be solved analytically if the missing variables are at the end of the ordering. We present a new training technique that obtains a set of parameters that can be used for any ordering of the variables. By choosing a model with a convenient ordering of the dimensions at test time, it is possible to solve any marginalization and imputation tasks analytically. The same training procedure also makes it practical to train NADEs and RNADEs with several hidden layers. The resulting deep and tractable models display higher test likelihoods than the equivalent one-hidden-layer models for all the datasets tested. Ensembles of NADEs or RNADEs can be created inexpensively by combining models that share their parameters but differ in the ordering of the variables. These ensembles of autoregressive models obtain state-of-the-art statistical performances for several datasets. Finally, we demonstrate the application of RNADE to speech synthesis, and confirm that capturing the phone-conditional dependencies of acoustic features improves the quality of synthetic speech. Our model generates synthetic speech that was judged by naive listeners as being of higher quality than that generated by mixture density networks, which are considered a state-of-the-art synthesis technique. 006.5
16	Scalable Nonparametric L1 Density Estimation via Sparse Subtree Partitioning Sandstedt, Axel January 2023 (has links) We consider the construction of multivariate histogram estimators for any density f seeking to minimize its L1 distance to the true underlying density using arbitrarily large sample sizes. Theory for such estimators exist and the early stages of distributed implementations are available. Our main contributions are new algorithms which seek to optimise out unnecessary network communication taking place in the distributed stages of the construction of such estimators using sparse binary tree arithmetics. density estimation scalable density estimation nonparametric density estimation L1 L_1 anomaly detection regression analysis Probability Theory and Statistics Sannolikhetsteori och statistik
17	Density estimates of monarch butterflies overwintering in central Mexico Thogmartin, Wayne E., Diffendorfer, Jay E., López-Hoffman, Laura, Oberhauser, Karen, Pleasants, John, Semmens, Brice X., Semmens, Darius, Taylor, Orley R., Wiederholt, Ruscena 26 April 2017 (has links) Given the rapid population decline and recent petition for listing of the monarch butterfly (Danaus plexippus L.) under the Endangered Species Act, an accurate estimate of the Eastern, migratory population size is needed. Because of difficulty in counting individual monarchs, the number of hectares occupied by monarchs in the overwintering area is commonly used as a proxy for population size, which is then multiplied by the density of individuals per hectare to estimate population size. There is, however, considerable variation in published estimates of overwintering density, ranging from 6.9-60.9 million ha(-1). We develop a probability distribution for overwinter density of monarch butterflies from six published density estimates. The mean density among the mixture of the six published estimates was similar to 27.9 million butterflies ha(-1) (95% CI [2.4-80.7] million ha(-1)); the mixture distribution is approximately log-normal, and as such is better represented by the median (21.1 million butterflies ha(-1)). Based upon assumptions regarding the number of milkweed needed to support monarchs, the amount of milkweed (Asciepias spp.) lost (0.86 billion stems) in the northern US plus the amount of milkweed remaining (1.34 billion stems), we estimate >1.8 billion stems is needed to return monarchs to an average population size of 6 ha. Considerable uncertainty exists in this required amount of milkweed because of the considerable uncertainty occurring in overwinter density estimates. Nevertheless, the estimate is on the same order as other published estimates, The studies included in our synthesis differ substantially by year, location, method, and measures of precision. A better understanding of the factors influencing overwintering density across space and time would be valuable for increasing the precision of conservation recommendations. Mixture distribution Monarch butterfly Uncertainty modeling Danaus plexxipus Density estimation
18	Méthodes spectrales pour l'inférence grammaticale probabiliste de langages stochastiques rationnels Bailly, Raphael 12 December 2011 (has links) Nous nous plaçons dans le cadre de l’inférence grammaticale probabiliste. Il s’agit, étant donnée une distribution p sur un ensemble de chaînes S∗ inconnue, d’inférer un modèle probabiliste pour p à partir d’un échantillon fini S d’observations supposé i.i.d. selon p. L’inférence gram- maticale se concentre avant tout sur la structure du modèle, et la convergence de l’estimation des paramètres. Les modèles probabilistes dont il sera question ici sont les automates pondérés, ou WA. Les fonctions qu’ils modélisent sont appelées séries rationnelles. Dans un premier temps, nous étudierons la possibilité de trouver un critère de convergence absolue pour de telles séries. Par la suite, nous introduirons un type d’algorithme pour l’inférence de distributions rationnelles (i.e. distributions modélisées par un WA), basé sur des méthodes spectrales. Nous montrerons comment adapter cet algorithme pour l’appliquer au domaine, assez proche, des distributions sur les arbres. Enfin, nous tenterons d’utiliser cet algorithme d’inférence dans un contexte plus statistique d’estimation de densité. / Our framework is the probabilistic grammatical inference. That is, given an unknown distribution p on a set of string S∗ , to infer a probabilistic model for p from a sample S of observations assumed to be i.i.d. according to p. Grammatical inference focuses primarily on the structure of the probabilistic model, and the convergence of parameter estimate. Probabilistic models which will be considered here are weighted automata, or WA. The series they model are called rational series. Initially, we study the possibility of finding an absolute convergence criterion for such series. Subsequently, we introduce a algorithm for the inference of rational distrbutions (i.e. distributions modeled by WA), based on spectral methods. We will show how to fit this algorithm to the domain, fairly close, of rational distributions on trees. Finally, we will try to see how to use the spectral algorithm in a more statistical way, in a density estimation task. Inférence grammaticale Estimation de densité non-paramétrique Grammatical inference Non-parametric density estimation
19	Prédiction, inférence sélective et quelques problèmes connexes Yadegari, Iraj January 2017 (has links) Nous étudions le problème de l'estimation de moyenne et de la densité prédictive d'une population sélectionnée, en obtenant de nouveaux développements qui incluent l'analyse de biais, la décomposition du risque et les problèmes avec restrictions sur les paramètres (chapitre 2). Nous proposons des estimateurs de densité prédictive efficaces en termes de pertes Kullback-Leibler et Hellinger (chapitre 3) améliorant les procédures de plug-in via une perte duale et via une d'expansion de variance. Enfin, nous présentons les résultats de l'amélioration de l'estimateur du maximum de vraisemblance (EMV) d'une moyenne normale bornée pour une classe de fonctions de perte, y compris la perte normale réfléchie, avec des implications pour l'estimation de densité prédictive. A savoir, nous donnons des conditions sur la perte et la largeur de l'espace paramétrique pour lesquels l'estimateur de Bayes par rapport à la loi a priori uniforme sur la frontière domine la EMV. / Abstract : We study the problem of point estimation and predictive density estimation of the mean of a selected population, obtaining novel developments which include bias analysis, decomposition of risk, and problems with restricted parameters (Chapter 2). We propose efficient predictive density estimators in terms of Kullback-Leibler and Hellinger losses (Chapter 3) improving on plug-in procedures via a dual loss and via a variance expansion scheme. Finally (Chapter 4), we present findings on improving on the maximum likelihood estimator (MLE) of a bounded normal mean under a class of loss functions, including reflected normal loss, with implications for predictive density estimation. Namely, we give conditions on the loss and the width of the parameter space for which the Bayes estimator with respect to the boundary uniform prior dominates the MLE. Selective inference Predictive density estimation Estimation after selection
20	Assessing the Impacts of Anthropogenic Drainage Structures on Hydrologic Connectivity Using High-Resolution Digital Elevation Models Bhadra, Sourav 01 August 2019 (has links) Stream flowline delineation from high-resolution digital elevation models (HRDEMs) can be problematic due to the fine representation of terrain features as well as anthropogenic drainage structures (e.g., bridges, culverts) within the grid surface. The anthropogenic drainage structures (ADS) may create digital dams while delineating stream flowlines from HRDEMs. The study assessed the effects of ADS locations, spatial resolution (ranged from 1m to 10m), depression processing methods, and flow direction algorithms (D8, D-Infinity, and MFD-md) on hydrologic connectivity through digital dams using HRDEMs in Nebraska. The assessment was conducted based on the offset distances between modeled stream flowlines and original ADS locations using kernel density estimation (KDE) and calculated frequency of ADS samples within offset distances. Three major depression processing techniques (i.e., depression filling, stream breaching, and stream burning) were considered for this study. Finally, an automated method, constrained burning was proposed for HRDEMs which utilizes ancillary datasets to create underneath stream crossings at possible ADS locations and perform DEM reconditioning. The results suggest that coarser resolution DEMs with depression filling and breaching can produce better hydrologic connectivity through ADS compared with finer resolution DEMs with different flow direction algorithms. It was also found that stream burning with known stream crossings at ADS locations outperformed depression filling and breaching techniques for HRDEMs in terms of hydrologic connectivity. The flow direction algorithms combining with depression filling and breaching techniques do not have significant effects on the hydrologic connectivity of modeled stream flowlines. However, for stream burning methods, D8 was found as the best performing flow direction algorithm in HRDEMs with statistical significance. The stream flowlines delineated using the proposed constrained burning method from the HRDEM was found better than depression filling and breaching techniques. This method has an overall accuracy of 78.82% in detecting possible ADS locations within the study area. DEM GIS Hydrologic Connectivity Kernel Density Estimation LiDAR Stream Flowline

Search results