Global ETD Search

1	Non parametric density estimation via regularization Lin, Mu 11 1900 (has links) The thesis aims at showing some important methods, theories and applications about non-parametric density estimation via regularization in univariate setting. It gives a brief introduction to non-parametric density estimation, and discuss several well-known methods, for example, histogram and kernel methods. Regularized methods with penalization and shape constraints are the focus of the thesis. Maximum entropy density estimation is introduced and the relationship between taut string and maximum entropy density estimation is explored. Furthermore, the dual and primal theories are discussed and some theoretical proofs corresponding to quasi-concave density estimation are presented. Different the numerical methods of non-parametric density estimation with regularization are classified and compared. Finally, a real data experiment will also be discussed in the last part of the thesis. / Statistics
2	Non parametric density estimation via regularization Lin, Mu Unknown Date No description available.
3	Méthodes spectrales pour l'inférence grammaticale probabiliste de langages stochastiques rationnels Bailly, Raphael 12 December 2011 (has links) Nous nous plaçons dans le cadre de l’inférence grammaticale probabiliste. Il s’agit, étant donnée une distribution p sur un ensemble de chaînes S∗ inconnue, d’inférer un modèle probabiliste pour p à partir d’un échantillon fini S d’observations supposé i.i.d. selon p. L’inférence gram- maticale se concentre avant tout sur la structure du modèle, et la convergence de l’estimation des paramètres. Les modèles probabilistes dont il sera question ici sont les automates pondérés, ou WA. Les fonctions qu’ils modélisent sont appelées séries rationnelles. Dans un premier temps, nous étudierons la possibilité de trouver un critère de convergence absolue pour de telles séries. Par la suite, nous introduirons un type d’algorithme pour l’inférence de distributions rationnelles (i.e. distributions modélisées par un WA), basé sur des méthodes spectrales. Nous montrerons comment adapter cet algorithme pour l’appliquer au domaine, assez proche, des distributions sur les arbres. Enfin, nous tenterons d’utiliser cet algorithme d’inférence dans un contexte plus statistique d’estimation de densité. / Our framework is the probabilistic grammatical inference. That is, given an unknown distribution p on a set of string S∗ , to infer a probabilistic model for p from a sample S of observations assumed to be i.i.d. according to p. Grammatical inference focuses primarily on the structure of the probabilistic model, and the convergence of parameter estimate. Probabilistic models which will be considered here are weighted automata, or WA. The series they model are called rational series. Initially, we study the possibility of finding an absolute convergence criterion for such series. Subsequently, we introduce a algorithm for the inference of rational distrbutions (i.e. distributions modeled by WA), based on spectral methods. We will show how to fit this algorithm to the domain, fairly close, of rational distributions on trees. Finally, we will try to see how to use the spectral algorithm in a more statistical way, in a density estimation task. Inférence grammaticale Estimation de densité non-paramétrique Grammatical inference Non-parametric density estimation
4	Computational Challenges in Non-parametric Prediction of Bradycardia in Preterm Infants January 2020 (has links) abstract: Infants born before 37 weeks of pregnancy are considered to be preterm. Typically, preterm infants have to be strictly monitored since they are highly susceptible to health problems like hypoxemia (low blood oxygen level), apnea, respiratory issues, cardiac problems, neurological problems as well as an increased chance of long-term health issues such as cerebral palsy, asthma and sudden infant death syndrome. One of the leading health complications in preterm infants is bradycardia - which is defined as the slower than expected heart rate, generally beating lower than 60 beats per minute. Bradycardia is often accompanied by low oxygen levels and can cause additional long term health problems in the premature infant.The implementation of a non-parametric method to predict the onset of brady- cardia is presented. This method assumes no prior knowledge of the data and uses kernel density estimation to predict the future onset of bradycardia events. The data is preprocessed, and then analyzed to detect the peaks in the ECG signals, following which different kernels are implemented to estimate the shared underlying distribu- tion of the data. The performance of the algorithm is evaluated using various metrics and the computational challenges and methods to overcome them are also discussed. It is observed that the performance of the algorithm with regards to the kernels used are consistent with the theoretical performance of the kernel as presented in a previous work. The theoretical approach has also been automated in this work and the various implementation challenges have been addressed. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2020 Electrical engineering Statistics Bradycardia Infants Kernel Density Machine Learning Non-parametric Density Estimation
5	Bickel-Rosenblatt Test Based on Tilted Estimation for Autoregressive Models & Deep Merged Survival Analysis on Cancer Study Using Multiple Types of Bioinformatic Data Su, Yan January 2021 (has links) No description available. Statistics Goodness-of-fit test Time series models Non-parametric density estimation Tilted density function estimator Cancer bioinformatics Deep learning Survival analysis
6	Maximum-likelihood kernel density estimation in high-dimensional feature spaces /\| C.M. van der Walt Van der Walt, Christiaan Maarten January 2014 (has links) With the advent of the internet and advances in computing power, the collection of very large high-dimensional datasets has become feasible { understanding and modelling high-dimensional data has thus become a crucial activity, especially in the field of pattern recognition. Since non-parametric density estimators are data-driven and do not require or impose a pre-defined probability density function on data, they are very powerful tools for probabilistic data modelling and analysis. Conventional non-parametric density estimation methods, however, originated from the field of statistics and were not originally intended to perform density estimation in high-dimensional features spaces { as is often encountered in real-world pattern recognition tasks. Therefore we address the fundamental problem of non-parametric density estimation in high-dimensional feature spaces in this study. Recent advances in maximum-likelihood (ML) kernel density estimation have shown that kernel density estimators hold much promise for estimating nonparametric probability density functions in high-dimensional feature spaces. We therefore derive two new iterative kernel bandwidth estimators from the maximum-likelihood (ML) leave one-out objective function and also introduce a new non-iterative kernel bandwidth estimator (based on the theoretical bounds of the ML bandwidths) for the purpose of bandwidth initialisation. We name the iterative kernel bandwidth estimators the minimum leave-one-out entropy (MLE) and global MLE estimators, and name the non-iterative kernel bandwidth estimator the MLE rule-of-thumb estimator. We compare the performance of the MLE rule-of-thumb estimator and conventional kernel density estimators on artificial data with data properties that are varied in a controlled fashion and on a number of representative real-world pattern recognition tasks, to gain a better understanding of the behaviour of these estimators in high-dimensional spaces and to determine whether these estimators are suitable for initialising the bandwidths of iterative ML bandwidth estimators in high dimensions. We find that there are several regularities in the relative performance of conventional kernel density estimators across different tasks and dimensionalities and that the Silverman rule-of-thumb bandwidth estimator performs reliably across most tasks and dimensionalities of the pattern recognition datasets considered, even in high-dimensional feature spaces. Based on this empirical evidence and the intuitive theoretical motivation that the Silverman estimator optimises the asymptotic mean integrated squared error (assuming a Gaussian reference distribution), we select this estimator to initialise the bandwidths of the iterative ML kernel bandwidth estimators compared in our simulation studies. We then perform a comparative simulation study of the newly introduced iterative MLE estimators and other state-of-the-art iterative ML estimators on a number of artificial and real-world high-dimensional pattern recognition tasks. We illustrate with artificial data (guided by theoretical motivations) under what conditions certain estimators should be preferred and we empirically confirm on real-world data that no estimator performs optimally on all tasks and that the optimal estimator depends on the properties of the underlying density function being estimated. We also observe an interesting case of the bias-variance trade-off where ML estimators with fewer parameters than the MLE estimator perform exceptionally well on a wide variety of tasks; however, for the cases where these estimators do not perform well, the MLE estimator generally performs well. The newly introduced MLE kernel bandwidth estimators prove to be a useful contribution to the field of pattern recognition, since they perform optimally on a number of real-world pattern recognition tasks investigated and provide researchers and practitioners with two alternative estimators to employ for the task of kernel density estimation. / PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014 Pattern recognition Non-parametric density estimation Kernel density estimation Kernel bandwidth estimation Maximum-likelihood High-dimensional data Artificial data Probability density function
7	Efficient Image Retrieval with Statistical Color Descriptors Viet Tran, Linh January 2003 (has links) Color has been widely used in content-based image retrieval (CBIR) applications. In such applications the color properties of an image are usually characterized by the probability distribution of the colors in the image. A distance measure is then used to measure the (dis-)similarity between images based on the descriptions of their color distributions in order to quickly find relevant images. The development and investigation of statistical methods for robust representations of such distributions, the construction of distance measures between them and their applications in efficient retrieval, browsing, and structuring of very large image databases are the main contributions of the thesis. In particular we have addressed the following problems in CBIR. Firstly, different non-parametric density estimators are used to describe color information for CBIR applications. Kernel-based methods using nonorthogonal bases together with a Gram-Schmidt procedure and the application of the Fourier transform are introduced and compared to previously used histogram-based methods. Our experiments show that efficient use of kernel density estimators improves the retrieval performance of CBIR. The practical problem of how to choose an optimal smoothing parameter for such density estimators as well as the selection of the histogram bin-width for CBIR applications are also discussed. Distance measures between color distributions are then described in a differential geometry-based framework. This allows the incorporation of geometrical features of the underlying color space into the distance measure between the probability distributions. The general framework is illustrated with two examples: Normal distributions and linear representations of distributions. The linear representation of color distributions is then used to derive new compact descriptors for color-based image retrieval. These descriptors are based on the combination of two ideas: Incorporating information from the structure of the color space with information from images and application of projection methods in the space of color distribution and the space of differences between neighboring color distributions. In our experiments we used several image databases containing more than 1,300,000 images. The experiments show that the method developed in this thesis is very fast and that the retrieval performance chievedcompares favorably with existing methods. A CBIR system has been developed and is currently available at http://www.media.itn.liu.se/cse. We also describe color invariant descriptors that can be used to retrieve images of objects independent of geometrical factors and the illumination conditions under which these images were taken. Both statistics- and physics-based methods are proposed and examined. We investigated the interaction between light and material using different physical models and applied the theory of transformation groups to derive geometry color invariants. Using the proposed framework, we are able to construct all independent invariants for a given physical model. The dichromatic reflection model and the Kubelka-Munk model are used as examples for the framework. The proposed color invariant descriptors are then applied to both CBIR, color image segmentation, and color correction applications. In the last chapter of the thesis we describe an industrial application where different color correction methods are used to optimize the layout of a newspaper page. / <p>A search engine based, on the methodes discribed in this thesis, can be found at http://pub.ep.liu.se/cse/db/?. Note that the question mark must be included in the address.</p> color properties images statistical content-based image retrieval (CBIR) non-parametric density estimators image database Kernel Gram-Schmidt geometry-based Information technology Informationsteknologi
8	Regiões de confiança para a localização do ponto estacionário em superfícies de resposta, usando o método "bootstrap" Bayesiano / Confidence region on the location of the stationary point in response surfaces, a Bayesian bootstrap approach Miquelluti, David José 18 April 2008 (has links) Experimentos nos quais uma ou mais variáveis respostas são influênciadas por diversos fatores quantitativos são bastante comuns nas áreas agrícola, química, biológica, dentre outras. Nesse caso, o problema de pesquisa consiste em se estudar essa relação, sendo de grande utilidade o uso da metodologia de superfícies de resposta (MSR). Nesse contexto, a determinação dos níveis dos fatores que otimizam a resposta consiste inicialmente na obtenção das coordenadas do ponto estacionário do modelo ajustado. No entanto, como o modelo verdadeiro é desconhecido, é interessante obter uma região de confiança das coordenadas verdadeiras de modo a avaliar a precisão da estimativa obtida. Foram abordados aqui os procedimentos para construção de regiões de confiança para as coordenadas do ponto estacionário em diferentes situações considerando-se a forma das superfícies analisadas e a distribuição e magnitude da variância dos erros do modelo. Foram utilizadas a metodologia de Box e Hunter (1954) (BH), "bootstrap" e "bootstrap" Bayesiano aliados ao cálculo da distância de Mahalanobis entre as coordenadas do ponto estacionários da amostra observada e aquelas obtidas por meio das estimativas "bootstrap"(BM e BBM), e métodos "bootstrap" e "bootstrap" Bayesiano aliados a métodos não paramétricos de estimação de funções densidade de probabilidade (BNP e BBNP). A avaliaçãoda metodologia foi realizada por meio de simulação e foi aplicada a um conjunto de dados de produção de amendoim. No estudo de simulação, a metodologia BH, baseada na distribuição normal dos erros, apresentou um bom desempenho em todas as situações analisadas, havendo concordância entre as regiões de confiança nominais e reais, mesmo naquelas em que essa distribuição é bastante assimétrica. Este mesmo comportamento foi observado para os métodos BM e BBM. No entanto, os métodos BNP e BBNP não apresentaram um desempenho satisfatório, resultando em um nível de significância real menor que o nominal para os autovalores com menor valor absoluto, gerando regiões de confiança maiores. No caso de autovalores com maior valor absoluto observou-se situação inversa. No caso da análise do conjunto de dados de amendoim os métodos BH, BM e BNP apresentaram regiões de confiança mais amplas comparativamente aos métodos BBM e BBNP. No entanto, os valores das estimativas do "bootstrap" Bayesiano são mais próximas das estimativas de mínimos quadrados e apresentam menor dispersão o que explica a menor área da região de confiança. / Experiments in which one or more response variables are influenced by several quantitative factors are very common in agricultural, chemistry, biology and other areas. In this case, the research question consists in studying this relation, being of great utility the use of response surface methodology (RSM). In this context determining the level of the factors that optimize the response consists finding the coordinates of the stationary point of the model. However, as the true model is unknown, it is of interest to obtain a confidence region of the true coordinates to analyze the precision of the obtained estimate. The procedures for the construction of confidence regions for the coordinates of the stationary point were studied in diferent situations, considering the shape of the surfaces analyzed and the distribution and magnitude of the variance errors. The methodology of Box and Hunter (1954) (BH), bootstrap and Bayesian bootstrap with Mahalanobis distance among the coordinates of the stationary point of the observed sample and those obtained using bootstrap estimates(BM and BBM) and bootstrap and Bayesian bootstrap with non-parametric methods for density estimation (BNP and BBNP) were compared. The methodology evaluation was realized by means of simulation and applied to a peanut yields data set. In simulation study the BH methodology, which is based in normal distribution of errors, presented a good performance in all of the analyzed situations, having concordance among the nominal and real confidence regions, even in those which this distribution is fairly asymmetric. This behavior was also observed in BM and BBM methods. The BNP and BBNP methods did not presented a satisfactory performance, resulting in a real significance level lower than the nominal for the eigenvalue with lower absolute value, generating bigger confidence regions. The inverse was observed using eigenvalue with higher absolute value. In the analysis of the peanut yields data set the BH, BM and BNP methods presented confidence regions larger than the BBM and BBNP methods. The Bayesian bootstrap estimate values are closer of the minimum square estimates and present less dispersion what explain the confidence region lower area. Bayesian bootstrap Confidence regions Distribuições amostrais Inferência bayesiana Metodologia e técnicas de computação Non parametric density. Respose surface stationary point Superfícies de resposta.
9	Maximum-likelihood kernel density estimation in high-dimensional feature spaces /\| C.M. van der Walt Van der Walt, Christiaan Maarten January 2014 (has links) With the advent of the internet and advances in computing power, the collection of very large high-dimensional datasets has become feasible { understanding and modelling high-dimensional data has thus become a crucial activity, especially in the field of pattern recognition. Since non-parametric density estimators are data-driven and do not require or impose a pre-defined probability density function on data, they are very powerful tools for probabilistic data modelling and analysis. Conventional non-parametric density estimation methods, however, originated from the field of statistics and were not originally intended to perform density estimation in high-dimensional features spaces { as is often encountered in real-world pattern recognition tasks. Therefore we address the fundamental problem of non-parametric density estimation in high-dimensional feature spaces in this study. Recent advances in maximum-likelihood (ML) kernel density estimation have shown that kernel density estimators hold much promise for estimating nonparametric probability density functions in high-dimensional feature spaces. We therefore derive two new iterative kernel bandwidth estimators from the maximum-likelihood (ML) leave one-out objective function and also introduce a new non-iterative kernel bandwidth estimator (based on the theoretical bounds of the ML bandwidths) for the purpose of bandwidth initialisation. We name the iterative kernel bandwidth estimators the minimum leave-one-out entropy (MLE) and global MLE estimators, and name the non-iterative kernel bandwidth estimator the MLE rule-of-thumb estimator. We compare the performance of the MLE rule-of-thumb estimator and conventional kernel density estimators on artificial data with data properties that are varied in a controlled fashion and on a number of representative real-world pattern recognition tasks, to gain a better understanding of the behaviour of these estimators in high-dimensional spaces and to determine whether these estimators are suitable for initialising the bandwidths of iterative ML bandwidth estimators in high dimensions. We find that there are several regularities in the relative performance of conventional kernel density estimators across different tasks and dimensionalities and that the Silverman rule-of-thumb bandwidth estimator performs reliably across most tasks and dimensionalities of the pattern recognition datasets considered, even in high-dimensional feature spaces. Based on this empirical evidence and the intuitive theoretical motivation that the Silverman estimator optimises the asymptotic mean integrated squared error (assuming a Gaussian reference distribution), we select this estimator to initialise the bandwidths of the iterative ML kernel bandwidth estimators compared in our simulation studies. We then perform a comparative simulation study of the newly introduced iterative MLE estimators and other state-of-the-art iterative ML estimators on a number of artificial and real-world high-dimensional pattern recognition tasks. We illustrate with artificial data (guided by theoretical motivations) under what conditions certain estimators should be preferred and we empirically confirm on real-world data that no estimator performs optimally on all tasks and that the optimal estimator depends on the properties of the underlying density function being estimated. We also observe an interesting case of the bias-variance trade-off where ML estimators with fewer parameters than the MLE estimator perform exceptionally well on a wide variety of tasks; however, for the cases where these estimators do not perform well, the MLE estimator generally performs well. The newly introduced MLE kernel bandwidth estimators prove to be a useful contribution to the field of pattern recognition, since they perform optimally on a number of real-world pattern recognition tasks investigated and provide researchers and practitioners with two alternative estimators to employ for the task of kernel density estimation. / PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014 Pattern recognition Non-parametric density estimation Kernel density estimation Kernel bandwidth estimation Maximum-likelihood High-dimensional data Artificial data Probability density function
10	Regiões de confiança para a localização do ponto estacionário em superfícies de resposta, usando o método "bootstrap" Bayesiano / Confidence region on the location of the stationary point in response surfaces, a Bayesian bootstrap approach David José Miquelluti 18 April 2008 (has links) Experimentos nos quais uma ou mais variáveis respostas são influênciadas por diversos fatores quantitativos são bastante comuns nas áreas agrícola, química, biológica, dentre outras. Nesse caso, o problema de pesquisa consiste em se estudar essa relação, sendo de grande utilidade o uso da metodologia de superfícies de resposta (MSR). Nesse contexto, a determinação dos níveis dos fatores que otimizam a resposta consiste inicialmente na obtenção das coordenadas do ponto estacionário do modelo ajustado. No entanto, como o modelo verdadeiro é desconhecido, é interessante obter uma região de confiança das coordenadas verdadeiras de modo a avaliar a precisão da estimativa obtida. Foram abordados aqui os procedimentos para construção de regiões de confiança para as coordenadas do ponto estacionário em diferentes situações considerando-se a forma das superfícies analisadas e a distribuição e magnitude da variância dos erros do modelo. Foram utilizadas a metodologia de Box e Hunter (1954) (BH), "bootstrap" e "bootstrap" Bayesiano aliados ao cálculo da distância de Mahalanobis entre as coordenadas do ponto estacionários da amostra observada e aquelas obtidas por meio das estimativas "bootstrap"(BM e BBM), e métodos "bootstrap" e "bootstrap" Bayesiano aliados a métodos não paramétricos de estimação de funções densidade de probabilidade (BNP e BBNP). A avaliaçãoda metodologia foi realizada por meio de simulação e foi aplicada a um conjunto de dados de produção de amendoim. No estudo de simulação, a metodologia BH, baseada na distribuição normal dos erros, apresentou um bom desempenho em todas as situações analisadas, havendo concordância entre as regiões de confiança nominais e reais, mesmo naquelas em que essa distribuição é bastante assimétrica. Este mesmo comportamento foi observado para os métodos BM e BBM. No entanto, os métodos BNP e BBNP não apresentaram um desempenho satisfatório, resultando em um nível de significância real menor que o nominal para os autovalores com menor valor absoluto, gerando regiões de confiança maiores. No caso de autovalores com maior valor absoluto observou-se situação inversa. No caso da análise do conjunto de dados de amendoim os métodos BH, BM e BNP apresentaram regiões de confiança mais amplas comparativamente aos métodos BBM e BBNP. No entanto, os valores das estimativas do "bootstrap" Bayesiano são mais próximas das estimativas de mínimos quadrados e apresentam menor dispersão o que explica a menor área da região de confiança. / Experiments in which one or more response variables are influenced by several quantitative factors are very common in agricultural, chemistry, biology and other areas. In this case, the research question consists in studying this relation, being of great utility the use of response surface methodology (RSM). In this context determining the level of the factors that optimize the response consists finding the coordinates of the stationary point of the model. However, as the true model is unknown, it is of interest to obtain a confidence region of the true coordinates to analyze the precision of the obtained estimate. The procedures for the construction of confidence regions for the coordinates of the stationary point were studied in diferent situations, considering the shape of the surfaces analyzed and the distribution and magnitude of the variance errors. The methodology of Box and Hunter (1954) (BH), bootstrap and Bayesian bootstrap with Mahalanobis distance among the coordinates of the stationary point of the observed sample and those obtained using bootstrap estimates(BM and BBM) and bootstrap and Bayesian bootstrap with non-parametric methods for density estimation (BNP and BBNP) were compared. The methodology evaluation was realized by means of simulation and applied to a peanut yields data set. In simulation study the BH methodology, which is based in normal distribution of errors, presented a good performance in all of the analyzed situations, having concordance among the nominal and real confidence regions, even in those which this distribution is fairly asymmetric. This behavior was also observed in BM and BBM methods. The BNP and BBNP methods did not presented a satisfactory performance, resulting in a real significance level lower than the nominal for the eigenvalue with lower absolute value, generating bigger confidence regions. The inverse was observed using eigenvalue with higher absolute value. In the analysis of the peanut yields data set the BH, BM and BNP methods presented confidence regions larger than the BBM and BBNP methods. The Bayesian bootstrap estimate values are closer of the minimum square estimates and present less dispersion what explain the confidence region lower area. Distribuições amostrais Inferência bayesiana Metodologia e técnicas de computação Superfícies de resposta. Bayesian bootstrap Confidence regions Non parametric density. Respose surface stationary point

Page generated in 0.0996 seconds