1 |
ON RECONSTRUCTING GAUSSIAN MIXTURES FROM THE DISTANCE BETWEEN TWO SAMPLES: AN ALGEBRAIC PERSPECTIVEKindyl Lu Zhao King (15347239) 25 April 2023 (has links)
<p>This thesis is concerned with the problem of characterizing the orbits of certain probability density functions under the action of the Euclidean group. Our motivating application is the recognition of a point configuration where the coordinates of the points are measured under noisy conditions. Consider a random variable X in R<sup>d</sup> with probability density function ρ(x). Let x<sub>1</sub> and x<sub>2</sub> be independent random samples following ρ(x). Define ∆ as the squared Euclidean distance between x<sub>1</sub> and x<sub>2</sub>. It has previously been shown that two distributions ρ(x) and ρ(x) consisting of Dirac delta distributions in generic positions that have the same respective distributions of ∆ are necessarily related by a rigid motion. That is, there exists some rigid motion g in the Euclidean group E(d) such that ρ(x) = ρ(g · x) for all x ∈ R<sup>d</sup> . To account for noise in the measurements, we assume X is a random variable in R<sup>d</sup> whose density is a k-component mixture of Gaussian distributions with means in generic position. We further assume that the covariance matrices of the Gaussian components are equal and of the form Σ = σ<sup>2</sup>1<sub>d</sub> with 0 ≤ σ<sup>2</sup> ∈ R. In Theorem 3.1.1 and Theorem 3.2.1, we prove that, when σ<sup>2</sup> is known, generic k-component Gaussian mixtures are uniquely reconstructible up to a rigid motion from the density of ∆. A more general formulation is proven in Theorem 3.2.3. Similarly, when σ<sup>2</sup> is unknown, we prove in Theorem 4.1.1 and Theorem 4.1.2 that generic equally-weighted k-component Gaussian mixtures with k = 1 and k = 2 are uniquely reconstructible up to a rigid motion from the distribution of ∆. There are at most three non-equivalent equally weighted 3-component Gaussian mixtures up to a rigid motion having the same distribution of ∆, as proven in Theorem 4.1.3. In Theorem 4.1.4, we present a test to check if, for a given k and d, the number of non-equivalent equally-weighted k-component Gaussian mixtures in R<sup>d</sup> having the same distribution of ∆ is at most (k choose 2) + 1. Numerical computations showed that distributions with k = 4, 5, 6, 7 such that d ≤ k −2 and (k, d) = (8, 1) pass the test, and thus have a finite number of reconstructions up to a rigid motion. When σ<sup>2</sup> is unknown and the mixture weights are also unknown, we prove in Theorem 4.2.1 that there are at most four non-equivalent 2-component Gaussian mixtures up to a rigid motion having the same distribution of ∆. </p>
|
2 |
Unsupervised Gaussian mixture models for the classification of outdoor environments using 3D terrestrial lidar data / Modèles de mélange gaussien sans surveillance pour la classification des environnements extérieurs en utilisant des données 3D de lidar terrestreFernandes maligo, Artur otavio 28 January 2016 (has links)
Le traitement de nuages de points 3D de lidars permet aux robots mobiles autonomes terrestres de construire des modèles sémantiques de l'environnement extérieur dans lequel ils évoluent. Ces modèles sont intéressants car ils représentent des informations qualitatives, et ainsi donnent à un robot la capacité de raisonner à un niveau plus élevé d'abstraction. Le coeur d'un système de modélisation sémantique est la capacité de classifier les observations venant du capteur. Nous proposons un système de classification centré sur l'apprentissage non-supervisé. La prémière couche, la couche intermédiaire, consiste en un modèle de mélange gaussien. Ce modèle est déterminé de manière non-supervisée lors d'une étape de training. Il definit un ensemble de classes intermédiaires qui correspond à une partition fine des classes présentes dans l'environnement. La deuxième couche, la couche finale, consiste en un regroupement des classes intermédiaires dans un ensemble de classes finales qui, elles, sont interprétables dans le contexte de la tâche ciblée. Le regroupement est déterminé par un expert lors de l'étape de training, de manière supervisée, mais guidée par les classes intermédiaires. L'évaluation est basée sur deux jeux de données acquis avec de différents lidars et possédant différentes caractéristiques. L'évaluation est quantitative pour l'un des jeux de données, et qualitative pour l'autre. La concéption du système utilise la procédure standard de l'apprentissage, basée sur les étapes de training, validation et test. L'opération suit la pipeline standard de classification. Le système est simple, et ne requiert aucun pré-traitement ou post-traitement. / The processing of 3D lidar point clouds enable terrestrial autonomous mobile robots to build semantic models of the outdoor environments in which they operate. Such models are interesting because they encode qualitative information, and thus provide to a robot the ability to reason at a higher level of abstraction. At the core of a semantic modelling system, lies the capacity to classify the sensor observations. We propose a two-layer classi- fication model which strongly relies on unsupervised learning. The first, intermediary layer consists of a Gaussian mixture model. This model is determined in a training step in an unsupervised manner, and defines a set of intermediary classes which is a fine-partitioned representation of the environment. The second, final layer consists of a grouping of the intermediary classes into final classes that are interpretable in a considered target task. This grouping is determined by an expert during the training step, in a process which is supervised, yet guided by the intermediary classes. The evaluation is done for two datasets acquired with different lidars and possessing different characteristics. It is done quantitatively using one of the datasets, and qualitatively using another. The system is designed following the standard learning procedure, based on a training, a validation and a test steps. The operation follows a standard classification pipeline. The system is simple, with no requirement of pre-processing or post-processing stages.
|
3 |
STUDY ON INFORMATION THEORY: CONNECTION TO CONTROL THEORY, APPROACH AND ANALYSIS FOR COMPUTATIONTheeranaew, Wanchat 09 February 2015 (has links)
No description available.
|
4 |
A sensor fusion method for detection of surface laid land minesWestberg, Daniel January 2007 (has links)
<p>Landminor är ett stort problem både under och efter krigstid. De metoder som används för att detektera minor har inte ändrats mycket sedan 1940-talet. Forskning med mål att utvärdera olika elektro-optiska sensorer och metoder som skulle kunna användas för att skapa mer effektiv min-detektion genomförs på FOI. Försök som har gjorts med data från bland annat laser-radar och IR-sensorer har gett intressanta resultat.</p><p>I det här examensarbetet utvärderades olika fenomen och egenskaper i laser-radar- och IR-data. De testade egenskaperna var intensitet, IR, ytlikhet och höjd.</p><p>En metod som segmenterar intressanta objekt och bakgrundsdata utformades och implementerades. Metoden använde sig av expectation-maximization-skattning och ett minimum message length-kriterium. Ett scatter separability-kriterium användes för att bestämma kvalitén på de olika egenskaperna och på den resulterande segmenteringen.</p><p>Data insamlad under en mätkampanj av FOI användes för att testa metoden. Resultatet visade bland annat att ytlikhetsmåttet gav en bra segmentering för stora objekt med släta ytor, men var sämre för små objekt med skrovliga ytor. Vid jämförelse med en manuellt skapad mål-mask visade det sig att metoden klarade av att välja ut egenskaper som i många fall gav en godkänd segmentering.</p> / <p>Land mines are a huge problem in conflict time and after. Methods used to detect mines have not changed much since the 1940's. Research aiming to evaluate output from different electro-optical sensors and develop methods for more efficient mine detection is performed at FOI. Early experiments with laser radar sensors show promising results, as do analysis of data from infrared sensors.</p><p>In this thesis, an evaluation is made of features found in laser radar- and in infrared -sensor data. The tested features are intensity, infrared, a surfaceness feature extracted from the laser radar data and height above an estimated ground plane.</p><p>A method for segmenting interesting objects from background data using theexpectation-maximization algorithm and a minimum message length criterion is designed and implemented. A scatter separability criterion is utilized to determine the quality of the features and the resulting segmentation.</p><p>The method is tested on real data from a field trial performed by FOI. The results show that the surfaceness feature supports the segmentation of larger object with smooth surfaces but gives no contribution to small object with irregular surfaces. The method produces a decent result of selecting contributing features for different neighbourhoods of a scene. A comparison with a manually created target mask of the neighbourhood and the segmented components show that in most cases a high percentage separation of mine data and background data is possible.</p>
|
5 |
A sensor fusion method for detection of surface laid land minesWestberg, Daniel January 2007 (has links)
Landminor är ett stort problem både under och efter krigstid. De metoder som används för att detektera minor har inte ändrats mycket sedan 1940-talet. Forskning med mål att utvärdera olika elektro-optiska sensorer och metoder som skulle kunna användas för att skapa mer effektiv min-detektion genomförs på FOI. Försök som har gjorts med data från bland annat laser-radar och IR-sensorer har gett intressanta resultat. I det här examensarbetet utvärderades olika fenomen och egenskaper i laser-radar- och IR-data. De testade egenskaperna var intensitet, IR, ytlikhet och höjd. En metod som segmenterar intressanta objekt och bakgrundsdata utformades och implementerades. Metoden använde sig av expectation-maximization-skattning och ett minimum message length-kriterium. Ett scatter separability-kriterium användes för att bestämma kvalitén på de olika egenskaperna och på den resulterande segmenteringen. Data insamlad under en mätkampanj av FOI användes för att testa metoden. Resultatet visade bland annat att ytlikhetsmåttet gav en bra segmentering för stora objekt med släta ytor, men var sämre för små objekt med skrovliga ytor. Vid jämförelse med en manuellt skapad mål-mask visade det sig att metoden klarade av att välja ut egenskaper som i många fall gav en godkänd segmentering. / Land mines are a huge problem in conflict time and after. Methods used to detect mines have not changed much since the 1940's. Research aiming to evaluate output from different electro-optical sensors and develop methods for more efficient mine detection is performed at FOI. Early experiments with laser radar sensors show promising results, as do analysis of data from infrared sensors. In this thesis, an evaluation is made of features found in laser radar- and in infrared -sensor data. The tested features are intensity, infrared, a surfaceness feature extracted from the laser radar data and height above an estimated ground plane. A method for segmenting interesting objects from background data using theexpectation-maximization algorithm and a minimum message length criterion is designed and implemented. A scatter separability criterion is utilized to determine the quality of the features and the resulting segmentation. The method is tested on real data from a field trial performed by FOI. The results show that the surfaceness feature supports the segmentation of larger object with smooth surfaces but gives no contribution to small object with irregular surfaces. The method produces a decent result of selecting contributing features for different neighbourhoods of a scene. A comparison with a manually created target mask of the neighbourhood and the segmented components show that in most cases a high percentage separation of mine data and background data is possible.
|
6 |
Distributed parameter and state estimation for wireless sensor networksYu, Jia January 2017 (has links)
The research in distributed algorithms is linked with the developments of statistical inference in wireless sensor networks (WSNs) applications. Typically, distributed approaches process the collected signals from networked sensor nodes. That is to say, the sensors receive local observations and transmit information between each other. Each sensor is capable of combining the collected information with its own observations to improve performance. In this thesis, we propose novel distributed methods for the inference applications using wireless sensor networks. In particular, the efficient algorithms which are not computationally intensive are investigated. Moreover, we present a number of novel algorithms for processing asynchronous network events and robust state estimation. In the first part of the thesis, a distributed adaptive algorithm based on the component-wise EM method for decentralized sensor networks is investigated. The distributed component-wise Expectation-Maximization (EM) algorithm has been designed for application in a Gaussian density estimation. The proposed algorithm operates a component-wise EM procedure for local parameter estimation and exploit an incremental strategy for network updating, which can provide an improved convergence rate. Numerical simulation results have illustrated the advantages of the proposed distributed component-wise EM algorithm for both well-separated and overlapped mixture densities. The distributed component-wise EM algorithm can outperform other EM-based distributed algorithms in estimating overlapping Gaussian mixtures. In the second part of the thesis, a diffusion based EM gradient algorithm for density estimation in asynchronous wireless sensor networks has been proposed. Specifically, based on the asynchronous adapt-then-combine diffusion strategy, a distributed EM gradient algorithm that can deal with asynchronous network events has been considered. The Bernoulli model has been exploited to approximate the asynchronous behaviour of the network. Compared with existing distributed EM based estimation methods using a consensus strategy, the proposed algorithm can provide more accurate estimates in the presence of asynchronous networks uncertainties, such as random link failures, random data arrival times, and turning on or off sensor nodes for energy conservation. Simulation experiments have been demonstrated that the proposed algorithm significantly outperforms the consensus based strategies in terms of Mean-Square- Deviation (MSD) performance in an asynchronous network setting. Finally, the challenge of distributed state estimation in power systems which requires low complexity and high stability in the presence of bad data for a large scale network is addressed. A gossip based quasi-Newton algorithm has been proposed for solving the power system state estimation problem. In particular, we have applied the quasi-Newton method for distributed state estimation under the gossip protocol. The proposed algorithm exploits the Broyden- Fletcher-Goldfarb-Shanno (BFGS) formula to approximate the Hessian matrix, thus avoiding the computation of inverse Hessian matrices for each control area. The simulation results for IEEE 14 bus system and a large scale 4200 bus system have shown that the distributed quasi-Newton scheme outperforms existing algorithms in terms of Mean-Square-Error (MSE) performance with bad data.
|
7 |
Online incremental one-shot learning of temporal sequencesPinto, Rafael Coimbra January 2011 (has links)
Este trabalho introduz novos algoritmos de redes neurais para o processamento online de padrões espaço-temporais, estendendo o algoritmo Incremental Gaussian Mixture Network (IGMN). O algoritmo IGMN é uma rede neural online incremental que aprende a partir de uma única passada através de dados por meio de uma versão incremental do algoritmo Expectation-Maximization (EM) combinado com regressão localmente ponderada (Locally Weighted Regression, LWR). Quatro abordagens diferentes são usadas para dar capacidade de processamento temporal para o algoritmo IGMN: linhas de atraso (Time-Delay IGMN), uma camada de reservoir (Echo-State IGMN), média móvel exponencial do vetor de entrada reconstruído (Merge IGMN) e auto-referência (Recursive IGMN). Isso resulta em algoritmos que são online, incrementais, agressivos e têm capacidades temporais e, portanto, são adequados para tarefas com memória ou estados internos desconhecidos, caracterizados por fluxo contínuo ininterrupto de dados, e que exigem operação perpétua provendo previsões sem etapas separadas para aprendizado e execução. Os algoritmos propostos são comparados a outras redes neurais espaço-temporais em 8 tarefas de previsão de séries temporais. Dois deles mostram desempenhos satisfatórios, em geral, superando as abordagens existentes. Uma melhoria geral para o algoritmo IGMN também é descrita, eliminando um dos parâmetros ajustáveis manualmente e provendo melhores resultados. / This work introduces novel neural networks algorithms for online spatio-temporal pattern processing by extending the Incremental Gaussian Mixture Network (IGMN). The IGMN algorithm is an online incremental neural network that learns from a single scan through data by means of an incremental version of the Expectation-Maximization (EM) algorithm combined with locally weighted regression (LWR). Four different approaches are used to give temporal processing capabilities to the IGMN algorithm: time-delay lines (Time-Delay IGMN), a reservoir layer (Echo-State IGMN), exponential moving average of reconstructed input vector (Merge IGMN) and self-referencing (Recursive IGMN). This results in algorithms that are online, incremental, aggressive and have temporal capabilities, and therefore are suitable for tasks with memory or unknown internal states, characterized by continuous non-stopping data-flows, and that require life-long learning while operating and giving predictions without separated stages. The proposed algorithms are compared to other spatio-temporal neural networks in 8 time-series prediction tasks. Two of them show satisfactory performances, generally improving upon existing approaches. A general enhancement for the IGMN algorithm is also described, eliminating one of the algorithm’s manually tunable parameters and giving better results.
|
8 |
Online incremental one-shot learning of temporal sequencesPinto, Rafael Coimbra January 2011 (has links)
Este trabalho introduz novos algoritmos de redes neurais para o processamento online de padrões espaço-temporais, estendendo o algoritmo Incremental Gaussian Mixture Network (IGMN). O algoritmo IGMN é uma rede neural online incremental que aprende a partir de uma única passada através de dados por meio de uma versão incremental do algoritmo Expectation-Maximization (EM) combinado com regressão localmente ponderada (Locally Weighted Regression, LWR). Quatro abordagens diferentes são usadas para dar capacidade de processamento temporal para o algoritmo IGMN: linhas de atraso (Time-Delay IGMN), uma camada de reservoir (Echo-State IGMN), média móvel exponencial do vetor de entrada reconstruído (Merge IGMN) e auto-referência (Recursive IGMN). Isso resulta em algoritmos que são online, incrementais, agressivos e têm capacidades temporais e, portanto, são adequados para tarefas com memória ou estados internos desconhecidos, caracterizados por fluxo contínuo ininterrupto de dados, e que exigem operação perpétua provendo previsões sem etapas separadas para aprendizado e execução. Os algoritmos propostos são comparados a outras redes neurais espaço-temporais em 8 tarefas de previsão de séries temporais. Dois deles mostram desempenhos satisfatórios, em geral, superando as abordagens existentes. Uma melhoria geral para o algoritmo IGMN também é descrita, eliminando um dos parâmetros ajustáveis manualmente e provendo melhores resultados. / This work introduces novel neural networks algorithms for online spatio-temporal pattern processing by extending the Incremental Gaussian Mixture Network (IGMN). The IGMN algorithm is an online incremental neural network that learns from a single scan through data by means of an incremental version of the Expectation-Maximization (EM) algorithm combined with locally weighted regression (LWR). Four different approaches are used to give temporal processing capabilities to the IGMN algorithm: time-delay lines (Time-Delay IGMN), a reservoir layer (Echo-State IGMN), exponential moving average of reconstructed input vector (Merge IGMN) and self-referencing (Recursive IGMN). This results in algorithms that are online, incremental, aggressive and have temporal capabilities, and therefore are suitable for tasks with memory or unknown internal states, characterized by continuous non-stopping data-flows, and that require life-long learning while operating and giving predictions without separated stages. The proposed algorithms are compared to other spatio-temporal neural networks in 8 time-series prediction tasks. Two of them show satisfactory performances, generally improving upon existing approaches. A general enhancement for the IGMN algorithm is also described, eliminating one of the algorithm’s manually tunable parameters and giving better results.
|
9 |
Online incremental one-shot learning of temporal sequencesPinto, Rafael Coimbra January 2011 (has links)
Este trabalho introduz novos algoritmos de redes neurais para o processamento online de padrões espaço-temporais, estendendo o algoritmo Incremental Gaussian Mixture Network (IGMN). O algoritmo IGMN é uma rede neural online incremental que aprende a partir de uma única passada através de dados por meio de uma versão incremental do algoritmo Expectation-Maximization (EM) combinado com regressão localmente ponderada (Locally Weighted Regression, LWR). Quatro abordagens diferentes são usadas para dar capacidade de processamento temporal para o algoritmo IGMN: linhas de atraso (Time-Delay IGMN), uma camada de reservoir (Echo-State IGMN), média móvel exponencial do vetor de entrada reconstruído (Merge IGMN) e auto-referência (Recursive IGMN). Isso resulta em algoritmos que são online, incrementais, agressivos e têm capacidades temporais e, portanto, são adequados para tarefas com memória ou estados internos desconhecidos, caracterizados por fluxo contínuo ininterrupto de dados, e que exigem operação perpétua provendo previsões sem etapas separadas para aprendizado e execução. Os algoritmos propostos são comparados a outras redes neurais espaço-temporais em 8 tarefas de previsão de séries temporais. Dois deles mostram desempenhos satisfatórios, em geral, superando as abordagens existentes. Uma melhoria geral para o algoritmo IGMN também é descrita, eliminando um dos parâmetros ajustáveis manualmente e provendo melhores resultados. / This work introduces novel neural networks algorithms for online spatio-temporal pattern processing by extending the Incremental Gaussian Mixture Network (IGMN). The IGMN algorithm is an online incremental neural network that learns from a single scan through data by means of an incremental version of the Expectation-Maximization (EM) algorithm combined with locally weighted regression (LWR). Four different approaches are used to give temporal processing capabilities to the IGMN algorithm: time-delay lines (Time-Delay IGMN), a reservoir layer (Echo-State IGMN), exponential moving average of reconstructed input vector (Merge IGMN) and self-referencing (Recursive IGMN). This results in algorithms that are online, incremental, aggressive and have temporal capabilities, and therefore are suitable for tasks with memory or unknown internal states, characterized by continuous non-stopping data-flows, and that require life-long learning while operating and giving predictions without separated stages. The proposed algorithms are compared to other spatio-temporal neural networks in 8 time-series prediction tasks. Two of them show satisfactory performances, generally improving upon existing approaches. A general enhancement for the IGMN algorithm is also described, eliminating one of the algorithm’s manually tunable parameters and giving better results.
|
10 |
On unsupervised learning in high dimension / Sur l'apprentissage non supervisé en haute dimensionSebbar, Mehdi 12 December 2017 (has links)
Dans ce mémoire de thèse, nous abordons deux thèmes, le clustering en haute dimension d'une part et l'estimation de densités de mélange d'autre part. Le premier chapitre est une introduction au clustering. Nous y présentons différentes méthodes répandues et nous nous concentrons sur un des principaux modèles de notre travail qui est le mélange de Gaussiennes. Nous abordons aussi les problèmes inhérents à l'estimation en haute dimension et la difficulté d'estimer le nombre de clusters. Nous exposons brièvement ici les notions abordées dans ce manuscrit. Considérons une loi mélange de K Gaussiennes dans R^p. Une des approches courantes pour estimer les paramètres du mélange est d'utiliser l'estimateur du maximum de vraisemblance. Ce problème n'étant pas convexe, on ne peut garantir la convergence des méthodes classiques. Cependant, en exploitant la biconvexité de la log-vraisemblance négative, on peut utiliser la procédure itérative 'Expectation-Maximization' (EM). Malheureusement, cette méthode n'est pas bien adaptée pour relever les défis posés par la grande dimension. Par ailleurs, cette méthode requiert de connaître le nombre de clusters. Le Chapitre 2 présente trois méthodes que nous avons développées pour tenter de résoudre les problèmes décrits précédemment. Les travaux qui y sont exposés n'ont pas fait l'objet de recherches approfondies pour diverses raisons. La première méthode, 'lasso graphique sur des mélanges de Gaussiennes', consiste à estimer les matrices inverses des matrices de covariance dans l'hypothèse où celles-ci sont parcimonieuses. Nous adaptons la méthode du lasso graphique de [Friedman et al., 2007] sur une composante dans le cas d'un mélange et nous évaluons expérimentalement cette méthode. Les deux autres méthodes abordent le problème d'estimation du nombre de clusters dans le mélange. La première est une estimation pénalisée de la matrice des probabilités postérieures dont la composante (i,j) est la probabilité que la i-ème observation soit dans le j-ème cluster. Malheureusement, cette méthode s'est avérée trop coûteuse en complexité. Enfin, la deuxième méthode considérée consiste à pénaliser le vecteur de poids afin de le rendre parcimonieux. Cette méthode montre des résultats prometteurs. Dans le Chapitre 3, nous étudions l'estimateur du maximum de vraisemblance d'une densité de n observations i.i.d. sous l’hypothèse qu'elle est bien approximée par un mélange de plusieurs densités données. Nous nous intéressons aux performances de l'estimateur par rapport à la perte de Kullback-Leibler. Nous établissons des bornes de risque sous la forme d'inégalités d'oracle exactes, que ce soit en probabilité ou en espérance. Nous démontrons à travers ces bornes que, dans le cas du problème d’agrégation convexe, l'estimateur du maximum de vraisemblance atteint la vitesse (log K)/n)^{1/2}, qui est optimale à un terme logarithmique près, lorsque le nombre de composant est plus grand que n^{1/2}. Plus important, sous l’hypothèse supplémentaire que la matrice de Gram des composantes du dictionnaire satisfait la condition de compatibilité, les inégalités d'oracles obtenues donnent la vitesse optimale dans le scénario parcimonieux. En d'autres termes, si le vecteur de poids est (presque) D-parcimonieux, nous obtenons une vitesse (Dlog K)/n. En complément de ces inégalités d'oracle, nous introduisons la notion d’agrégation (presque)-D-parcimonieuse et établissons pour ce type d’agrégation les bornes inférieures correspondantes. Enfin, dans le Chapitre 4, nous proposons un algorithme qui réalise l'agrégation en Kullback-Leibler de composantes d'un dictionnaire telle qu'étudiée dans le Chapitre 3. Nous comparons sa performance avec différentes méthodes. Nous proposons ensuite une méthode pour construire le dictionnaire de densités et l’étudions de manière numérique. Cette thèse a été effectué dans le cadre d’une convention CIFRE avec l’entreprise ARTEFACT. / In this thesis, we discuss two topics, high-dimensional clustering on the one hand and estimation of mixing densities on the other. The first chapter is an introduction to clustering. We present various popular methods and we focus on one of the main models of our work which is the mixture of Gaussians. We also discuss the problems with high-dimensional estimation (Section 1.3) and the difficulty of estimating the number of clusters (Section 1.1.4). In what follows, we present briefly the concepts discussed in this manuscript. Consider a mixture of $K$ Gaussians in $RR^p$. One of the common approaches to estimate the parameters is to use the maximum likelihood estimator. Since this problem is not convex, we can not guarantee the convergence of classical methods such as gradient descent or Newton's algorithm. However, by exploiting the biconvexity of the negative log-likelihood, the iterative 'Expectation-Maximization' (EM) procedure described in Section 1.2.1 can be used. Unfortunately, this method is not well suited to meet the challenges posed by the high dimension. In addition, it is necessary to know the number of clusters in order to use it. Chapter 2 presents three methods that we have developed to try to solve the problems described above. The works presented there have not been thoroughly researched for various reasons. The first method that could be called 'graphical lasso on Gaussian mixtures' consists in estimating the inverse matrices of covariance matrices $Sigma$ (Section 2.1) in the hypothesis that they are parsimonious. We adapt the graphic lasso method of [Friedman et al., 2007] to a component in the case of a mixture and experimentally evaluate this method. The other two methods address the problem of estimating the number of clusters in the mixture. The first is a penalized estimate of the matrix of posterior probabilities $ Tau in RR ^ {n times K} $ whose component $ (i, j) $ is the probability that the $i$-th observation is in the $j$-th cluster. Unfortunately, this method proved to be too expensive in complexity (Section 2.2.1). Finally, the second method considered is to penalize the weight vector $ pi $ in order to make it parsimonious. This method shows promising results (Section 2.2.2). In Chapter 3, we study the maximum likelihood estimator of density of $n$ i.i.d observations, under the assumption that it is well approximated by a mixture with a large number of components. The main focus is on statistical properties with respect to the Kullback-Leibler loss. We establish risk bounds taking the form of sharp oracle inequalities both in deviation and in expectation. A simple consequence of these bounds is that the maximum likelihood estimator attains the optimal rate $((log K)/n)^{1/2}$, up to a possible logarithmic correction, in the problem of convex aggregation when the number $K$ of components is larger than $n^{1/2}$. More importantly, under the additional assumption that the Gram matrix of the components satisfies the compatibility condition, the obtained oracle inequalities yield the optimal rate in the sparsity scenario. That is, if the weight vector is (nearly) $D$-sparse, we get the rate $(Dlog K)/n$. As a natural complement to our oracle inequalities, we introduce the notion of nearly-$D$-sparse aggregation and establish matching lower bounds for this type of aggregation. Finally, in Chapter 4, we propose an algorithm that performs the Kullback-Leibler aggregation of components of a dictionary as discussed in Chapter 3. We compare its performance with different methods: the kernel density estimator , the 'Adaptive Danzig' estimator, the SPADES and EM estimator with the BIC criterion. We then propose a method to build the dictionary of densities and study it numerically. This thesis was carried out within the framework of a CIFRE agreement with the company ARTEFACT.
|
Page generated in 0.0723 seconds