Global ETD Search

101	Échantillonnage dynamique de champs markoviens Breuleux, Olivier 11 1900 (has links) L'un des modèles d'apprentissage non-supervisé générant le plus de recherche active est la machine de Boltzmann --- en particulier la machine de Boltzmann restreinte, ou RBM. Un aspect important de l'entraînement ainsi que l'exploitation d'un tel modèle est la prise d'échantillons. Deux développements récents, la divergence contrastive persistante rapide (FPCD) et le herding, visent à améliorer cet aspect, se concentrant principalement sur le processus d'apprentissage en tant que tel. Notamment, le herding renonce à obtenir un estimé précis des paramètres de la RBM, définissant plutôt une distribution par un système dynamique guidé par les exemples d'entraînement. Nous généralisons ces idées afin d'obtenir des algorithmes permettant d'exploiter la distribution de probabilités définie par une RBM pré-entraînée, par tirage d'échantillons qui en sont représentatifs, et ce sans que l'ensemble d'entraînement ne soit nécessaire. Nous présentons trois méthodes: la pénalisation d'échantillon (basée sur une intuition théorique) ainsi que la FPCD et le herding utilisant des statistiques constantes pour la phase positive. Ces méthodes définissent des systèmes dynamiques produisant des échantillons ayant les statistiques voulues et nous les évaluons à l'aide d'une méthode d'estimation de densité non-paramétrique. Nous montrons que ces méthodes mixent substantiellement mieux que la méthode conventionnelle, l'échantillonnage de Gibbs. / One of the most active topics of research in unsupervised learning is the Boltzmann machine --- particularly the Restricted Boltzmann Machine or RBM. In order to train, evaluate or exploit such models, one has to draw samples from it. Two recent algorithms, Fast Persistent Contrastive Divergence (FPCD) and Herding aim to improve sampling during training. In particular, herding gives up on obtaining a point estimate of the RBM's parameters, rather defining the model's distribution with a dynamical system guided by training samples. We generalize these ideas in order to obtain algorithms capable of exploiting the probability distribution defined by a pre-trained RBM, by sampling from it, without needing to make use of the training set. We present three methods: Sample Penalization, based on a theoretical argument as well as FPCD and Herding using constant statistics for their positive phases. These methods define dynamical systems producing samples with the right statistics and we evaluate them using non-parametric density estimation. We show that these methods mix substantially better than Gibbs sampling, which is the conventional sampling method used for RBMs. Apprentissage machine Champs markoviens Machine de Boltzmann MCMC Modèles probabilistes Machine learning Markov random fields Boltzmann machine MCMC Probabilistic models
102	Uncovering Structure in High-Dimensions: Networks and Multi-task Learning Problems Kolar, Mladen 01 July 2013 (has links) Extracting knowledge and providing insights into complex mechanisms underlying noisy high-dimensional data sets is of utmost importance in many scientific domains. Statistical modeling has become ubiquitous in the analysis of high dimensional functional data in search of better understanding of cognition mechanisms, in the exploration of large-scale gene regulatory networks in hope of developing drugs for lethal diseases, and in prediction of volatility in stock market in hope of beating the market. Statistical analysis in these high-dimensional data sets is possible only if an estimation procedure exploits hidden structures underlying data. This thesis develops flexible estimation procedures with provable theoretical guarantees for uncovering unknown hidden structures underlying data generating process. Of particular interest are procedures that can be used on high dimensional data sets where the number of samples n is much smaller than the ambient dimension p. Learning in high-dimensions is difficult due to the curse of dimensionality, however, the special problem structure makes inference possible. Due to its importance for scientific discovery, we put emphasis on consistent structure recovery throughout the thesis. Particular focus is given to two important problems, semi-parametric estimation of networks and feature selection in multi-task learning. Complex Systems Dynamic Networks Feature Selection Gaussian Graphical Models High-dimensional Inference Markov Random Fields Multi-task Learning Semiparametric Estimation Sparsity Structure Learning Undirected Graphical Models Variable Screening Varying Coefficient Computer Sciences
103	Robust estimation for spatial models and the skill test for disease diagnosis Lin, Shu-Chuan 25 August 2008 (has links) This thesis focuses on (1) the statistical methodologies for the estimation of spatial data with outliers and (2) classification accuracy of disease diagnosis. Chapter I, Robust Estimation for Spatial Markov Random Field Models: Markov Random Field (MRF) models are useful in analyzing spatial lattice data collected from semiconductor device fabrication and printed circuit board manufacturing processes or agricultural field trials. When outliers are present in the data, classical parameter estimation techniques (e.g., least squares) can be inefficient and potentially mislead the analyst. This chapter extends the MRF model to accommodate outliers and proposes robust parameter estimation methods such as the robust M- and RA-estimates. Asymptotic distributions of the estimates with differentiable and non-differentiable robustifying function are derived. Extensive simulation studies explore robustness properties of the proposed methods in situations with various amounts of outliers in different patterns. Also provided are studies of analysis of grid data with and without the edge information. Three data sets taken from the literature illustrate advantages of the methods. Chapter II, Extending the Skill Test for Disease Diagnosis: For diagnostic tests, we present an extension to the skill plot introduced by Mozer and Briggs (2003). The method is motivated by diagnostic measures for osteoporosis in a study. By restricting the area under the ROC curve (AUC) according to the skill statistic, we have an improved diagnostic test for practical applications by considering the misclassification costs. We also construct relationships, using the Koziol-Green model and mean-shift model, between the diseased group and the healthy group for improving the skill statistic. Asymptotic properties of the skill statistic are provided. Simulation studies compare the theoretical results and the estimates under various disease rates and misclassification costs. We apply the proposed method in classification of osteoporosis data. True positive rate False positive rate Classification Disease diagnosis Skill test Robust estimation Spatial models Markov random field models Spatial lattice data Koziol-Green model and mean-shift model Area under the curve ROC curve Markov random fields Lattice theory Outliers (Statistics)
104	Pokročilé algoritmy fúze 3D medicínských dat pro specifické lékařské problémy / Advanced Algorithms for 3D Medical Image Data Fusion in Specific Medical Problems Malínský, Miloš January 2013 (has links) Fúze obrazu je dnes jednou z nejběžnějších avšak stále velmi diskutovanou oblastí v lékařském zobrazování a hraje důležitou roli ve všech oblastech lékařské péče jako je diagnóza, léčba a chirurgie. V této dizertační práci jsou představeny tři projekty, které jsou velmi úzce spojeny s oblastí fúze medicínských dat. První projekt pojednává o 3D CT subtrakční angiografii dolních končetin. V práci je využito kombinace kontrastních a nekontrastních dat pro získání kompletního cévního stromu. Druhý projekt se zabývá fúzí DTI a T1 váhovaných MRI dat mozku. Cílem tohoto projektu je zkombinovat stukturální a funkční informace, které umožňují zlepšit znalosti konektivity v mozkové tkáni. Třetí projekt se zabývá metastázemi v CT časových datech páteře. Tento projekt je zaměřen na studium vývoje metastáz uvnitř obratlů ve fúzované časové řadě snímků. Tato dizertační práce představuje novou metodologii pro klasifikaci těchto metastáz. Všechny projekty zmíněné v této dizertační práci byly řešeny v rámci pracovní skupiny zabývající se analýzou lékařských dat, kterou vedl pan Prof. Jiří Jan. Tato dizertační práce obsahuje registrační část prvního a klasifikační část třetího projektu. Druhý projekt je představen kompletně. Další část prvního a třetího projektu, obsahující specifické předzpracování dat, jsou obsaženy v disertační práci mého kolegy Ing. Romana Petera.
105	Modèles de classification hiérarchiques d'images satellitaires multi-résolutions, multi-temporelles et multi-capteurs. Application aux désastres naturels / Hierarchical joint classification models for multi-resolution, multi-temporal and multi-sensor remote sensing images. Application to natural disasters Hedhli, Ihsen 18 March 2016 (has links) Les moyens mis en œuvre pour surveiller la surface de la Terre, notamment les zones urbaines, en cas de catastrophes naturelles telles que les inondations ou les tremblements de terre, et pour évaluer l’impact de ces événements, jouent un rôle primordial du point de vue sociétal, économique et humain. Dans ce cadre, des méthodes de classification précises et efficaces sont des outils particulièrement importants pour aider à l’évaluation rapide et fiable des changements au sol et des dommages provoqués. Étant données l’énorme quantité et la variété des données Haute Résolution (HR) disponibles grâce aux missions satellitaires de dernière génération et de différents types, telles que Pléiades, COSMO-SkyMed ou RadarSat-2 la principale difficulté est de trouver un classifieur qui puisse prendre en compte des données multi-bande, multi-résolution, multi-date et éventuellement multi-capteur tout en gardant un temps de calcul acceptable. Les approches de classification multi-date/multi-capteur et multi-résolution sont fondées sur une modélisation statistique explicite. En fait, le modèle développé consiste en un classifieur bayésien supervisé qui combine un modèle statistique conditionnel par classe intégrant des informations pixel par pixel à la même résolution et un champ de Markov hiérarchique fusionnant l’information spatio-temporelle et multi-résolution, en se basant sur le critère des Modes Marginales a Posteriori (MPM en anglais), qui vise à affecter à chaque pixel l’étiquette optimale en maximisant récursivement la probabilité marginale a posteriori, étant donné l’ensemble des observations multi-temporelles ou multi-capteur / The capabilities to monitor the Earth's surface, notably in urban and built-up areas, for example in the framework of the protection from environmental disasters such as floods or earthquakes, play important roles in multiple social, economic, and human viewpoints. In this framework, accurate and time-efficient classification methods are important tools required to support the rapid and reliable assessment of ground changes and damages induced by a disaster, in particular when an extensive area has been affected. Given the substantial amount and variety of data available currently from last generation very-high resolution (VHR) satellite missions such as Pléiades, COSMO-SkyMed, or RadarSat-2, the main methodological difficulty is to develop classifiers that are powerful and flexible enough to utilize the benefits of multiband, multiresolution, multi-date, and possibly multi-sensor input imagery. With the proposed approaches, multi-date/multi-sensor and multi-resolution fusion are based on explicit statistical modeling. The method combines a joint statistical model of multi-sensor and multi-temporal images through hierarchical Markov random field (MRF) modeling, leading to statistical supervised classification approaches. We have developed novel hierarchical Markov random field models, based on the marginal posterior modes (MPM) criterion, that support information extraction from multi-temporal and/or multi-sensor information and allow the joint supervised classification of multiple images taken over the same area at different times, from different sensors, and/or at different spatial resolutions. The developed methods have been experimentally validated with complex optical multispectral (Pléiades), X-band SAR (COSMO-Skymed), and C-band SAR (RadarSat-2) imagery taken from the Haiti site Images satellitaires Séries temporelles Multi-résolution Multi-capteur Quad-arbre Classification Champs de Markov hiérarchiques MPM Satellite images Image time series Multi-resolution Multi-sensor Quad-tree Classification Hierarchical Markov random fields MPM
106	Časový snímek z obrazu stacionární kamery / Time Lapse from Stationary Camera Image Turek, Lukáš January 2015 (has links) The topic of this master's thesis is the time lapse from stationary camera images. Unwanted phenomena, which arise in time lapse, were analyzed and algorithms to overcome these limitations were designed. The algorithms were implemented and compared using the captured dataset. The resulting application creates time lapse from the video input and allows users to choose the processing technique including the setting of appropriate parameters.
107	Efficient and Scalable Subgraph Statistics using Regenerative Markov Chain Monte Carlo Mayank Kakodkar (12463929) 26 April 2022 (has links) <p>In recent years there has been a growing interest in data mining and graph machine learning for techniques that can obtain frequencies of <em>k</em>-node Connected Induced Subgraphs (<em>k</em>-CIS) contained in large real-world graphs. While recent work has shown that 5-CISs can be counted exactly, no exact polynomial-time algorithms are known that solve this task for <em>k </em>> 5. In the past, sampling-based algorithms that work well in moderately-sized graphs for <em>k</em> ≤ 8 have been proposed. In this thesis I push this boundary up to <em>k</em> ≤ 16 for graphs containing up to 120M edges, and to <em>k</em> ≤ 25 for smaller graphs containing between a million to 20M edges. I do so by re-imagining two older, but elegant and memory-efficient algorithms -- FANMOD and PSRW -- which have large estimation errors by modern standards. This is because FANMOD produces highly correlated k-CIS samples and the cost of sampling the PSRW Markov chain becomes prohibitively expensive for k-CIS’s larger than <em>k </em>> 8.</p> <p>In this thesis, I introduce:</p> <p>(a) <strong>RTS:</strong> a novel regenerative Markov chain Monte Carlo (MCMC) sampling procedure on the tree, generated on-the-fly by the FANMOD algorithm. RTS is able to run on multiple cores and multiple machines (embarrassingly parallel) and compute confidence intervals of estimates, all this while preserving the memory-efficient nature of FANMOD. RTS is thus able to estimate subgraph statistics for <em>k</em> ≤ 16 for larger graphs containing up to 120M edges, and for <em>k</em> ≤ 25 for smaller graphs containing between a million to 20M edges.</p> <p>(b) <strong>R-PSRW:</strong> which scales the PSRW algorithm to larger CIS-sizes using a rejection sampling procedure to efficiently sample transitions from the PSRW Markov chain. R-PSRW matches RTS in terms of scaling to larger CIS sizes.</p> <p>(c) <strong>Ripple:</strong> which achieves unprecedented scalability by stratifying the R-PSRW Markov chain state-space into ordered strata via a new technique that I call <em>sequential stratified regeneration</em>. I show that the Ripple estimator is consistent, highly parallelizable, and scales well. Ripple is able to <em>count</em> CISs of size up to <em>k </em>≤ 12 in real world graphs containing up to 120M edges.</p> <p>My empirical results show that the proposed methods offer a considerable improvement over the state-of-the-art. Moreover my methods are able to run at a scale that has been considered unreachable until now, not only by prior MCMC-based methods but also by other sampling approaches. </p> <p><strong>Optimization of Restricted Boltzmann Machines. </strong>In addition, I also propose a regenerative transformation of MCMC samplers of Restricted Boltzmann Machines RBMs. My approach, Markov Chain Las Vegas (MCLV) gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has a maximum number of Markov chain step-count <em>K</em> (referred as MCLV-<em>K</em>). I present a MCLV-<em>K</em> gradient estimator (LVS-<em>K</em>) for RBMs and explore the correspondence and differences between LVS-<em>K</em> and Contrastive Divergence (CD-<em>K</em>). LVS-<em>K</em> significantly outperforms CD-<em>K</em> in the task of training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models.</p> Pattern Recognition and Data Mining Markov Chain Monte Carlo Random Walk Regenerative Sampling Motif Analysis Subgraph Counting Graph Mining Energy Based Models Generative Models Markov Random Fields Restricted Boltzmann Machine Random Walk Tours
108	Duas abordagens para casamento de padrões de pontos usando relações espaciais e casamento entre grafos / Two approaches for point set matching using spatial relations for graph matching Noma, Alexandre 07 July 2010 (has links) Casamento de padrões de pontos é um problema fundamental em reconhecimento de padrões. O objetivo é encontrar uma correspondência entre dois conjuntos de pontos, associados a características relevantes de objetos ou entidades, mapeando os pontos de um conjunto no outro. Este problema está associado a muitas aplicações, como por exemplo, reconhecimento de objetos baseado em modelos, imagens estéreo, registro de imagens, biometria, entre outros. Para encontrar um mapeamento, os objetos são codificados por representações abstratas, codificando as características relevantes consideradas na comparação entre pares de objetos. Neste trabalho, objetos são representados por grafos, codificando tanto as características `locais\' quanto as relações espaciais entre estas características. A comparação entre objetos é guiada por uma formulação de atribuição quadrática, que é um problema NP-difícil. Para estimar uma solução, duas técnicas de casamento entre grafos são propostas: uma baseada em grafos auxiliares, chamados de grafos deformados; e outra baseada em representações `esparsas\', campos aleatórios de Markov e propagação de crenças. Devido as suas respectivas limitações, as abordagens são adequadas para situações específicas, conforme mostrado neste documento. Resultados envolvendo as duas abordagens são ilustrados em quatro importantes aplicações: casamento de imagens de gel eletroforese 2D, segmentação interativa de imagens naturais, casamento de formas, e colorização assistida por computador. / Point set matching is a fundamental problem in pattern recognition. The goal is to match two sets of points, associated to relevant features of objects or entities, by finding a mapping, or a correspondence, from one set to another set of points. This issue arises in many applications, e.g. model-based object recognition, stereo matching, image registration, biometrics, among others. In order to find a mapping, the objects can be encoded by abstract representations, carrying relevant features which are taken into account to compare pairs of objects. In this work, graphs are adopted to represent the objects, encoding their `local\' features and the spatial relations between these features. The comparison of two given objects is guided by a quadratic assignment formulation, which is NP-hard. In order to estimate the optimal solution, two approximations techniques, via graph matching, are proposed: one is based on auxiliary graphs, called deformed graphs; the other is based on `sparse\' representations, Markov random fields and belief propagation. Due to their respective limitations, each approach is more suitable to each specific situation, as shown in this document. The quality of the two approaches is illustrated on four important applications: 2D electrophoresis gel matching, interactive natural image segmentation, shape matching, and computer-assisted colorization. 2D electrophoresis gel matching attributed relational graphs belief propagation campos aleatórios de Markov casamento de formas Casamento de padrões de pontos casamento entre grafos colorização assistida por computador. computer-assisted colorization. deformed graphs formulação de atribuição quadrática grafos deformados grafos relacionais com atributos graph matching interactive natural image segmentation Markov random fields pattern recognition point pattern matching Point set matching propagação de crenças quadratic assingment formulation reconhecimento de padrões shape matching
109	Duas abordagens para casamento de padrões de pontos usando relações espaciais e casamento entre grafos / Two approaches for point set matching using spatial relations for graph matching Alexandre Noma 07 July 2010 (has links) Casamento de padrões de pontos é um problema fundamental em reconhecimento de padrões. O objetivo é encontrar uma correspondência entre dois conjuntos de pontos, associados a características relevantes de objetos ou entidades, mapeando os pontos de um conjunto no outro. Este problema está associado a muitas aplicações, como por exemplo, reconhecimento de objetos baseado em modelos, imagens estéreo, registro de imagens, biometria, entre outros. Para encontrar um mapeamento, os objetos são codificados por representações abstratas, codificando as características relevantes consideradas na comparação entre pares de objetos. Neste trabalho, objetos são representados por grafos, codificando tanto as características `locais\' quanto as relações espaciais entre estas características. A comparação entre objetos é guiada por uma formulação de atribuição quadrática, que é um problema NP-difícil. Para estimar uma solução, duas técnicas de casamento entre grafos são propostas: uma baseada em grafos auxiliares, chamados de grafos deformados; e outra baseada em representações `esparsas\', campos aleatórios de Markov e propagação de crenças. Devido as suas respectivas limitações, as abordagens são adequadas para situações específicas, conforme mostrado neste documento. Resultados envolvendo as duas abordagens são ilustrados em quatro importantes aplicações: casamento de imagens de gel eletroforese 2D, segmentação interativa de imagens naturais, casamento de formas, e colorização assistida por computador. / Point set matching is a fundamental problem in pattern recognition. The goal is to match two sets of points, associated to relevant features of objects or entities, by finding a mapping, or a correspondence, from one set to another set of points. This issue arises in many applications, e.g. model-based object recognition, stereo matching, image registration, biometrics, among others. In order to find a mapping, the objects can be encoded by abstract representations, carrying relevant features which are taken into account to compare pairs of objects. In this work, graphs are adopted to represent the objects, encoding their `local\' features and the spatial relations between these features. The comparison of two given objects is guided by a quadratic assignment formulation, which is NP-hard. In order to estimate the optimal solution, two approximations techniques, via graph matching, are proposed: one is based on auxiliary graphs, called deformed graphs; the other is based on `sparse\' representations, Markov random fields and belief propagation. Due to their respective limitations, each approach is more suitable to each specific situation, as shown in this document. The quality of the two approaches is illustrated on four important applications: 2D electrophoresis gel matching, interactive natural image segmentation, shape matching, and computer-assisted colorization. campos aleatórios de Markov casamento de formas Casamento de padrões de pontos casamento entre grafos colorização assistida por computador. formulação de atribuição quadrática grafos deformados grafos relacionais com atributos propagação de crenças reconhecimento de padrões 2D electrophoresis gel matching attributed relational graphs belief propagation computer-assisted colorization. deformed graphs graph matching interactive natural image segmentation Markov random fields pattern recognition point pattern matching Point set matching quadratic assingment formulation shape matching

Search results