Global ETD Search

211	Inférence topologique Prévost, Noémie 02 1900 (has links) Les données provenant de l'échantillonnage fin d'un processus continu (champ aléatoire) peuvent être représentées sous forme d'images. Un test statistique permettant de détecter une différence entre deux images peut être vu comme un ensemble de tests où chaque pixel est comparé au pixel correspondant de l'autre image. On utilise alors une méthode de contrôle de l'erreur de type I au niveau de l'ensemble de tests, comme la correction de Bonferroni ou le contrôle du taux de faux-positifs (FDR). Des méthodes d'analyse de données ont été développées en imagerie médicale, principalement par Keith Worsley, utilisant la géométrie des champs aléatoires afin de construire un test statistique global sur une image entière. Il s'agit d'utiliser l'espérance de la caractéristique d'Euler de l'ensemble d'excursion du champ aléatoire sous-jacent à l'échantillon au-delà d'un seuil donné, pour déterminer la probabilité que le champ aléatoire dépasse ce même seuil sous l'hypothèse nulle (inférence topologique). Nous exposons quelques notions portant sur les champs aléatoires, en particulier l'isotropie (la fonction de covariance entre deux points du champ dépend seulement de la distance qui les sépare). Nous discutons de deux méthodes pour l'analyse des champs anisotropes. La première consiste à déformer le champ puis à utiliser les volumes intrinsèques et les compacités de la caractéristique d'Euler. La seconde utilise plutôt les courbures de Lipschitz-Killing. Nous faisons ensuite une étude de niveau et de puissance de l'inférence topologique en comparaison avec la correction de Bonferroni. Finalement, nous utilisons l'inférence topologique pour décrire l'évolution du changement climatique sur le territoire du Québec entre 1991 et 2100, en utilisant des données de température simulées et publiées par l'Équipe Simulations climatiques d'Ouranos selon le modèle régional canadien du climat. / Data coming from a fine sampling of a continuous process (random field) can be represented as images. A statistical test aiming at detecting a difference between two images can be seen as a group of tests in which each pixel is compared to the corresponding pixel in the other image. We then use a method to control the type I error over all the tests, such as the Bonferroni correction or the control of the false discovery rate (FDR). Methods of data analysis have been developped in the field of medical imaging, mainly by Keith Worsley, using the geometry of random fields in order to build a global statistical test over the whole image. The expected Euler characteristic of the excursion set of the random field underlying the sample over a given threshold is used in order to determine the probability that the random field exceeds this same threshold under the null hypothesis (topological inference). We present some notions relevant to random fields, in particular isotropy (the covariance function between two given points of a field depends only on the distance between them). We discuss two methods for the analysis of non\-isotropic random fields. The first one consists in deforming the field and then using the intrinsic volumes and the Euler characteristic densities. The second one uses the Lipschitz-Killing curvatures. We then perform a study of sensitivity and power of the topological inference technique comparing it to the Bonferonni correction. Finally, we use topological inference in order to describe the evolution of climate change over Quebec territory between 1991 and 2100 using temperature data simulated and published by the Climate Simulation Team at Ouranos, with the Canadian Regional Climate Model CRCM4.2. Comparaisons multiples Caractéristique d'Euler Champs aléatoires Isotropie Courbures de Lipschitz-Killing Inférence topologique Changement climatique Multiple comparisons Euler characteristic Random fields Isotropy Lipschitz-Killing curvatures Climate change
212	Robust estimation for spatial models and the skill test for disease diagnosis Lin, Shu-Chuan 25 August 2008 (has links) This thesis focuses on (1) the statistical methodologies for the estimation of spatial data with outliers and (2) classification accuracy of disease diagnosis. Chapter I, Robust Estimation for Spatial Markov Random Field Models: Markov Random Field (MRF) models are useful in analyzing spatial lattice data collected from semiconductor device fabrication and printed circuit board manufacturing processes or agricultural field trials. When outliers are present in the data, classical parameter estimation techniques (e.g., least squares) can be inefficient and potentially mislead the analyst. This chapter extends the MRF model to accommodate outliers and proposes robust parameter estimation methods such as the robust M- and RA-estimates. Asymptotic distributions of the estimates with differentiable and non-differentiable robustifying function are derived. Extensive simulation studies explore robustness properties of the proposed methods in situations with various amounts of outliers in different patterns. Also provided are studies of analysis of grid data with and without the edge information. Three data sets taken from the literature illustrate advantages of the methods. Chapter II, Extending the Skill Test for Disease Diagnosis: For diagnostic tests, we present an extension to the skill plot introduced by Mozer and Briggs (2003). The method is motivated by diagnostic measures for osteoporosis in a study. By restricting the area under the ROC curve (AUC) according to the skill statistic, we have an improved diagnostic test for practical applications by considering the misclassification costs. We also construct relationships, using the Koziol-Green model and mean-shift model, between the diseased group and the healthy group for improving the skill statistic. Asymptotic properties of the skill statistic are provided. Simulation studies compare the theoretical results and the estimates under various disease rates and misclassification costs. We apply the proposed method in classification of osteoporosis data. True positive rate False positive rate Classification Disease diagnosis Skill test Robust estimation Spatial models Markov random field models Spatial lattice data Koziol-Green model and mean-shift model Area under the curve ROC curve Markov random fields Lattice theory Outliers (Statistics)
213	Pokročilé algoritmy fúze 3D medicínských dat pro specifické lékařské problémy / Advanced Algorithms for 3D Medical Image Data Fusion in Specific Medical Problems Malínský, Miloš January 2013 (has links) Fúze obrazu je dnes jednou z nejběžnějších avšak stále velmi diskutovanou oblastí v lékařském zobrazování a hraje důležitou roli ve všech oblastech lékařské péče jako je diagnóza, léčba a chirurgie. V této dizertační práci jsou představeny tři projekty, které jsou velmi úzce spojeny s oblastí fúze medicínských dat. První projekt pojednává o 3D CT subtrakční angiografii dolních končetin. V práci je využito kombinace kontrastních a nekontrastních dat pro získání kompletního cévního stromu. Druhý projekt se zabývá fúzí DTI a T1 váhovaných MRI dat mozku. Cílem tohoto projektu je zkombinovat stukturální a funkční informace, které umožňují zlepšit znalosti konektivity v mozkové tkáni. Třetí projekt se zabývá metastázemi v CT časových datech páteře. Tento projekt je zaměřen na studium vývoje metastáz uvnitř obratlů ve fúzované časové řadě snímků. Tato dizertační práce představuje novou metodologii pro klasifikaci těchto metastáz. Všechny projekty zmíněné v této dizertační práci byly řešeny v rámci pracovní skupiny zabývající se analýzou lékařských dat, kterou vedl pan Prof. Jiří Jan. Tato dizertační práce obsahuje registrační část prvního a klasifikační část třetího projektu. Druhý projekt je představen kompletně. Další část prvního a třetího projektu, obsahující specifické předzpracování dat, jsou obsaženy v disertační práci mého kolegy Ing. Romana Petera.
214	Effets des hétérogénéités du béton sur le comportement mécanique des structures à plusieurs échelles / Effects of heterogeneity of concrete on the mechanical behavior of structures at different scales Ghannoum, Maria 18 September 2017 (has links) Cette thèse contribue à la modélisation de la variabilité spatiale de la résistance à la traction des structures en béton, à différentes échelles, et son influence sur la fissuration du béton. En particulier, une loi d'effet d'échelle et des champs aléatoires sont utilisés à l'aide de deux approches:D'une part, une approche analytique probabiliste de la méthode Weakest Linkand Localization (WL2) est proposée. Cette méthode estime la distribution de la résistance à la traction, à différentes échelles, en tenant compte des redistributions des contraintes autour du point le plus faible. Cela dépend d'une longueur d'échelle, dont l'identification est discutée. Cette longueur d'échelle explique le caractère aléatoire de la résistance à la traction du béton.D'autre part, une autre contribution de cette thèse est le développement d'une méthode Élément Fini Stochastique (EFS), utilisée pour modéliser l'effet d'échelle et la variabilité spatiale de la résistance à la traction. La méthode consiste d'abord à définir un champ aléatoire, en utilisant la résistance à la traction réduite estimée à partir de l'approche analytique de WL2. Ensuite, des réalisations de champs aléatoires autocorrélées discrétisées sont générées. En outre, le choix des paramètres d'autocorrélations, utilisés pour définir les champs aléatoires, est discuté.L'applicabilité des deux méthodes est évaluée à l'aide de différentes séries expérimentales présentant des effets d'échelle particulièrement statistique. En outre, la méthode EFS est utilisée pour compléter le modèle EF simplifié de la maquette d'enceinte à double paroi VeRCoRs (échelle 1/3). Les incertitudes sur la résistance à la traction, à cette échelle, sont modélisées à l'aide d'un champ aléatoire autocorrelé indépendant à chaque levée. La propagation des incertitudes, à l'état initial, montre sa pertinence dans l'estimation des positions de fissures. / This thesis is a contribution to the modeling of the spatial variability of tensile strength of concrete structures, at different scales, and its influence on concrete cracking pattern. Particularly, a size effect law and random fields are used through two approaches:On the one hand, an analytical probabilistic approach of the Weakest Link and Localization (WL2) method is proposed. This method estimates the distribution of the tensile strength, at different scales, accounting for the stress redistributions around the weakest point.It depends on a scale length, whose identification is discussed. This scale length accounts for spatial randomness of the concrete tensile strengthOn the other hand, another contribution of this thesis is the development of a Stochastic Finite Element (SFE) method, used to model both size effect and the spatial variability of the tensile strength. The method consists first on defining a random field, using the mean tensile strength estimated from the analytical approach of WL2. Then, discretized autocorrelated random field realizations are generated. Moreover, the choice of autocorrelation parameters, used to define the random fields, is discussed.The applicability of both methods is evaluated using various experimental series exhibiting particularly statistical size effect. Furthermore, the SFE method is used to complete the simplified FE model of a 1/3 mock-up of a double-wall containment building. The uncertainties on the tensile strength, at this scale, are modeled using independent autocorrelated random field at each scale. Uncertainties propagation, at initial state, shows its pertinence in the estimation of crack positions. Résistance à la traction Variabilité Effets d'échelle Éléments finies stochastiques Champs aléatoires Longueur d'échelle WL2 Longueur d'autocorrélation Tensile strength Variability Size effect Stochastic finite element Random fields Scale length WL2 Autocorrelation length 620
215	Efficient Algorithms for Structured Output Learning Balamurugan, P January 2014 (has links) (PDF) Structured output learning is the machine learning task of building a classiﬁer to predict structured outputs. Structured outputs arise in several contexts in diverse applications like natural language processing, computer vision, bioinformatics and social networks. Unlike the simple two(or multi)-class outputs which belong to a set of distinct or univariate categories, structured outputs are composed of multiple components with complex interdependencies amongst them. As an illustrative example ,consider the natural language processing task of tagging a sentence with its corresponding part-of-speech tags. The part-of-speech tag sequence is an example of a structured output as it is made up of multiple components, the interactions among them being governed by the underlying properties of the language. This thesis provides eﬃcient solutions for diﬀerent problems pertaining to structured output learning. The classiﬁer for structured outputs is generally built by learning a suitable model from a set of training examples labeled with their associated structured outputs. Discriminative techniques like Structural Support Vector Machines(Structural SVMs) and Conditional Random Fields(CRFs) are popular alternatives developed for structured output learning. The thesis contributes towards developing eﬃcient training strategies for structural SVMs. In particular, an eﬃcient sequential optimization method is proposed for structural SVMs, which is faster than several competing methods. An extension of the sequential method to CRFs is also developed. The sequential method is adapted to a variant of structural SVM with linear cumulative loss. The thesis also presents a systematic empirical evaluation of various training methods available for structured output learning, which will be useful to the practitioner. To train structural SVMs in the presence of a vast number of training examples without labels, the thesis develops a simple semi-supervised technique based on switching the labels of the components of the structured output. The proposed technique is general and its eﬃcacy is demonstrated using experiments on diﬀerent benchmark applications. Another contribution of the thesis is towards the design of fast algorithms for sparse structured output learning. Eﬃcient alternating optimization algorithms are developed for sparse classiﬁer design. These algorithms are shown to achieve sparse models faster, when compared to existing methods. Structured Output Learning Structured Output Learning Algorithms Machine learning Structural Support Vector Machines Sparse Structured Output Learning Sequential Dual Methods Semi-supervised Structural SVMs Structural SVMs Sparse Structural SVMs Computer Science
216	Modèles de classification hiérarchiques d'images satellitaires multi-résolutions, multi-temporelles et multi-capteurs. Application aux désastres naturels / Hierarchical joint classification models for multi-resolution, multi-temporal and multi-sensor remote sensing images. Application to natural disasters Hedhli, Ihsen 18 March 2016 (has links) Les moyens mis en œuvre pour surveiller la surface de la Terre, notamment les zones urbaines, en cas de catastrophes naturelles telles que les inondations ou les tremblements de terre, et pour évaluer l’impact de ces événements, jouent un rôle primordial du point de vue sociétal, économique et humain. Dans ce cadre, des méthodes de classification précises et efficaces sont des outils particulièrement importants pour aider à l’évaluation rapide et fiable des changements au sol et des dommages provoqués. Étant données l’énorme quantité et la variété des données Haute Résolution (HR) disponibles grâce aux missions satellitaires de dernière génération et de différents types, telles que Pléiades, COSMO-SkyMed ou RadarSat-2 la principale difficulté est de trouver un classifieur qui puisse prendre en compte des données multi-bande, multi-résolution, multi-date et éventuellement multi-capteur tout en gardant un temps de calcul acceptable. Les approches de classification multi-date/multi-capteur et multi-résolution sont fondées sur une modélisation statistique explicite. En fait, le modèle développé consiste en un classifieur bayésien supervisé qui combine un modèle statistique conditionnel par classe intégrant des informations pixel par pixel à la même résolution et un champ de Markov hiérarchique fusionnant l’information spatio-temporelle et multi-résolution, en se basant sur le critère des Modes Marginales a Posteriori (MPM en anglais), qui vise à affecter à chaque pixel l’étiquette optimale en maximisant récursivement la probabilité marginale a posteriori, étant donné l’ensemble des observations multi-temporelles ou multi-capteur / The capabilities to monitor the Earth's surface, notably in urban and built-up areas, for example in the framework of the protection from environmental disasters such as floods or earthquakes, play important roles in multiple social, economic, and human viewpoints. In this framework, accurate and time-efficient classification methods are important tools required to support the rapid and reliable assessment of ground changes and damages induced by a disaster, in particular when an extensive area has been affected. Given the substantial amount and variety of data available currently from last generation very-high resolution (VHR) satellite missions such as Pléiades, COSMO-SkyMed, or RadarSat-2, the main methodological difficulty is to develop classifiers that are powerful and flexible enough to utilize the benefits of multiband, multiresolution, multi-date, and possibly multi-sensor input imagery. With the proposed approaches, multi-date/multi-sensor and multi-resolution fusion are based on explicit statistical modeling. The method combines a joint statistical model of multi-sensor and multi-temporal images through hierarchical Markov random field (MRF) modeling, leading to statistical supervised classification approaches. We have developed novel hierarchical Markov random field models, based on the marginal posterior modes (MPM) criterion, that support information extraction from multi-temporal and/or multi-sensor information and allow the joint supervised classification of multiple images taken over the same area at different times, from different sensors, and/or at different spatial resolutions. The developed methods have been experimentally validated with complex optical multispectral (Pléiades), X-band SAR (COSMO-Skymed), and C-band SAR (RadarSat-2) imagery taken from the Haiti site Images satellitaires Séries temporelles Multi-résolution Multi-capteur Quad-arbre Classification Champs de Markov hiérarchiques MPM Satellite images Image time series Multi-resolution Multi-sensor Quad-tree Classification Hierarchical Markov random fields MPM
217	Nové metody nadvzorkování obrazu / New methods for super-resolution imaging Kučera, Ondřej January 2012 (has links) This master's thesis deals with methods of increasing the image resolution. It contens as a description of theoretical principles and description of calculations which are wellknown nowdays and are usually used for increasing image resolution both description of new methods which are used in this area of image procesing. It also contens a method which I suggested myself. There is also a description of methods for an evaluation of image similarity and a comparation of results from methods which are described in this thesis. This thesis includes implementations of selected methods in programming language MATLAB. It was created an application, which realizes some methods of increasing image and evaluate their results relation to the original image using PSNR and SSIM index.
218	Časový snímek z obrazu stacionární kamery / Time Lapse from Stationary Camera Image Turek, Lukáš January 2015 (has links) The topic of this master's thesis is the time lapse from stationary camera images. Unwanted phenomena, which arise in time lapse, were analyzed and algorithms to overcome these limitations were designed. The algorithms were implemented and compared using the captured dataset. The resulting application creates time lapse from the video input and allows users to choose the processing technique including the setting of appropriate parameters.
219	Modèles exponentiels et contraintes sur les espaces de recherche en traduction automatique et pour le transfert cross-lingue / Log-linear Models and Search Space Constraints in Statistical Machine Translation and Cross-lingual Transfer Pécheux, Nicolas 27 September 2016 (has links) La plupart des méthodes de traitement automatique des langues (TAL) peuvent être formalisées comme des problèmes de prédiction, dans lesquels on cherche à choisir automatiquement l'hypothèse la plus plausible parmi un très grand nombre de candidats. Malgré de nombreux travaux qui ont permis de mieux prendre en compte la structure de l'ensemble des hypothèses, la taille de l'espace de recherche est généralement trop grande pour permettre son exploration exhaustive. Dans ce travail, nous nous intéressons à l'importance du design de l'espace de recherche et étudions l'utilisation de contraintes pour en réduire la taille et la complexité. Nous nous appuyons sur l'étude de trois problèmes linguistiques — l'analyse morpho-syntaxique, le transfert cross-lingue et le problème du réordonnancement en traduction — pour mettre en lumière les risques, les avantages et les enjeux du choix de l'espace de recherche dans les problèmes de TAL.Par exemple, lorsque l'on dispose d'informations a priori sur les sorties possibles d'un problème d'apprentissage structuré, il semble naturel de les inclure dans le processus de modélisation pour réduire l'espace de recherche et ainsi permettre une accélération des traitements lors de la phase d'apprentissage. Une étude de cas sur les modèles exponentiels pour l'analyse morpho-syntaxique montre paradoxalement que cela peut conduire à d'importantes dégradations des résultats, et cela même quand les contraintes associées sont pertinentes. Parallèlement, nous considérons l'utilisation de ce type de contraintes pour généraliser le problème de l'apprentissage supervisé au cas où l'on ne dispose que d'informations partielles et incomplètes lors de l'apprentissage, qui apparaît par exemple lors du transfert cross-lingue d'annotations. Nous étudions deux méthodes d'apprentissage faiblement supervisé, que nous formalisons dans le cadre de l'apprentissage ambigu, appliquées à l'analyse morpho-syntaxiques de langues peu dotées en ressources linguistiques.Enfin, nous nous intéressons au design de l'espace de recherche en traduction automatique. Les divergences dans l'ordre des mots lors du processus de traduction posent un problème combinatoire difficile. En effet, il n'est pas possible de considérer l'ensemble factoriel de tous les réordonnancements possibles, et des contraintes sur les permutations s'avèrent nécessaires. Nous comparons différents jeux de contraintes et explorons l'importance de l'espace de réordonnancement dans les performances globales d'un système de traduction. Si un meilleur design permet d'obtenir de meilleurs résultats, nous montrons cependant que la marge d'amélioration se situe principalement dans l'évaluation des réordonnancements plutôt que dans la qualité de l'espace de recherche. / Most natural language processing tasks are modeled as prediction problems where one aims at finding the best scoring hypothesis from a very large pool of possible outputs. Even if algorithms are designed to leverage some kind of structure, the output space is often too large to be searched exaustively. This work aims at understanding the importance of the search space and the possible use of constraints to reduce it in size and complexity. We report in this thesis three case studies which highlight the risk and benefits of manipulating the seach space in learning and inference.When information about the possible outputs of a sequence labeling task is available, it may seem appropriate to include this knowledge into the system, so as to facilitate and speed-up learning and inference. A case study on type constraints for CRFs however shows that using such constraints at training time is likely to drastically reduce performance, even when these constraints are both correct and useful at decoding.On the other side, we also consider possible relaxations of the supervision space, as in the case of learning with latent variables, or when only partial supervision is available, which we cast as ambiguous learning. Such weakly supervised methods, together with cross-lingual transfer and dictionary crawling techniques, allow us to develop natural language processing tools for under-resourced languages. Word order differences between languages pose several combinatorial challenges to machine translation and the constraints on word reorderings have a great impact on the set of potential translations that is explored during search. We study reordering constraints that allow to restrict the factorial space of permutations and explore the impact of the reordering search space design on machine translation performance. However, we show that even though it might be desirable to design better reordering spaces, model and search errors seem yet to be the most important issues. Traduction automatique Contraintes de réordonnancements Étiquetage morpho-Syntaxique Transfert cross-Lingue Apprentissage faiblement supervisé Champs markoviens aléatoires Statistical Machine Translation Reordering Contraints Part-Of-Speech Tagging Cross-Lingual Transfer Weakly Supervised Learning Conditional Random Fields
220	Efficient and Scalable Subgraph Statistics using Regenerative Markov Chain Monte Carlo Mayank Kakodkar (12463929) 26 April 2022 (has links) <p>In recent years there has been a growing interest in data mining and graph machine learning for techniques that can obtain frequencies of <em>k</em>-node Connected Induced Subgraphs (<em>k</em>-CIS) contained in large real-world graphs. While recent work has shown that 5-CISs can be counted exactly, no exact polynomial-time algorithms are known that solve this task for <em>k </em>> 5. In the past, sampling-based algorithms that work well in moderately-sized graphs for <em>k</em> ≤ 8 have been proposed. In this thesis I push this boundary up to <em>k</em> ≤ 16 for graphs containing up to 120M edges, and to <em>k</em> ≤ 25 for smaller graphs containing between a million to 20M edges. I do so by re-imagining two older, but elegant and memory-efficient algorithms -- FANMOD and PSRW -- which have large estimation errors by modern standards. This is because FANMOD produces highly correlated k-CIS samples and the cost of sampling the PSRW Markov chain becomes prohibitively expensive for k-CIS’s larger than <em>k </em>> 8.</p> <p>In this thesis, I introduce:</p> <p>(a) <strong>RTS:</strong> a novel regenerative Markov chain Monte Carlo (MCMC) sampling procedure on the tree, generated on-the-fly by the FANMOD algorithm. RTS is able to run on multiple cores and multiple machines (embarrassingly parallel) and compute confidence intervals of estimates, all this while preserving the memory-efficient nature of FANMOD. RTS is thus able to estimate subgraph statistics for <em>k</em> ≤ 16 for larger graphs containing up to 120M edges, and for <em>k</em> ≤ 25 for smaller graphs containing between a million to 20M edges.</p> <p>(b) <strong>R-PSRW:</strong> which scales the PSRW algorithm to larger CIS-sizes using a rejection sampling procedure to efficiently sample transitions from the PSRW Markov chain. R-PSRW matches RTS in terms of scaling to larger CIS sizes.</p> <p>(c) <strong>Ripple:</strong> which achieves unprecedented scalability by stratifying the R-PSRW Markov chain state-space into ordered strata via a new technique that I call <em>sequential stratified regeneration</em>. I show that the Ripple estimator is consistent, highly parallelizable, and scales well. Ripple is able to <em>count</em> CISs of size up to <em>k </em>≤ 12 in real world graphs containing up to 120M edges.</p> <p>My empirical results show that the proposed methods offer a considerable improvement over the state-of-the-art. Moreover my methods are able to run at a scale that has been considered unreachable until now, not only by prior MCMC-based methods but also by other sampling approaches. </p> <p><strong>Optimization of Restricted Boltzmann Machines. </strong>In addition, I also propose a regenerative transformation of MCMC samplers of Restricted Boltzmann Machines RBMs. My approach, Markov Chain Las Vegas (MCLV) gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has a maximum number of Markov chain step-count <em>K</em> (referred as MCLV-<em>K</em>). I present a MCLV-<em>K</em> gradient estimator (LVS-<em>K</em>) for RBMs and explore the correspondence and differences between LVS-<em>K</em> and Contrastive Divergence (CD-<em>K</em>). LVS-<em>K</em> significantly outperforms CD-<em>K</em> in the task of training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models.</p> Pattern Recognition and Data Mining Markov Chain Monte Carlo Random Walk Regenerative Sampling Motif Analysis Subgraph Counting Graph Mining Energy Based Models Generative Models Markov Random Fields Restricted Boltzmann Machine Random Walk Tours

Search results