Global ETD Search

41	Methods for modelling human functional brain networks with MEG and fMRI Colclough, Giles January 2016 (has links) MEG and fMRI offer complementary insights into connected human brain function. Evidence from the use of both techniques in the study of networked activity indicates that functional connectivity reflects almost every measurable aspect of human reality, being indicative of ability and deteriorating with disease. Functional network analyses may offer improved prediction of dysfunction and characterisation of cognition. Three factors holding back progress are the difficulty in synthesising information from multiple imaging modalities; a need for accurate modelling of connectivity in individual subjects, not just average effects; and a lack of scalable solutions to these problems that are applicable in a big-data setting. I propose two methodological advances that tackle these issues. A confound to network analysis in MEG, the artificial correlations induced across the brain by the process of source reconstruction, prevents the transfer of connectivity models from fMRI to MEG. The first advance is a fast correction for this confound, allowing comparable analyses to be performed in both modalities. A comparative study demonstrates that this new approach for MEG shows better repeatability for connectivity estimation, both within and between subjects, than a wide range of alternative models in popular use. A case-study analysis uses both fMRI and MEG recordings from a large dataset to determine the genetic basis for functional connectivity in the human brain. Genes account for 20% - 65% of the variation in connectivity, and outweigh the influence of the developmental environment. The second advance is a Bayesian hierarchical model for sparse functional networks that is applicable to both modalities. By sharing information over a group of subjects, more accurate estimates can be constructed for individuals' connectivity patterns. The approach scales to large datasets, outperforms state-of-the-art methods, and can provide a 50% noise reduction in MEG resting-state networks.
42	On regularized estimation methods for precision and covariance matrix and statistical network inference Kuismin, M. (Markku) 14 November 2018 (has links) Abstract Estimation of the covariance matrix is an important problem in statistics in general because the covariance matrix is an essential part of principal component analysis, statistical pattern recognition, multivariate regression and network exploration, just to mention but a few applications. Penalized likelihood methods are used when standard estimates cannot be computed. This is a common case when the number of explanatory variables is much larger compared to the sample size (high-dimensional case). An alternative ridge-type estimator for the precision matrix estimation is introduced in Article I. This estimate is derived using a penalized likelihood estimation method. Undirected networks, which are connected to penalized covariance and precision matrix estimation and some applications related to networks are also explored in this dissertation. In Article II novel statistical methods are used to infer population networks from discrete measurements of genetic data. More precisely, Least Absolute Shrinkage and Selection Operator, LASSO for short, is applied in neighborhood selection. This inferred network is used for more detailed inference of population structures. We illustrate how community detection can be a promising tool in population structure and admixture exploration of genetic data. In addition, in Article IV it is shown how the precision matrix estimator introduced in Article I can be used in graphical model selection via a multiple hypothesis testing procedure. Article III in this dissertation contains a review of current tools for practical graphical model selection and precision/covariance matrix estimation. The other three publications have detailed descriptions of the fundamental computational and mathematical results which create a basis for the methods presented in these articles. Each publication contains a collection of practical research questions where the novel methods can be applied. We hope that these applications will help readers to better understand the possible applications of the methods presented in this dissertation. / Tiivistelmä Kovarianssimatriisin estimointi on yleisesti ottaen tärkeä tilastotieteen ongelma, koska kovarianssimatriisi on oleellinen osa pääkomponenttianalyysia, tilastollista hahmontunnistusta, monimuuttujaregressiota ja verkkojen tutkimista, vain muutamia sovellutuksia mainitakseni. Sakotettuja suurimman uskottavuuden menetelmiä käytetään sellaisissa tilanteissa, joissa tavanomaisia estimaatteja ei voida laskea. Tämä on tyypillistä tilanteessa, jossa selittävien muuttujien lukumäärä on hyvin suuri verrattuna otoskokoon (englanninkielisessä kirjallisuudessa tämä tunnetaan nimellä ”high dimensional case”). Ensimmäisessä artikkelissa esitellään vaihtoehtoinen harjanne (ridge)-tyyppinen estimaattori tarkkuusmatriisin estimointiin. Tämä estimaatti on johdettu käyttäen sakotettua suurimman uskottavuuden estimointimenetelmää. Tässä väitöskirjassa käsitellään myös suuntaamattomia verkkoja, jotka liittyvät läheisesti sakotettuun kovarianssi- ja tarkkuusmatriisin estimointiin, sekä joitakin verkkoihin liittyviä sovelluksia. Toisessa artikkelissa käytetään uusia tilastotieteen menetelmiä populaatioverkon päättelyyn epäjatkuvista mittauksista. Tarkemmin sanottuna Lassoa (Least Absolute Shrinkage and Selection Operator) sovelletaan naapuruston valinnassa. Näin muodostettua verkkoa hyödynnetään tarkemmassa populaatiorakenteen tarkastelussa. Havainnollistamme, kuinka verkon kommuunien (communities) tunnistaminen saattaa olla lupaava tapa tutkia populaatiorakennetta ja populaation sekoittumista (admixture) geneettisestä datasta. Lisäksi neljännessä artikkelissa näytetään, kuinka ensimmäisessä artikkelissa esiteltyä tarkkuusmatriisin estimaattoria voidaan käyttää graafisessa mallinvalinnassa usean hypoteesin testauksen avulla. Tämän väitöskirjan kolmas artikkeli sisältää yleiskatsauksen tämänhetkisistä työkaluista, joiden avulla voidaan valita graafinen malli ja estimoida tarkkuus- sekä kovarianssimatriiseja. Muissa kolmessa julkaisussa on kuvailtu yksityiskohtaisesti olennaisia laskennallisista ja matemaattisista tuloksista, joihin artikkeleissa esitellyt estimointimenetelmät perustuvat. Jokaisessa julkaisussa on kokoelma käytännöllisiä tutkimuskysymyksiä, joihin voidaan soveltaa uusia estimointimenetelmiä. Toivomme, että nämä sovellukset auttavat lukijaa ymmärtämään paremmin tässä väitöskirjassa esiteltyjen menetelmien käyttömahdollisuuksia. LASSO covariance matrix graphical model network estimation precision matrix ridge Lasso graafinen malli kovarianssimatriisi ridge tarkkuusmatriisi verkkojen estimointi high-dimensional setting
43	Réseau bayésien dynamique hybride : application à la modélisation de la fiabilité de systèmes à espaces d'états discrets / hybrid dynamic bayesian network : application to reliability modeling of discrete state spaces systems Petiet, Florence 01 July 2019 (has links) L'analyse de fiabilité fait partie intégrante de la conception et du fonctionnement du système, en particulier pour les systèmes exécutant des applications critiques. Des travaux récents ont montré l'intérêt d'utiliser les réseaux bayésiens dans le domaine de la fiabilité, pour modélisation la dégradation d'un système. Les modèles graphiques de durée sont un cas particulier des réseaux bayésiens, qui permettent de s'affranchir de la propriété markovienne des réseaux bayésiens dynamiques. Ils s'adaptent aux systèmes dont le temps de séjour dans chaque état n'est pas nécessairement distribué exponentiellement, comme c'est le cas dans la plupart des applications industrielles. Des travaux antérieurs ont toutefois montré des limitations à ces modèles en terme de capacité de stockage et de temps de calcul, en raison du caractère discret de la variable temps de séjour. Une solution pourrait consister à considérer une variable de durée continue. Selon les avis d'experts, les variables de temps de séjour suivent une distribution de Weibull dans de nombreux systèmes. L'objectif de la thèse est d'intégrer des variables de temps de séjour suivant une distribution de Weibull dans un modèle de durée graphique en proposant une nouvelle approche. Après une présentation des réseaux bayésiens, et plus particulièrement des modèles graphiques de durée et leur limitation, ce rapport s'attache à présenter le nouveau modèle permettant la modélisation du processus de dégradation. Ce nouveau modèle est appelé modèle graphique de durée hybride Weibull. Un algorithme original permettant l'inférence dans un tel réseau a été mis en place. L'étape suivante a été la validation de l'approche. Ne disposant pas de données, il a été nécessaire de simuler des séquences d'états du système. Différentes bases de données ainsi construites ont permis d'apprendre d'un part un modèle graphique de durée, et d'autre part un modèle graphique de durée hybride-Weibull, afin de les comparer, que ce soit en terme de qualité d’apprentissage, de qualité d’inférence, de temps de calcul, et de capacité de stockage / Reliability analysis is an integral part of system design and operation, especially for systems running critical applications. Recent works have shown the interest of using Bayesian Networks in the field of reliability, for modeling the degradation of a system. The Graphical Duration Models are a specific case of Bayesian Networks, which make it possible to overcome the Markovian property of dynamic Bayesian Networks. They adapt to systems whose sojourn-time in each state is not necessarily exponentially distributed, which is the case for most industrial applications. Previous works, however, have shown limitations in these models in terms of storage capacity and computing time, due to the discrete nature of the sojourn time variable. A solution might be to allow the sojourn time variable to be continuous. According to expert opinion, sojourn time variables follow a Weibull distribution in many systems. The goal of this thesis is to integrate sojour time variables following a Weibull distribution in a Graphical Duration Model by proposing a new approach. After a presentation of the Bayesian networks, and more particularly graphical duration models, and their limitations, this report focus on presenting the new model allowing the modeling of the degradation process. This new model is called Weibull Hybrid Graphical Duration Model. An original algorithm allowing inference in such a network has been deployed. Various so built databases allowed to learn on one hand a Graphical Duration Model, and on an other hand a Graphical Duration Model Hybrid - Weibull, in order to compare them, in term of learning quality, of inference quality, of compute time, and of storage space Modèle graphique probabiliste Inférence statistique Fiabilité Réseau bayésiens dynamiques Approche hybride Modèle stochastique Probabilistic Graphical Model Stasticial inference Reliability Dynamic Bayesian Network Hybrid approach Stochastic modelisation
44	Classification et inférence de réseaux pour les données RNA-seq / Clustering and network inference for RNA-seq data Gallopin, Mélina 09 December 2015 (has links) Cette thèse regroupe des contributions méthodologiques à l'analyse statistique des données issues des technologies de séquençage du transcriptome (RNA-seq). Les difficultés de modélisation des données de comptage RNA-seq sont liées à leur caractère discret et au faible nombre d'échantillons disponibles, limité par le coût financier du séquençage. Une première partie de travaux de cette thèse porte sur la classification à l'aide de modèle de mélange. L'objectif de la classification est la détection de modules de gènes co-exprimés. Un choix naturel de modélisation des données RNA-seq est un modèle de mélange de lois de Poisson. Mais des transformations simples des données permettent de se ramener à un modèle de mélange de lois gaussiennes. Nous proposons de comparer, pour chaque jeu de données RNA-seq, les différentes modélisations à l'aide d'un critère objectif permettant de sélectionner la modélisation la plus adaptée aux données. Par ailleurs, nous présentons un critère de sélection de modèle prenant en compte des informations biologiques externes sur les gènes. Ce critère facilite l'obtention de classes biologiquement interprétables. Il n'est pas spécifique aux données RNA-seq. Il est utile à toute analyse de co-expression à l'aide de modèles de mélange visant à enrichir les bases de données d'annotations fonctionnelles des gènes. Une seconde partie de travaux de cette thèse porte sur l'inférence de réseau à l'aide d'un modèle graphique. L'objectif de l'inférence de réseau est la détection des relations de dépendance entre les niveaux d'expression des gènes. Nous proposons un modèle d'inférence de réseau basé sur des lois de Poisson, prenant en compte le caractère discret et la grande variabilité inter-échantillons des données RNA-seq. Cependant, les méthodes d'inférence de réseau nécessitent un nombre d'échantillons élevé.Dans le cadre du modèle graphique gaussien, modèle concurrent au précédent, nous présentons une approche non-asymptotique pour sélectionner des sous-ensembles de gènes pertinents, en décomposant la matrice variance en blocs diagonaux. Cette méthode n'est pas spécifique aux données RNA-seq et permet de réduire la dimension de tout problème d'inférence de réseau basé sur le modèle graphique gaussien. / This thesis gathers methodologicals contributions to the statistical analysis of next-generation high-throughput transcriptome sequencing data (RNA-seq). RNA-seq data are discrete and the number of samples sequenced is usually small due to the cost of the technology. These two points are the main statistical challenges for modelling RNA-seq data.The first part of the thesis is dedicated to the co-expression analysis of RNA-seq data using model-based clustering. A natural model for discrete RNA-seq data is a Poisson mixture model. However, a Gaussian mixture model in conjunction with a simple transformation applied to the data is a reasonable alternative. We propose to compare the two alternatives using a data-driven criterion to select the model that best fits each dataset. In addition, we present a model selection criterion to take into account external gene annotations. This model selection criterion is not specific to RNA-seq data. It is useful in any co-expression analysis using model-based clustering designed to enrich functional annotation databases.The second part of the thesis is dedicated to network inference using graphical models. The aim of network inference is to detect relationships among genes based on their expression. We propose a network inference model based on a Poisson distribution taking into account the discrete nature and high inter sample variability of RNA-seq data. However, network inference methods require a large number of samples. For Gaussian graphical models, we propose a non-asymptotic approach to detect relevant subsets of genes based on a block-diagonale decomposition of the covariance matrix. This method is not specific to RNA-seq data and reduces the dimension of any network inference problem based on the Gaussian graphical model. Modèle de mélange Modèle graphique RNA-Seq data Classification Inférence de réseau Sélection de modèle Mixture model Graphical model selection RNA-Seq data Clustering Network inference Model selection
45	Hybridation GPS/Vision monoculaire pour la navigation autonome d'un robot en milieu extérieur / Outdoor robotic navigation by GPS and monocular vision sensors fusion Codol, Jean-Marie 15 February 2012 (has links) On assiste aujourd'hui à l'importation des NTIC (Nouvelles Technologies de l'Information et de la Télécommunication) dans la robotique. L'union de ces technologies donnera naissance, dans les années à venir, à la robotique de service grand-public.Cet avenir, s'il se réalise, sera le fruit d'un travail de recherche, amont, dans de nombreux domaines : la mécatronique, les télécommunications, l'automatique, le traitement du signal et des images, l'intelligence artificielle ... Un des aspects particulièrement intéressant en robotique mobile est alors le problème de la localisation et de la cartographie simultanée. En effet, dans de nombreux cas, un robot mobile, pour accéder à une intelligence, doit nécessairement se localiser dans son environnement. La question est alors : quelle précision pouvons-nous espérer en terme de localisation? Et à quel coût?Dans ce contexte, un des objectifs de tous les laboratoires de recherche en robotique, objectif dont les résultats sont particulièrement attendus dans les milieux industriels, est un positionnement et une cartographie de l'environnement, qui soient à la fois précis, tous-lieux, intègre, bas-coût et temps-réel. Les capteurs de prédilection sont les capteurs peu onéreux tels qu'un GPS standard (de précision métrique), et un ensemble de capteurs embarquables en charge utile (comme les caméras-vidéo). Ce type de capteurs constituera donc notre support privilégié, dans notre travail de recherche. Dans cette thèse, nous aborderons le problème de la localisation d'un robot mobile, et nous choisirons de traiter notre problème par l'approche probabiliste. La démarche est la suivante, nous définissons nos 'variables d'intérêt' : un ensemble de variables aléatoires. Nous décrivons ensuite leurs lois de distribution, et leur modèles d'évolution, enfin nous déterminons une fonction de coût, de manière à construire un observateur (une classe d'algorithme dont l'objectif est de déterminer le minimum de notre fonction de coût). Notre contribution consistera en l'utilisation de mesures GPS brutes GPS (les mesures brutes - ou raw-datas - sont les mesures issues des boucles de corrélation de code et de phase, respectivement appelées mesures de pseudo-distances de code et de phase) pour une navigation bas-coût précise en milieu extérieur suburbain. En utilisant la propriété dite 'entière' des ambiguïtés de phase GPS, nous étendrons notre navigation pour réaliser un système GPS-RTK (Real Time Kinematic) en mode différentiel local précise et bas-coût. Nos propositions sont validées par des expérimentations réalisées sur notre démonstrateur robotique. / We are witnessing nowadays the importation of ICT (Information and Communications Technology) in robotics. These technologies will give birth, in upcoming years, to the general public service robotics. This future, if realised, shall be the result of many research conducted in several domains: mechatronics, telecommunications, automatics, signal and image processing, artificial intelligence ... One particularly interesting aspect in mobile robotics is hence the simultaneous localisation and mapping problem. Consequently, to access certain informations, a mobile robot has, in many cases, to map/localise itself inside its environment. The following question is then posed: What precision can we aim for in terms of localisation? And at what cost?In this context, one of the objectives of many laboratories indulged in robotics research, and where results impact directly the industry, is the positioning and mapping of the environment. These latter tasks should be precise, adapted everywhere, integrated, low-cost and real-time. The prediction sensors are inexpensive ones, such as a standard GPS (of metric precision), and a set of embeddable payload sensors (e.g. video cameras). These type of sensors constitute the main support in our work.In this thesis, we shed light on the localisation problem of a mobile robot, which we choose to handle with a probabilistic approach. The procedure is as follows: we first define our "variables of interest" which are a set of random variables, and then we describe their distribution laws and their evolution models. Afterwards, we determine a cost function in such a manner to build up an observer (an algorithmic class where the objective is to minimize the cost function).Our contribution consists of using brute GPS measures (brute measures or raw datas are measures issued from code and phase correlation loops, called pseudo-distance measures of code and phase, respectively) for a low-cost navigation, which is precise in an external suburban environment. By implementing the so-called "whole" property of GPS phase ambiguities, we expand the navigation to achieve a GPS-RTK (Real-Time Kinematic) system in a precise and low-cost local differential mode.Our propositions has been validated through experimentations realized on our robotic demonstrator. Robotique Localisation Optimisation GPS RTK Bundle adjustment Smoothing and mapping Graphe SLAM Navigation Robotic Navigation Optimisation GPS RTK Bundle Adjustment Smoothing and mapping Graphical model Localisation SLAM 621.382
46	Objective Bayesian Analysis of Kullback-Liebler Divergence of two Multivariate Normal Distributions with Common Covariance Matrix and Star-shape Gaussian Graphical Model Li, Zhonggai 22 July 2008 (has links) This dissertation consists of four independent but related parts, each in a Chapter. The first part is an introductory. It serves as the background introduction and offer preparations for later parts. The second part discusses two population multivariate normal distributions with common covariance matrix. The goal for this part is to derive objective/non-informative priors for the parameterizations and use these priors to build up constructive random posteriors of the Kullback-Liebler (KL) divergence of the two multivariate normal populations, which is proportional to the distance between the two means, weighted by the common precision matrix. We use the Cholesky decomposition for re-parameterization of the precision matrix. The KL divergence is a true distance measurement for divergence between the two multivariate normal populations with common covariance matrix. Frequentist properties of the Bayesian procedure using these objective priors are studied through analytical and numerical tools. The third part considers the star-shape Gaussian graphical model, which is a special case of undirected Gaussian graphical models. It is a multivariate normal distribution where the variables are grouped into one "global" group of variable set and several "local" groups of variable set. When conditioned on the global variable set, the local variable sets are independent of each other. We adopt the Cholesky decomposition for re-parametrization of precision matrix and derive Jeffreys' prior, reference prior, and invariant priors for new parameterizations. The frequentist properties of the Bayesian procedure using these objective priors are also studied. The last part concentrates on the discussion of objective Bayesian analysis for partial correlation coefficient and its application to multivariate Gaussian models. / Ph. D. Multivariate Normal Distributions Monte Carlo Star-shape Gaussian Graphical Model Objective Priors Jeffreys' Priors Reference Priors Invariant Haar Prior Fisher Information Matrix Frequentist Matching Kullback-Liebler Divergence
47	Statistical Methods for Genetic Pathway-Based Data Analysis Cheng, Lulu 13 November 2013 (has links) The wide application of the genomic microarray technology triggers a tremendous need in the development of the high dimensional genetic data analysis. Many statistical methods for the microarray data analysis consider one gene at a time, but they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from the prior biological knowledge and are called "pathways". We have made contributions on two specific research topics related to the high dimensional genetic pathway data. One is to propose a semi- parametric model for identifying pathways related to the zero inflated clinical outcomes; the other is to propose a multilevel Gaussian graphical model for exploring both pathway and gene level network structures. For the first problem, we develop a semiparametric model via a Bayesian hierarchical framework. We model the pathway effect nonparametrically into a zero inflated Poisson hierarchical regression model with unknown link function. The nonparametric pathway effect is estimated via the kernel machine and the unknown link function is estimated by transforming a mixture of beta cumulative density functions. Our approach provides flexible semiparametric settings to describe the complicated association between gene microarray expressions and the clinical outcomes. The Metropolis-within-Gibbs sampling algorithm and Bayes factor are used to make the statistical inferences. Our simulation results support that the semiparametric approach is more accurate and flexible than the zero inflated Poisson regression with the canonical link function, this is especially true when the number of genes is large. The usefulness of our approaches is demonstrated through its applications to a canine gene expression data set (Enerson et al., 2006). Our approaches can also be applied to other settings where a large number of highly correlated predictors are present. Unlike the first problem, the second one is to take into account that pathways are not independent of each other because of shared genes and interactions among pathways. Multi-pathway analysis has been a challenging problem because of the complex dependence structure among pathways. By considering the dependency among pathways as well as genes within each pathway, we propose a multi-level Gaussian graphical model (MGGM): one level is for pathway network and the second one is for gene network. We develop a multilevel L1 penalized likelihood approach to achieve the sparseness on both levels. We also provide an iterative weighted graphical LASSO algorithm (Guo et al., 2011) for MGGM. Some asymptotic properties of the estimator are also illustrated. Our simulation results support the advantages of our approach; our method estimates the network more accurate on the pathway level, and sparser on the gene level. We also demonstrate usefulness of our approach using the canine genes-pathways data set. / Ph. D. Adaptive GLASSO Gaussian Random Process Gene Expression Data GLASSO Marginal Likelihood Multi-Level Gaussian Graphical Model Pathway-Based Analysis Unknown Link Estimation Zero Inflated Poisson.
48	Semiparametric and Nonparametric Methods for Complex Data Kim, Byung-Jun 26 June 2020 (has links) A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing those complex data in this dissertation. We have then provided several contributions to semiparametric and nonparametric methods for dealing with the following problems: the first is to propose a method for testing the significance of a functional association under the matched study; the second is to develop a method to simultaneously identify important variables and build a network in HDHC data; the third is to propose a multi-class dynamic model for recognizing a pattern in the time-trend analysis. For the first topic, we propose a semiparametric omnibus test for testing the significance of a functional association between the clustered binary outcomes and covariates with measurement error by taking into account the effect modification of matching covariates. We develop a flexible omnibus test for testing purposes without a specific alternative form of a hypothesis. The advantages of our omnibus test are demonstrated through simulation studies and 1-4 bidirectional matched data analyses from an epidemiology study. For the second topic, we propose a joint semiparametric kernel machine network approach to provide a connection between variable selection and network estimation. Our approach is a unified and integrated method that can simultaneously identify important variables and build a network among them. We develop our approach under a semiparametric kernel machine regression framework, which can allow for the possibility that each variable might be nonlinear and is likely to interact with each other in a complicated way. We demonstrate our approach using simulation studies and real application on genetic pathway analysis. Lastly, for the third project, we propose a Bayesian focal-area detection method for a multi-class dynamic model under a Bayesian hierarchical framework. Two-step Bayesian sequential procedures are developed to estimate patterns and detect focal intervals, which can be used for gas chromatography. We demonstrate the performance of our proposed method using a simulation study and real application on gas chromatography on Fast Odor Chromatographic Sniffer (FOX) system. / Doctor of Philosophy / A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing the following three types of data: (1) matched case-crossover data, (2) HCHD data, and (3) Time-series data. We contribute to the development of statistical methods to deal with such complex data. First, under the matched study, we discuss an idea about hypothesis testing to effectively determine the association between observed factors and risk of interested disease. Because, in practice, we do not know the specific form of the association, it might be challenging to set a specific alternative hypothesis. By reflecting the reality, we consider the possibility that some observations are measured with errors. By considering these measurement errors, we develop a testing procedure under the matched case-crossover framework. This testing procedure has the flexibility to make inferences on various hypothesis settings. Second, we consider the data where the number of variables is very large compared to the sample size, and the variables are correlated to each other. In this case, our goal is to identify important variables for outcome among a large amount of the variables and build their network. For example, identifying few genes among whole genomics associated with diabetes can be used to develop biomarkers. By our proposed approach in the second project, we can identify differentially expressed and important genes and their network structure with consideration for the outcome. Lastly, we consider the scenario of changing patterns of interest over time with application to gas chromatography. We propose an efficient detection method to effectively distinguish the patterns of multi-level subjects in time-trend analysis. We suggest that our proposed method can give precious information on efficient search for the distinguishable patterns so as to reduce the burden of examining all observations in the data. Bayesian Hierarchical Model Fused Lasso Gaussian graphical model High-dimensional regression Kernel machine learning based regression Matched case-control study Measurement error in covariates Multivariate analysis Semiparametric regression
49	[en] END-TO-END CONVOLUTIONAL NEURAL NETWORK COMBINED WITH CONDITIONAL RANDOM FIELDS FOR CROP MAPPING FROM MULTITEMPORAL SAR IMAGERY / [pt] TREINAMENTO PONTA A PONTA DE REDES NEURAIS CONVOLUCIONAIS COMBINADAS COM CAMPOS ALEATÓRIOS CONDICIONAIS PARA O MAPEAMENTO DE CULTURAS A PARTIR DE IMAGENS SAR MULTITEMPORAIS LAURA ELENA CUE LA ROSA 21 May 2024 (has links) [pt] Imagens de sensoriamento remoto permitem o monitoramento e mapeamento de culturas de maneira precisa, apoiando práticas de agriculturaeficientes e sustentáveis com o objetivo de garantir a segurança alimentar.No entanto, a identificação do tipo de cultura a partir de dados de sensoriamento remoto em regiões tropicais ainda são consideradas tarefas comalto grau de dificuldade. As favoráveis condições climáticas permitem o uso,planejamento e o manejo da terra com maior flexibilidade, o que implica emculturas com dinâmicas mais complexas. Além disso, a presença constantede nuvens dificulta o uso de imagens ópticas, tornando as imagens de radar uma alternativa interessante para o mapeamento de culturas em regiõestropicais. Os modelos de campos aleatórios condicionais (CRFs) têm sidousados satisfatoriamente para explorar o contexto temporal e espacial naclassificação de imagens de sensoriamento remoto. Estes modelos oferecemuma alta precisão na classificação, no entanto, dependem de atributos extraídos manualmente com base em conhecimento especializado do domínio.Neste contexto, os métodos de aprendizado profundo, tais como as redesneurais convolucionais (CNNs), provaram ser uma alternativa robusta paraa classificação de imagens de sensoriamento, pois podem aprender atributosótimos diretamente dos dados. Este trabalho apresenta um modelo híbridobaseado em aprendizado profundo e CRF para o reconhecimento de culturas em áreas de regiões tropicais caracterizadas por ter uma dinâmicaespaço–temporal complexa. O framework proposto consiste em dois módulos: uma CNNs que modela o contexto espacial e temporal dos dados deentrada, e o CRF que modela a dinâmica temporal considerando a dependência entre rótulos para datas adjacentes. Estas dependências podem seraprendidas ou desenhadas por um especialista nas práticas de agriculturalocal. Comparações entre diferentes variantes de como modelar as transiçõestemporais são apresentadas usando sequências de imagens SAR de duas municipalidades no Brasil. Os experimentos mostraram melhorias significativasatingindo ate 30 por cento no F1 score por classe e ate 12 por cento no F1 score medio em relação ao modelo de base que não inclui dependências temporais duranteo processo de aprendizagem. / [en] Remote sensing imagery enables accurate crop mapping and monitoring, supporting efficient and sustainable agricultural practices to ensure food security. However, accurate crop type identification and crop area estimation from remote sensing data in tropical regions are still challenging tasks. Compared to the characteristic conditions of temperate regions, the more favorable weather conditions in tropical regions permit higher flexibility in land use, planning, and management, which implies complex crop dynamics. Moreover, the frequent cloud cover prevents the use of optical data during large periods of the year, making SAR data an attractive alternative for crop mapping in tropical regions. To exploit both spatial and temporal contex, conditional random fields (CRFs) models have been used successfully in the classification of RS imagery. These approaches deliver high accuracies; however, they rely on features engineering manually designed based on domain-specific knowledge. In this context, deep learning methods such as convolutional neural networks (CNNs) proved to be a robust alternative for remote sensing image classification, as they can learn optimal features and classification parameters directly from raw data. This work introduces a novel end-to-end hybrid model based on deep learning and conditional random fields for crop recognition in areas characterized by complex spatio-temporal dynamics typical of tropical regions. The proposed framework consists of two modules: a CNN that models spatial and temporal contexts from the input data and a CRF that models temporal dynamics considering label dependencies between adjacent epochs. These dependencies can be learned or designed by an expert in local agricultural practices. Comparisons between data-driven and prior-knowledge temporal constraints are presented for two municipalities in Brazil, using multi-temporal SAR image sequences. The experiments showed significant improvements in per class F1 score of up to 30 percent and up to 12 percent in average F1 score against a baseline model that doesn t include temporal dependencies during the learning process. [pt] SENSORIAMENTO REMOTO [pt] MODELO GRAFICO PROBABILISTICO [pt] SENTINEL-1 [pt] APRENDIZADO PROFUNDO [pt] RECONHECIMENTO DE CULTURAS [en] REMOTE SENSING [en] PROBABILISTIC GRAPHICAL MODEL [en] SENTINEL-1 [en] DEEP LEARNING [en] CROP RECOGNITION
50	A statistical modeling framework for analyzing tree-indexed data : application to plant development on microscopic and macroscopic scales / Un cadre de modélisation statistique pour l'analyse de données indexées par des arborescences Fernique, Pierre 10 December 2014 (has links) Nous nous intéressons à des modèles statistiques pour les données indexées par des arborescences. Dans le contexte de l'équipe Virtual Plants, équipe hôte de cette thèse, les applications d'intérêt portent sur le développement de la plante et sa modulation par des facteurs environnementaux et génétiques. Nous nous restreignons donc à des applications issues du développement de la plante, à la fois au niveau microscopique avec l'étude de la lignée cellulaire du tissu biologique servant à la croissance des plantes, et au niveau macroscopique avec le mécanisme de production de branches. Le catalogue de modèles disponibles pour les données indexées par des arborescences est beaucoup moins important que celui disponible pour les données indexées par des chemins. Cette thèse vise donc à proposer un cadre de modélisation statistique pour l'étude de patterns pour données indexées par des arborescences. À cette fin, deux classes différentes de modèles statistiques, les modèles de Markov et de détection de ruptures, sont étudiées. / We address statistical models for tree-indexed data.Tree-indexed data can be seen as a generalization of path-indexed data since directed path graphs are directed tree graphs where there is at most one child per vertex.In the context of the Virtual Plants team, host team of this thesis, applications of interest focus on plant development and its modulation by environmental and genetic factors.We thus focus on plant developmental applications, both at the microscopic level with the study of the cell lineage in the biological tissue responsible for the plant growth, and at the macroscopic level with the mechanism of production of branches. The catalog of models available for tree-indexed data is far less important than the one available for path-indexed data.This thesis therefore aims at proposing a statistical modeling framework for studying patterns in tree-indexed data.To this end, two different classes of statistical models, Markov and change-point models, are investigated. Données indexées par des arborescences Modèle graphique Modèle de markov Modèle de détection de ruptures Architecture des plantes Lignage cellulaire Tree-Indexed data Graphical model Markov model Change-Point model Plant architecture Cell lineage

Search results