Global ETD Search

1	Understanding responses to external stimuli using network-based approaches / Vers une meilleur compréhension des réponses cellulaires aux stimuli externes en utilisant des approches informatiques dit réseaux Gwinner, Konrad Frederik 15 May 2014 (has links) Pendant mes travaux de thèse, j'ai développé et appliqué des méthodes informatiques utilisant des données de réseaux afin d'aider l'analyse des données biologiques à haut-débit. Ma thèse consiste en trois projets : L'identification de protéines supplémentaires dans des approches de protéomique différentielle à l'aide des réseaux d'interaction protéiques, l'identification de réseaux régulatoires sous-jacents aux réponses aux stress abiotiques dans arabidopsis thaliana et l'analyse de signature transcriptomique de réponse immunitaire d'hôte spécifique à différentes étapes d'infection par shigella flexneri. / In the course of my Ph.D work, i have developed and applied methods making use of network information to adavance the analysis of high-throughput biological data. My thesis comprises three projects :- The identification of additional proteins in differential protemics using protein interaction networks. In this study, we developed a novel computational approach based on protein-protein interaction networks to identify a list of proteins that might have remained undetected in differential proteomic profiling experiments.- The transcriptional regulatory networks underlying responses to environmental stresses. Based on publicly available data, measuring the response of A. Thaliana to a set of abiotic stresses in a time-resolved manner, we applied two complimentary approaches to derive gene regulatory networks underlying the plant's response to the perceived stresses.- The analysis of transcriptional host immune response signatures specific for distinct stages of infection by shigella flexneri. During their host invasion process, shigella localize to different subcellular niches. Inférence de réseaux Network inference
2	Inferring Gene Regulatory Networks from Expression Data using Ensemble Methods Slawek, Janusz 01 May 2014 (has links) High-throughput technologies for measuring gene expression made inferring of the genome-wide Gene Regulatory Networks an active field of research. Reverse-engineering of systems of transcriptional regulations became an important challenge in molecular and computational biology. Because such systems model dependencies between genes, they are important in understanding of cell behavior, and can potentially turn observed expression data into the new biological knowledge and practical applications. In this dissertation we introduce a set of algorithms, which infer networks of transcriptional regulations from variety of expression profiles with superior accuracy compared to the state-of-the-art techniques. The proposed methods make use of ensembles of trees, which became popular in many scientific fields, including genetics and bioinformatics. However, originally they were motivated from the perspective of classification, regression, and feature selection theory. In this study we exploit their relative variable importance measure as an indication of the presence or absence of a regulatory interaction between genes. We further analyze their predictions on a set of the universally recognized benchmark expression data sets, and achieve favorable results in compare with the state-of-the-art algorithms. Bioinformatics Gene Regulatory Networks Network Inference Ensemble Learning Boosting Engineering
3	Information-Theoretic Variable Selection and Network Inference from Microarray Data Meyer, Patrick E 16 December 2008 (has links) Statisticians are used to model interactions between variables on the basis of observed data. In a lot of emerging fields, like bioinformatics, they are confronted with datasets having thousands of variables, a lot of noise, non-linear dependencies and, only, tens of samples. The detection of functional relationships, when such uncertainty is contained in data, constitutes a major challenge. Our work focuses on variable selection and network inference from datasets having many variables and few samples (high variable-to-sample ratio), such as microarray data. Variable selection is the topic of machine learning whose objective is to select, among a set of input variables, those that lead to the best predictive model. The application of variable selection methods to gene expression data allows, for example, to improve cancer diagnosis and prognosis by identifying a new molecular signature of the disease. Network inference consists in representing the dependencies between the variables of a dataset by a graph. Hence, when applied to microarray data, network inference can reverse-engineer the transcriptional regulatory network of cell in view of discovering new drug targets to cure diseases. In this work, two original tools are proposed MASSIVE (Matrix of Average Sub-Subset Information for Variable Elimination) a new method of feature selection and MRNET (Minimum Redundancy NETwork), a new algorithm of network inference. Both tools rely on the computation of mutual information, an information-theoretic measure of dependency. More precisely, MASSIVE and MRNET use approximations of the mutual information between a subset of variables and a target variable based on combinations of mutual informations between sub-subsets of variables and the target. The used approximations allow to estimate a series of low variate densities instead of one large multivariate density. Low variate densities are well-suited for dealing with high variable-to-sample ratio datasets, since they are rather cheap in terms of computational cost and they do not require a large amount of samples in order to be estimated accurately. Numerous experimental results show the competitiveness of these new approaches. Finally, our thesis has led to a freely available source code of MASSIVE and an open-source R and Bioconductor package of network inference. microarray analysis information theory variable selection network inference
4	Machine learning approach to reconstructing signalling pathways and interaction networks in biology Dondelinger, Frank January 2013 (has links) In this doctoral thesis, I present my research into applying machine learning techniques for reconstructing species interaction networks in ecology, reconstructing molecular signalling pathways and gene regulatory networks in systems biology, and inferring parameters in ordinary differential equation (ODE) models of signalling pathways. Together, the methods I have developed for these applications demonstrate the usefulness of machine learning for reconstructing networks and inferring network parameters from data. The thesis consists of three parts. The first part is a detailed comparison of applying static Bayesian networks, relevance vector machines, and linear regression with L1 regularisation (LASSO) to the problem of reconstructing species interaction networks from species absence/presence data in ecology (Faisal et al., 2010). I describe how I generated data from a stochastic population model to test the different methods and how the simulation study led us to introduce spatial autocorrelation as an important covariate. I also show how we used the results of the simulation study to apply the methods to presence/absence data of bird species from the European Bird Atlas. The second part of the thesis describes a time-varying, non-homogeneous dynamic Bayesian network model for reconstructing signalling pathways and gene regulatory networks, based on L`ebre et al. (2010). I show how my work has extended this model to incorporate different types of hierarchical Bayesian information sharing priors and different coupling strategies among nodes in the network. The introduction of these priors reduces the inference uncertainty by putting a penalty on the number of structure changes among network segments separated by inferred changepoints (Dondelinger et al., 2010; Husmeier et al., 2010; Dondelinger et al., 2012b). Using both synthetic and real data, I demonstrate that using information sharing priors leads to a better reconstruction accuracy of the underlying gene regulatory networks, and I compare the different priors and coupling strategies. I show the results of applying the model to gene expression datasets from Drosophila melanogaster and Arabidopsis thaliana, as well as to a synthetic biology gene expression dataset from Saccharomyces cerevisiae. In each case, the underlying network is time-varying; for Drosophila melanogaster, as a consequence of measuring gene expression during different developmental stages; for Arabidopsis thaliana, as a consequence of measuring gene expression for circadian clock genes under different conditions; and for the synthetic biology dataset, as a consequence of changing the growth environment. I show that in addition to inferring sensible network structures, the model also successfully predicts the locations of changepoints. The third and final part of this thesis is concerned with parameter inference in ODE models of biological systems. This problem is of interest to systems biology researchers, as kinetic reaction parameters can often not be measured, or can only be estimated imprecisely from experimental data. Due to the cost of numerically solving the ODE system after each parameter adaptation, this is a computationally challenging problem. Gradient matching techniques circumvent this problem by directly fitting the derivatives of the ODE to the slope of an interpolant. I present an inference procedure for a model using nonparametric Bayesian statistics with Gaussian processes, based on Calderhead et al. (2008). I show that the new inference procedure improves on the original formulation in Calderhead et al. (2008) and I present the result of applying it to ODE models of predator-prey interactions, a circadian clock gene, a signal transduction pathway, and the JAK/STAT pathway. 006.3
5	Learning COVID-19 network from literature databases using core decomposition Guo, Yang 22 July 2021 (has links) The SARS-CoV-2 coronavirus is responsible for millions of deaths around the world. To help contribute to the understanding of crucial knowledge and to further generate new hypotheses relevant to SARS-CoV-2 and human protein interactions, we make use of the information abundant Biomine probabilistic database and extend the experimentally identified SARS-CoV-2-human protein-protein interaction (PPI) network in silico. We generate an extended network by integrating information from the Biomine database and the PPI network. To generate novel hypotheses, we focus on the high-connectivity sub-communities that overlap most with the PPI network in the extended network. Therefore, we propose a new data analysis pipeline that can efficiently compute core decomposition on the extended network and identify dense subgraphs. We then evaluate the identified dense subgraph and the generated hypotheses in three contexts: literature validation for uncovered virus targeting genes and proteins, gene function enrichment analysis on subgraphs, and literature support on drug repurposing for identified tissues and diseases related to COVID-19. The majority types of the generated hypotheses are proteins with their encoding genes and we rank them by sorting their connections to known PPI network nodes. In addition, we compile a comprehensive list of novel genes, and proteins potentially related to COVID-19, as well as novel diseases which might be comorbidities. Together with the generated hypotheses, our results provide novel knowledge relevant to COVID-19 for further validation. / Graduate COVID-19 Core Decomposition Network Inference Data Mining Graph Theory
6	A Machine Learning Approach to Predict Gene Regulatory Networks in Seed Development in Arabidopsis Using Time Series Gene Expression Data Ni, Ying 08 July 2016 (has links) Gene regulatory networks (GRNs) provide a natural representation of relationships between regulators and target genes. Though inferring GRN is a challenging task, many methods, including unsupervised and supervised approaches, have been developed in the literature. However, most of these methods target non-context-specific GRNs. Because the regulatory relationships consistently reprogram under different tissues or biological processes, non-context-specific GRNs may not fit some specific conditions. In addition, a detailed investigation of the prediction results has remained elusive. In this study, I propose to use a machine learning approach to predict GRNs that occur in developmental stage-specific networks and to show how it improves our understanding of the GRN in seed development. I developed a Beacon GRN inference tool to predict a GRN in seed development in Arabidopsis based on a support vector machine (SVM) local model. Using the time series gene expression levels in seed development and prior known regulatory relationships, I evaluated and predicted the GRN at this specific biological process. The prediction results show that one gene may be controlled by multiple regulators. The targets that are strongly positively correlated with their regulators are mostly expressed at the beginning of seed development. The direct targets were detected when I found a match between the promoter regions of the targets and the regulator's binding sequence. Our prediction provides a novel testable hypotheses of a GRN in seed development in Arabidopsis, and the Beacon GRN inference tool provides a valuable model system for context-specific GRN inference. / Master of Science Network inference signal transduction pathways gene expression support vector machines
7	Towards Machine Learning Inference in the Data Plane Langlet, Jonatan January 2019 (has links) Recently, machine learning has been considered an important tool for various networkingrelated use cases such as intrusion detection, flow classification, etc. Traditionally, machinelearning based classification algorithms run on dedicated machines that are outside of thefast path, e.g. on Deep Packet Inspection boxes, etc. This imposes additional latency inorder to detect threats or classify the flows.With the recent advance of programmable data planes, implementing advanced function-ality directly in the fast path is now a possibility. In this thesis, we propose to implementArtificial Neural Network inference together with flow metadata extraction directly in thedata plane of P4 programmable switches, routers, or Network Interface Cards (NICs).We design a P4 pipeline, optimize the memory and computational operations for our dataplane target, a programmable NIC with Micro-C external support. The results show thatneural networks of a reasonable size (i.e. 3 hidden layers with 30 neurons each) can pro-cess flows totaling over a million packets per second, while the packet latency impact fromextracting a total of 46 features is 1.85μs. Computer Sciences Datavetenskap (datalogi)
8	Network inference using independence criteria Verbyla, Petras January 2018 (has links) Biological systems are driven by complex regulatory processes. Graphical models play a crucial role in the analysis and reconstruction of such processes. It is possible to derive regulatory models using network inference algorithms from high-throughput data, for example; from gene or protein expression data. A wide variety of network inference algorithms have been designed and implemented. Our aim is to explore the possibilities of using statistical independence criteria for biological network inference. The contributions of our work can be categorized into four sections. First, we provide a detailed overview of some of the most popular general independence criteria: distance covariance (dCov), kernel canonical variance (KCC), kernel generalized variance (KGV) and the Hilbert-Schmidt Independence Criterion (HSIC). We provide easy to understand geometrical interpretations for these criteria. We also explicitly show the equivalence of dCov, KGV and HSIC. Second, we introduce a new criterion for measuring dependence based on the signal to noise ratio (SNRIC). SNRIC is significantly faster to compute than other popular independence criteria. SNRIC is an approximate criterion but becomes exact under many popular modelling assumptions, for example for data from an additive noise model. Third, we compare the performance of the independence criteria on biological experimental data within the framework of the PC algorithm. Since not all criteria are available in a version that allows for testing conditional independence, we propose and test an approach which relies on residuals and requires only an unconditional version of an independence criterion. Finally we propose a novel method to infer networks with feedback loops. We use an MCMC sampler, which samples using a loss function based on an independence criterion. This allows us to find networks under very general assumptions, such as non-linear relationships, non-Gaussian noise distributions and feedback loops.
9	Novel methods for biological network inference : an application to circadian Ca2+ signaling network Jin, Junyang January 2018 (has links) Biological processes involve complex biochemical interactions among a large number of species like cells, RNA, proteins and metabolites. Learning these interactions is essential to interfering artificially with biological processes in order to, for example, improve crop yield, develop new therapies, and predict new cell or organism behaviors to genetic or environmental perturbations. For a biological process, two pieces of information are of most interest. For a particular species, the first step is to learn which other species are regulating it. This reveals topology and causality. The second step involves learning the precise mechanisms of how this regulation occurs. This step reveals the dynamics of the system. Applying this process to all species leads to the complete dynamical network. Systems biology is making considerable efforts to learn biological networks at low experimental costs. The main goal of this thesis is to develop advanced methods to build models for biological networks, taking the circadian system of Arabidopsis thaliana as a case study. A variety of network inference approaches have been proposed in the literature to study dynamic biological networks. However, many successful methods either require prior knowledge of the system or focus more on topology. This thesis presents novel methods that identify both network topology and dynamics, and do not depend on prior knowledge. Hence, the proposed methods are applicable to general biological networks. These methods are initially developed for linear systems, and, at the cost of higher computational complexity, can also be applied to nonlinear systems. Overall, we propose four methods with increasing computational complexity: one-to-one, combined group and element sparse Bayesian learning (GESBL), the kernel method and reversible jump Markov chain Monte Carlo method (RJMCMC). All methods are tested with challenging dynamical network simulations (including feedback, random networks, different levels of noise and number of samples), and realistic models of circadian system of Arabidopsis thaliana. These simulations show that, while the one-to-one method scales to the whole genome, the kernel method and RJMCMC method are superior for smaller networks. They are robust to tuning variables and able to provide stable performance. The simulations also imply the advantage of GESBL and RJMCMC over the state-of-the-art method. We envision that the estimated models can benefit a wide range of research. For example, they can locate biological compounds responsible for human disease through mathematical analysis and help predict the effectiveness of new treatments.
10	A Mathematical Modeling And Approximation Of Gene Expression Patterns By Linear And Quadratic Regulatory Relations And Analysis Of Gene Networks Yilmaz, Fatma Bilge 01 September 2004 (has links) (PDF) This thesis mainly concerns modeling, approximation and inference of gene regulatory dynamics on the basis of gene expression patterns. The dynamical behavior of gene expressions is represented by a system of ordinary dierential equations. We introduce a gene-interaction matrix with some nonlinear entries, in particular, quadratic polynomials of the expression levels to keep the system solvable. The model parameters are determined by using optimization. Then, we provide the time-discrete approximation of our time-continuous model. We analyze the approximating model under the aspect of stability. Finally, from the considered models we derive gene regulatory networks, discuss their qualitative features of the networks and provide a basis for analyzing networks with nonlinear connections. QA General 15707

Search results