11 |
Inférence de réseaux de régulation de gènes à partir de données dynamiques multi-échelles / Gene regulatory network inference from dynamic multi-scale dataBonnaffoux, Arnaud 12 October 2018 (has links)
L'inférence des réseaux de régulation de gènes (RRG) à partir de données d'expression est un défi majeur en biologie. L’arrivée des technologies de mesure de transcriptomique à l’échelle de la cellule a suscité de nombreux espoirs, mais paradoxalement elles montrent une nouvelle complexité du problème d’inférence des RRG qui limite encore les approches existantes. Nous avons commencé par montrer, à partir de données d'expression en cellules uniques acquises sur un modèle aviaire de différenciation érythrocytaire, que les RRG sont des systèmes stochastiques à l'échelle de la cellule et qu'il y a une évolution dynamique de cette stochasticité au cours du processus de différenciation (Richard et al, PLOS Comp.Biol., 2016). C'est pourquoi nous avons développé par la suite un modèle de RRG mécaniste qui inclus cette stochasticité afin d'exploiter au maximum l'information des données expérimentales à l'échelle de la cellule (Herbach et al, BMC Sys.Biol., 2017). Ce modèle décrit les interactions entre gènes comme un couplage de processus de Markov déterministes par morceaux. En régime stationnaire une formule explicite de la distribution jointe est dérivée du modèle et peut servir à inférer des réseaux simples. Afin d'exploiter l'information dynamique et d'intégrer d'autres données expérimentales (protéomique, demi-vie des ARN), j’ai développé à partir du modèle précédent une approche itérative, intégrative et parallèle, baptisée WASABI qui est basé sur le concept de vague d'expression (Bonnaffoux et al, en révision, 2018). Cette approche originale a été validée sur des modèles in-silico de RRG, puis sur nos données in-vitro. Les RRG inférés affichent une structure de réseau originale au regard de la littérature, avec un rôle central du stimulus et une topologie très distribuée et limitée. Les résultats montrent que WASABI surmonte certaines limitations des approches existantes et sera certainement utile pour aider les biologistes dans l’analyse et l’intégration de leurs données. / Inference of gene regulatory networks from gene expression data has been a long-standing and notoriously difficult task in systems biology. Recently, single-cell transcriptomic data have been massively used for gene regulatory network inference, with both successes and limitations.In the present work we propose an iterative algorithm called WASABI, dedicated to inferring a causal dynamical network from timestamped single-cell data, which tackles some of the limitations associated with current approaches. We first introduce the concept of waves, which posits that the information provided by an external stimulus will affect genes one-byone through a cascade, like waves spreading through a network. This concept allows us to infer the network one gene at a time, after genes have been ordered regarding their time of regulation. We then demonstrate the ability of WASABI to correctly infer small networks, which have been simulated in-silico using a mechanistic model consisting of coupled piecewise-deterministic Markov processes for the proper description of gene expression at the single-cell level. We finally apply WASABI on in-vitro generated data on an avian model of erythroid differentiation. The structure of the resulting gene regulatory network sheds a fascinating new light on the molecular mechanisms controlling this process. In particular, we find no evidence for hub genes and a much more distributed network structure than expected. Interestingly, we find that a majority of genes are under the direct control of the differentiation-inducing stimulus. Together, these results demonstrate WASABI versatility and ability to tackle some general gene regulatory networks inference issues. It is our hope that WASABI will prove useful in helping biologists to fully exploit the power of time-stamped single-cell data.
|
12 |
Inference of gene networks from time series expression data and application to type 1 DiabetesLopes, Miguel 04 September 2015 (has links)
The inference of gene regulatory networks (GRN) is of great importance to medical research, as causal mechanisms responsible for phenotypes are unravelled and potential therapeutical targets identified. In type 1 diabetes, insulin producing pancreatic beta-cells are the target of an auto-immune attack leading to apoptosis (cell suicide). Although key genes and regulations have been identified, a precise characterization of the process leading to beta-cell apoptosis has not been achieved yet. The inference of relevant molecular pathways in type 1 diabetes is then a crucial research topic. GRN inference from gene expression data (obtained from microarrays and RNA-seq technology) is a causal inference problem which may be tackled with well-established statistical and machine learning concepts. In particular, the use of time series facilitates the identification of the causal direction in cause-effect gene pairs. However, inference from gene expression data is a very challenging problem due to the large number of existing genes (in human, over twenty thousand) and the typical low number of samples in gene expression datasets. In this context, it is important to correctly assess the accuracy of network inference methods. The contributions of this thesis are on three distinct aspects. The first is on inference assessment using precision-recall curves, in particular using the area under the curve (AUPRC). The typical approach to assess AUPRC significance is using Monte Carlo, and a parametric alternative is proposed. It consists on deriving the mean and variance of the null AUPRC and then using these parameters to fit a beta distribution approximating the true distribution. The second contribution is an investigation on network inference from time series. Several state of the art strategies are experimentally assessed and novel heuristics are proposed. One is a fast approximation of first order Granger causality scores, suited for GRN inference in the large variable case. Another identifies co-regulated genes (ie. regulated by the same genes). Both are experimentally validated using microarray and simulated time series. The third contribution of this thesis is on the context of type 1 diabetes and is a study on beta cell gene expression after exposure to cytokines, emulating the mechanisms leading to apoptosis. 8 datasets of beta cell gene expression were used to identify differentially expressed genes before and after 24h, which were functionally characterized using bioinformatics tools. The two most differentially expressed genes, previously unknown in the type 1 Diabetes literature (RIPK2 and ELF3) were found to modulate cytokine induced apoptosis. A regulatory network was then inferred using a dynamic adaptation of a state of the art network inference method. Three out of four predicted regulations (involving RIPK2 and ELF3) were experimentally confirmed, providing a proof of concept for the adopted approach. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
|
13 |
Information, causality, and observability approaches to understand complex systemsBianco-Martinez, Ezequiel Julian January 2015 (has links)
The objective of this thesis is to propose fundamental concepts, analytical and numerical tools, and approaches to characterize, understand, and better observe complex systems. The scientific contribution of this thesis can be separated in tree topics. In the first one, we show how to theoretically estimate the Mutual Information Rate (MIR), the amount of mutual information transmitted per unit of time between two time-series. We then show how a quantity derived from it can be successfully used to infer the network structure of a complex system. The proposed inference methodology shows to be robust in the presence of additive noise, different time-series lengths, and heterogeneous node dynamics and coupling strengths. It also shows to be superior in performance for networks formed by nodes possessing different time-scales, as compared to inference methods based on mutual information (MI). In the second topic, a deep analysis of causality from the space-time properties of the observed probabilistic space is performed. We show the existence of special regions in the state space which indicate variable ranges responsible for most of the information exchanged between two variables. We define a new causality measure named CaMI that explores a property we have understood: in order to detect if there is a flow of information from X to Y, one only needs to check the positiveness of the MI between trajectories in X and Y, however assuming that the observational resolution in Y is larger than in X. Moreover, we show how the assessment of causality can be done when we consider partitions with arbitrary, but equal rectangular cells in the probabilist space, what naturally facilitates the calculation of CaMI. In the third topic, we develop a symbolic coefficient of observability that allows us to understand what is the reduced set of accessible variables to observe a complex system, such that it can be fully reconstructed from the set of observed variables, regardless of its dimension. Using this symbolic coefficient, we explain how it is possible to compare different complex systems from the point of view of observability and how to construct systems of any dimensionality that can be fully observed by only one variable.
|
14 |
Derivation and Use of Gene Network Models to Make Quantitative Predictions of Genetic Interaction DataPhenix, Hilary January 2017 (has links)
This thesis investigates how pairwise combinatorial gene and stimulus perturbation experiments are conducted and interpreted. In particular, I investigate gene perturbation in the form of knockout, which can be achieved in a pairwise manner by SGA or CRISPR/Cas9 methods. In the present literature, I distinguish two approaches to interpretation: the calculation of stimulus and gene interactions, and the identification of equality among phenotypes measured for distinct perturbation conditions. I describe how each approach has been applied to derive hypotheses about gene regulatory networks. I identify conflicts and uncertainties in the assumptions allowing these derivations, and explore theoretically and experimentally approaches to improve the interpretation of genetic interaction data. I apply the approaches to a well-studied gene regulatory branch of the DNA damage checkpoint (DDC) pathway of Saccharomyces cerevisiae, and confirm the known order of genes within this pathway. I also describe observations that seem inconsistent with this pathway structure. I explore this inconsistency experimentally and discover that high concentrations of the DNA alkylating drug methyl methanesulfonate cause a cell division arrest program distinct from a G1 or G2/M checkpoint or from DNA damage adaptation, that resembles an endocycle.
|
15 |
Graph algorithms : network inference and planar graph optimization / Algorithmes des graphes : inférence des réseaux et optimisation dans les graphes planairesZhou, Hang 06 July 2015 (has links)
Cette thèse porte sur deux sujets d’algorithmique des graphes. Le premier sujet est l’inférence de réseaux. Quelle est la complexité pour déterminer un graphe inconnu à partir de requêtes de plus court chemin entre ses sommets ? Nous supposons que le graphe est de degré borné. Dans le problème de reconstruction, le but est de reconstruire le graphe ; tandis que dans le problème de vérification, le but est de vérifier qu’un graphe donné est correct. Nous développons des algorithmes probabilistes utilisant une décomposition en cellules de Voronoi. Ensuite, nous analysons des algorithmes de type glouton, et montrons qu’ils sont quasi-optimaux. Nous étudions aussi ces problèmes sur des familles particulières de graphes, démontrons des bornes inférieures, et étudions la reconstruction approximative. Le deuxième sujet est l’étude de deux problèmes d’optimisation sur les graphes planaires. Dans le problème de classification par corrélations, l’entrée est un graphe pondéré, où chaque arête a une étiquette h+i ou h-i, indiquant si ses extrémités sont ou non dans la même catégorie. Le but est de trouver une partition des sommets en catégories qui respecte au mieux les étiquettes. Dans le problème d’augmentation 2-arête-connexe, l’entrée est un graphe pondéré et un sous-ensemble R des arêtes. Le but est de trouver un sous-ensemble S des arêtes de poids minimum, tel que pour chaque arête de R, ses extrémités sont dans une composante 2-arête-connexe de l’union de R et S. Pour les graphes planaires, nous réduisons le premier problème au deuxième et montrons que les deux problèmes, bien que NP-durs, ont un schéma d’approximation en temps polynomial. Nous utilisons la technique récente de décomposition en briques. / This thesis focuses on two topics of graph algorithms. The first topic is network inference. How efficiently can we find an unknown graph using shortest path queries between its vertices? We assume that the graph has bounded degree. In the reconstruction problem, the goal is to find the graph; and in the verification problem, the goal is to check whether a given graph is correct. We provide randomized algorithms based on a Voronoi cell decomposition. Next, we analyze greedy algorithms, and show that they are near-optimal. We also study the problems on special graph classes, prove lower bounds, and study the approximate reconstruction. The second topic is optimization in planar graphs. We study two problems. In the correlation clustering problem, the input is a weighted graph, where every edge has a label of h+i or h−i, indicating whether its endpoints are in the same category or in different categories. The goal is to find a partition of the vertices into categories that tries to respect the labels. In the two-edge-connected augmentation problem, the input is a weighted graph and a subset R of edges. The goal is to produce a minimum-weight subset S of edges, such that for every edge in R, its endpoints are two-edge-connected in the union of R and S. For planar graphs, we reduce correlation clustering to two-edge-connected augmentation, and show that both problems, although they are NP-hard, have a polynomial-time approximation scheme. We build on the brick decomposition technique developed recently.
|
16 |
Parameter optimization of linear ordinary differential equations with application in gene regulatory network inference problems / Parameteroptimering av linjära ordinära differentialekvationer med tillämpningar inom inferensproblem i regulatoriska gennätverkDeng, Yue January 2014 (has links)
In this thesis we analyze parameter optimization problems governed by linear ordinary differential equations (ODEs) and develop computationally efficient numerical methods for their solution. In addition, a series of noise-robust finite difference formulas are given for the estimation of the derivatives in the ODEs. The suggested methods have been employed to identify Gene Regulatory Networks (GRNs). GRNs are responsible for the expression of thousands of genes in any given developmental process. Network inference deals with deciphering the complex interplay of genes in order to characterize the cellular state directly from experimental data. Even though a plethora of methods using diverse conceptual ideas has been developed, a reliable network reconstruction remains challenging. This is due to several reasons, including the huge number of possible topologies, high level of noise, and the complexity of gene regulation at different levels. A promising approach is dynamic modeling using differential equations. In this thesis we present such an approach to infer quantitative dynamic models from biological data which addresses inherent weaknesses in the current state-of-the-art methods for data-driven reconstruction of GRNs. The method is computationally cheap such that the size of the network (model complexity) is no longer a main concern with respect to the computational cost but due to data limitations; the challenge is a huge number of possible topologies. Therefore we embed a filtration step into the method to reduce the number of free parameters before simulating dynamical behavior. The latter is used to produce more information about the network’s structure. We evaluate our method on simulated data, and study its performance with respect to data set size and levels of noise on a 1565-gene E.coli gene regulatory network. We show the computation time over various network sizes and estimate the order of computational complexity. Results on five networks in the benchmark collection DREAM4 Challenge are also presented. Results on five networks in the benchmark collection DREAM4 Challenge are also presented and show our method to outperform the current state of the art methods on synthetic data and allows the reconstruction of bio-physically accurate dynamic models from noisy data. / I detta examensarbete analyserar vi parameteroptimeringsproblem som är beskrivna med ordinära differentialekvationer (ODEer) och utvecklar beräkningstekniskt effektiva numeriska metoder för att beräkna lösningen. Dessutom härleder vi brusrobusta finita-differens approximationer för uppskattning av derivator i ODEn. De föreslagna metoderna har tillämpats för regulatoriska gennätverk (RGN). RGNer är ansvariga för uttrycket av tusentals gener. Nätverksinferens handlar om att identifiera den komplicerad interaktionen mellan gener för att kunna karaktärisera cellernas tillstånd direkt från experimentella data. Tillförlitlig nätverksrekonstruktion är ett utmanande problem, trots att många metoder som använder många olika typer av konceptuella idéer har utvecklats. Detta beror på flera olika saker, inklusive att det finns ett enormt antal topologier, mycket brus, och komplexiteten av genregulering på olika nivåer. Ett lovande angreppssätt är dynamisk modellering från biologiska data som angriper en underliggande svaghet i den för tillfället ledande metoden för data-driven rekonstruktion. Metoden är beräkningstekniskt billig så att storleken på nätverket inte längre är huvudproblemet för beräkningen men ligger fortfarande i databegränsningar. Utmaningen är ett enormt antal av topologier. Därför bygger vi in ett filtreringssteg i metoder för att reducera antalet fria parameterar och simulerar sedan det dynamiska beteendet. Anledningen är att producera mer information om nätverkets struktur. Vi utvärderar metoden på simulerat data, och studierar dess prestanda med avseende på datastorlek och brusnivå genom att tillämpa den på ett regulartoriskt gennätverk med 1565-gen E.coli. Vi illustrerar beräkningstiden över olika nätverksstorlekar och uppskattar beräkningskomplexiteten. Resultat på fem nätverk från DREAM4 är också presenterade och visar att vår metod har bättre prestanda än nuvarande metoder när de tillämpas på syntetiska data och tillåter rekonstruktion av bio-fysikaliskt noggranna dynamiska modeller från data med brus.
|
17 |
Evaluation of network inference algorithms and their effects on network analysis for the study of small metabolomic data setsGreenyer, Haley 24 May 2022 (has links)
Motivation: Alzheimer’s Disease (AD) is a highly prevalent, neurodegenerative
disease which causes gradual cognitive decline. As documented in the literature, evi-
dence has recently mounted for the role of metabolic dysfunction in AD. Metabolomic
data has therefore been increasingly used in AD studies. Metabolomic disease studies
often suffer from small sample sizes and inflated false discovery rates. It is therefore
of great importance to identify algorithms best suited for the inference of metabolic
networks from small cohort disease studies. For future benchmarking, and for the
development of new metabolic network inference methods, it is similarly important
to identify appropriate performance measures for small sample sizes.
Results: The performances of 13 different network inference algorithms, includ-
ing correlation-based, regression-based, information theoretic, and hybrid methods,
were assessed through benchmarking and structural network analyses. Benchmark-
ing was performed on simulated data with known structures across six sample sizes
using three different summative performance measures: area under the Receiver Op-
erating Characteristic Curve, area under the Precision Recall Curve, and Matthews
Correlation Coefficient. Structural analyses (commonly applied in disease studies),
including betweenness, closeness, and eigenvector centrality were applied to simu-
lated data. Differential network analysis was additionally applied to experimental
AD data. Based on the performance measure benchmarking and network analysis
results, I identified Probabilistic Context Likelihood Relatedness of Correlation with
Biweight Midcorrelation (PCLRCb) (a novel variation of the PCLRC algorithm)
to be best suited for the prediction of metabolic networks from small-cohort disease
studies. Additionally, I identified Matthews Correlation Coefficient as the best mea-
sure with which to evaluate the performance of metabolic network inference methods
across small sample sizes. / Graduate
|
18 |
Network inference from sparse single-cell transcriptomics data: Exploring, exploiting, and evaluating the single-cell toolboxSteinheuer, Lisa Maria 04 April 2022 (has links)
Large-scale transcriptomics data studies revolutionised the fields of systems biology and medicine, allowing to generate deeper mechanistic insights into biological pathways and molecular functions. However, conventional bulk RNA-sequencing results in the analysis of an averaged signal of many input cells, which are homogenised during the experimental procedure.
Hence, those insights represent only a coarse-grained picture, potentially missing information from rare or unidentified cell types. Allowing for an unprecedented level of resolution, single-cell transcriptomics may help to identify and characterise new cell types, unravel developmental trajectories, and facilitate inference of cell type-specific networks. Besides all these tempting promises, there is one main limitation that currently hampers many downstream tasks: single-cell RNA-sequencing data is characterised by a high degree of sparsity.
Due to this limitation, no reliable network inference tools allowed to disentangle the hidden information in the single-cell data.
Single-cell correlation networks likely hold previously masked information and could allow inferring new insights into cell type-specific networks. To harness the potential of single-cell transcriptomics data, this dissertation sought to evaluate the influence of data dropout on network inference and how this might be alleviated. However, two premisses must be met to fulfil the promise of cell type-specific networks: (I) cell type annotation and (II) reliable network inference. Since any experimentally generated scRNA-seq data is associated with an unknown degree of dropout, a benchmarking framework was set up using a synthetic gold data set, which was subsequently affected with different defined degrees of dropout. Aiming to desparsify the dropout-afflicted data, the influence of various imputations tools on the network
structure was further evaluated. The results highlighted that for moderate dropout levels, a deep count autoencoder (DCA) was able to outperform the other tools and the unimputed data. To fulfil the premiss of cell type annotation, the impact of data imputation on cell-cell correlations was investigated using a human retina organoid data set. The results highlighted that no imputation tool intervened with cell cluster annotation.
Based on the encouraging results of the benchmarking analysis, a window of opportunity was identified, which allowed for meaningful network inference from imputed single-cell RNA-seq data. Therefore, the inference of cell type-specific networks subsequent to DCA-imputation was evaluated in a human retina organoid data set. To understand the differences and commonalities of cell type-specific networks, those were analysed for cones and rods, two closely related photoreceptor cell types of the retina. Comparing the importance of marker genes for rods and cones between their respective cell type-specific networks exhibited that these genes were of high importance, i.e. had hub-gene-like properties in one module of the corresponding network but were of less importance in the opposing network. Furthermore, it was analysed how many hub genes in general preserved their status across cell type-specific networks and whether they associate with similar or diverging sub-networks. While a set of preserved hub genes was identified, a few were linked to completely different network structures. One candidate was EIF4EBP1, a eukaryotic translation initiation factor binding protein, which is associated with a retinal pathology called age-related macular degeneration (AMD). These results suggest that given very defined prerequisites, data imputation via DCA can indeed facilitate cell type-specific network inference, delivering promising biological insights.
Referring back to AMD, a major cause for the loss of central vision in patients older than 65, neither the defined mechanisms of pathogenesis nor treatment options are at hand. However, light can be shed on this disease through the employment of organoid model systems since they resemble the in vivo organ composition while reducing its complexity and ethical concerns. Therefore, a recently developed human retina organoid system (HRO) was investigated using the single-cell toolbox to evaluate whether it provides a useful base to study the defined effects on the onset and progression of AMD in the future. In particular, different workflows for a robust and in-depth annotation of cell types were used, including literature-based and transfer learning approaches. These allowed to state that the organoid system may reproduce hallmarks of a more central retina, which is an important determinant of AMD pathogenesis. Also, using trajectory analysis, it could be detected that the organoids in part reproduce major developmental hallmarks of the retina, but that different HRO samples exhibited developmental differences that point at different degrees of maturation. Altogether, this analysis allowed to deeply characterise a human retinal organoid system, which revealed in vivo-like outcomes and features as pinpointing discrepancies. These results could be used to refine culture conditions during the organoid differentiation to optimise its utility as a disease model.
In summary, this dissertation describes a workflow that, in contrast to the current state of the art in the literature enables the inference of cell type-specific gene regulatory networks.
The thesis illustrated that such networks indeed differ even between closely related cells.
Thus, single-cell transcriptomics can yield unprecedented insights into so far not understood cell regulatory principles, particularly rare cell types that are so far hardly reflected in bulk-derived RNA-seq data.
|
19 |
Complexity penalized methods for structured and unstructured dataGoeva, Aleksandrina 08 November 2017 (has links)
A fundamental goal of statisticians is to make inferences from the sample about characteristics of the underlying population. This is an inverse problem, since we are trying to recover a feature of the input with the availability of observations on an output. Towards this end, we consider complexity penalized methods, because they balance goodness of fit and generalizability of the solution. The data from the underlying population may come in diverse formats - structured or unstructured - such as probability distributions, text tokens, or graph characteristics. Depending on the defining features of the problem we can chose the appropriate complexity penalized approach, and assess the quality of the estimate produced by it. Favorable characteristics are strong theoretical guarantees of closeness to the true value and interpretability. Our work fits within this framework and spans the areas of simulation optimization, text mining and network inference. The first problem we consider is model calibration under the assumption that given a hypothesized input model, we can use stochastic simulation to obtain its corresponding output observations. We formulate it as a stochastic program by maximizing the entropy of the input distribution subject to moment matching. We then propose an iterative scheme via simulation to approximately solve it. We prove convergence of the proposed algorithm under appropriate conditions and demonstrate the performance via numerical studies. The second problem we consider is summarizing text documents through an inferred set of topics. We propose a frequentist reformulation of a Bayesian regularization scheme. Through our complexity-penalized perspective we lend further insight into the nature of the loss function and the regularization achieved through the priors in the Bayesian formulation. The third problem is concerned with the impact of sampling on the degree distribution of a network. Under many sampling designs, we have a linear inverse problem characterized by an ill-conditioned matrix. We investigate the theoretical properties of an approximate solution for the degree distribution found by regularizing the solution of the ill-conditioned least squares objective. Particularly, we study the rate at which the penalized solution tends to the true value as a function of network size and sampling rate.
|
20 |
Supervised Inference of Gene Regulatory NetworksSen, Malabika Ashit 09 September 2021 (has links)
A gene regulatory network (GRN) records the interactions among transcription
factors and their target genes. GRNs are useful to study how transcription factors (TFs) control
gene expression as cells transition between states during differentiation and development.
Scientists usually construct GRNs by careful examination and study of the literature. This
process is slow and painstaking and does not scale to large networks. In this thesis, we study
the problem of inferring GRNs automatically from gene expression data. Recent data-driven
approaches to infer GRNs increasingly rely on single-cell level RNA-sequencing (scRNA-seq)
data. Most of these methods rely on unsupervised or association based strategies, which
cannot leverage known regulatory interactions by design. To facilitate supervised learning,
we propose a novel graph convolutional neural network (GCN) based autoencoder to infer
new regulatory edges from a known GRN and scRNA-seq data. As the name suggests, a
GCN-based autoencoder consists of an encoder that learns a low-dimensional embedding
of the nodes (genes) in the input graph (the GRN) through a series of graph convolution
operations and a decoder that aims to reconstruct the original graph as accurately as possible.
We investigate several GCN-based architectures to determine the ideal encoder-decoder
combination for GRN reconstruction. We systematically study the performance of these
and other supervised learning methods on different mouse and human scRNA-seq datasets
for two types of evaluation. We demonstrate that our GCN-based approach substantially
outperforms traditional machine learning approaches. / Master of Science / In multi-cellular living organisms, stem cells differentiate into multiple cell types.
Proteins called transcription factors (TFs) control the activity of genes to effect these transitions.
It is possible to represent these interactions abstractly using a gene regulatory network
(GRN). In a GRN, each node is a TF or a gene and each edge connects a TF to a gene or
TF that it controls. New high-throughput technologies that can measure gene expression
(activity) in individual cells provide rich data that can be used to construct GRNs. In this
thesis, we take advantage of recent advances in the field of machine learning to develop
a new computational method for computationally constructing GRNs. The distinguishing
property of our technique is that it is supervised, i.e., it uses experimentally-known interactions
to infer new regulatory connections. We investigate several variations of this approach
to reconstruct a GRN as close to the original network as possible. We analyze and provide
a rationale for the decisions made in designing, evaluating, and choosing the characteristics
of our predictor. We show that our predictor has a reconstruction accuracy that is superior
to other supervised-learning approaches.
|
Page generated in 0.0974 seconds