A complex systems approach to important biological problems.

Berryman, Matthew John January 2007 (has links)
Complex systems are those which exhibit one or more of the following inter-related behaviours: 1. Nonlinear behaviour: the component parts do not act in linear ways, that is the superposition of the actions of the parts is not the output of the system. 2. Emergent behaviour: the output of the system may be inexpressible in terms of the rules or equations of the component parts. 3. Self-organisation: order appears from the chaotic interactions of individuals and the rules they obey. 4. Layers of description: in which a rule may apply at some higher levels of description but not at lower layers. 5. Adaptation: in which the environment becomes encoded in the rules governing the structure and/or behaviour of the parts (in this case strictly agents) that undergo selection in which those that are by some measure better become more numerous than those that are not as “fit”. A single cell is a complex system: we cannot explain all of its behaviour as simply the sum of its parts. Similarly, DNA structures, social networks, cancers, the brain, and living beings are intricate complex systems. This thesis tackles all of these topics from a complex systems approach. I have skirted some of the philosophical issues of complex systems and mainly focussed on appropriate tools to analyse these systems, addressing important questions such as: • What is the best way to extract information from DNA? • How can we model and analyse mutations in DNA? • Can we determine the likely spread of both viruses and ideas in social networks? • How can we model the growth of cancer? • How can we model and analyse interactions between genes in such living systems as the fruit fly, cancers, and humans? • Can complex systems techniques give us some insight into the human brain? / http://proxy.library.adelaide.edu.au/login?url= http://library.adelaide.edu.au/cgi-bin/Pwebrecon.cgi?BBID=1290759 / Thesis (Ph.D.)-- School of Electrical and Electronic Engineering, 2007

Robust inference of gene regulatory networks : System properties, variable selection, subnetworks, and design of experiments

Nordling, Torbjörn E. M. January 2013 (has links)
In this thesis, inference of biological networks from in vivo data generated by perturbation experiments is considered, i.e. deduction of causal interactions that exist among the observed variables. Knowledge of such regulatory influences is essential in biology. A system property–interampatteness–is introduced that explains why the variation in existing gene expression data is concentrated to a few “characteristic modes” or “eigengenes”, and why previously inferred models have a large number of false positive and false negative links. An interampatte system is characterized by strong INTERactions enabling simultaneous AMPlification and ATTEnuation of different signals and we show that perturbation of individual state variables, e.g. genes, typically leads to ill-conditioned data with both characteristic and weak modes. The weak modes are typically dominated by measurement noise due to poor excitation and their existence hampers network reconstruction. The excitation problem is solved by iterative design of correlated multi-gene perturbation experiments that counteract the intrinsic signal attenuation of the system. The next perturbation should be designed such that the expected response practically spans an additional dimension of the state space. The proposed design is numerically demonstrated for the Snf1 signalling pathway in S. cerevisiae. The impact of unperturbed and unobserved latent state variables, that exist in any real biological system, on the inferred network and required set-up of the experiments for network inference is analysed. Their existence implies that a subnetwork of pseudo-direct causal regulatory influences, accounting for all environmental effects, in general is inferred. In principle, the number of latent states and different paths between the nodes of the network can be estimated, but their identity cannot be determined unless they are observed or perturbed directly. Network inference is recognized as a variable/model selection problem and solved by considering all possible models of a specified class that can explain the data at a desired significance level, and by classifying only the links present in all of these models as existing. As shown, these links can be determined without any parameter estimation by reformulating the variable selection problem as a robust rank problem. Solution of the rank problem enable assignment of confidence to individual interactions, without resorting to any approximation or asymptotic results. This is demonstrated by reverse engineering of the synthetic IRMA gene regulatory network from published data. A previously unknown activation of transcription of SWI5 by CBF1 in the IRMA strain of S. cerevisiae is proven to exist, which serves to illustrate that even the accumulated knowledge of well studied genes is incomplete. / Denna avhandling behandlar inferens av biologiskanätverk från in vivo data genererat genom störningsexperiment, d.v.s. bestämning av kausala kopplingar som existerar mellan de observerade variablerna. Kunskap om dessa regulatoriska influenser är väsentlig för biologisk förståelse. En system egenskap—förstärksvagning—introduceras. Denna förklarar varför variationen i existerande genexpressionsdata är koncentrerat till några få ”karakteristiska moder” eller ”egengener” och varför de modeller som konstruerats innan innehåller många falska positiva och falska negativa linkar. Ett system med förstärksvagning karakteriseras av starka kopplingar som möjliggör simultan FÖRSTÄRKning och förSVAGNING av olika signaler. Vi demonstrerar att störning av individuella tillståndsvariabler, t.ex. gener, typiskt leder till illakonditionerat data med både karakteristiska och svaga moder. De svaga moderna domineras typiskt av mätbrus p.g.a. dålig excitering och försvårar rekonstruktion av nätverket. Excitationsproblemet löses med iterativdesign av experiment där korrelerade störningar i multipla gener motverkar systemets inneboende försvagning av signaller. Följande störning bör designas så att det förväntade svaret praktiskt spänner ytterligare en dimension av tillståndsrummet. Den föreslagna designen demonstreras numeriskt för Snf1 signalleringsvägen i S. cerevisiae. Påverkan av ostörda och icke observerade latenta tillståndsvariabler, som existerar i varje verkligt biologiskt system, på konstruerade nätverk och planeringen av experiment för nätverksinferens analyseras. Existens av dessa tillståndsvariabler innebär att delnätverk med pseudo-direkta regulatoriska influenser, som kompenserar för miljöeffekter, generellt bestäms. I princip så kan antalet latenta tillstånd och alternativa vägar mellan noder i nätverket bestämmas, men deras identitet kan ej bestämmas om de inte direkt observeras eller störs. Nätverksinferens behandlas som ett variabel-/modelselektionsproblem och löses genom att undersöka alla modeller inom en vald klass som kan förklara datat på den önskade signifikansnivån, samt klassificera endast linkar som är närvarande i alla dessa modeller som existerande. Dessa linkar kan bestämmas utan estimering av parametrar genom att skriva om variabelselektionsproblemet som ett robustrangproblem. Lösning av rangproblemet möjliggör att statistisk konfidens kan tillskrivas individuella linkar utan approximationer eller asymptotiska betraktningar. Detta demonstreras genom rekonstruktion av det syntetiska IRMA genreglernätverket från publicerat data. En tidigare okänd aktivering av transkription av SWI5 av CBF1 i IRMA stammen av S. cerevisiae bevisas. Detta illustrerar att t.o.m. den ackumulerade kunskapen om välstuderade gener är ofullständig. / <p>QC 20130508</p>

Efficient Partially Observable Markov Decision Process Based Formulation Of Gene Regulatory Network Control Problem

Erdogdu, Utku 01 April 2012 (has links) (PDF)
The need to analyze and closely study the gene related mechanisms motivated the research on the modeling and control of gene regulatory networks (GRN). Dierent approaches exist to model GRNs / they are mostly simulated as mathematical models that represent relationships between genes. Though it turns into a more challenging problem, we argue that partial observability would be a more natural and realistic method for handling the control of GRNs. Partial observability is a fundamental aspect of the problem / it is mostly ignored and substituted by the assumption that states of GRN are known precisely, prescribed as full observability. On the other hand, current works addressing partially observability focus on formulating algorithms for the nite horizon GRN control problem. So, in this work we explore the feasibility of realizing the problem in a partially observable setting, mainly with Partially Observable Markov Decision Processes (POMDP). We proposed a POMDP formulation for the innite horizon version of the problem. Knowing the fact that POMDP problems suer from the curse of dimensionality, we also proposed a POMDP solution method that automatically decomposes the problem by isolating dierent unrelated parts of the problem, and then solves the reduced subproblems. We also proposed a method to enrich gene expression data sets given as input to POMDP control task, because in available data sets there are thousands of genes but only tens or rarely hundreds of samples. The method is based on the idea of generating more than one model using the available data sets, and then sampling data from each of the models and nally ltering the generated samples with the help of metrics that measure compatibility, diversity and coverage of the newly generated samples.

A computational approach to discovering p53 binding sites in the human genome

Lim, Ji-Hyun January 2013 (has links)
The tumour suppressor p53 protein plays a central role in the DNA damage response/checkpoint pathways leading to DNA repair, cell cycle arrest, apoptosis and senescence. The activation of p53-mediated pathways is primarily facilitated by the binding of tetrameric p53 to two 'half-sites', each consisting of a decameric p53 response element (RE). Functional REs are directly adjacent or separated by a small number of 1-13 'spacer' base pairs (bp). The p53 RE is detected by exact or inexact matches to the palindromic sequence represented by the regular expression [AG][AG][AG]C[AT][TA]G[TC][TC][TC] or a position weight matrix (PWM). The use of matrix-based and regular expression pattern-matching techniques, however, leads to an overwhelming number of false positives. A more specific model, which combines multiple factors known to influence p53-dependent transcription, is required for accurate detection of the binding sites. In this thesis, we present a logistic regression based model which integrates sequence information and epigenetic information to predict human p53 binding sites. Sequence information includes the PWM score and the spacer length between the two half-sites of the observed binding site. To integrate epigenetic information, we analyzed the surrounding region of the binding site for the presence of mono- and trimethylation patterns of histone H3 lysine 4 (H3K4). Our model showed a high level of performance on both a high-resolution data set of functional p53 binding sites from the experimental literature (ChIP data) and the whole human genome. Comparing our model with a simpler sequence-only model, we demonstrated that the prediction accuracy of the sequence-only model could be improved by incorporating epigenetic information, such as the two histone modification marks H3K4me1 and H3K4me3.

Inferring Genetic Regulatory Networks Using Cost-based Abduction and Its Relation to Bayesian Inference

Andrews, Emad Abdel-Thalooth 16 July 2014 (has links)
Inferring Genetic Regulatory Networks (GRN) from multiple data sources is a fundamental problem in computational biology. Computational models for GRN range from simple Boolean networks to stochastic differential equations. To successfully model GRN, a computational method has to be scalable and capable of integrating different biological data sources effectively and homogeneously. In this thesis, we introduce a novel method to model GRN using Cost-Based Abduction (CBA) and study the relation between CBA and Bayesian inference. CBA is an important AI formalism for reasoning under uncertainty that can integrate different biological data sources effectively. We use three different yeast genome data sources—protein-DNA, protein-protein, and knock-out data—to build a skeleton (unannotated) graph which acts as a theory to build a CBA system. The Least Cost Proof (LCP) for the CBA system fully annotates the skeleton graph to represent the learned GRN. Our results show that CBA is a promising tool in computational biology in general and in GRN modeling in particular because CBA knowledge representation can intrinsically implement the AND/OR logic in GRN while enforcing cis-regulatory logic constraints effectively, allowing the method to operate on a genome-wide scale.Besides allowing us to successfully learn yeast pathways such as the pheromone pathway, our method is scalable enough to analyze the full yeast genome in a single CBA instance, without sub-networking. The scalability power of our method comes from the fact that our CBA model size grows in a quadratic, rather than exponential, manner with respect to data size and path length. We also introduce a new algorithm to convert CBA into an equivalent binary linear program that computes the exact LCP for the CBA system, thus reaching the optimal solution. Our work establishes a framework to solve Bayesian networks using integer linear programming and high order recurrent neural networks through CBA as an intermediate representation.

