• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 7
  • 7
  • 7
  • 7
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Reconstructing regulatory networks from high-throughput post-genomic data using MCMC methods

Sharma, Sapna January 2013 (has links)
Modern biological research aims to understand when genes are expressed and how certain genes in uence the expression of other genes. For organizing and visualizing gene expression activity gene regulatory networks are used. The architecture of these networks holds great importance, as they enable us to identify inconsistencies between hypotheses and observations, and to predict the behavior of biological processes in yet untested conditions. Data from gene expression measurements are used to construct gene regulatory networks. Along with the advance of high-throughput technologies for measuring gene expression statistical methods to predict regulatory networks have also been evolving. This thesis presents a computational framework based on a Bayesian modeling technique using state space models (SSM) for the inference of gene regulatory networks from time-series measurements. A linear SSM consists of observation and hidden state equations. The hidden variables can unfold effects that cannot be directly measured in an experiment, such as missing gene expression. We have used a Bayesian MCMC approach based on Gibbs sampling for the inference of parameters. However the task of determining the dimension of the hidden state space variables remains crucial for the accuracy of network inference. For this we have used the Bayesian evidence (or marginal likelihood) as a yardstick. In addition, the Bayesian approach also provides the possibility of incorporating prior information, based on literature knowledge. We compare marginal likelihoods calculated from the Gibbs sampler output to the lower bound calculated by a variational approximation. Before using the algorithm for the analysis of real biological experimental datasets we perform validation tests using numerical experiments based on simulated time series datasets generated by in-silico networks. The robustness of our algorithm can be measured by its ability to recapture the input data and generating networks using the inferred parameters. Our developed algorithm, GBSSM, was used to infer a gene network using E. coli data sets from the different stress conditions of temperature shift and acid stress. The resulting model for the gene expression response under temperature shift captures the e�ects of global transcription factors, such as fnr that control the regulation of hundreds of other genes. Interestingly, we also observe the stress-inducible membrane protein OsmC regulating transcriptional activity involved in the adaptation mechanism under both temperature shift and acid stress conditions. In the case of acid stress, integration of metabolomic and transcriptome data suggests that the observed rapid decrease in the concentration of glycine betaine is the result of the activation of osmoregulators which may play a key role in acid stress adaptation.
2

Bayesian clustering of curves and the search of the partition space

Liverani, Silvia January 2009 (has links)
This thesis is concerned with the study of a Bayesian clustering algorithm, proposed by Heard et al. (2006), used successfully for microarray experiments over time. It focuses not only on the development of new ways of setting hyperparameters so that inferences both reflect the scientific needs and contribute to the inferential stability of the search, but also on the design of new fast algorithms for the search over the partition space. First we use the explicit forms of the associated Bayes factors to demonstrate that such methods can be unstable under common settings of the associated hyperparameters. We then prove that the regions of instability can be removed by setting the hyperparameters in an unconventional way. Moreover, we demonstrate that MAP (maximum a posteriori) search is satisfied when a utility function is defined according to the scientific interest of the clusters. We then focus on the search over the partition space. In model-based clustering a comprehensive search for the highest scoring partition is usually impossible, due to the huge number of partitions of even a moderately sized dataset. We propose two methods for the partition search. One method encodes the clustering as a weighted MAX-SAT problem, while the other views clusterings as elements of the lattice of partitions. Finally, this thesis includes the full analysis of two microarray experiments for identifying circadian genes.
3

Modelling and analysis of a genetic oscillator in E. coli

Janus, Ulrich January 2008 (has links)
This thesis presents the modelling and analysis of an engineered genetic oscillator in E.coli. Genetic oscillators composed of transcriptional feedback loops are the central components of circadian clocks [16]. Thus understanding small genetic oscillators is key for understanding the complex regulatory networks of circadian clocks. In order to monitor clock function, a new colony based imaging assay was set up, based on luminescent transcriptional reporter constructs, that allows for automated data collection over long time spans and for the screening of clock mutants. Clock runs produced damped oscillatory behaviour after starting the clock by removal of the lac inducer IPTG or by giving a metabolic stimulus by transferring cells onto fresh agar plates. A detailed mathematical model of the clock was constructed, taking into account discrete and stochastic regulatory binding events at the promoter sites. From this model, using the theory of heterogeneous systems [69, 66], deterministic equations were derived and analysed to yield conditions for the occurrence of stable oscillations based on the system's nullclines. To facilitate the modelling, an algorithm was devised and implemented, that allows for automated construction of Markov chain models of gene activity states based on DNA binding events. In sum, the work constitutes the establishment and analysis of an integrated experimental and modelling system, which opens possibilities for further investigation in order to yield insight into the properties of genetic oscillators.
4

Accelerated estimation and inference for heritability of fMRI data

Chen, Xu January 2014 (has links)
In this thesis, we develop some novel methods for univariate and multivariate analyses of additive genetic factors including heritability and genetic correlation. For the univariate heritability analysis, we present 3 newly proposed estimation methods—Frequentist ReML, LR-SD and LR-SD ReML. The comparison of these novel and those currently available approaches demonstrates the non-iterative LRSD method is extremely fast and free of any convergence issues. The properties of this LR-SD method motivate the use of the non-parametric permutation and bootstrapping inference approaches. The permutation framework also allows the utilization of spatial statistics, which we find increases the statistical sensitivity of the test. For the bivariate genetic analysis, we generalize the univariate LR-SD method to the bivariate case, where the integration of univariate and bivariate LR-SD provides a new estimation method for genetic correlation. Although simulation studies show that our measure of genetic correlation is not ideal, we propose a closely related test statistic based on the ERV, which we show to be a valid hypothesis test for zero genetic correlation. The rapid implementation of this ERV estimator makes it feasible to use with permutation as well. Finally, we consider a method for high-dimensional multivariate genetic analysis based on pair-wise correlations of different subject pairs. While traditional genetic analysis models the correlation over subjects to produce an estimate of heritability, this approach estimates correlation over a (high-dimensional) phenotype for pairs of subjects, and then estimates heritability based on the difference in MZ-pair and DZ-pair correlations. A significant two-sample t-test comparing MZ and DZ correlations implies the existence of heritable elements. The resulting summary measure of aggregate heritability, defined as twice the difference of MZ and DZ mean correlations, can be treated as a quick screening estimate of whole-phenotype heritability that is closely related to the average of traditional heritability.
5

New mathematical methods for the study of stem cell differentiation

Camacho Aguilar, Elena January 2018 (has links)
The question of how the fertilized egg develops into an adult organism is one of the most fundamental ones in Biology. A very important stage in the development of the embryo is cell differentiation, in which unspecialised cells, called stem cells, become specialised ones, such as skin or nerve cells depending on the signals that they receive. This is controlled by a very large network of genes that interact with each other, the state of which defines the characteristics of the cell. With the recent development of experimental techniques that allow us to obtain very detailed information about the changes in cells, new data analysis methods and mathematical models are required for the understanding of stem cell differentiation. A common approach to the mathematical modelling of stem cell differentiation is by means of gene regulatory network (GRN) models describing the gene regulation behind the process. However, the number of variables and parameters in these models rapidly scales up as one tries to study more genes in the network, difficulting its analysis. This thesis aims to assess these problems and it is structured into two main parts. In the first one, which comprises Chapters 3 and 4, we will develop a phenotypic quasi-potential landscape model for vulval development in C. elegans to illustrate how catastrophe theory can be a powerful tool to construct and understand these recently emerging types of models. Moreover, will use advanced statistical techniques to fit the built model to the experimental data. The second part, in Chapter 5, will be devoted to developing a methodology to understand protein expression data in order to reverse engineer the gene regulatory network from it and create a mathematical model that explains such experimental data.
6

Bayesian inference of causal gene networks

Morrissey, Edward R. January 2012 (has links)
Genes do not act alone, rather they form part of large interacting networks with certain genes regulating the activity of others. The structure of these networks is of great importance as it can produce emergent behaviour, for instance, oscillations in the expression of network genes or robustness to uctuations. While some networks have been studied in detail, most networks underpinning biological processes have not been fully characterised. Elucidating the structure of these networks is of paramount importance to understand these biological processes. With the advent of whole-genome gene expression measurement technology, a number of statistical methods have been put forward to predict the structure of gene networks from the individual gene measurements. This thesis focuses on the development of Bayesian statistical models for the inference of gene regulatory networks using time-series data. Most models used for network inference rely on the assumption that regulation is linear. This assumption is known to be incorrect and when the interactions are highly non-linear can affect the accuracy of the retrieved network. In order to address this problem we developed an inference model that allows for non-linear interactions and benchmarked the model against a linear interaction model. Next we addressed the problem of how to infer a network when replicate measurements are available. To analyse data with replicates we proposed two models that account for measurement error. The models were compared to the standard way of analysing replicate data, that is, calculating the mean/median of the data and treating it as a noise-free time-series. Following the development of the models we implemented GRENITS, an R/Bioconductor package that integrates the models into a single free package. The package is faster than the previous implementations and is also easier to use. Finally GRENITS was used to fit a network to a whole-genome time-series for the bacterium Streptomyces coelicolor. The accuracy of a sub-network of the inferred network was assessed by comparing gene expression dynamics across datasets collected under different experimental conditions.
7

Dynamic DNA and human disease : mathematical modelling and statistical inference for myotonic dystrophy type 1 and Huntington disease

Higham, Catherine F. January 2013 (has links)
Several human genetic diseases, including myotonic dystrophy type 1 (DM1) and Huntington disease (HD), are associated with inheriting an abnormally large unstable DNA simple sequence tandem repeat. These sequences mutate, by changing the number of repeats, many times during the lifetime of those affected, with a bias towards expansion. High repeat numbers are associated with early onset and disease severity. The presence of somatic instability compromises attempts to measure intergenerational repeat dynamics and infer genotype-phenotype relationships. Modelling the progression of repeat length throughout the lifetime of individuals has potential for improving prognostic information as well as providing a deeper understanding of the underlying biological process. Dr Fernando Morales, Dr Anneli Cooper and others from the Monckton lab have characterised more than 25,000 de novo somatic mutations from a large cohort of DM1 patients using single-molecule polymerase chain reaction (SM-PCR). This rich dataset enables us to fully quantify levels of somatic instability across a representative DM1 population for the first time. We establish the relationship between inherited or progenitor allele length, age at sampling and levels of somatic instability using linear regression analysis. We show that the estimated progenitor allele length genotype is significantly better than modal repeat length (the current clinical standard) at predicting age of onset and this novel genotype is the major modifier of the age of onset phenotype. Further we show that somatic variation (adjusted for estimated progenitor allele length and age at sampling) is also a modifier of the age of onset phenotype. Several families form the large cohort, and we find that the level of somatic instability is highly heritable, implying a role for individual-specific trans-acting genetic modifiers. We develop new mathematical models, the main focus of this thesis, by modifying a previously proposed stochastic birth process to incorporate possible contraction. A Bayesian likelihood approach is used as the basis for inference and parameter estimation. We use model comparison analysis to reveal, for the first time, that the expansion bias observed in the distributions of repeat lengths is likely to be the cumulative effect of many expansion and contraction events. We predict that mutation events can occur as frequently as every other day, which matches the timing of regular cell activities such as DNA repair and transcription, but not DNA replication. Mutation rates estimated under the models described above are lower than expected among individuals with inherited repeat lengths less than 100 CTGs, suggesting that these rates may be suppressed at the lower end of the disease causing range. We propose that a length-specific effect may be operating within this range and test this hypothesis by introducing such an effect into the model. To calibrate this extended model, we use blood DNA data from DM1 individuals with small alleles (inherited repeat lengths less than 100 CTGs) and buccal DNA from HD individuals who almost always have inherited repeat lengths less than 100 CAGs. These datasets comprise single DNA molecules sized using SM-PCR. We find statistical support for a general length-specific effect which suppresses mutational rates among the smaller alleles and gives rise to a distinctive pattern in the repeat length distributions. In a novel application of this new model, fitted to a large cohort of DM1 individuals, we also show that this distinctive pattern may help identify individuals whose effective repeat length, with regards to somatic instability, is less than their actual repeat length. A plausible explanation for this distinction is that the expanded repeat tract is compromised by interruptions or other unusual features. For these individuals, we estimate the effective repeat length of their expanded repeat tracts and contribute to the on-going discussion about the effect of interruptions on phenotype. The interpretation of the levels of somatic instability in many of the affected tissues in the triplet repeat diseases is hindered by complex cell compositions. We extend our model to two cell populations whose repeat lengths have different rates of mutation (fast and slow). Swami et al. have recently characterised repeat length distributions in end stage HD brain. Applying our model, we infer for each frontal cortex HD dataset the likely relative weight of these cell populations and their corresponding contribution towards somatic variation. By comparison with data from laser captured single cells we conclude that the neuronal repeat lengths most likely mutate at a higher rate than glial repeat lengths, explaining the characteristic skewed distributions observed in mixed cell tissue from the brain. We confirm that individual-specific mutation rates in neurons are, in addition to the inherited repeat length, a modifier of age of onset. Our results support a model of disease progression where individuals with the same inherited repeat length may reach age of onset, as much as 30 years earlier, because of greater somatic expansions underpinned by higher mutational rates. Therapies aimed at reducing somatic expansions would therefore have considerable benefits with regard to extending the age of onset. Currently clinical diagnosis of DM1 is based on a measure of repeat length from blood cells, but variance in modal length only accounts for between 20 - 40% of the variance in age of onset and, therefore, is not a an accurate predictive tool. We show that in principle progenitor allele length improves the inverse correlation with age of onset over the traditional model length measure. We make use of second blood samples that are now available from 40 DM1 individuals. We show that inherited repeat length and the mutation rates underlying repeat length instability in blood, inferred from samples at two time points rather than one, are better predictors of age of onset than the traditional modal length measure. Our results are a step towards providing better prognostic information for DM1 individuals and their families. They should also lead to better predictions for drug/therapy response, which is emerging as key to successful clinical trials. Microsatellites are another type of tandem repeat found in the genome with high levels of intergenerational and somatic mutation. Differences between individuals make microsatellites very useful biomarkers and they have many applications in forensics and medicine. As well as a general application to other expanded repeat diseases, the mathematical models developed here could be used to better understand instability at other mutational hotspots such as microsatellites.

Page generated in 0.2226 seconds