Global ETD Search

1	Application of bioinformatics on gene regulation studies and regulatory network construction with omics data Qin, Jing, 覃静 January 2013 (has links) Gene expression is a multi-step process that involves various regulators. From whole genome sequences to the complex gene regulatory system, high-throughput technologies have generated a large amount of omics data, but information in such a large scale is hard to interpret manually. Bioinformatics can help to process this huge biological information and infer biological insights using the merits of mathematics, statistics and computational techniques. In this study, we applied various bioinformatic techniques on gene regulation in several aspects. Multiple primary transcripts of a gene can be initiated at different promoters, termed alternative promoters (APs). Most human genes have multiple APs. However, whether the usage of APs is independent or not is still controversial. In this study, we analyze the roles of APs in gene regulations using various bioinformatics approaches. Chromosomal interactions between APs are found to be more frequent than interactions between different genes. By comparing the APs at two ends of the genes, we find that they are significant different in terms of sequence content, conservation and motif frequency. The position and distance of two APs are important for their combined effects, which prove their regulations are not independent and one AP could affect the transcription of the other. With the aim to understand the multi-level gene regulatory system in various biological processes, a mass of high-throughput omics data have been generated. However, each omics technology measuring the molecular abundance or behavior at a single level has a limited ability to depict the multi-level system. Integrating omics data can effectively comprehend the multi-level gene regulatory system and reduce the false positives. In this study, two web servers, ChIP-Array and ProteoMirExpress, have been built to construct transcriptional and post-transcriptional regulatory networks by integrating omics data. ChIP-Array is a web server for biologists to construct a TF-centered network for their own data. Network library is further constructed by ChIP-Array from publicly available data. Given a series mRNA expression profiles in a biological process, master regulators can be identified by matching the profiles with the networks in the library. To explore gene regulatory network controlled by multiple TFs, least absolute shrinkage and selection operator (LASSO)-type regularization models are applied on multiple integrative data. Golden standard based evaluations demonstrate that the L0 and L1/2 regularization models are efficient and applicable to gene regulatory network inference in large genome with a small number of samples. ProteoMirExpress integrates transcriptomic and proteomic data to infer miRNA-centered networks. It successfully infers the perturbed miRNA and those that co-express with it. The resulting network reports miRNA targets with uncorrelated mRNA and protein levels, which are usually ignored by tools considering only the mRNA abundance, even though some of them may be important downstream regulators. In summary, in this study we analyze gene regulation at multiple levels and develop several tools for gene network construction and regulator analysis with multiple omics data. It benefits researchers to efficiently process high-throughput raw data and to draw biological hypotheses and interpretation. / published_or_final_version / Biochemistry / Doctoral / Doctor of Philosophy Read more Genetic regulation - Data processing Bioinformatics
2	In silico prediction of cis-regulatory elements of genes involved in hypoxic-ischaemic insult Fu, Wai, 符慧 January 2006 (has links) published_or_final_version / abstract / Paediatrics and Adolescent Medicine / Master / Master of Philosophy Cerebral anoxia. Cerebral ischemia. Brain - Wounds and injuries. Genetic regulation - Data processing. Gene expression - Data processing. Transcription factors.
3	Confounding effects in gene expression and their impact on downstream analysis Lachmann, Alexander January 2016 (has links) The reconstruction of gene regulatory networks is one of the milestones of computational system biology. We introduce a new implementation of ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) to reverse engineer transcriptional regulatory networks with improved mutual information estimators and significant improvement in performance. In the context of data driven network inference we identify two major confounding biases and introduce solutions to remove some of the discussed biases. First we identify prevalent spatial biases in gene expression studies derived from plate based designs. We investigate the gene expression profiles of a million samples from the LINCS dataset and find that the vast majority (96%) of the tested plates is affected by significant spatial bias. We can show that our proposed method to correct these biases results in a significant improvement of similarity between biological replicates assayed in different plates. Lastly we discuss the effect of CNV on gene expression and its confounding effect on the correlation landscape of genes in the context of cancer samples. We propose a method that removes the variance in gene expression explained by CNV and show that TF target predictions can be significantly improved. Read more Genetic regulation Genetic regulation--Data processing Gene expression Gene expression--Data processing Bioinformatics Gene regulatory networks Genetics
4	Gaussian graphical model selection for gene regulatory network reverse engineering and function prediction Kontos, Kevin 02 July 2009 (has links) One of the most important and challenging ``knowledge extraction' tasks in bioinformatics is the reverse engineering of gene regulatory networks (GRNs) from DNA microarray gene expression data. Indeed, as a result of the development of high-throughput data-collection techniques, biology is experiencing a data flood phenomenon that pushes biologists toward a new view of biology--systems biology--that aims at system-level understanding of biological systems.<p><p>Unfortunately, even for small model organisms such as the yeast Saccharomyces cerevisiae, the number p of genes is much larger than the number n of expression data samples. The dimensionality issue induced by this ``small n, large p' data setting renders standard statistical learning methods inadequate. Restricting the complexity of the models enables to deal with this serious impediment. Indeed, by introducing (a priori undesirable) bias in the model selection procedure, one reduces the variance of the selected model thereby increasing its accuracy.<p><p>Gaussian graphical models (GGMs) have proven to be a very powerful formalism to infer GRNs from expression data. Standard GGM selection techniques can unfortunately not be used in the ``small n, large p' data setting. One way to overcome this issue is to resort to regularization. In particular, shrinkage estimators of the covariance matrix--required to infer GGMs--have proven to be very effective. Our first contribution consists in a new shrinkage estimator that improves upon existing ones through the use of a Monte Carlo (parametric bootstrap) procedure.<p><p>Another approach to GGM selection in the ``small n, large p' data setting consists in reverse engineering limited-order partial correlation graphs (q-partial correlation graphs) to approximate GGMs. Our second contribution consists in an inference algorithm, the q-nested procedure, that builds a sequence of nested q-partial correlation graphs to take advantage of the smaller order graphs' topology to infer higher order graphs. This allows us to significantly speed up the inference of such graphs and to avoid problems related to multiple testing. Consequently, we are able to consider higher order graphs, thereby increasing the accuracy of the inferred graphs.<p><p>Another important challenge in bioinformatics is the prediction of gene function. An example of such a prediction task is the identification of genes that are targets of the nitrogen catabolite repression (NCR) selection mechanism in the yeast Saccharomyces cerevisiae. The study of model organisms such as Saccharomyces cerevisiae is indispensable for the understanding of more complex organisms. Our third contribution consists in extending the standard two-class classification approach by enriching the set of variables and comparing several feature selection techniques and classification algorithms.<p><p>Finally, our fourth contribution formulates the prediction of NCR target genes as a network inference task. We use GGM selection to infer multivariate dependencies between genes, and, starting from a set of genes known to be sensitive to NCR, we classify the remaining genes. We hence avoid problems related to the choice of a negative training set and take advantage of the robustness of GGM selection techniques in the ``small n, large p' data setting. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Read more Informatique générale Sciences exactes et naturelles Bioinformatics DNA microarrays Genetic regulation -- Data processing Bio-informatique Puces à ADN Régulation génétique -- Informatique machine learning bioinformatics large p' small n Gaussian graphical model (GGM)

1

Page generated in 0.1503 seconds