Global ETD Search

1	Development and Application of Network Algorithms for Prediction of Gene Function and Response to Viral Infection and Chemicals Law, Jeffrey Norman 09 December 2020 (has links) The complex molecular machinery of the cell controls its response to various signals and environmental conditions. A natural approach to study these molecular mechanisms and cellular processes is with protein interaction networks. Due to the complexity of these networks, sophisticated computational techniques are required to extract biological insights from them. In this thesis, I develop and apply network-based algorithms for three different challenges. 1. I develop a novel, highly-scalable algorithm for network-based label prediction methods that enables the integration of functional annotations and interaction networks across many species in order to predict the functions of genes in newly-sequenced bacteria. 2. To overcome the limitations of experimental approaches to find human proteins and processes that are hijacked by SARS-CoV-2, I adapt network propagation approaches for predicting human interactors of the virus. 3. Large-scale experimental techniques to screen chemicals for toxicity have tested their effects on many individual proteins. I integrate human protein-protein interactions with this data to gain insights into the molecular networks those chemicals affect. For each of these research problems, I perform comprehensive evaluations and downstream analyses to demonstrate both the accuracy of our approaches and their utility in obtaining a broader understanding of the molecular systems in question. / Doctor of Philosophy / The functions of all living cells are governed by complex networks of molecular interactions. A major goal of systems biology is to understand the components of this machinery and how they regulate each other to control the cell's response to various conditions and signals. Advances in experimental techniques to understand these systems over the past couple of decades have led to an explosion of data that probe various aspects of a cell such as genome sequencing, which reads the DNA blueprint, gene expression, which measures the amount of each gene's products in the cell, and the interactions between those products (i.e., proteins). To extract biological insights from these datasets, increasingly sophisticated computational methods are required. A powerful approach is to model the datasets as networks where the individual molecules are the nodes and the interactions between them are the edges. In this thesis, I develop and apply network-based algorithms to utilize molecular systems data for three related problems: (i) predicting the functions of genes in bacterial species, (ii) predicting human proteins and processes that are hijacked by the SARS-CoV-2 virus, and (iii) suggesting cellular signaling pathways affected by exposure to a chemical. Developments such as those presented in these three projects are critical to obtaining a broader understanding of the functions of genes in the cell. Therefore, I make the methods and results for each project easily accessible to aid other researchers in their efforts. Network Propagation Gene Function Prediction SARS-CoV-2 Cellular Signaling
2	A piRNA regulation landscape in C. elegans and a computational model to predict gene functions Chen, Hao 28 October 2020 (has links) Investigating mechanisms that regulate genes and the genes' functions are essential to understand a biological system. This dissertation is consists of two specific research projects under these aims, which are for understanding piRNA's regulation mechanism and predicting genes' function computationally. The first project shows a piRNA regulation landscape in C. elegans. piRNAs (Piwi-interacting small RNAs) form a complex with Piwi Argonautes to maintain fertility and silence transposons in animal germlines. In C. elegans, previous studies have suggested that piRNAs tolerate mismatched pairing and in principle could target all transcripts. In this project, by computationally analyzing the chimeric reads directly captured by cross-linking piRNA and their targets in vivo, piRNAs are found to target all germline mRNAs with microRNA-like pairing rules. The number of targeting chimeric reads correlates better with binding energy than with piRNA abundance, suggesting that piRNA concentration does not limit targeting. Further more, in mRNAs silenced by piRNAs, secondary small RNAs are found to be accumulating at the center and ends of piRNA binding sites. Whereas in germline-expressed mRNAs, reduced piRNA binding density and suppression of piRNA-associated secondary small RNAs targeting correlate with the CSR-1 Argonaute presence. These findings reveal physiologically important and nuanced regulation of piRNA targets and provide evidence for a comprehensive post-transcriptional regulatory step in germline gene expression. The second project elaborates a computational model to predict gene function. Predicting genes involved in a biological function facilitates many kinds of research, such as prioritizing candidates in a screening project. Following the “Guilt By Association” principle, multiple datasets are considered as biological networks and integrated together under a multi-label learning framework for predicting gene functions. Specifically, the functional labels are propagated and smoothed using a label propagation method on the networks and then integrated using an “Error correction of code” multi-label learning framework, where a “codeword” defines all the labels annotated to a specific gene. The model is then trained by finding the optimal projections between the code matrix and the biological datasets using canonical correlation analysis. Its performance is benchmarked by comparing to a state-of-art algorithm and a large scale screen results for piRNA pathway genes in D.melanogaster. Finally, piRNA targeting's roles in epigenetics and physiology and its cross-talk with CSR-1 pathway are discussed, together with a survey of additional biological datasets and a discussion of benchmarking methods for the gene function prediction. Bioinformatics Biological networks Error correction of code Gene function prediction Multi-label learning piRNA Regulation
3	Applications of Machine Learning in Source Attribution and Gene Function Prediction Chinnareddy, Sandeep 07 June 2024 (has links) This research investigates the application of machine learning techniques in computational genomics across two distinct domains: (1) the predicting the source of bacterial pathogen using whole genome sequencing data, and (2) the functional annotation of genes using single- cell RNA sequencing data. This work proposes the development of a bioinformatics pipeline tailored for identifying genomic variants, including gene presence/absence and single nu- cleotide polymorphism. This methodology is applied to specific strains such as Salmonella enterica serovar Typhimurium and the Ralstonia solanacearum species complex. Phylo- genetic analyses along with pan-genome and positive selection studiesshow that genomic variants and evolutionary patterns of S. Typhimurium vary across sources, which suggests that sources can be accurately attributed based on genomic variants empowered by machine learning. We benchmarked seven traditional machine learning algorithms, achieving a no- table accuracy of 94.6% in host prediction for S. Typhimurium using the Random Forest model, underscored by SHAP value analyses which elucidated key predictive features. Next, the focus is shifted to the prediction of Gene Ontology terms for Arabidopsis genes using single-cell RNA-seq data. This analysis offers a detailed comparison of gene expression in root versus shoot tissues, juxtaposed with insights from bulk RNA-seq data. The integration of regulatory network data from DAP-seq significantly enhances the prediction accuracy of gene functions. / Master of Science / This work applies machine learning techniques to two areas in computational biology: pre- dicting the hosts of bacterial pathogens based on their genome data, and predicting the func- tions of plant genes using single-cell gene expression data. The first part develops a method to analyze genome sequences from bacterial pathogens like Salmonella enterica serovar Ty- phimurium and the Ralstonia solanacearum species complex, identifying genomic variants, including gene presence/absence and single nucleotide polymorphism, which are variations in genetic code. By studying the evolutionary relationships and genetic diversity among dif- ferent strains, the motivation for using machine learning models to predict the sources (e.g., poultry, swine) of the pathogen genomes is established. Several machine learning models are then trained on these datasets, and the most important factors contributing to the predic- tions are identified. The second part focuses on predicting the functions of genes in the model plant species Arabidopsis thaliana using the gene expression data measured at the single-cell level to train machine learning models for identifying standardized gene function descrip- tions called Gene Ontology (GO) terms. By comparing results from single-cell and bulk tissue data, the study evaluates whether the higher resolution of single-cell data improves gene function prediction accuracy. Additionally, by incorporating information about gene regulation from a specialized experiment, the role of gene expression control in determining gene functions is explored. Machine Learning Source Attribution Whole genome sequencing Gene function prediction single-cell RNAseq
4	Knowledge management and discovery for genotype/phenotype data Groth, Philip 02 December 2009 (has links) Die Untersuchung des Phänotyps bringt z.B. bei genetischen Krankheiten ein Verständnis der zugrunde liegenden Mechanismen mit sich. Aufgrund dessen wurden neue Technologien wie RNA-Interferenz (RNAi) entwickelt, die Genfunktionen entschlüsseln und mehr phänotypische Daten erzeugen. Interpretation der Ergebnisse solcher Versuche ist insbesondere bei heterogenen Daten eine große Herausforderung. Wenige Ansätze haben bisher Daten über die direkte Verknüpfung von Genotyp und Phänotyp hinaus interpretiert. Diese Dissertation zeigt neue Methoden, die Entdeckungen in Phänotypen über Spezies und Methodik hinweg ermöglichen. Es erfolgt eine Erfassung der verfügbaren Datenbanken und der Ansätze zur Analyse ihres Inhalts. Die Grenzen und Hürden, die noch bewältigt werden müssen, z.B. fehlende Datenintegration, lückenhafte Ontologien und der Mangel an Methoden zur Datenanalyse, werden diskutiert. Der Ansatz zur Integration von Genotyp- und Phänotypdaten, PhenomicDB 2, wird präsentiert. Diese Datenbank assoziiert Gene mit Phänotypen durch Orthologie über Spezies hinweg. Im Fokus sind die Integration von RNAi-Daten und die Einbindung von Ontologien für Phänotypen, Experimentiermethoden und Zelllinien. Ferner wird eine Studie präsentiert, in der Phänotypendaten aus PhenomicDB genutzt werden, um Genfunktionen vorherzusagen. Dazu werden Gene aufgrund ihrer Phänotypen mit Textclustering gruppiert. Die Gruppen zeigen hohe biologische Kohärenz, da sich viele gemeinsame Annotationen aus der Gen-Ontologie und viele Protein-Protein-Interaktionen innerhalb der Gruppen finden, was zur Vorhersage von Genfunktionen durch Übertragung von Annotationen von gut annotierten Genen zu Genen mit weniger Annotationen genutzt wird. Zuletzt wird der Prototyp PhenoMIX präsentiert, in dem Genotypen und Phänotypen mit geclusterten Phänotypen, PPi, Orthologien und weiteren Ähnlichkeitsmaßen integriert und deren Gruppierungen zur Vorhersage von Genfunktionen, sowie von phänotypischen Wörtern genutzt. / In diseases with a genetic component, examination of the phenotype can aid understanding the underlying genetics. Technologies to generate high-throughput phenotypes, such as RNA interference (RNAi), have been developed to decipher functions for genes. This large-scale characterization of genes strongly increases phenotypic information. It is a challenge to interpret results of such functional screens, especially with heterogeneous data sets. Thus, there have been only few efforts to make use of phenotype data beyond the single genotype-phenotype relationship. Here, methods are presented for knowledge discovery in phenotypes across species and screening methods. The available databases and various approaches to analyzing their content are reviewed, including a discussion of hurdles to be overcome, e.g. lack of data integration, inadequate ontologies and shortage of analytical tools. PhenomicDB 2 is an approach to integrate genotype and phenotype data on a large scale, using orthologies for cross-species phenotypes. The focus lies on the uptake of quantitative and descriptive RNAi data and ontologies of phenotypes, assays and cell-lines. Then, the results of a study are presented in which the large set of phenotype data from PhenomicDB is taken to predict gene annotations. Text clustering is utilized to group genes based on their phenotype descriptions. It is shown that these clusters correlate well with indicators for biological coherence in gene groups, such as functional annotations from the Gene Ontology (GO) and protein-protein interactions. The clusters are then used to predict gene function by carrying over annotations from well-annotated genes to less well-characterized genes. Finally, the prototype PhenoMIX is presented, integrating genotype and phenotype data with clustered phenotypes, orthologies, interaction data and other similarity measures. Data grouped by these measures are evaluated for theirnpredictiveness in gene functions and phenotype terms. Phänotypen Genotypen comparative phenomics Genfunktionsvorhersage PhenomicDB text-clustering phenotypes genotypes comparative phenomics gene function prediction PhenomicDB text-clustering 004 Informatik ddc:004
5	Searching for novel gene functions in yeast : identification of thousands of novel molecular interactions by protein-fragment complementation assay followed by automated gene function prediction and high-throughput lipidomics Tarasov, Kirill 09 1900 (has links) No description available. Interaction protéine-protéine Protéine membranaire Métabolisme des lipides Apprentissage automatique Prédiction de la fonction d’un gène Visualisation analytique Criblage à haut débit Protein-protein interactions Protein-fragment complementation assays High-throughput screen Membrane proteins Lipid metabolism Lipidomics Machine learning Gene function prediction Visual analytics

1

Page generated in 0.1424 seconds