Global ETD Search

1	Applications of Machine Learning in Source Attribution and Gene Function Prediction Chinnareddy, Sandeep 07 June 2024 (has links) This research investigates the application of machine learning techniques in computational genomics across two distinct domains: (1) the predicting the source of bacterial pathogen using whole genome sequencing data, and (2) the functional annotation of genes using single- cell RNA sequencing data. This work proposes the development of a bioinformatics pipeline tailored for identifying genomic variants, including gene presence/absence and single nu- cleotide polymorphism. This methodology is applied to specific strains such as Salmonella enterica serovar Typhimurium and the Ralstonia solanacearum species complex. Phylo- genetic analyses along with pan-genome and positive selection studiesshow that genomic variants and evolutionary patterns of S. Typhimurium vary across sources, which suggests that sources can be accurately attributed based on genomic variants empowered by machine learning. We benchmarked seven traditional machine learning algorithms, achieving a no- table accuracy of 94.6% in host prediction for S. Typhimurium using the Random Forest model, underscored by SHAP value analyses which elucidated key predictive features. Next, the focus is shifted to the prediction of Gene Ontology terms for Arabidopsis genes using single-cell RNA-seq data. This analysis offers a detailed comparison of gene expression in root versus shoot tissues, juxtaposed with insights from bulk RNA-seq data. The integration of regulatory network data from DAP-seq significantly enhances the prediction accuracy of gene functions. / Master of Science / This work applies machine learning techniques to two areas in computational biology: pre- dicting the hosts of bacterial pathogens based on their genome data, and predicting the func- tions of plant genes using single-cell gene expression data. The first part develops a method to analyze genome sequences from bacterial pathogens like Salmonella enterica serovar Ty- phimurium and the Ralstonia solanacearum species complex, identifying genomic variants, including gene presence/absence and single nucleotide polymorphism, which are variations in genetic code. By studying the evolutionary relationships and genetic diversity among dif- ferent strains, the motivation for using machine learning models to predict the sources (e.g., poultry, swine) of the pathogen genomes is established. Several machine learning models are then trained on these datasets, and the most important factors contributing to the predic- tions are identified. The second part focuses on predicting the functions of genes in the model plant species Arabidopsis thaliana using the gene expression data measured at the single-cell level to train machine learning models for identifying standardized gene function descrip- tions called Gene Ontology (GO) terms. By comparing results from single-cell and bulk tissue data, the study evaluates whether the higher resolution of single-cell data improves gene function prediction accuracy. Additionally, by incorporating information about gene regulation from a specialized experiment, the role of gene expression control in determining gene functions is explored. Machine Learning Source Attribution Whole genome sequencing Gene function prediction single-cell RNAseq
2	Understanding transcriptional regulation through computational analysis of single-cell transcriptomics Lim, Chee Yee January 2017 (has links) Gene expression is tightly regulated by complex transcriptional regulatory mechanisms to achieve specific expression patterns, which are essential to facilitate important biological processes such as embryonic development. Dysregulation of gene expression can lead to diseases such as cancers. A better understanding of the transcriptional regulation will therefore not only advance the understanding of fundamental biological processes, but also provide mechanistic insights into diseases. The earlier versions of high-throughput expression profiling techniques were limited to measuring average gene expression across large pools of cells. In contrast, recent technological improvements have made it possible to perform expression profiling in single cells. Single-cell expression profiling is able to capture heterogeneity among single cells, which is not possible in conventional bulk expression profiling. In my PhD, I focus on developing new algorithms, as well as benchmarking and utilising existing algorithms to study the transcriptomes of various biological systems using single-cell expression data. I have developed two different single-cell specific network inference algorithms, BTR and SPVAR, which are based on two different formalisms, Boolean and autoregression frameworks respectively. BTR was shown to be useful for improving existing Boolean models with single-cell expression data, while SPVAR was shown to be a conservative predictor of gene interactions using pseudotime-ordered single-cell expression data. In addition, I have obtained novel biological insights by analysing single-cell RNAseq data from the epiblast stem cells reprogramming and the leukaemia systems. Three different driver genes, namely Esrrb, Klf2 and GY118F, were shown to drive reprogramming of epiblast stem cells via different reprogramming routes. As for the leukaemia system, FLT3-ITD and IDH1-R132H mutations were shown to interact with each other and potentially predispose some cells for developing acute myeloid leukaemia. 616.99

Search results

Applications of Machine Learning in Source Attribution and Gene Function Prediction

Understanding transcriptional regulation through computational analysis of single-cell transcriptomics