Spelling suggestions: "subject:"bioinformatics"" "subject:"ioinformatics""
61 |
RNA sequencing differential expression and small RNA analyses of obesity and BMI with post-mortem human brainWake, Christian 29 September 2019 (has links)
Obesity, the accumulation of body fat to excess, may cause serious negative health effects including increased risk of heart disease, type 2 diabetes, stroke and certain cancers. RNA sequencing studies in the human brain related to obesity have not been previously undertaken. I conducted both large and small RNA sequencing of hypothalamus (207 samples) and nucleus accumbens (276 samples) from individuals defined as consistently obese (124 samples), consistently normal weight as controls (148 samples) or selected without respect to BMI and falling within neither case nor control definition (211 samples), based on longitudinal BMI measures. The samples were provided by three cohort studies with brain donation programs; the Framingham Heart Study, the Religious Orders Study and the Memory Aging Project. For each brain region and large/small RNA sequencing set, differential expression of obesity, BMI, brain region and sex was performed. There are sixteen mRNAs and five microRNAs that are differentially expressed (adjusted p < 0:05) by obesity or BMI in these tissues. Some genes, such as APOBR and CES1 and some gene sets, such as Reactome’s “opioid signaling”, yielded findings with interesting implications.
The small RNA sequencing data was used for novel analyses of microRNAs (miRNAs), discovering novel miRNAs and characterizing post-transcriptionally edited miRNAs (isomiRs). A custom miRNA identification analysis pipeline was built, which utilizes miRDeep* miRNA identification and result filtering based on false positive rate estimates. With this analysis I discovered over 300 novel miRNAs. Our isomiR analysis included isomiR-specific read filtering based on genome-alignment, and generated a set of isomiR reads which show editing patterns that are non-random with respect to the position and nucleotide of the edit. Specifically, purine substitution, pyrimidine substitution and 3’ polyadenylation and polyuridylation are commonly observed. The patterns of editing revealed that some miRNAs are almost always edited while others are very rarely. I developed a novel statistical test to determine differences in the isomiR profiles of individual miRNAs between two sets of samples. This method revealed 58 miRNAs with differentially edited isomiRs between the two brain regions, but none when comparing obese with control samples or male with female samples.
|
62 |
Genomic biomarker development to impact clinical management of patients at risk for lung cancerZhang, Jiarui 20 June 2020 (has links)
Lung cancer is the leading cause of cancer mortality in the US and the world, largely due to the challenges with early detection and precision management of aggressive cancer. We previously derived and validated bronchial and nasal epithelial gene expression biomarkers to detect lung cancer among individuals undergoing clinical workup for suspect of lung cancer. However, there are continuing challenges and needs to better understand lung cancer airway biology and ultimately impact clinical management: 1. Whether airway genomic classifiers could be developed to detect cancer among patients with indeterminate pulmonary nodules; 2. What are the airway cellular and molecular subtypes and their abilities to improve lung cancer diagnosis; 3. Whether molecular and histological subtype profiling based on lung adenocarcinoma gene expression would impact pre-/post-surgical management by indolence and aggressiveness prediction.
To fulfill above goals, I first developed a cancer biomarker based on the nasal airway gene expression alterations, and improved clinical model prediction among patients with indeterminate pulmonary nodules. Next, I leveraged both bulk and single cell bronchial airway gene expressions from patients of different lung cancer subtypes, and identified the molecular and cellular changes associated with adenocarcinoma vs. squamous cell carcinoma. This finding facilitated the development of a lung cancer subtype biomarker that improved the diagnostic accuracy of the previous lung cancer classifier. Finally, I leveraged tumor gene expression data from clinical stage I lung adenocarcinomas from a screening population, and identified solid-, micropapillary- and cribriform-specific gene signatures. A classifier predictive of aggressive histologic features was developed with potential to predict histologic aggressiveness from pre-surgical tumor biopsies where all histologic patterns may not be represented. Such a biomarker may be useful in guiding clinical decision making including extent of surgical resection.
Findings and discussions in this dissertation will discuss the potential for these biomarkers to have clinical utility in patients with or at risk for lung cancer. / 2022-06-19T00:00:00Z
|
63 |
The evolutionary impacts of secondary structures within genomes of eukaryote-infecting single-stranded DNA virusesMuhire, Brejnev Muhizi January 2015 (has links)
Includes bibliographical references / Secondary structures forming through base-pairing in virus genomes have been proven to regulate several processes during viral replication cycles, including genome replication, transcription, post-transcriptional activities, protein synthesis, genome packaging, generation of viral sub-genomes and evasion of host-cell immune responses. Although computational DNA/RNA folding methods based-on free energy minimisation approaches are capable of predicting structures that form within virus genomes, these methods are not entirely accurate. Notably, many of structures that are accurately predicted will likely have no biological importance within the genomes in which they reside because even randomly generated single-stranded RNA/DNA sequences will form stable secondary structures. Nevertheless, with additional genome evolution analyses involving the detection of natural selection, sequence co-evolution, and genetic recombination, it is possible to both validate the existence of, and infer the biological importance of, computationally predicted structures. Here I implement and deploy free bioinformatics tools to (1) automate nucleotide and protein sequences classification into datasets useful for downstream molecular evolution analyses; (2) improve the accuracy of computational virus-genome-scale secondary structure prediction; (3) enable the identification of biologically relevant secondary structures using signals of purifying selection, coevolution and recombination within aligned sequence datasets; and (4) enable efficient visualisation of structural and selection data for better characterisation of individual secondary structural elements. Using these tools I carried-out large scale studies that predicted and characterised novel functional secondary structures, that potentially regulate transcription, translation, gene splicing, and replication, within the genomes of eukaryote-infecting ssDNA viruses (Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae). I show that purifying selection tends to be stronger at base-paired sites than it is at unpaired sites and, wherever mutations are tolerable within paired regions, I demonstrate that there exist strong associations between base-pairing and complementary coevolution. Finally, I show that the recombinant genomes of some, but not all, eukaryote-infecting ssDNA virus groups display weak evidence of both homologous and non-homologous recombination break-points preferentially occurring at genome sites that minimally disrupt secondary structures. Altogether, these results suggest that natural selection acting to maintain important biologically functional secondary structural elements has been a major process during the evolution of eukaryote-infecting ssDNA viruses.
|
64 |
Statistical and computational methods for addressing heterogeneity in genomic dataZhang, Yuqing 16 July 2020 (has links)
Heterogeneity describes any variability across different datasets. In genomic studies which profile gene expression levels, the presence of heterogeneity is ubiquitous, and may bring challenges to the integrative analysis of multiple datasets. Thus, many efforts are needed to understand and address the impact of heterogeneity. In this dissertation, I have developed novel statistical models and computational software for this purpose. I derived reference-batch ComBat and ComBat-Seq, two improved models based on the state-of-the-art method, ComBat, for addressing one particular type of heterogeneity known as the “batch effects”. I showed their benefits compared to the existing methods in several data types and situations, and implemented these models in publicly available software. Then, I created systematic simulations to explore the impact of common study heterogeneity on the independent validation of genomic prediction models, showing that the most identifiable sources of heterogeneity are not the primary ones affecting the validation of genomic predictors. Finally, I adapted a solution using cross-study ensemble learning to train predictors with generalizable independent performance, to address the unwanted impact of batch effects on prediction. I compared this new framework with the traditional approach for batch correction, showing that cross-study learning may provide a more robust-performing model in independent validation. Results in this dissertation provide insights and guidelines for working with heterogeneous gene expression profiling datasets in practice, and encourage further investigation on understanding and addressing heterogeneity in genomic studies
|
65 |
Integration and visualisation of data in bioinformaticsSalazar, Gustavo A January 2015 (has links)
Includes bibliographical references / The most recent advances in laboratory techniques aimed at observing and measuring biological processes are characterised by their ability to generate large amounts of data. The more data we gather, the greater the chance of finding clues to understand the systems of life. This, however, is only true if the methods that analyse the generated data are efficient, effective, and robust enough to overcome the challenges intrinsic to the management of big data. The computational tools designed to overcome these challenges should also take into account the requirements of current research. Science demands specialised knowledge for understanding the particularities of each study; in addition, it is seldom possible to describe a single observation without considering its relationship with other processes, entities or systems. This thesis explores two closely related fields: the integration and visualisation of biological data. We believe that these two branches of study are fundamental in the creation of scientific software tools that respond to the ever increasing needs of researchers. The distributed annotation system (DAS) is a community project that supports the integration of data from federated sources and its visualisation on web and stand-alone clients. We have extended the DAS protocol to improve its search capabilities and also to support feature annotation by the community. We have also collaborated on the implementation of MyDAS, a server to facilitate the publication of biological data following the DAS protocol, and contributed in the design of the protein DAS client called DASty. Furthermore, we have developed a tool called probeSearcher, which uses the DAS technology to facilitate the identification of microarray chips that include probes for regions on proteins of interest. Another community project in which we participated is BioJS, an open source library of visualisation components for biological data. This thesis includes a description of the project, our contributions to it and some developed components that are part of it. Finally, and most importantly, we combined several BioJS components over a modular architecture to create PINV, a web based visualiser of protein-protein interaction (PPI) networks, that takes advantage of the features of modern web technologies in order to explore PPI datasets on an almost ubiquitous platform (the web) and facilitates collaboration between scientific peers. This thesis includes a description of the design and development processes of PINV, as well as current use cases that have benefited from the tool and whose feedback has been the source of several improvements to PINV. Collectively, this thesis describes novel software tools that, by using modern web technologies, facilitates the integration, exploration and visualisation of biological data, which has the potential to contribute to our understanding of the systems of life.
|
66 |
Functional interpretation of high-resolution multi-omic data using molecular interaction networksBlum, Benjamin Coburn 16 June 2021 (has links)
Advances in instrumentation and sample preparation techniques enable evermore in-depth molecular profiling to catalyze exciting research into complex biological processes. Current platforms survey biomolecular classes with varying depth. While sequencing is near comprehensive, and even enabled at single cell resolution, challenges remain in global metabolite surveys primarily due to the increased chemical diversity relative to other “omics” data types. At the same time, metabolism and the interaction of diverse biomolecules are increasingly recognized as vitally important components of many disease processes. Presented here is work describing the development and use of molecular interaction subnetworks for the functional interpretation of multi-omic data. Metabolic pathway-centric subnetworks for functional inference with protein or gene derived global profiling data were created from the integration of disparate network models: Protein- protein interaction (PPI) networks and metabolic models. The subnetworks were shown to increase mapping between metabolic pathways and the proteome, and the subnetwork- derived analysis shows dramatic improvement over primary enzymes alone with direct
metabolomic experimental measurements for validation of pathway findings. We illustrate the functional utility of integrating PPI data with metabolic models by finding network modules previously but independently implicated in disease. Specifically, the analysis reveals abundance increases in known oncogenes in response to changes in breast cancer metabolism. Additionally, we reveal cellular mechanisms related to metabolic stress observed in patient sera following viral SARS-CoV-2 infection, and metabolic changes in a model of heart disease, where the characteristic muscle fibers make in-depth proteomic profiling difficult. Functional network models were additionally used to compare the response of varying cell lines in response to viral infection, showing significant context- specific differences. All of these findings demonstrate the importance of functional models to help interpret multi-omic data. The implications of revealing the connections between metabolism and protein subnetwork rewiring may be profound; for example, suggesting metabolic pathway activity may be as important a biomarker as mutation status in cancer. This research points to a means of practically inferring metabolic state from proteomic data. We further describe the release of our open-source software to accelerate integrative multi-omic analysis in the broader research community. / 2023-06-16T00:00:00Z
|
67 |
Single cell analysis and methods to characterize peripheral blood immune cell types in disease and agingKaragiannis, Tanya Theodora 18 February 2022 (has links)
In the past decade, RNA-sequencing (RNA-seq)-based genome-wide expression studies have contributed to major advances in understanding human biology and disease. However, for heterogeneous tissues such as peripheral blood, RNA-sequencing masks the expression of different populations of cells that may be important in understanding different conditions and disease progression. With the advent of single cell RNA-sequencing (scRNA-seq), it has become possible to study the gene expression of each single cell and to explore cellular heterogeneity in the context of disease and under the influence of medications or other substances. In this dissertation, I will present three projects that demonstrate how single cell sequencing methods can be used to characterize novel changes in the peripheral immune system in human disease and aging. I will also describe novel methodological approaches I created to analyze cell type composition and gene expression level changes.
First, I investigated the cell type specific changes due to opioid use in human peripheral blood. Utilizing single cell transcriptomic methods, I identified a genome-wide suppression of antiviral gene expression across immune cell types of chronic opioid users, and similarly under acute exposure to morphine.
Second, I investigated the immune cell type specific changes of gene expression and composition in the context of human aging and longevity. I developed novel approaches to measure and compare overall cell type composition between samples, and identified significant overall differences in immune cell type composition, including pro-inflammatory cell populations, between extreme longevity and younger ages. In addition, I generated cell type-specific signatures associated with longevity after accounting for age-related changes that demonstrate an upregulation in immune response and metabolic processes important in the activation of immune cells in extreme long lived individuals compared to normally aging individuals.
Finally, I investigated whether aging of the immune system is accelerated in opioid-dependent individuals. I utilized the unique aging signatures generated in the aging project and discovered higher expression of aging signatures in specific cell types of opioid-dependent individuals, suggesting chronic opioid use causes premature aging of the immune system that may contribute to the increased susceptibility to infections in these individuals. / 2023-02-18T00:00:00Z
|
68 |
Development of methods for Omics Network inference and analysis and their application to disease modelingFederico, Anthony N. 18 March 2022 (has links)
With the advent of Next Generation Sequencing (NGS) technologies and the emergence of large publicly available genomics data comes an unprecedented opportunity to model biological networks through a holistic lens using a systems-based approach. Networks provide a mathematical framework for representing biological phenomena that go beyond standard one-gene-at-a-time analyses. Networks can model system-level patterns and the molecular rewiring (i.e. changes in connectivity) occurring in response to perturbations or between distinct phenotypic groups or cell types. This in turn supports the identification of putative mechanisms of actions of the biological processes under study, and thus have the potential to advance prevention and therapy. However, there are major challenges faced by researchers. Inference of biological network structures is often performed on high-dimensional data, yet is hindered by the limited sample size of high throughput omics data. Furthermore, modeling biological networks involves complex analyses capable of integrating multiple sources of omics layers and summarizing large amounts of information.
My dissertation aims to address these challenges by presenting new approaches for high-dimensional network inference with limit sample sizes as well as methods and tools for integrated network analysis applied to multiple research domains in cancer genomics. First, I introduce a novel method for reconstructing gene regulatory networks called SHINE (Structure Learning for Hierarchical Networks) and present an evaluation on simulated and real datasets including a Pan-Cancer analysis using The Cancer Genome Atlas (TCGA) data. Next, I summarize the challenges with executing and managing data processing workflows for large omics datasets on high performance computing environments and present multiple strategies for using Nextflow for reproducible scientific workflows including shine-nf - a collection of Nextflow modules for structure learning. Lastly, I introduce the methods, objects, and tools developed for the analysis of biological networks used throughout my dissertation work. Together - these contributions were used in focused analyses of understanding the molecular mechanisms of tumor maintenance and progression in subtype networks of Breast Cancer and Head and Neck Squamous Cell Carcinoma.
|
69 |
An African Genome Variation Database and its applications in human diversity and healthTodt, Davis 22 March 2022 (has links)
African genomes exhibit the highest levels of sequence and haplotype diversity of all extant human populations. A combination of historical as well as geographical factors have contributed toward the high level of genetic diversity in Ancestral populations in Africa. Additionally, a series of concomitant migration events out of Africa, with founder populations harbouring only a subset of this genetic variation, have contributed to the relatively lower genetic diversity observed in non-Africans. Population genetic studies have refined our understanding of human evolutionary history and clinical genomic studies have resulted in improved patient outcomes. However, despite the increased throughput and decreased cost afforded from next-generation sequencing (NGS) and despite the relatively higher genetic variation in Africans, relatively little of the genomic data currently available is representative of diverse African populations. This may result in adverse outcomes in the context of minority populations with little representation in clinical databases. Given the under-representation of African genetic variation and the importance of highlighting and further characterizing it, the objectives of this project were to design, develop and deploy a proof of concept database and web application for the storage, analysis and visualization of African genetic variant data – the African Genome Variation Database (AGVD). The AGVD was developed according to software industry design standards. The project also explored available genomic tools and databases in order to leverage existing software solutions where suitable. Additionally, relevant data sets were identified for use during testing and validation of the pilot phase of the project. To this end, the open access 1000 Genomes Project phase 3 dataset was selected and the genotypes for several chromosomes were loaded into the AGVD. The AGVD leverages the scalable, performant, and open source genomics engine OpenCGA for data storage and analysis. A custom front-end web application was developed by applying a novel approach to render and serve static Vue JS assets from the Python Flask microframework. The web application supports rich data search and filtering operations of loaded variants and allows end-users to visualize annotations of genomic loci and allele change, variant type, associated gene and transcript consequences, clinical significance, and allele frequency information for all annotated cohorts in a highly interactive manner. A bespoke REST API also supports future analytical functionality. The AGVD has demonstrated proof of concept in the secure and scalable storage and visualization of African genomic data, providing a viable solution for H3ABioNet to further extend in future iterations of the project and a valuable resource for researchers to explore African genetic variation.
|
70 |
Revisiting and re-computing the X-score scoring functionMambo, Hilaire Mobele January 2014 (has links)
Includes bibliographical references. / Scoring functions seek to compute in different ways protein-ligand binding energies by summing together the individual pairwise atomic interaction energies observed in crystal structures between the protein and the bound ligand. To date though, accurate prediction remains a big challenge since existing scoring functions fail to reproduce known binding energies with a sufficient degree of accuracy and robustness. To overcome this problem, we assign a discrete weighting to the individual atomic interaction to account for entropic desolvation factors on ligand binding. We thereafter re-compute the revised scoring function and test the output against multiple sets of data to examine the robustness of the heuristic weightings used.
|
Page generated in 0.1469 seconds