Global ETD Search

1	Next-generation information systems for genomics Mungall, Christopher January 2011 (has links) The advent of next-generation sequencing technologies is transforming biology by enabling individual researchers to sequence the genomes of individual organisms or cells on a massive scale. In order to realize the translational potential of this technology we will need advanced information systems to integrate and interpret this deluge of data. These systems must be capable of extracting the location and function of genes and biological features from genomic data, requiring the coordinated parallel execution of multiple bioinformatics analyses and intelligent synthesis of the results. The resulting databases must be structured to allow complex biological knowledge to be recorded in a computable way, which requires the development of logic-based knowledge structures called ontologies. To visualise and manipulate the results, new graphical interfaces and knowledge acquisition tools are required. Finally, to help understand complex disease processes, these information systems must be equipped with the capability to integrate and make inferences over multiple data sets derived from numerous sources. RESULTS: Here I describe research, design and implementation of some of the components of such a next-generation information system. I first describe the automated pipeline system used for the annotation of the Drosophila genome, and the application of this system in genomic research. This was succeeded by the development of a flexible graphoriented database system called Chado, which relies on the use of ontologies for structuring data and knowledge. I also describe research to develop, restructure and enhance a number of biological ontologies, adding a layer of logical semantics that increases the computability of these key knowledge sources. The resulting database and ontology collection can be accessed through a suite of tools. Finally I describe how the combination of genome analysis, ontology-based database representation and powerful tools can be combined in order to make inferences about genotype-phenotype relationships within and across species. CONCLUSION: The large volumes of complex data generated by high-throughput genomic and systems biology technology threatens to overwhelm us, unless we can devise better computing tools to assist us with its analysis. Ontologies are key technologies, but many existing ontologies are not interoperable or lack features that make them computable. Here I have shown how concerted ontology, tool and database development can be applied to make inferences of value to translational research. 502.85
2	Sibios as a Framework for Biomarker Discovery Using Microarray Data Choudhury, Bhavna 26 July 2006 (has links) Submitted to the Faculty of the School of Informatics in parial fulfillment of the requirements for the degree of Master of Schience in Bioinformatics Indiana University August 2006 / Decoding the human genome resulted in generating large amount of data that need to be analyzed and given a biological meaning. The field of Life Schiences is highly information driven. The genomic data are mainly the gene expression data that are obtained from measurement of mRNA levels in an organism. Efficiently processing large amount of gene expression data has been possible with the help of high throughput technology. Research studies working on microarray data has led to the possibility of finding disease biomarkers. Carrying out biomarker discovery experiments has been greatly facilitated with the emergence of various analytical and visualization tools as well as annotation databases. These tools and databases are often termed as 'bioinformatics services'. The main purpose of this research was to develop SIBIOS (Bystem for Integration of Bioinformatics Services) as a platform to carry out microarray experiments for the purpose of biomarker discovery. Such experiments require the understanding of the current procedures adopted by researchers to extract biologically significant genes. In the course of this study, sample protocols were built for the purpose of biomarker discovery. A case study on the BCR-ABL subtype of ALL was selected to validate the results. Different approaches for biomarker discovery were explored and both statistical and mining techniques were considered. Biological annotation of the results was also carried out. The final task was to incorporate the new proposed sample protocols into SIBIOS by providing the workflow capabilities and therefore enhancing the system's characteristics to be able to support biomarker discovery workflows. SIBIOS Microarray Data genomic data bioinformatics
3	Genomic Data Augmentation with Variational Autoencoder Thyrum, Emily 12 1900 (has links) In order to treat cancer effectively, medical practitioners must predict pathological stages accurately, and machine learning methods can be employed to make such predictions. However, biomedical datasets, including genomic datasets, often have disproportionately more samples from people of European ancestry than people of other ethnic or racial groups, which can cause machine learning methods to perform better on the European samples than on the people of the under-represented groups. Data augmentation can be employed as a potential solution in order to artificially increase the number of samples from people of under-represented racial groups, and can in turn improve pathological stage predictions for future patients from such under-represented groups. Genomic data augmentation has been explored previously, for example using a Generative Adversarial Network, but to the best of our knowledge, the use of the variational autoencoder for the purpose of genomic data augmentation remains largely unexplored. Here we utilize a geometry-based variational autoencoder that models the latent space as a Riemannian manifold so that samples can be generated without the use of a prior distribution to show that the variational autoencoder can indeed be used to reliably augment genomic data. Using TCGA prostate cancer genotype data, we show that our VAE-generated data can improve pathological stage predictions on a test set of European samples. Because we only had European samples that were labeled in terms of pathological stage, we were not able to validate the African generated samples in this way, but we still attempt to show how such samples may be realistic. / Computer and Information Science Computer science Genomic data augmentation Variational autoencoder
4	Establishing a framework for an African Genome Archive Southgate, Jamie January 2019 (has links) >Magister Scientiae - MSc / The generation of biomedical research data on the African continent is growing, with numerous studies realizing the importance of African genetic diversity in discoveries of human origins and disease susceptibility. The decrease in costs to purchase and utilize such tools has enabled research groups to produce datasets of significant scientific value. However, this success story has resulted in a new challenge for African Researchers and institutions. An increase in data scale and complexity has led to an imbalance of infrastructure and skills to manage, store and analyse this data Genomic data Metadata Data sharing Archive User interface
5	Establishing a Framework for an African Genome Archive Southgate, Jamie January 2021 (has links) >Magister Scientiae - MSc / The generation of biomedical research data on the African continent is grow- ing, with numerous studies realizing the importance of African genetic diver- sity in discoveries of human origins and disease susceptibility. The decrease in costs to purchase and utilize such tools has enabled research groups to produce datasets of signi cant scienti c value. However, this success story has resulted in a new challenge for African Researchers and institutions. An increase in data scale and complexity has led to an imbalance of infrastructure and skills to manage, store and analyse this data. The lack of physical infrastructure has left genomic research on the continent lagging behind its counterparts abroad, drastically limiting the sharing of data and posing challenges for researchers wishing to explore secondary analysis, study veri cation and amalgamation. The scope of this project entailed the design and implementation of a proto- type genome archive to support the e ective use of data resources amongst researchers. The prototype consists of a web interface and storage backend for users to upload and browse projects, datasets and metadata stored in the archive. The server, middleware, database and server-side framework are components of the genome archive and form the software stack. The server component provides the shared resources such as network connectivity, le storage, security and metadata database. The database type implemented in storing the metadata relating to the sample les is a NoSQL database. This database is interfaced with the iRods middleware component which controls data being sent between the server, database and the Flask framework. The Flask framework which is based on the Python programming language, is the development platform of the archive web application. The Cognitive Walkthrough methodology was used to evaluate suitabil- ity of the software for its users. Results showed that the core conceptual model adopted by the prototype software is consistent and that actions available to the user are visible. Issues were raised pertaining to user feedback when per- forming tasks and metadata term meaning. The development of a continent wide genome archive for Africa is feasible by utilizing open source software and metadata standards to improve data discovery and reuse. Genomic data Archive Database Mycobacterium tuberculosis Data repository
6	Privacy Preserving Kin Genomic Data Publishing Shang, Hui 16 July 2020 (has links) No description available. Computer Science Kin genomic data Differential privacy Factor graph
7	An Efficient Algorithm for Clustering Genomic Data Zhou, Xuan January 2014 (has links) No description available. Computer Science genomic data clustering discretization 1D-Jury dimension reduction
8	Functional characterization of C/D snoRNA-derived microRNAs Lemus Diaz, Gustavo Nicolas 08 December 2017 (has links) No description available. 570 snoRNA miRNA NGS Analytical flow cytometry Genomic data science bioinformatics Dual reporter assays Biologie (PPN619462639)
9	Computational Inference of Genome-Wide Protein-DNA Interactions Using High-Throughput Genomic Data Zhong, Jianling January 2015 (has links) <p>Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape. </p><p>We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations. </p><p>We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites. </p><p>Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets. </p><p>This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.</p> / Dissertation Bioinformatics Statistics Computer science Bayes factor Genomic data integration Protein-DNA interactions state-space models statistical inference transcriptional regulation
10	LivelyViz: an approach to develop interactive collaborative web visualizations Bazurto Blacio, Voltaire 03 January 2017 (has links) We investigate the development of collaborative data dashboards, comprised of web visualization components. For this, we explore the use of Lively Web as a development platform and provide a framework for developing web collaborative scientific visualizations. We use a modern thin-client approach that moves most of the specific application processing logic from the client side to the server side, leveraging the implementation of reusable web services. As a web application, it provides users with multi-platform and multi-device compatibility along with enhanced concurrent access from remote locations. Our platform focuses on providing reusable, interactive, extensible and tightly- integrated web visualization components. Such visualization components are designed to be readily usable in distributed-synchronous collaborative environments. As use case we consider the development of a dashboard for researchers working with bioinformatics datasets, in particular Poxviruses data. We argue that our thin-client approach for developing web collaborative visualizations can greatly benefit researchers in different geographic locations in their mission of analyzing datasets as a team. / Graduate LivelyViz Lively Visualization Collaboration Bioinformatics Dashboard Collaborative Collaborative visualizations Synchronous collaboration Computer science Genomic data Framework Thin-client

Search results