Global ETD Search

Return to search

Biological and clinical data integration and its applications in healthcare

Answers to the most complex biological questions are rarely determined solely from the experimental evidence. It requires subsequent analysis of many data sources that are often heterogeneous. Most biological data repositories focus on providing only one particular type of data, such as sequences, molecular interactions, protein structure, or gene expression. In many cases, it is required for researchers to visit several different databases to answer one scientific question. It is essential to develop strategies to integrate disparate biological data sources that are efficient and seamless to facilitate the discovery of novel associations and validate existing hypotheses.

This thesis presents the design and development of different integration strategies of biological and clinical systems.
The BioSPIDA system is a data warehousing solution that integrates
many NCBI databases and other biological sources on protein sequences,
protein domains, and biological pathways. It utilizes a universal
parser facilitating integration without developing separate source
code for each data site. This enables users to execute fine-grained
queries that can filter genes by their protein interactions, gene
expressions, functional annotation, and protein domain
representation. Relational databases can powerfully return and
generate quickly filtered results to research questions, but they are not the most suitable solution in all cases. Clinical patients and genes are typically annotated by concepts in hierarchical ontologies and performance of relational databases are weakened considerably when traversing and representing graph structures. This thesis illustrates when relational databases are most suitable as well as comparing the performance benchmarks of semantic web technologies and graph databases when comparing ontological concepts.

Several approaches of analyzing integrated data will be discussed to demonstrate the advantages over dependencies on remote data centers. Intensive Care Patients are prioritized by their length of stay and their severity class is estimated by their diagnosis to help minimize wait time and preferentially treat patients by their condition. In a separate study, semantic clustering of patients is conducted by integrating a clinical database and a medical ontology to help identify multi-morbidity patterns.
In the biological area, gene pathways, protein interaction networks, and functional annotation are integrated to help predict and prioritize candidate disease genes. This thesis will present the results that were able to be generated from each project through utilizing a local repository of genes, functional annotations, protein interactions, clinical patients, and medical ontologies.

http://hdl.handle.net/1853/54267

Biological database integration

Clinical data warehouse

Candidate gene prioritization

Hospital prioritization

Patient

Machine learning

Identifer	oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/54267
Date	07 January 2016
Creators	Hagen, Matthew
Contributors	Lee, Eva K.
Publisher	Georgia Institute of Technology
Source Sets	Georgia Tech Electronic Thesis and Dissertation Archive
Language	en_US
Detected Language	English
Type	Dissertation
Format	application/pdf

Page generated in 0.0026 seconds

Biological and clinical data integration and its applications in healthcare

Description

Links & Downloads

Tags

Additional Fields