Return to search

An Application for Downloading and Integrating Molecular Biology Data

Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the degree Master of Sciences
in the School of Informatics Indiana University
July 2004 / Integrating large volumes of data from diverse sources is a formidable challenge for many investigators in the field of molecular biology. Developing efficient methods for accessing and integrating this data is a major focus of investigation in the field of bioinformatics.
In early 2003, the Hereditary Genomics division of the department of Medical and Molecular Genetics at IUPUI recognized the need for a software application that would automate many of the manual processes that were being used to obtain data for their research. The two primary objectives for this project were: 1) an application that would provide large-scale, integrated output tables to help answer questions that frequently arose in the course of their research, and 2) a graphic user interface (GUI) that would minimize or eliminate the need for technical expertise in computer programming or database operations on the part of the end-users.
In early 2003, Indiana University (IU), IBM, and the Indiana Genomics Initiative (INGEN) introduced a new resource called Centralized Life Sciences Data Services (CLSD). CLSD is a centralized data repository that provides programmatic access to biological data that is collected and integrated from multiple public, online databases.
METHODS
1. an in-depth analysis was conducted to assess the department's data requirements and map these requirements to the data available at CLSD
2. CLSD incorporated new data as necessary
3. SQL was written to generate tables that would replace the targeted manual processes
4. a DB2 client was installed in Medical and Molecular Genetics to establish remote access to CLSD
5. a graphic user interface (GUI) was designed and implemented in HTML/CGI
6. a PERL program was written to accept parameters from the web input form, submit queries to CLSD, and generate HTML-based output tables
7. validation, updates, and maintenance procedures were conducted after early prototype implementation
RESULTS AND CONCLUSIONS
This application resulted in a substantial increase in efficiency over the manual methods that were previously used for data collection. The application also allows research teams to update their data much more frequently. A high level of accuracy in the output tables was confirmed by a thorough validation process.

Identiferoai:union.ndltd.org:IUPUI/oai:scholarworks.iupui.edu:1805/369
Date24 August 2005
CreatorsFontaine, Burr R.
ContributorsForoud, Tatiana
Source SetsIndiana University-Purdue University Indianapolis
Languageen_US
Detected LanguageEnglish
TypeThesis
Format328729 bytes, application/pdf

Page generated in 0.0013 seconds