The term âtoxico-chemogenomicsâ is used to convey extension of toxicogenomics to more broadly survey gene expression changes across chemical space. Moving towards an improved, publicly available toxico-chemogenomics capability requires not only common data standards and protocols across public resources, but also broad data coverage within the chemical, genomics and toxicological information domains, and transparent and functional linkages of Internet data resources. The first goal of this project was to assess the current extent of standardization, interoperability, and chemical indexing of public genomics resources with respect to toxico-chemogenomics utility. Focusing on the largest of these public data resources â Gene Expression Omnibus (GEO) and ArrayExpress -- the second goal was to chemically index the full experimental content of these repositories to assess the current coverage of chemical exposure-related microarray experiments in relation to chemical space and toxicology, and to make these data accessible in relation to other publicly available, chemically-indexed toxicological information. Current standards for chemical annotation within ArrayExpress and GEO are presently inadequate to this task, such that development of new methodologies to mine the author-submitted content was required. A series of automated Perl programs were utilized along with extensive manual review to transform the raw experiment/study descriptions and text files into a standardized chemically-indexed inventory of microarray experiments in both resources. These files and top-level experiment annotations allowed for identification of all current chemical-associated experimental content as well as the subset of chemical exposure-related (or âTreatmentâ) content deemed most relevant to toxicogenomics in the GEO Series and ArrayExpress Repository experiment inventories. With chemical exposure experiments suitably indexed by chemical structure, it is possible for the first time to assess the breadth of chemical study space represented in these databases, as well as the overlapping chemical content, and to begin to assess the sufficiency of data for making chemical similarity inferences. Chemical indexing of public genomics databases is also the first step towards integrating chemical, toxicological and genomics data into predictive toxicology by providing linkages across public resources. The main products of this effort include the following: (1) published, downloadable and structure-searchable DSSTox Structure-Index (Locator) files for both the GEO Series (GEOGDS) and ArrayExpress Repository (ARYEXP), containing standard chemical fields for the unique chemical âTreatmentâ subset, accompanied by URLs to AccessionID experiment pages in GEO and ArrayExpress; (2) published, downloadable DSSTox Aux data files for GEOGDS and ARYEXP providing a chemical-experiment pair index to all chemical-associated content in each resource and containing 14 standard genomics fields (e.g., Experiment_Title, Experiment_Description, Experiment_ArrayType, Species, Number_Samples, etc.) and source-specific fields extracted from each resource (e.g., MIAME_Protocol, MIAMI_Factors, etc. for ArrayExpress); and (3) incorporation of the âTreatmentâ chemical-experiment pair index with URLs linked directly to AccessionID pages for GEO and ArrayExpress into the National Center for Biotechnology Information (NCBI) PubChem resource. The secondary product of this effort is a methodology discussion about the proper use of public microarray data with a demonstrative analysis of how one might use the newly identified public microarray data.
|Date||16 December 2008|
|Creators||Williams-DeVane, ClarLynda Raynell|
|Contributors||David Muddiman, Dahlia Nielsen, Ann Richard, Jose Alonso, Steffen Heber|
|Source Sets||North Carolina State University|
|Rights||unrestricted, I hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dis sertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to NC State University or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.|
Page generated in 0.0206 seconds