Cheap electronic storage and Internet bandwidth has increased the amount of online data. Large quantities of metadata are created to manage this wealth of information. Methods to organize and structure metadata has led to the development of ontologies - data that is organized to describe the relation between elements. The creation of large ontologies has brought forth the need for ontology management strategies. Ontology alignment and merging techniques are standard operations for ontology management. Accurate ontology alignment methods are typically semi-automatic, meaning they require periodic user input. This becomes infeasible on large ontologies and the accuracy and efficiency drops significantly when these algorithms are forced to align without human interaction. Bioinformatics, for example, has seen the influx of large ontologies, such as signal pathway sets with thousands of elements or protein-protein interaction (PPI) databases with hundreds of thousands of elements. This drives the need for a reliable method of large-scale ontology alignment.
Many bioinformatics ontologies contain references to domain ontologies - manually curated ontologies describing additional, general information about the terms in the ontologies. For example, more than 2/3 of proteins in PPI data sets contain at least one annotation to the domain ontology the Gene Ontology. We use the domain ontology references as features to compute similarity between elements. However, there are few efficient ways to compute similarity from structured features. We present a novel, automatic method for aligning ontologies based on such domain ontology features.
Specifically, we use simulated annealing to reduce the complexity of the domain ontologys structure by finding approximate relevant clusters of elements. An intermediate step performs hierarchical clustering based on the similarity between elements of the ontology. Then the mapping between clusters across aligning ontologies is built. The final step builds an alignment between matched clusters.
To evaluate our methods, we perform an alignment between Human (Homo Sapiens) and Yeast (Saccharomyces cerevisiae) signal pathways provided by the Reactome database. The results were compared against reliable homology studies of proteins. The final mapping produces alignments that are significantly more accurate than the traditional ontology alignment methods, without any human involvement.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:BVAU.2429/821 |
Date | 11 1900 |
Creators | Carbonetto, Andrew August |
Publisher | University of British Columbia |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | English |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Page generated in 0.0016 seconds