Global ETD Search

1	The Protein Binding Potential of C2H2 Zinc Finger Domains Brayer, Kathryn Jo January 2008 (has links) Cys2-His2 (C2H2) zinc finger domains were originally identified as DNA binding domains, and uncharacterized domains are typically assumed to bind DNA. However, a growing body of evidence suggests an important and widespread role for these domains in protein binding. Over 100 C2H2 zinc finger-protein interactions have been described. This study uses common bioinformatics tools to identify sequence features that predict a DNA- or protein-binding function. Several issues, including uncertainties about the full functional capabilities of the zinc fingers, complicated these efforts. Therefore, an unbiased approach which directly examined the potential for zinc fingers to facilitate DNA or protein interactions was used to determine the full functional capabilities of the C2H2 domains in two model proteins, human OLF-1/EBF associated zinc finger (OAZ) protein and Zif268. OAZ contains 30 zinc fingers in six clusters, some of which have been previously indicated in DNA or protein interactions. Zif268 is a well-known DNA binding protein with three C2H2 domains. DNA binding was assessed using a target site selection (CAST) assay, and protein binding was assessed using a yeast two-hybrid assay. Results indicate that clusters known to bind DNA could facilitate specific protein interactions, but clusters known to bind protein did not facilitate specific DNA interactions, indicating that DNA binding is a more restricted function of zinc fingers than has previously been recognized. These results also suggest that the role of C2H2 zinc finger domains in protein interactions has probably been underestimated. The implication of these findings for the prediction of zinc finger function is discussed. transcription factors protein-DNA interactions protein-protien interactions protein chemistry structural biology functional annotations
2	A Method for Integrating Heterogeneous Datasets based on GO Term Similarity Thanthiriwatte, Chamali Lankara 11 December 2009 (has links) This thesis presents a method for integrating heterogeneous gene/protein datasets at the functional level based on Gene Ontology term similarity. Often biologists want to integrate heterogeneous data sets obtain from different biological samples. A major challenge in this process is how to link the heterogeneous datasets. Currently, the most common approach is to link them through common reference database identifiers which tend to result in small number of matching identifiers. This is due to lack of standard accession schemes. Due to this problem, biologists may not recognize the underlying biological phenomena revealed by a combination of the data but by each data set individually. We discuss an approach for integrating heterogeneous datasets by computing the similarity among them based on the similarity of their GO annotations. Then we group the genes and/or proteins with similar annotations by applying a hierarchical clustering algorithm. The results demonstrate a more comprehensive understanding of the biological processes involved. Functional Annotations Similarity Matrix Transcriptomics Hierarchical Clustering Gene Ontology Proteomics Semantic Similarity Gene Expression Protein Expression
3	Phaeodactylum tricornutum genome and epigenome : characterization of natural variants / Phaeodactylum tricornutum génome et épigénome : caractérisation des variantes naturelles Rastogi, Achal 27 October 2016 (has links) Depuis la découverte de Phaeodactylum tricornutum par Bohlin en 1897, sa classification au sein de l'arbre de la vie a été controversée. En utilisant des morphotypes ovales et fusiformes Lewin a décrit en 1958 plusieurs traits caractéristiques de cette espèce rappelant la structure des diatomées mettant ainsi fin à la controverse sur la classification de P. tricornutum au sein des Bacillariophycées. Pour se faire, trois morphotypes (ovale, fusiforme et triradié) de Phaeodactylum tricornutum ont été observés. Au cours d’une centaine d’années environ, de 1908 à 2000, 10 souches de Phaeodactylum tricornutum (appelées écotypes) ont été collectées et stockées soit de manière axénique ou en l’état avec leur populations naturelles de bactéries dans les centres des ressources génétiques pour algues, cryo-préservées quand cela est possible. Divers outils cellulaires et moléculaires ont été établis pour disséquer et comprendre la physiologie et l'évolution de P. tricornutum, et/ou les diatomées en général. Grâce à des décennies de recherche et les efforts déployés par de nombreux laboratoires que P. tricornutum est aujourd’hui considérée comme une espèce modèle des diatomées. Le sujet de ma thèse traite majoritairement de la composition génétique et épigénétique du génome de P. tricornutum ainsi que de la diversité morphologique et physiologique sousjacente au sein des populations naturelles prospectées à différents endroits du globe. Pour se faire, j’ai généré les profils chromatiniens en utilisant différentes marques des modifications post-traductionnelles des histones (chapitres 1 et 2) et a également comparé la variation naturelle dans la distribution de certaines marques clés entre deux populations d’écotypes (chapitre 4). Nous avons également généré une carte de la diversité génétique à l’échelle du génome chez 10 écotypes de P. tricornutum révélant ainsi la présence d'un complexe d'espèces dans le genre Phaeodactylum comme la conséquence d’une hybridation ancienne (chapitre 3). Sur la base de nombreux rapports antérieurs et des observations similaires au sein de P. tricornutum, nous proposons l’hybridation naturelle comme une base solide et une possibilité plausible pour expliquer la diversité des espèces chez lest diatomées. De plus, nous avons mis à jour les annotations fonctionnelles et structurelles du génome de P. tricornutum (Phatr3, chapitre 2) et mis au point un algorithme de logiciel convivial pour aller chercher les cibles CRISPR du système d’édition du génome CRISPR / cas9 chez 13 génomes de phytoplancton incluant P. tricornutum (chapitre 5). Pour accomplir tout cela, j'ai utilisé diverses méthodes à la pointe de l’état de l’art comme la spectrométrie de masse, l’immunoprécipitation de la chromatine suivie de séquençage à haut débit ainsi que les séquençages du génome entier, de l'ARN et des protocoles d'édition du génome CRISPR et plusieurs logiciels / pipelines de calcul. Ainsi, le travail de thèse fournit une plate-forme complète qui pourra être utilisée à l’avenir pour des études épigénétiques, de génétiques moléculaires et fonctionnelles chez les diatomées en utilisant comme espèce modèle Phaeodactylum tricornutum. Ce travail est pionnier et représente une valeur ajoutée importante dans le domaine de la recherche sur les diatomées en répondant à des questions nouvelles ouvrant ainsi de nouveaux horizons à la recherche en particulier en épigénétique qui joue un rôle important mais pas encore assez apprécié dans le succès écologique des diatomées dans les océans actuels. / Since the discovery of Phaeodactylum tricornutum by Bohlin in 1897, its classification within the tree of life has been controversial. It was in 1958 when Lewin, using oval and fusiform morphotypes, described multiple characteristic features of this species that resemble diatoms structure, the debate to whether classify P. tricornutum as a member of Bacillariophyceae was ended. To this point three morphotypes (oval, fusiform and triradiate) of Phaeodactylum tricornutum have been observed. Over the course of approximately 100 years, from 1908 till 2000, 10 strains of Phaeodactylum tricornutum (referred to asecotypes) have been collected and stored axenically as cryopreserved stocks at various stock centers. Various cellular and molecular tools have been established to dissect and understand the physiology and evolution of P. tricornutum, and/or diatoms in general. It is because of decades of research and efforts by many laboratories that now P. tricornutum is considered to be a model diatom species. My thesis majorly focuses in understanding the genetic and epigenetic makeup of P. tricornutum genome to decipher the underlying morphological and physiological diversity within different ecotype populations. To do so, I established the epigenetic landscape within P. tricornutum genome using various histone post-translational modification marks (chapter 1 and chapter 2) and also compared the natural variation in the distribution of some key histone PTMs between two ecotype populations (chapter 4). We also generated a genome-wide genetic diversity map across 10 ecotypes of P. tricornutum revealing the presence of a species-complex within the genus Phaeodactylum as aconsequence of ancient hybridization (Chapter 3). Based on the evidences from many previous reports and similar observations within P. tricornutum, we propose natural hybridization as a strong and potential foundation for explaining unprecedented species diversity within the diatom clade. Moreover, we updated the functional and structural annotations of P. tricornutum genome (Phatr3, chapter 2) and developed a user-friendly software algorithm to fetch CRISPR/Cas9 targets, which is a basis to perform knockout studies using CRISPR/Cas9 genome editing protocol, in 13 phytoplankton genomes including P. tricornutum (chapter 5). To accomplish all this, I used various state-of-the-art technologies like Mass-Spectrometry, ChIPsequencing, Whole genome sequencing, RNA sequencing, CRISPR genome editing protocols and several computational softwares/pipelines. In brief, the thesis work provides a comprehensive platform for future epigenetic, genetic and functional molecular studies in diatoms using Phaeodactylum tricornutum as a model. The work is an addon value to the current state of diatom research by answering questions that have never been asked before and opens a completely new horizon and demand of epigenetics research that underlie the ecological success of diatoms in modern-day ocean. Phaeodactylum tricornutum Diatomées Epigénétique Génomique des populations Morphogenèse Logiciels Annotations fonctionnelles H3K27me3 CRISPR/cas9 Phaeodactylum tricornutum Diatoms Epigenetics Population genomics Morphogenesis Software Functional annotations H3K27me3 CRISPR/Cas9 570 004
4	Evolution von ontologiebasierten Mappings in den Lebenswissenschaften / Evolution of ontology-based mappings in the life sciences Groß, Anika 19 March 2014 (has links) (PDF) Im Bereich der Lebenswissenschaften steht eine große und wachsende Menge heterogener Datenquellen zur Verfügung, welche häufig in quellübergreifenden Analysen und Auswertungen miteinander kombiniert werden. Um eine einheitliche und strukturierte Erfassung von Wissen sowie einen formalen Austausch zwischen verschiedenen Applikationen zu erleichtern, kommen Ontologien und andere strukturierte Vokabulare zum Einsatz. Sie finden Anwendung in verschiedenen Domänen wie der Molekularbiologie oder Chemie und dienen zumeist der Annotation realer Objekte wie z.B. Gene oder Literaturquellen. Unterschiedliche Ontologien enthalten jedoch teilweise überlappendes Wissen, so dass die Bestimmung einer Abbildung (Ontologiemapping) zwischen ihnen notwendig ist. Oft ist eine manuelle Mappingerstellung zwischen großen Ontologien kaum möglich, weshalb typischerweise automatische Verfahren zu deren Abgleich (Matching) eingesetzt werden. Aufgrund neuer Forschungserkenntnisse und Nutzeranforderungen verändern sich die Ontologien kontinuierlich weiter. Die Evolution der Ontologien hat wiederum Auswirkungen auf abhängige Daten wie beispielsweise Annotations- und Ontologiemappings, welche entsprechend aktualisiert werden müssen. Im Rahmen dieser Arbeit werden neue Methoden und Algorithmen zum Umgang mit der Evolution ontologie-basierter Mappings entwickelt. Dabei wird die generische Infrastruktur GOMMA zur Verwaltung und Analyse der Evolution von Ontologien und Mappings genutzt und erweitert. Zunächst wurde eine vergleichende Analyse der Evolution von Ontologiemappings für drei Subdomänen der Lebenswissenschaften durchgeführt. Ontologien sowie Mappings unterliegen teilweise starken Änderungen, wobei die Evolutionsintensität von der untersuchten Domäne abhängt. Insgesamt zeigt sich ein deutlicher Einfluss von Ontologieänderungen auf Ontologiemappings. Dementsprechend können bestehende Mappings infolge der Weiterentwicklung von Ontologien ungültig werden, so dass sie auf aktuelle Ontologieversionen migriert werden müssen. Dabei sollte eine aufwendige Neubestimmung der Mappings vermieden werden. In dieser Arbeit werden zwei generische Algorithmen zur (semi-) automatischen Adaptierung von Ontologiemappings eingeführt. Ein Ansatz basiert auf der Komposition von Ontologiemappings, wohingegen der andere Ansatz eine individuelle Behandlung von Ontologieänderungen zur Adaptierung der Mappings erlaubt. Beide Verfahren ermöglichen die Wiederverwendung unbeeinflusster, bereits bestätigter Mappingteile und adaptieren nur die von Änderungen betroffenen Bereiche der Mappings. Eine Evaluierung für sehr große, biomedizinische Ontologien und Mappings zeigt, dass beide Verfahren qualitativ hochwertige Ergebnisse produzieren. Ähnlich zu Ontologiemappings werden auch ontologiebasierte Annotationsmappings durch Ontologieänderungen beeinflusst. Die Arbeit stellt einen generischen Ansatz zur Bewertung der Qualität von Annotationsmappings auf Basis ihrer Evolution vor. Verschiedene Qualitätsmaße erlauben die Identifikation glaubwürdiger Annotationen beispielsweise anhand ihrer Stabilität oder Herkunftsinformationen. Eine umfassende Analyse großer Annotationsdatenquellen zeigt zahlreiche Instabilitäten z.B. aufgrund temporärer Annotationslöschungen. Dementsprechend stellt sich die Frage, inwieweit die Datenevolution zu einer Veränderung von abhängigen Analyseergebnissen führen kann. Dazu werden die Auswirkungen der Ontologie- und Annotationsevolution auf sogenannte funktionale Analysen großer biologischer Datensätze untersucht. Eine Evaluierung anhand verschiedener Stabilitätsmaße erlaubt die Bewertung der Änderungsintensität der Ergebnisse und gibt Aufschluss, inwieweit Nutzer mit einer signifikanten Veränderung ihrer Ergebnisse rechnen müssen. Darüber hinaus wird GOMMA um effiziente Verfahren für das Matching sehr großer Ontologien erweitert. Diese werden u.a. für den Abgleich neuer Konzepte während der Adaptierung von Ontologiemappings benötigt. Viele der existierenden Match-Systeme skalieren nicht für das Matching besonders großer Ontologien wie sie im Bereich der Lebenswissenschaften auftreten. Ein effizienter, kompositionsbasierter Ansatz gleicht Ontologien indirekt ab, indem existierende Mappings zu Mediatorontologien wiederverwendet und miteinander kombiniert werden. Mediatorontologien enthalten wertvolles Hintergrundwissen, so dass sich die Mappingqualität im Vergleich zu einem direkten Matching verbessern kann. Zudem werden generelle Strategien für das parallele Ontologie-Matching unter Verwendung mehrerer Rechenknoten vorgestellt. Eine größenbasierte Partitionierung der Eingabeontologien verspricht eine gute Lastbalancierung und Skalierbarkeit, da kleinere Teilaufgaben des Matchings parallel verarbeitet werden können. Die Evaluierung im Rahmen der Ontology Alignment Evaluation Initiative (OAEI) vergleicht GOMMA und andere Systeme für das Matching von Ontologien in verschiedenen Domänen. GOMMA kann u.a. durch Anwendung des parallelen und kompositionsbasierten Matchings sehr gute Ergebnisse bezüglich der Effektivität und Effizienz des Matchings, insbesondere für Ontologien aus dem Bereich der Lebenswissenschaften, erreichen. / In the life sciences, there is an increasing number of heterogeneous data sources that need to be integrated and combined in comprehensive analysis tasks. Often ontologies and other structured vocabularies are used to provide a formal representation of knowledge and to facilitate data exchange between different applications. Ontologies are used in different domains like molecular biology or chemistry. One of their most important applications is the annotation of real-world objects like genes or publications. Since different ontologies can contain overlapping knowledge it is necessary to determine mappings between them (ontology mappings). A manual mapping creation can be very time-consuming or even infeasible such that (semi-) automatic ontology matching methods are typically applied. Ontologies are not static but underlie continuous modifications due to new research insights and changing user requirements. The evolution of ontologies can have impact on dependent data like annotation or ontology mappings. This thesis presents novel methods and algorithms to deal with the evolution of ontology-based mappings. Thereby the generic infrastructure GOMMA is used and extended to manage and analyze the evolution of ontologies and mappings. First, a comparative evolution analysis for ontologies and mappings from three life science domains shows heavy changes in ontologies and mappings as well as an impact of ontology changes on the mappings. Hence, existing ontology mappings can become invalid and need to be migrated to current ontology versions. Thereby an expensive redetermination of the mappings should be avoided. This thesis introduces two generic algorithms to (semi-) automatically adapt ontology mappings: (1) a composition-based adaptation relies on the principle of mapping composition, and (2) a diff-based adaptation algorithm allows for individually handling change operations to update mappings. Both approaches reuse unaffected mapping parts, and adapt only affected parts of the mappings. An evaluation for very large biomedical ontologies and mappings shows that both approaches produce ontology mappings of high quality. Similarly, ontology changes may also affect ontology-based annotation mappings. The thesis introduces a generic evaluation approach to assess the quality of annotation mappings based on their evolution. Different quality measures allow for the identification of reliable annotations, e.g., based on their stability or provenance information. A comprehensive analysis of large annotation data sources shows numerous instabilities, e.g., due to the temporary absence of annotations. Such modifications may influence results of dependent applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. The question arises to what degree ontology and annotation changes may affect such analyses. Based on different stability measures the evaluation assesses change intensities of application results and gives insights whether users need to expect significant changes of their analysis results. Moreover, GOMMA is extended by large-scale ontology matching techniques. Such techniques are useful, a.o., to match new concepts during ontology mapping adaptation. Many existing match systems do not scale for aligning very large ontologies, e.g., from the life science domain. One efficient composition-based approach indirectly computes ontology mappings by reusing and combining existing mappings to intermediate ontologies. Intermediate ontologies can contain useful background knowledge such that the mapping quality can be improved compared to a direct match approach. Moreover, the thesis introduces general strategies for matching ontologies in parallel using several computing nodes. A size-based partitioning of the input ontologies enables good load balancing and scalability since smaller match tasks can be processed in parallel. The evaluation of the Ontology Alignment Evaluation Initiative (OAEI) compares GOMMA and other systems in terms of matching ontologies from different domains. Using the parallel and composition-based matching, GOMMA can achieve very good results w.r.t. efficiency and effectiveness, especially for ontologies from the life science domain. Ontologien Mappings Ontologie-Mapping Evolution Änderungen Mappingevolution Ontologieevolution Adaptierung Migration Ontologie-Matching Komposition Mediatorontologie Lebenswissenschaften Biomedizin Annotationen funktionale Analysen UMLS FMA SNOMED CT Adult Mouse Anatomy Gene Ontology ontology ontologies ontology mapping ontology alignment ontology evolution mapping migration mapping change mapping composition ontology matching ontology change ontology development mapping adaptationmediator ontology biomedical ontologies UMLS FMA SNOMED CT Adult Mouse Anatomy Gene Ontology functional annotations term enrichment analysis ddc:570
5	Evolution von ontologiebasierten Mappings in den Lebenswissenschaften Groß, Anika 05 March 2014 (has links) Im Bereich der Lebenswissenschaften steht eine große und wachsende Menge heterogener Datenquellen zur Verfügung, welche häufig in quellübergreifenden Analysen und Auswertungen miteinander kombiniert werden. Um eine einheitliche und strukturierte Erfassung von Wissen sowie einen formalen Austausch zwischen verschiedenen Applikationen zu erleichtern, kommen Ontologien und andere strukturierte Vokabulare zum Einsatz. Sie finden Anwendung in verschiedenen Domänen wie der Molekularbiologie oder Chemie und dienen zumeist der Annotation realer Objekte wie z.B. Gene oder Literaturquellen. Unterschiedliche Ontologien enthalten jedoch teilweise überlappendes Wissen, so dass die Bestimmung einer Abbildung (Ontologiemapping) zwischen ihnen notwendig ist. Oft ist eine manuelle Mappingerstellung zwischen großen Ontologien kaum möglich, weshalb typischerweise automatische Verfahren zu deren Abgleich (Matching) eingesetzt werden. Aufgrund neuer Forschungserkenntnisse und Nutzeranforderungen verändern sich die Ontologien kontinuierlich weiter. Die Evolution der Ontologien hat wiederum Auswirkungen auf abhängige Daten wie beispielsweise Annotations- und Ontologiemappings, welche entsprechend aktualisiert werden müssen. Im Rahmen dieser Arbeit werden neue Methoden und Algorithmen zum Umgang mit der Evolution ontologie-basierter Mappings entwickelt. Dabei wird die generische Infrastruktur GOMMA zur Verwaltung und Analyse der Evolution von Ontologien und Mappings genutzt und erweitert. Zunächst wurde eine vergleichende Analyse der Evolution von Ontologiemappings für drei Subdomänen der Lebenswissenschaften durchgeführt. Ontologien sowie Mappings unterliegen teilweise starken Änderungen, wobei die Evolutionsintensität von der untersuchten Domäne abhängt. Insgesamt zeigt sich ein deutlicher Einfluss von Ontologieänderungen auf Ontologiemappings. Dementsprechend können bestehende Mappings infolge der Weiterentwicklung von Ontologien ungültig werden, so dass sie auf aktuelle Ontologieversionen migriert werden müssen. Dabei sollte eine aufwendige Neubestimmung der Mappings vermieden werden. In dieser Arbeit werden zwei generische Algorithmen zur (semi-) automatischen Adaptierung von Ontologiemappings eingeführt. Ein Ansatz basiert auf der Komposition von Ontologiemappings, wohingegen der andere Ansatz eine individuelle Behandlung von Ontologieänderungen zur Adaptierung der Mappings erlaubt. Beide Verfahren ermöglichen die Wiederverwendung unbeeinflusster, bereits bestätigter Mappingteile und adaptieren nur die von Änderungen betroffenen Bereiche der Mappings. Eine Evaluierung für sehr große, biomedizinische Ontologien und Mappings zeigt, dass beide Verfahren qualitativ hochwertige Ergebnisse produzieren. Ähnlich zu Ontologiemappings werden auch ontologiebasierte Annotationsmappings durch Ontologieänderungen beeinflusst. Die Arbeit stellt einen generischen Ansatz zur Bewertung der Qualität von Annotationsmappings auf Basis ihrer Evolution vor. Verschiedene Qualitätsmaße erlauben die Identifikation glaubwürdiger Annotationen beispielsweise anhand ihrer Stabilität oder Herkunftsinformationen. Eine umfassende Analyse großer Annotationsdatenquellen zeigt zahlreiche Instabilitäten z.B. aufgrund temporärer Annotationslöschungen. Dementsprechend stellt sich die Frage, inwieweit die Datenevolution zu einer Veränderung von abhängigen Analyseergebnissen führen kann. Dazu werden die Auswirkungen der Ontologie- und Annotationsevolution auf sogenannte funktionale Analysen großer biologischer Datensätze untersucht. Eine Evaluierung anhand verschiedener Stabilitätsmaße erlaubt die Bewertung der Änderungsintensität der Ergebnisse und gibt Aufschluss, inwieweit Nutzer mit einer signifikanten Veränderung ihrer Ergebnisse rechnen müssen. Darüber hinaus wird GOMMA um effiziente Verfahren für das Matching sehr großer Ontologien erweitert. Diese werden u.a. für den Abgleich neuer Konzepte während der Adaptierung von Ontologiemappings benötigt. Viele der existierenden Match-Systeme skalieren nicht für das Matching besonders großer Ontologien wie sie im Bereich der Lebenswissenschaften auftreten. Ein effizienter, kompositionsbasierter Ansatz gleicht Ontologien indirekt ab, indem existierende Mappings zu Mediatorontologien wiederverwendet und miteinander kombiniert werden. Mediatorontologien enthalten wertvolles Hintergrundwissen, so dass sich die Mappingqualität im Vergleich zu einem direkten Matching verbessern kann. Zudem werden generelle Strategien für das parallele Ontologie-Matching unter Verwendung mehrerer Rechenknoten vorgestellt. Eine größenbasierte Partitionierung der Eingabeontologien verspricht eine gute Lastbalancierung und Skalierbarkeit, da kleinere Teilaufgaben des Matchings parallel verarbeitet werden können. Die Evaluierung im Rahmen der Ontology Alignment Evaluation Initiative (OAEI) vergleicht GOMMA und andere Systeme für das Matching von Ontologien in verschiedenen Domänen. GOMMA kann u.a. durch Anwendung des parallelen und kompositionsbasierten Matchings sehr gute Ergebnisse bezüglich der Effektivität und Effizienz des Matchings, insbesondere für Ontologien aus dem Bereich der Lebenswissenschaften, erreichen. / In the life sciences, there is an increasing number of heterogeneous data sources that need to be integrated and combined in comprehensive analysis tasks. Often ontologies and other structured vocabularies are used to provide a formal representation of knowledge and to facilitate data exchange between different applications. Ontologies are used in different domains like molecular biology or chemistry. One of their most important applications is the annotation of real-world objects like genes or publications. Since different ontologies can contain overlapping knowledge it is necessary to determine mappings between them (ontology mappings). A manual mapping creation can be very time-consuming or even infeasible such that (semi-) automatic ontology matching methods are typically applied. Ontologies are not static but underlie continuous modifications due to new research insights and changing user requirements. The evolution of ontologies can have impact on dependent data like annotation or ontology mappings. This thesis presents novel methods and algorithms to deal with the evolution of ontology-based mappings. Thereby the generic infrastructure GOMMA is used and extended to manage and analyze the evolution of ontologies and mappings. First, a comparative evolution analysis for ontologies and mappings from three life science domains shows heavy changes in ontologies and mappings as well as an impact of ontology changes on the mappings. Hence, existing ontology mappings can become invalid and need to be migrated to current ontology versions. Thereby an expensive redetermination of the mappings should be avoided. This thesis introduces two generic algorithms to (semi-) automatically adapt ontology mappings: (1) a composition-based adaptation relies on the principle of mapping composition, and (2) a diff-based adaptation algorithm allows for individually handling change operations to update mappings. Both approaches reuse unaffected mapping parts, and adapt only affected parts of the mappings. An evaluation for very large biomedical ontologies and mappings shows that both approaches produce ontology mappings of high quality. Similarly, ontology changes may also affect ontology-based annotation mappings. The thesis introduces a generic evaluation approach to assess the quality of annotation mappings based on their evolution. Different quality measures allow for the identification of reliable annotations, e.g., based on their stability or provenance information. A comprehensive analysis of large annotation data sources shows numerous instabilities, e.g., due to the temporary absence of annotations. Such modifications may influence results of dependent applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. The question arises to what degree ontology and annotation changes may affect such analyses. Based on different stability measures the evaluation assesses change intensities of application results and gives insights whether users need to expect significant changes of their analysis results. Moreover, GOMMA is extended by large-scale ontology matching techniques. Such techniques are useful, a.o., to match new concepts during ontology mapping adaptation. Many existing match systems do not scale for aligning very large ontologies, e.g., from the life science domain. One efficient composition-based approach indirectly computes ontology mappings by reusing and combining existing mappings to intermediate ontologies. Intermediate ontologies can contain useful background knowledge such that the mapping quality can be improved compared to a direct match approach. Moreover, the thesis introduces general strategies for matching ontologies in parallel using several computing nodes. A size-based partitioning of the input ontologies enables good load balancing and scalability since smaller match tasks can be processed in parallel. The evaluation of the Ontology Alignment Evaluation Initiative (OAEI) compares GOMMA and other systems in terms of matching ontologies from different domains. Using the parallel and composition-based matching, GOMMA can achieve very good results w.r.t. efficiency and effectiveness, especially for ontologies from the life science domain. info:eu-repo/classification/ddc/570 ddc:570

1

Page generated in 0.1417 seconds