Global ETD Search

11	Standardizing our perinatal language to facilitate data sharing Massey, Kiran Angelina 05 1900 (has links) Our ultimate goal as obstetric and neonatal care providers is to improve care for mothers and their babies. Continuous quality improvement (CQI) involves iterative cycles of practice change and audit of ongoing clinical care identifying practices that are associated with good outcomes. A vital prerequisite to this evidence based medicine is data collection. In Canada, much of the country is covered by separate fragmented silos known as regional reproductive care databases or perinatal health programs. A more centralized system which includes collaborative efforts is required. Moving in this direction would serve many purposes: efficiency, economy in the setting of limited resources and shrinking budgets and lastly, interaction among data collection agencies. This interaction may facilitate translation and transfer of knowledge to care-givers and patients. There are however many barriers towards such collaborative efforts including privacy, ownership and the standardization of both digital technologies and semantics. After thoroughly examining the current existing perinatal data collection among Perinatal Health Programs (PHPs), and the Canadian Perinatal Network (CPN) database, it was evident that there is little standardization of definitions. This serves as one of the most important barriers towards data sharing. To communicate effectively and share data, researchers and clinicians alike must construct a common perinatal language. Communicative tools and programs such as SNOMED CT® offer a potential solution, but still require much work due to their infancy. A standardized perinatal language would not only lay the definitional foundation in women’s health and obstetrics but also serve as a major contribution towards a universal electronic health record. Semantics Electronic health record Perinatal health programs Databases SNOMED CT(R) Interoperability
12	Standardizing our perinatal language to facilitate data sharing Massey, Kiran Angelina 05 1900 (has links) Our ultimate goal as obstetric and neonatal care providers is to improve care for mothers and their babies. Continuous quality improvement (CQI) involves iterative cycles of practice change and audit of ongoing clinical care identifying practices that are associated with good outcomes. A vital prerequisite to this evidence based medicine is data collection. In Canada, much of the country is covered by separate fragmented silos known as regional reproductive care databases or perinatal health programs. A more centralized system which includes collaborative efforts is required. Moving in this direction would serve many purposes: efficiency, economy in the setting of limited resources and shrinking budgets and lastly, interaction among data collection agencies. This interaction may facilitate translation and transfer of knowledge to care-givers and patients. There are however many barriers towards such collaborative efforts including privacy, ownership and the standardization of both digital technologies and semantics. After thoroughly examining the current existing perinatal data collection among Perinatal Health Programs (PHPs), and the Canadian Perinatal Network (CPN) database, it was evident that there is little standardization of definitions. This serves as one of the most important barriers towards data sharing. To communicate effectively and share data, researchers and clinicians alike must construct a common perinatal language. Communicative tools and programs such as SNOMED CT® offer a potential solution, but still require much work due to their infancy. A standardized perinatal language would not only lay the definitional foundation in women’s health and obstetrics but also serve as a major contribution towards a universal electronic health record. / Medicine, Faculty of / Obstetrics and Gynaecology, Department of / Graduate Semantics Electronic health record Perinatal health programs Databases SNOMED CT(R) Interoperability
13	Evaluation of Terminology servers for use with SNOMED CT Wassing, Daniel January 2020 (has links) Today, electronic healthcare still suffers from a lack of unified semantics and classifications of patient data. Electronic health records may not be properly processed across the globe, and if they are, data is often lost upon conversion. A way to tackle this problem is with proper data classification using terminologies and by using terminology servers, which map between terminologies. In this thesis the focus lies on evaluating a set of open source terminology servers on their qualities with respect to usability and performance. We evaluate them by finding out how we can use and work with the SNOMED CT terminology through the servers. This thesis uses a heuristic evaluation where it measures a set of criteria based on defined properties of the servers and artifacts related to them. This way, a candidate server for integration with COSMIC, a software solution by CAMBIO for the electronic healthcare system which aims to conform to the idea of a global electronic health record standard, is found. Computer Sciences Datavetenskap (datalogi)
14	Extracting Structured Data from Free-Text Clinical Notes : The impact of hierarchies in model training / Utvinna strukturerad data från fri-text läkaranteckningar : Påverkan av hierarkier i modelträning Omer, Mohammad January 2021 (has links) Diagnosis code assignment is a field that looks at automatically assigning diagnosis codes to free-text clinical notes. Assigning a diagnosis code to clinical notes manually needs expertise and time. Being able to do this automatically makes getting structured data from free-text clinical notes in Electronic Health Records easier. Furthermore, it can also be used as decision support for clinicians where they can input their notes and get back diagnosis codes as a second opinion. This project investigates the effects of using the hierarchies the diagnosis codes are structured in when training the diagnosis code assignment models compared to models trained with a standard loss function, binary cross-entropy. This has been done by using the hierarchy of two systems of diagnosis codes, ICD-9 and SNOMED CT, where one hierarchy is more detailed than the other. The results showed that hierarchical training increased the recall of the models regardless of what hierarchy was used. The more detailed hierarchy, SNOMED CT, increased the recall more than what the use of the less detailed ICD-9 hierarchy did. However, when using the more detailed SNOMED CT hierarchy the precision of the models decreased while the differences in precision when using the ICD-9 hierarchy was not statistically significant. The increase in recall did not make up for the decrease in precision when training with the SNOMED CT hierarchy when looking at the F1-score that is the harmonic mean of the two metrics. The conclusions from these results are that using a more detailed hierarchy increased the recall of the model more than when using a less detailed hierarchy. However, the overall performance measured in F1-score decreased when using a more detailed hierarchy since the other metric, precision, decreased by more than what recall increased. The use of a less detailed hierarchy maintained its precision giving an increase in overall performance. / Diagnoskodstilldeling är ett fält som undersöker hur man automatiskt kan tilldela diagnoskoder till fri-text läkaranteckningar. En manuell tildeling kräver expertis och mycket tid. Förmågan att göra detta automatiskt förenklar utvinning av strukturerad data från fri-text läkaranteckningar i elektroniska patientjournaler. Det kan även användas som ett hjälpverktyg för läkare där de kan skriva in sina läkaranteckningar och få tillbaka diagnoskoder som en andra åsikt. Detta arbete undersöker effekterna av att ta användning av hierarkierna diagnoskoderna är strukturerade i när man tränar modeller för diagnoskodstilldelning jämfört med att träna modellerna med en vanlig loss-funktion. Det här kommer att göras genom att använda hierarkierna av två diagnoskod-system, SNOMED CT och ICD-9, där en av hierarkierna är mer detaljerad. Resultaten visade att hierarkisk träning ökade recall för modellerna med båda hierarkierna. Den mer detaljerade hierarkien, SNOMED CT, gav en högre ökning än vad träningen med ICD-9 gjorde. Trots detta minskade precision av modellen när man den tränades med SNOMED CT hierarkin medan skillnaderna i precision när man tränade hierarkiskt med ICD-9 jämfört med vanligt inte var statistiskt signifikanta. Ökningen i recall kompenserade inte för minskningen i precision när modellen tränades med SNOMED CT hierarkien som man kan see på F1-score vilket är det harmoniska medelvärdet av de recall och precision. Slutsatserna man kan dra från de här resultaten är att en mer detaljerad hierarki kommer att öka recall mer än en mindre detaljerad hierarki ökar recall. Trots detta kommer den totala prestandan, som mäts av F1-score, försämras med en mer detaljerad hierarki eftersom att recall minskar mer än vad precision ökar. En mindre detaljerad hierarki i träning kommer bibehålla precision så att dens totala prestandan förbättras. Diagnosis Code Assignment Hierarchical Training Transformer Model BERT SNOMED CT ICD-9 MIMIC-III Diagnoskodstilldelning Hierarkisk Träning Transformermodell BERT SNOMED CT ICD-9 MIMIC-III Computer Sciences Datavetenskap (datalogi)
15	Formalizing biomedical concepts from textual definitions Petrova, Alina, Ma, Yue, Tsatsaronis, George, Kissa, Maria, Distel, Felix, Baader, Franz, Schroeder, Michael 07 January 2016 (has links) (PDF) BACKGROUND: Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. RESULTS: We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations' domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations' domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. CONCLUSIONS: The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL. Biomedizinische Ontologien formale Definitionen TU Dresden Publikationsfonds Biomedical ontologies Formal definitions MeSH Relation extraction SNOMED CT Technical University Dresden Publication funds ddc:610 rvk:XA 10000
16	Formalizing biomedical concepts from textual definitions Tsatsaronis, George, Ma, Yue, Petrova, Alina, Kissa, Maria, Distel, Felix, Baader , Franz, Schroeder, Michael 04 January 2016 (has links) (PDF) Background Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. Results We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations’ domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations’ domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. Conclusions The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL. formale Definition biomedizinische Ontologien TU Dresden Publikationsfonds Formal definitions Biomedical ontologies Relation extraction SNOMED CT MeSH Technical University Dresden Publication funds ddc:570 rvk:WH 3100
17	Formalizing biomedical concepts from textual definitions Petrova, Alina, Ma, Yue, Tsatsaronis, George, Kissa, Maria, Distel, Felix, Baader, Franz, Schroeder, Michael 07 January 2016 (has links) BACKGROUND: Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. RESULTS: We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations' domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations' domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. CONCLUSIONS: The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL. info:eu-repo/classification/ddc/610 ddc:610
18	Formalizing biomedical concepts from textual definitions: Research Article Tsatsaronis, George, Ma, Yue, Petrova, Alina, Kissa, Maria, Distel, Felix, Baader, Franz, Schroeder, Michael 04 January 2016 (has links) Background Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. Results We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations’ domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations’ domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. Conclusions The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL. info:eu-repo/classification/ddc/570 ddc:570
19	Evolution von ontologiebasierten Mappings in den Lebenswissenschaften / Evolution of ontology-based mappings in the life sciences Groß, Anika 19 March 2014 (has links) (PDF) Im Bereich der Lebenswissenschaften steht eine große und wachsende Menge heterogener Datenquellen zur Verfügung, welche häufig in quellübergreifenden Analysen und Auswertungen miteinander kombiniert werden. Um eine einheitliche und strukturierte Erfassung von Wissen sowie einen formalen Austausch zwischen verschiedenen Applikationen zu erleichtern, kommen Ontologien und andere strukturierte Vokabulare zum Einsatz. Sie finden Anwendung in verschiedenen Domänen wie der Molekularbiologie oder Chemie und dienen zumeist der Annotation realer Objekte wie z.B. Gene oder Literaturquellen. Unterschiedliche Ontologien enthalten jedoch teilweise überlappendes Wissen, so dass die Bestimmung einer Abbildung (Ontologiemapping) zwischen ihnen notwendig ist. Oft ist eine manuelle Mappingerstellung zwischen großen Ontologien kaum möglich, weshalb typischerweise automatische Verfahren zu deren Abgleich (Matching) eingesetzt werden. Aufgrund neuer Forschungserkenntnisse und Nutzeranforderungen verändern sich die Ontologien kontinuierlich weiter. Die Evolution der Ontologien hat wiederum Auswirkungen auf abhängige Daten wie beispielsweise Annotations- und Ontologiemappings, welche entsprechend aktualisiert werden müssen. Im Rahmen dieser Arbeit werden neue Methoden und Algorithmen zum Umgang mit der Evolution ontologie-basierter Mappings entwickelt. Dabei wird die generische Infrastruktur GOMMA zur Verwaltung und Analyse der Evolution von Ontologien und Mappings genutzt und erweitert. Zunächst wurde eine vergleichende Analyse der Evolution von Ontologiemappings für drei Subdomänen der Lebenswissenschaften durchgeführt. Ontologien sowie Mappings unterliegen teilweise starken Änderungen, wobei die Evolutionsintensität von der untersuchten Domäne abhängt. Insgesamt zeigt sich ein deutlicher Einfluss von Ontologieänderungen auf Ontologiemappings. Dementsprechend können bestehende Mappings infolge der Weiterentwicklung von Ontologien ungültig werden, so dass sie auf aktuelle Ontologieversionen migriert werden müssen. Dabei sollte eine aufwendige Neubestimmung der Mappings vermieden werden. In dieser Arbeit werden zwei generische Algorithmen zur (semi-) automatischen Adaptierung von Ontologiemappings eingeführt. Ein Ansatz basiert auf der Komposition von Ontologiemappings, wohingegen der andere Ansatz eine individuelle Behandlung von Ontologieänderungen zur Adaptierung der Mappings erlaubt. Beide Verfahren ermöglichen die Wiederverwendung unbeeinflusster, bereits bestätigter Mappingteile und adaptieren nur die von Änderungen betroffenen Bereiche der Mappings. Eine Evaluierung für sehr große, biomedizinische Ontologien und Mappings zeigt, dass beide Verfahren qualitativ hochwertige Ergebnisse produzieren. Ähnlich zu Ontologiemappings werden auch ontologiebasierte Annotationsmappings durch Ontologieänderungen beeinflusst. Die Arbeit stellt einen generischen Ansatz zur Bewertung der Qualität von Annotationsmappings auf Basis ihrer Evolution vor. Verschiedene Qualitätsmaße erlauben die Identifikation glaubwürdiger Annotationen beispielsweise anhand ihrer Stabilität oder Herkunftsinformationen. Eine umfassende Analyse großer Annotationsdatenquellen zeigt zahlreiche Instabilitäten z.B. aufgrund temporärer Annotationslöschungen. Dementsprechend stellt sich die Frage, inwieweit die Datenevolution zu einer Veränderung von abhängigen Analyseergebnissen führen kann. Dazu werden die Auswirkungen der Ontologie- und Annotationsevolution auf sogenannte funktionale Analysen großer biologischer Datensätze untersucht. Eine Evaluierung anhand verschiedener Stabilitätsmaße erlaubt die Bewertung der Änderungsintensität der Ergebnisse und gibt Aufschluss, inwieweit Nutzer mit einer signifikanten Veränderung ihrer Ergebnisse rechnen müssen. Darüber hinaus wird GOMMA um effiziente Verfahren für das Matching sehr großer Ontologien erweitert. Diese werden u.a. für den Abgleich neuer Konzepte während der Adaptierung von Ontologiemappings benötigt. Viele der existierenden Match-Systeme skalieren nicht für das Matching besonders großer Ontologien wie sie im Bereich der Lebenswissenschaften auftreten. Ein effizienter, kompositionsbasierter Ansatz gleicht Ontologien indirekt ab, indem existierende Mappings zu Mediatorontologien wiederverwendet und miteinander kombiniert werden. Mediatorontologien enthalten wertvolles Hintergrundwissen, so dass sich die Mappingqualität im Vergleich zu einem direkten Matching verbessern kann. Zudem werden generelle Strategien für das parallele Ontologie-Matching unter Verwendung mehrerer Rechenknoten vorgestellt. Eine größenbasierte Partitionierung der Eingabeontologien verspricht eine gute Lastbalancierung und Skalierbarkeit, da kleinere Teilaufgaben des Matchings parallel verarbeitet werden können. Die Evaluierung im Rahmen der Ontology Alignment Evaluation Initiative (OAEI) vergleicht GOMMA und andere Systeme für das Matching von Ontologien in verschiedenen Domänen. GOMMA kann u.a. durch Anwendung des parallelen und kompositionsbasierten Matchings sehr gute Ergebnisse bezüglich der Effektivität und Effizienz des Matchings, insbesondere für Ontologien aus dem Bereich der Lebenswissenschaften, erreichen. / In the life sciences, there is an increasing number of heterogeneous data sources that need to be integrated and combined in comprehensive analysis tasks. Often ontologies and other structured vocabularies are used to provide a formal representation of knowledge and to facilitate data exchange between different applications. Ontologies are used in different domains like molecular biology or chemistry. One of their most important applications is the annotation of real-world objects like genes or publications. Since different ontologies can contain overlapping knowledge it is necessary to determine mappings between them (ontology mappings). A manual mapping creation can be very time-consuming or even infeasible such that (semi-) automatic ontology matching methods are typically applied. Ontologies are not static but underlie continuous modifications due to new research insights and changing user requirements. The evolution of ontologies can have impact on dependent data like annotation or ontology mappings. This thesis presents novel methods and algorithms to deal with the evolution of ontology-based mappings. Thereby the generic infrastructure GOMMA is used and extended to manage and analyze the evolution of ontologies and mappings. First, a comparative evolution analysis for ontologies and mappings from three life science domains shows heavy changes in ontologies and mappings as well as an impact of ontology changes on the mappings. Hence, existing ontology mappings can become invalid and need to be migrated to current ontology versions. Thereby an expensive redetermination of the mappings should be avoided. This thesis introduces two generic algorithms to (semi-) automatically adapt ontology mappings: (1) a composition-based adaptation relies on the principle of mapping composition, and (2) a diff-based adaptation algorithm allows for individually handling change operations to update mappings. Both approaches reuse unaffected mapping parts, and adapt only affected parts of the mappings. An evaluation for very large biomedical ontologies and mappings shows that both approaches produce ontology mappings of high quality. Similarly, ontology changes may also affect ontology-based annotation mappings. The thesis introduces a generic evaluation approach to assess the quality of annotation mappings based on their evolution. Different quality measures allow for the identification of reliable annotations, e.g., based on their stability or provenance information. A comprehensive analysis of large annotation data sources shows numerous instabilities, e.g., due to the temporary absence of annotations. Such modifications may influence results of dependent applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. The question arises to what degree ontology and annotation changes may affect such analyses. Based on different stability measures the evaluation assesses change intensities of application results and gives insights whether users need to expect significant changes of their analysis results. Moreover, GOMMA is extended by large-scale ontology matching techniques. Such techniques are useful, a.o., to match new concepts during ontology mapping adaptation. Many existing match systems do not scale for aligning very large ontologies, e.g., from the life science domain. One efficient composition-based approach indirectly computes ontology mappings by reusing and combining existing mappings to intermediate ontologies. Intermediate ontologies can contain useful background knowledge such that the mapping quality can be improved compared to a direct match approach. Moreover, the thesis introduces general strategies for matching ontologies in parallel using several computing nodes. A size-based partitioning of the input ontologies enables good load balancing and scalability since smaller match tasks can be processed in parallel. The evaluation of the Ontology Alignment Evaluation Initiative (OAEI) compares GOMMA and other systems in terms of matching ontologies from different domains. Using the parallel and composition-based matching, GOMMA can achieve very good results w.r.t. efficiency and effectiveness, especially for ontologies from the life science domain. Ontologien Mappings Ontologie-Mapping Evolution Änderungen Mappingevolution Ontologieevolution Adaptierung Migration Ontologie-Matching Komposition Mediatorontologie Lebenswissenschaften Biomedizin Annotationen funktionale Analysen UMLS FMA SNOMED CT Adult Mouse Anatomy Gene Ontology ontology ontologies ontology mapping ontology alignment ontology evolution mapping migration mapping change mapping composition ontology matching ontology change ontology development mapping adaptationmediator ontology biomedical ontologies UMLS FMA SNOMED CT Adult Mouse Anatomy Gene Ontology functional annotations term enrichment analysis ddc:570
20	Evolution von ontologiebasierten Mappings in den Lebenswissenschaften Groß, Anika 05 March 2014 (has links) Im Bereich der Lebenswissenschaften steht eine große und wachsende Menge heterogener Datenquellen zur Verfügung, welche häufig in quellübergreifenden Analysen und Auswertungen miteinander kombiniert werden. Um eine einheitliche und strukturierte Erfassung von Wissen sowie einen formalen Austausch zwischen verschiedenen Applikationen zu erleichtern, kommen Ontologien und andere strukturierte Vokabulare zum Einsatz. Sie finden Anwendung in verschiedenen Domänen wie der Molekularbiologie oder Chemie und dienen zumeist der Annotation realer Objekte wie z.B. Gene oder Literaturquellen. Unterschiedliche Ontologien enthalten jedoch teilweise überlappendes Wissen, so dass die Bestimmung einer Abbildung (Ontologiemapping) zwischen ihnen notwendig ist. Oft ist eine manuelle Mappingerstellung zwischen großen Ontologien kaum möglich, weshalb typischerweise automatische Verfahren zu deren Abgleich (Matching) eingesetzt werden. Aufgrund neuer Forschungserkenntnisse und Nutzeranforderungen verändern sich die Ontologien kontinuierlich weiter. Die Evolution der Ontologien hat wiederum Auswirkungen auf abhängige Daten wie beispielsweise Annotations- und Ontologiemappings, welche entsprechend aktualisiert werden müssen. Im Rahmen dieser Arbeit werden neue Methoden und Algorithmen zum Umgang mit der Evolution ontologie-basierter Mappings entwickelt. Dabei wird die generische Infrastruktur GOMMA zur Verwaltung und Analyse der Evolution von Ontologien und Mappings genutzt und erweitert. Zunächst wurde eine vergleichende Analyse der Evolution von Ontologiemappings für drei Subdomänen der Lebenswissenschaften durchgeführt. Ontologien sowie Mappings unterliegen teilweise starken Änderungen, wobei die Evolutionsintensität von der untersuchten Domäne abhängt. Insgesamt zeigt sich ein deutlicher Einfluss von Ontologieänderungen auf Ontologiemappings. Dementsprechend können bestehende Mappings infolge der Weiterentwicklung von Ontologien ungültig werden, so dass sie auf aktuelle Ontologieversionen migriert werden müssen. Dabei sollte eine aufwendige Neubestimmung der Mappings vermieden werden. In dieser Arbeit werden zwei generische Algorithmen zur (semi-) automatischen Adaptierung von Ontologiemappings eingeführt. Ein Ansatz basiert auf der Komposition von Ontologiemappings, wohingegen der andere Ansatz eine individuelle Behandlung von Ontologieänderungen zur Adaptierung der Mappings erlaubt. Beide Verfahren ermöglichen die Wiederverwendung unbeeinflusster, bereits bestätigter Mappingteile und adaptieren nur die von Änderungen betroffenen Bereiche der Mappings. Eine Evaluierung für sehr große, biomedizinische Ontologien und Mappings zeigt, dass beide Verfahren qualitativ hochwertige Ergebnisse produzieren. Ähnlich zu Ontologiemappings werden auch ontologiebasierte Annotationsmappings durch Ontologieänderungen beeinflusst. Die Arbeit stellt einen generischen Ansatz zur Bewertung der Qualität von Annotationsmappings auf Basis ihrer Evolution vor. Verschiedene Qualitätsmaße erlauben die Identifikation glaubwürdiger Annotationen beispielsweise anhand ihrer Stabilität oder Herkunftsinformationen. Eine umfassende Analyse großer Annotationsdatenquellen zeigt zahlreiche Instabilitäten z.B. aufgrund temporärer Annotationslöschungen. Dementsprechend stellt sich die Frage, inwieweit die Datenevolution zu einer Veränderung von abhängigen Analyseergebnissen führen kann. Dazu werden die Auswirkungen der Ontologie- und Annotationsevolution auf sogenannte funktionale Analysen großer biologischer Datensätze untersucht. Eine Evaluierung anhand verschiedener Stabilitätsmaße erlaubt die Bewertung der Änderungsintensität der Ergebnisse und gibt Aufschluss, inwieweit Nutzer mit einer signifikanten Veränderung ihrer Ergebnisse rechnen müssen. Darüber hinaus wird GOMMA um effiziente Verfahren für das Matching sehr großer Ontologien erweitert. Diese werden u.a. für den Abgleich neuer Konzepte während der Adaptierung von Ontologiemappings benötigt. Viele der existierenden Match-Systeme skalieren nicht für das Matching besonders großer Ontologien wie sie im Bereich der Lebenswissenschaften auftreten. Ein effizienter, kompositionsbasierter Ansatz gleicht Ontologien indirekt ab, indem existierende Mappings zu Mediatorontologien wiederverwendet und miteinander kombiniert werden. Mediatorontologien enthalten wertvolles Hintergrundwissen, so dass sich die Mappingqualität im Vergleich zu einem direkten Matching verbessern kann. Zudem werden generelle Strategien für das parallele Ontologie-Matching unter Verwendung mehrerer Rechenknoten vorgestellt. Eine größenbasierte Partitionierung der Eingabeontologien verspricht eine gute Lastbalancierung und Skalierbarkeit, da kleinere Teilaufgaben des Matchings parallel verarbeitet werden können. Die Evaluierung im Rahmen der Ontology Alignment Evaluation Initiative (OAEI) vergleicht GOMMA und andere Systeme für das Matching von Ontologien in verschiedenen Domänen. GOMMA kann u.a. durch Anwendung des parallelen und kompositionsbasierten Matchings sehr gute Ergebnisse bezüglich der Effektivität und Effizienz des Matchings, insbesondere für Ontologien aus dem Bereich der Lebenswissenschaften, erreichen. / In the life sciences, there is an increasing number of heterogeneous data sources that need to be integrated and combined in comprehensive analysis tasks. Often ontologies and other structured vocabularies are used to provide a formal representation of knowledge and to facilitate data exchange between different applications. Ontologies are used in different domains like molecular biology or chemistry. One of their most important applications is the annotation of real-world objects like genes or publications. Since different ontologies can contain overlapping knowledge it is necessary to determine mappings between them (ontology mappings). A manual mapping creation can be very time-consuming or even infeasible such that (semi-) automatic ontology matching methods are typically applied. Ontologies are not static but underlie continuous modifications due to new research insights and changing user requirements. The evolution of ontologies can have impact on dependent data like annotation or ontology mappings. This thesis presents novel methods and algorithms to deal with the evolution of ontology-based mappings. Thereby the generic infrastructure GOMMA is used and extended to manage and analyze the evolution of ontologies and mappings. First, a comparative evolution analysis for ontologies and mappings from three life science domains shows heavy changes in ontologies and mappings as well as an impact of ontology changes on the mappings. Hence, existing ontology mappings can become invalid and need to be migrated to current ontology versions. Thereby an expensive redetermination of the mappings should be avoided. This thesis introduces two generic algorithms to (semi-) automatically adapt ontology mappings: (1) a composition-based adaptation relies on the principle of mapping composition, and (2) a diff-based adaptation algorithm allows for individually handling change operations to update mappings. Both approaches reuse unaffected mapping parts, and adapt only affected parts of the mappings. An evaluation for very large biomedical ontologies and mappings shows that both approaches produce ontology mappings of high quality. Similarly, ontology changes may also affect ontology-based annotation mappings. The thesis introduces a generic evaluation approach to assess the quality of annotation mappings based on their evolution. Different quality measures allow for the identification of reliable annotations, e.g., based on their stability or provenance information. A comprehensive analysis of large annotation data sources shows numerous instabilities, e.g., due to the temporary absence of annotations. Such modifications may influence results of dependent applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. The question arises to what degree ontology and annotation changes may affect such analyses. Based on different stability measures the evaluation assesses change intensities of application results and gives insights whether users need to expect significant changes of their analysis results. Moreover, GOMMA is extended by large-scale ontology matching techniques. Such techniques are useful, a.o., to match new concepts during ontology mapping adaptation. Many existing match systems do not scale for aligning very large ontologies, e.g., from the life science domain. One efficient composition-based approach indirectly computes ontology mappings by reusing and combining existing mappings to intermediate ontologies. Intermediate ontologies can contain useful background knowledge such that the mapping quality can be improved compared to a direct match approach. Moreover, the thesis introduces general strategies for matching ontologies in parallel using several computing nodes. A size-based partitioning of the input ontologies enables good load balancing and scalability since smaller match tasks can be processed in parallel. The evaluation of the Ontology Alignment Evaluation Initiative (OAEI) compares GOMMA and other systems in terms of matching ontologies from different domains. Using the parallel and composition-based matching, GOMMA can achieve very good results w.r.t. efficiency and effectiveness, especially for ontologies from the life science domain. info:eu-repo/classification/ddc/570 ddc:570

Search results