11 |
Using Background Knowledge to Enhance Biomedical Ontology Matching / Utilisation des ressources de connaissances externes pour améliorer l'alignement d'ontologies biomédicalesAnnane, Amina 29 October 2018 (has links)
Les sciences de la vie produisent de grandes masses de données (par exemple, des essais cliniques et des articles scientifiques). L'intégration et l'analyse des différentes bases de données liées à la même question de recherche, par exemple la corrélation entre phénotypes et génotypes, sont essentielles pour découvrir de nouvelles connaissances. Pour cela, la communauté des sciences de la vie a adopté les techniques du Web sémantique pour réaliser l'intégration et l'interopérabilité des données, en particulier les ontologies. En effet, les ontologies représentent la brique de base pour représenter et partager la quantité croissante de données sur le Web. Elles fournissent un vocabulaire commun pour les humains, et des définitions d'entités formelles pour les machines.Un grand nombre d'ontologies et de terminologies biomédicales a été développé pour représenter et annoter les différentes bases de données existantes. Cependant, celles qui sont représentées avec différentes ontologies qui se chevauchent, c'est à dire qui ont des parties communes, ne sont pas interopérables. Il est donc crucial d'établir des correspondances entre les différentes ontologies utilisées, ce qui est un domaine de recherche actif connu sous le nom d'alignement d'ontologies.Les premières méthodes d'alignement d'ontologies exploitaient principalement le contenu lexical et structurel des ontologies à aligner. Ces méthodes sont moins efficaces lorsque les ontologies à aligner sont fortement hétérogènes lexicalement, c'est à dire lorsque des concepts équivalents sont décrits avec des labels différents. Pour pallier à ce problème, la communauté d'alignement d'ontologies s'est tournée vers l'utilisation de ressources de connaissance externes en tant que pont sémantique entre les ontologies à aligner. Cette approche soulève plusieurs nouvelles questions de recherche, notamment : (1) la sélection des ressources de connaissance à utiliser, (2) l'exploitation des ressources sélectionnées pour améliorer le résultat d'alignement. Plusieurs travaux de recherche ont traité ces problèmes conjointement ou séparément. Dans notre thèse, nous avons fait une revue systématique et une comparaison des méthodes proposées dans la littérature. Puis, nous nous sommes intéressés aux deux questions.Les ontologies, autres que celles à aligner, sont les ressources de connaissance externes (Background Knowledge : BK) les plus utilisées. Les travaux apparentés sélectionnent souvent un ensemble d'ontologies complètes en tant que BK même si, seuls des fragments des ontologies sélectionnées sont réellement efficaces pour découvrir de nouvelles correspondances. Nous proposons une nouvelle approche qui sélectionne et construit une ressource de connaissance à partir d'un ensemble d'ontologies. La ressource construite, d'une taille réduite, améliore, comme nous le démontrons, l'efficience et l'efficacité du processus d'alignement basé sur l'exploitation de BK.L'exploitation de BK dans l'alignement d'ontologies est une épée à double tranchant : bien qu'elle puisse augmenter le rappel (i.e., aider à trouver plus de correspondances correctes), elle peut réduire la précision (i.e., générer plus de correspondances incorrectes). Afin de faire face à ce problème, nous proposons deux méthodes pour sélectionner les correspondances les plus pertinentes parmi les candidates qui se basent sur : (1) un ensemble de règles et (2) l'apprentissage automatique supervisé. Nous avons expérimenté et évalué notre approche dans le domaine biomédical, grâce à la profusion de ressources de connaissances en biomédecine (ontologies, terminologies et alignements existants). Nous avons effectué des expériences intensives sur deux benchmarks de référence de la campagne d'évaluation de l'alignement d'ontologie (OAEI). Nos résultats confirment l'efficacité et l'efficience de notre approche et dépassent ou rivalisent avec les meilleurs résultats obtenus. / Life sciences produce a huge amount of data (e.g., clinical trials, scientific articles) so that integrating and analyzing all the datasets related to a given research question like the correlation between phenotypes and genotypes, is a key element for knowledge discovery. The life sciences community adopted Semantic Web technologies to achieve data integration and interoperability, especially ontologies which are the key technology to represent and share the increasing amount of data on the Web. Indeed, ontologies provide a common domain vocabulary for humans, and formal entity definitions for machines.A large number of biomedical ontologies and terminologies has been developed to represent and annotate various datasets. However, datasets represented with different overlapping ontologies are not interoperable. It is therefore crucial to establish correspondences between the ontologies used; an active area of research known as ontology matching.Original ontology matching methods usually exploit the lexical and structural content of the ontologies to align. These methods are less effective when the ontologies to align are lexically heterogeneous i.e., when equivalent concepts are described with different labels. To overcome this issue, the ontology matching community has turned to the use of external knowledge resources as a semantic bridge between the ontologies to align. This approach arises several new issues mainly: (1) the selection of these background resources, (2) the exploitation of the selected resources to enhance the matching results. Several works have dealt with these issues jointly or separately. In our thesis, we made a systematic review and historical evaluation comparison of state-of-the-art approaches.Ontologies, others than the ones to align, are the most used background knowledge resources. Related works often select a set of complete ontologies as background knowledge, even if, only fragments of the selected ontologies are actually effective for discovering new mappings. We propose a novel BK-based ontology matching approach that selects and builds a knowledge resource with just the right concepts chosen from a set of ontologies. The conducted experiments showed that our BK selection approach improves efficiency without loss of effectiveness.Exploiting background knowledge resources in ontology matching is a double-edged sword: while it may increase recall (i.e., retrieve more correct mappings), it may lower precision (i.e., produce more incorrect mappings). We propose two methods to select the most relevant mappings from the candidate ones: (1) based on a set of rules and (2) with Supervised Machine Learning. We experiment and evaluate our approach in the biomedical domain, thanks to the profusion of knowledge resources in biomedicine (ontologies, terminologies and existing alignments).We evaluated our approach with extensive experiments on two Ontology Alignment Evaluation Initiative (OAEI) benchmarks. Our results confirm the effectiveness and efficiency of our approach and overcome or compete with state-of-the-art matchers exploiting background knowledge resources.
|
12 |
Knowledge Integration and Representation for Biomedical AnalysisAlachram, Halima 04 February 2021 (has links)
No description available.
|
13 |
Formalizing biomedical concepts from textual definitionsPetrova, Alina, Ma, Yue, Tsatsaronis, George, Kissa, Maria, Distel, Felix, Baader, Franz, Schroeder, Michael 07 January 2016 (has links) (PDF)
BACKGROUND:
Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions.
RESULTS:
We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations' domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations' domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions.
CONCLUSIONS:
The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL.
|
14 |
Towards the French Biomedical Ontology Enrichment / Vers l'enrichissement d'ontologies biomédicales françaisesLossio-Ventura, Juan Antonio 09 November 2015 (has links)
En biomedicine, le domaine du « Big Data » (l'infobésité) pose le problème de l'analyse de gros volumes de données hétérogènes (i.e. vidéo, audio, texte, image). Les ontologies biomédicales, modèle conceptuel de la réalité, peuvent jouer un rôle important afin d'automatiser le traitement des données, les requêtes et la mise en correspondance des données hétérogènes. Il existe plusieurs ressources en anglais mais elles sont moins riches pour le français. Le manque d'outils et de services connexes pour les exploiter accentue ces lacunes. Dans un premier temps, les ontologies ont été construites manuellement. Au cours de ces dernières années, quelques méthodes semi-automatiques ont été proposées. Ces techniques semi-automatiques de construction/enrichissement d'ontologies sont principalement induites à partir de textes en utilisant des techniques du traitement du langage naturel (TALN). Les méthodes de TALN permettent de prendre en compte la complexité lexicale et sémantique des données biomédicales : (1) lexicale pour faire référence aux syntagmes biomédicaux complexes à considérer et (2) sémantique pour traiter l'induction du concept et du contexte de la terminologie. Dans cette thèse, afin de relever les défis mentionnés précédemment, nous proposons des méthodologies pour l'enrichissement/la construction d'ontologies biomédicales fondées sur deux principales contributions.La première contribution est liée à l'extraction automatique de termes biomédicaux spécialisés (complexité lexicale) à partir de corpus. De nouvelles mesures d'extraction et de classement de termes composés d'un ou plusieurs mots ont été proposées et évaluées. L'application BioTex implémente les mesures définies.La seconde contribution concerne l'extraction de concepts et le lien sémantique de la terminologie extraite (complexité sémantique). Ce travail vise à induire des concepts pour les nouveaux termes candidats et de déterminer leurs liens sémantiques, c'est-à-dire les positions les plus pertinentes au sein d'une ontologie biomédicale existante. Nous avons ainsi proposé une approche d'extraction de concepts qui intègre de nouveaux termes dans l'ontologie MeSH. Les évaluations, quantitatives et qualitatives, menées par des experts et non experts, sur des données réelles soulignent l'intérêt de ces contributions. / Big Data for biomedicine domain deals with a major issue, the analyze of large volume of heterogeneous data (e.g. video, audio, text, image). Ontology, conceptual models of the reality, can play a crucial role in biomedical to automate data processing, querying, and matching heterogeneous data. Various English resources exist but there are considerably less available in French and there is a strong lack of related tools and services to exploit them. Initially, ontologies were built manually. In recent years, few semi-automatic methodologies have been proposed. The semi-automatic construction/enrichment of ontologies are mostly induced from texts by using natural language processing (NLP) techniques. NLP methods have to take into account lexical and semantic complexity of biomedical data : (1) lexical refers to complex phrases to take into account, (2) semantic refers to sense and context induction of the terminology.In this thesis, we propose methodologies for enrichment/construction of biomedical ontologies based on two main contributions, in order to tackle the previously mentioned challenges. The first contribution is about the automatic extraction of specialized biomedical terms (lexical complexity) from corpora. New ranking measures for single- and multi-word term extraction methods have been proposed and evaluated. In addition, we present BioTex software that implements the proposed measures. The second contribution concerns the concept extraction and semantic linkage of the extracted terminology (semantic complexity). This work seeks to induce semantic concepts of new candidate terms, and to find the semantic links, i.e. relevant location of new candidate terms, in an existing biomedical ontology. We proposed a methodology that extracts new terms in MeSH ontology. The experiments conducted on real data highlight the relevance of the contributions.
|
15 |
Formalizing biomedical concepts from textual definitionsTsatsaronis, George, Ma, Yue, Petrova, Alina, Kissa, Maria, Distel, Felix, Baader , Franz, Schroeder, Michael 04 January 2016 (has links) (PDF)
Background
Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions.
Results
We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations’ domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations’ domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions.
Conclusions
The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL.
|
16 |
Formalizing biomedical concepts from textual definitionsPetrova, Alina, Ma, Yue, Tsatsaronis, George, Kissa, Maria, Distel, Felix, Baader, Franz, Schroeder, Michael 07 January 2016 (has links)
BACKGROUND:
Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions.
RESULTS:
We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations' domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations' domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions.
CONCLUSIONS:
The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL.
|
17 |
Formalizing biomedical concepts from textual definitions: Research ArticleTsatsaronis, George, Ma, Yue, Petrova, Alina, Kissa, Maria, Distel, Felix, Baader, Franz, Schroeder, Michael 04 January 2016 (has links)
Background
Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions.
Results
We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations’ domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations’ domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions.
Conclusions
The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL.
|
18 |
Investigation and application of artificial intelligence algorithms for complexity metrics based classification of semantic web ontologiesKoech, Gideon Kiprotich 11 1900 (has links)
M. Tech. (Department of Information Technology, Faculty of Applied and Computer Sciences), Vaal University of Technology. / The increasing demand for knowledge representation and exchange on the semantic web has resulted in an increase in both the number and size of ontologies. This increased features in ontologies has made them more complex and in turn difficult to select, reuse and maintain them. Several ontology evaluations and ranking tools have been proposed recently. Such evaluation tools provide a metrics suite that evaluates the content of an ontology by analysing their schemas and instances. The presence of ontology metric suites may enable classification techniques in placing the ontologies in various categories or classes. Machine Learning algorithms mostly based on statistical methods used in classification of data makes them the perfect tools to be used in performing classification of ontologies.
In this study, popular Machine Learning algorithms including K-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forest, Naïve Bayes, Linear Regression and Logistic Regression were used in the classification of ontologies based on their complexity metrics. A total of 200 biomedical ontologies were downloaded from the Bio Portal repository. Ontology metrics were then generated using the OntoMetrics tool, an online ontology evaluation platform. These metrics constituted the dataset used in the implementation of the machine learning algorithms.
The results obtained were evaluated with performance evaluation techniques, namely, precision, recall, F-Measure Score and Receiver Operating Characteristic (ROC) curves. The Overall accuracy scores for K-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forest, Naïve Bayes, Logistic Regression and Linear Regression algorithms were 66.67%, 65%, 98%, 99.29%, 74%, 64.67%, and 57%, respectively. From these scores, Decision Trees and Random Forests algorithms were the best performing and can be attributed to the ability to handle multiclass classifications.
|
19 |
Towards an Ontology-Based Phenotypic Query ModelBeger, Christoph, Matthies, Franz, Schäfermeier, Ralph, Kirsten, Toralf, Herre, Heinrich, Uciteli, Alexandr 10 October 2023 (has links)
Clinical research based on data from patient or study data management systems plays an
important role in transferring basic findings into the daily practices of physicians. To support study
recruitment, diagnostic processes, and risk factor evaluation, search queries for such management
systems can be used. Typically, the query syntax as well as the underlying data structure vary
greatly between different data management systems. This makes it difficult for domain experts (e.g.,
clinicians) to build and execute search queries. In this work, the Core Ontology of Phenotypes is used
as a general model for phenotypic knowledge. This knowledge is required to create search queries
that determine and classify individuals (e.g., patients or study participants) whose morphology,
function, behaviour, or biochemical and physiological properties meet specific phenotype classes. A
specific model describing a set of particular phenotype classes is called a Phenotype Specification
Ontology. Such an ontology can be automatically converted to search queries on data management
systems. The methods described have already been used successfully in several projects. Using
ontologies to model phenotypic knowledge on patient or study data management systems is a viable
approach. It allows clinicians to model from a domain perspective without knowing the actual data
structure or query language.
|
20 |
Polynomial-Time Reasoning Support for Design and Maintenance of Large-Scale Biomedical OntologiesSuntisrivaraporn, Boontawee 05 February 2009 (has links) (PDF)
Description Logics (DLs) belong to a successful family of knowledge representation formalisms with two key assets: formally well-defined semantics which allows to represent knowledge in an unambiguous way and automated reasoning which allows to infer implicit knowledge from the one given explicitly. This thesis investigates various reasoning techniques for tractable DLs in the EL family which have been implemented in the CEL system. It suggests that the use of the lightweight DLs, in which reasoning is tractable, is beneficial for ontology design and maintenance both in terms of expressivity and scalability. The claim is supported by a case study on the renown medical ontology SNOMED CT and extensive empirical evaluation on several large-scale biomedical ontologies.
|
Page generated in 0.0709 seconds