Global ETD Search

1	Vers un système intelligent de capitalisation de connaissances pour l'agriculture durable : construction d'ontologies agricoles par transformation de sources existantes / Towards an intelligent knowledge capitalization for sustainable agriculture : agricultural building ontologies by transforming existing sources Amarger, Fabien 18 December 2015 (has links) Les données disponibles sur le Web sont généralement de deux natures : (1) des données non structurées ou semi-structurées difficilement exploitables de manière automatique ou (2) des données structurées destinées à une utilisation particulière, difficilement réutilisables par d’autres applications. Le Web de données est une application du Web sémantique facilitant l’accès, le partage et l’alignement des données. Il existe actuellement de très nombreuses données disponibles sur le Web, mais qui ne sont pas publiées en suivant les principes du Web de données liées. Elles nécessiteraient d’être transformées en bases de connaissances. Nous proposons une méthodologie innovante qui permet de transformer plusieurs sources simultanément et non séquentiellement. Cette méthodologie permet la fusion de plusieurs sources de données orientée par des patrons de conception du domaine. Notre méthodologie spécifie la modélisation attendue du domaine en définissant la partie haute d’un module ontologique. Une chaîne de processus enrichit ce module par des éléments issus des sources : transformation syntaxique des sources, alignement, identification des éléments équivalents pour construire des candidats, calcul de score de confiance des candidats, filtrage des candidats. Notre travail part de l’hypothèse suivante : si un élément apparaît dans plusieurs sources, alors la possibilité qu’il appartienne au domaine d’étude est accrue. Nous avons défini différentes fonctions de calcul de la confiance consensuelle d’un candidat en mettant en évidence plusieurs caractéristiques comme le consensus entre sources ou la connectivité entre éléments d’un même candidat. Nous posons une deuxième hypothèse : un élément ne doit apparaître que dans un seul candidat pour obtenir une modélisation correcte. Cette hypothèse nous amène à définir la notion d’incompatibilité entre candidats. Nous pouvons considérer alors l’extraction des candidats qui ne partagent pas d’éléments, ce qui permet de faciliter le travail de validation. Pour évaluer nos propositions, nous avons mené trois expérimentations. La première a porté sur le domaine de la classification taxonomique des blés. Cette expérimentation nous a permis d’analyser la qualité des candidats générés avec l’aide de trois experts du domaine. La deuxième expérimentation a porté sur le même domaine et nous a permis de valider le temps gagné par un expert lors de la validation des candidats en considérant les incompatibilités. Pour la dernière expérimentation nous avons utilisé les données d’une campagne d’évaluation de systèmes d’alignements. Nous avons adaptés ces données pour évaluer la génération de candidats et la définition du score de confiance sur un grand jeu de données. Nous proposons une implémentation de cette proposition dans un outil réutilisable et paramétrable : Muskca. Celui-ci permet la fusion multi-sources pour la génération d’une base de connaissances consensuelle. L’application de nos travaux dans le domaine de l’agriculture nous a permis de constituer une base de connaissances sur la taxonomie des plantes. Cette base de connaissances permettra la représentation d’observations des attaques des agresseurs sur les cultures, ainsi que les techniques de traitement des agresseurs. Cette base de connaissances permettra de publier les données disponibles mais aussi d’annoter les nombreux documents mobilisables pour faire évoluer les pratiques agricoles. / The data available on the Web are generally of two kinds: (1) non structured data or semi structured data, which are difficult to exploit automatically; or (2) structured data, dedicated to a specific usage, which are difficult to reuse for a different application. The Linked Open Data is a Semantic Web application facilitating access, share ability and alignment of data. There are many data available on the Web, but these are not always published using the Linked Open Data theory and thus need to be transformed into knowledge bases. An innovative methodology is proposed in this work: one that transforms several sources simultaneously, not sequentially. This methodology merges several data sources oriented by domain design patterns and defines the expected domain representation using the upper part of an ontological module. A process chain enriches this module with elements from the sources: syntactic transformation of the sources, alignment, identification of equivalent elements for the construction of candidates, computation of the candidates’ trust scores and candidate filtering. This work is based on the following hypothesis: if an element appears in several sources then the possibility that it belongs to the studied domain is increased. Several functions were defined in order to compute the consensual trust score of a specific candidate by bringing out such characteristics as the consensus between the sources or the connectivity between the elements within a given candidate. A second hypothesis is put forward: to obtain a valid design, an element must be part of one candidate only. This hypothesis resulted in the definition of the notion of incompatibility between the candidates. The extraction of the candidates that do not share elements can then be considered, which made the experts’ validation task easier. To evaluate the proposals, three experiments were conducted. The first one dealt with the taxonomic classification of wheat. With the assistance of three experts, this experiment made for the analysis of the validation of the generated candidates. The second experiment, still in the same domain, lead to the evaluation of the time an expert saved using the notion of incompatibility during the validation of the candidates. As for the last experiment, the data from an evaluation campaign of alignment systems were used. These data had to be adapted to evaluate the generation of the candidates and the definition of the consensual trust score on a large data set. These three proposals were implemented in a new reusable and configurable tool: Muskca. This tool allows a multi-source fusion for the generation of a consensual knowledge base. This methodology was applied to agriculture, which allowed the creation of a knowledge base on plant taxonomy. The knowledge base will be used to represent the observations of pest attacks on crops along with pest treatment techniques. Not only will this knowledge base help the publication of the available data but it will also allow the annotation of the various documents that will be used, so as to improve agricultural practices. Ingénierie des connaissances Construction d'ontologie Fusion de connaissances Enrichissement d'ontologies Modules ontologiques Knowledge engineering Ontology construction Knowledge fusion Ontology enrichment Ontological module
2	Améliorer l'interopérabilité sémantique : applicabilité et utilité de l'alignement d'ontologies / Enhancing the semantic interoperability : applicability and utility of the ontology alignment Hamdi, Fayçal 02 December 2011 (has links) Dans cette thèse, nous présentons des approches d’adaptation d’un processus d’alignement aux caractéristiques des ontologies alignées, qu'il s'agisse de caractéristiques quantitatives telles que leur volume ou de caractéristiques particulières liées par exemple à la façon dont les labels des concepts sont construits. Concernant les caractéristiques quantitatives, nous proposons deux méthodes de partitionnement d'ontologies qui permettent l’alignement des ontologies très volumineuses. Ces deux méthodes génèrent, en entrée du processus d'alignement, des sous ensembles de taille raisonnable des deux ontologies à aligner en prenant en compte dès le départ l'objectif d'alignement dans le processus de partitionnement.Concernant les caractéristiques particulières des ontologies alignées, nous présentons l’environnement TaxoMap Framework qui permet la spécification de traitements de raffinement à partir de primitives prédéfinies. Nous proposons un langage de patrons MPL (the Mapping Pattern Language) que nous utilisons pour spécifier les traitements de raffinement.En plus des approches d’adaptation aux caractéristiques des ontologies alignées, nous présentons des approches de réutilisation des résultats d'alignement pour l'ingénierie ontologique. Nous nous focalisons plus particulièrement sur l'utilisation de l'alignement pour l'enrichissement d'ontologies. Nous étudions l'apport des techniques d'alignement pour l'enrichissement et l'impact des caractéristiques de la ressource externe utilisée comme source d'enrichissement. Enfin, nous présentons la façon dont l'environnement TaxoMap Framework a été implémenté et les expérimentations réalisées : des tests sur le module d'alignement TaxoMap, sur l'approche de raffinement de mappings, sur les méthodes de partitionnement d'ontologies de très grande taille et sur l'approche d'enrichissement d'ontologies. / In this thesis, we present approaches for adapting an alignment process to the characteristics of the aligned ontologies, whether in respect of the quantitative characteristics such as their volume or the particular characteristics related for example to the way in which the labels of the concepts are built.Concerning the quantitative characteristics, we propose two ontology partitioning methods that make the alignment of very large ontologies possible. Both methods generate in the input of the alignment process, subsets of reasonable size of the two ontologies to be aligned by taking into account, as soon as possible, the alignment objective in the partitioning process.Concerning the particular characteristics of the aligned ontologies, we present the TaxoMap Framework environment that allows the specification of refinement treatments from predefined primitives. We propose a pattern language MPL (the Mapping Pattern Language) that we use to specify the refinement treatments.In addition to the adaptation approaches to the characteristics of the aligned ontologies, we present approaches for re-using the alignment results for the ontological engineering. We focus specifically on the use of the alignment for the ontology enrichment. We study the contribution of the alignment techniques for the enrichment and the impact of the characteristics of the external resource used as a source of enrichment.Finally, we present how the TaxoMap Framework environment was implemented and the performed experiments: tests on the TaxoMap alignment module, on the mappings refinement approach, on the partitioning methods of very large ontologies and on the ontology enrichment approach. Web sémantique Alignement d’ontologies Raffinement de mappings Partitionnement d’ontologies Enrichissement d’ontologies Semantic Web Ontology alignment Mapping refinement Ontology partitioning Ontology enrichment
3	Integration of Genome Scale Data for Identifying New Biomarkers in Colon Cancer: Integrated Analysis of Transcriptomics and Epigenomics Data from High Throughput Technologies in Order to Identifying New Biomarkers Genes for Personalised Targeted Therapies for Patients Suffering from Colon Cancer Hassan, Aamir Ul January 2017 (has links) Colorectal cancer is the third most common cancer and the leading cause of cancer deaths in Western industrialised countries. Despite recent advances in the screening, diagnosis, and treatment of colorectal cancer, an estimated 608,000 people die every year due to colon cancer. Our current knowledge of colorectal carcinogenesis indicates a multifactorial and multi-step process that involves various genetic alterations and several biological pathways. The identification of molecular markers with early diagnostic and precise clinical outcome in colon cancer is a challenging task because of tumour heterogeneity. This Ph.D.-thesis presents the molecular and cellular mechanisms leading to colorectal cancer. A systematical review of the literature is conducted on Microarray Gene expression profiling, gene ontology enrichment analysis, microRNA and system Biology and various bioinformatics tools. We aimed this study to stratify a colon tumour into molecular distinct subtypes, identification of novel diagnostic targets and prediction of reliable prognostic signatures for clinical practice using microarray expression datasets. We performed an integrated analysis of gene expression data based on genetic, epigenetic and extensive clinical information using unsupervised learning, correlation and functional network analysis. As results, we identified 267-gene and 124-gene signatures that can distinguish normal, primary and metastatic tissues, and also involved in important regulatory functions such as immune-response, lipid metabolism and peroxisome proliferator-activated receptors (PPARs) signalling pathways. For the first time, we also identify miRNAs that can differentiate between primary colon from metastatic and a prognostic signature of grade and stage levels, which can be a major contributor to complex transcriptional phenotypes in a colon tumour. Colon cancer Microarray gene expression profiling Gene ontology enrichment analysis MicroRNA System biology Bioinformatics Gene signature Cross-validation Diagnostic Prognostic
4	Knowledge Discovery Considering Domain Literature and Ontologies : Application to Rare Diseases / Découverte de connaissances considérant la littérature et les ontologies de domaine : application aux maladies rares Hassan, Mohsen 11 July 2017 (has links) De par leur grand nombre et leur sévérité, les maladies rares (MR) constituent un enjeu de santé majeur. Des bases de données de référence, comme Orphanet et Orphadata, répertorient les informations disponibles à propos de ces maladies. Cependant, il est difficile pour ces bases de données de proposer un contenu complet et à jour par rapport à ce qui est disponible dans la littérature. En effet, des millions de publications scientifiques sur ces maladies sont disponibles et leur nombre augmente de façon continue. Par conséquent, il serait très fastidieux d’extraire manuellement et de façon exhaustive des informations sur ces maladies. Cela motive le développement des approches semi-automatiques pour extraire l’information des textes et la représenter dans un format approprié pour son utilisation dans d’autres applications. Cette thèse s’intéresse à l’extraction de connaissances à partir de textes et propose d’utiliser les résultats de l’extraction pour enrichir une ontologie de domaine. Nous avons étudié trois directions de recherche: (1) l’extraction de connaissances à partir de textes, et en particulier l’extraction de relations maladie-phénotype (M-P); (2) l’identification d’entité nommées complexes, en particulier de phénotypes de MR; et (3) l’enrichissement d’une ontologie en considérant les connaissances extraites à partir de texte. Tout d’abord, nous avons fouillé une collection de résumés d’articles scientifiques représentés sous la forme graphes pour un extraire des connaissances sur les MR. Nous nous sommes concentrés sur la complétion de la description des MR, en extrayant les relations M-P. Cette trouve des applications dans la mise à jour des bases de données de MR telles que Orphanet. Pour cela, nous avons développé un système appelé SPARE* qui extrait les relations M-P à partir des résumés PubMed, où les phénotypes et les MR sont annotés au préalable par un système de reconnaissance des entités nommées. SPARE* suit une approche hybride qui combine une méthode basée sur des patrons syntaxique, appelée SPARE, et une méthode d’apprentissage automatique (les machines à vecteurs de support ou SVM). SPARE* bénéficié à la fois de la précision relativement bonne de SPARE et du bon rappel des SVM. Ensuite, SPARE* a été utilisé pour identifier des phénotypes candidats à partir de textes. Pour cela, nous avons sélectionné des patrons syntaxiques qui sont spécifiques aux relations M-P uniquement. Ensuite, ces patrons sont relaxés au niveau de leur contrainte sur le phénotype pour permettre l’identification de phénotypes candidats qui peuvent ne pas être références dans les bases de données ou les ontologies. Ces candidats sont vérifiés et validés par une comparaison avec les classes de phénotypes définies dans une ontologie de domaine comme HPO. Cette comparaison repose sur une modèle sémantique et un ensemble de règles de mises en correspondance définies manuellement pour cartographier un phénotype candidate extrait de texte avec une classe de l’ontologie. Nos expériences illustrent la capacité de SPARE* à des phénotypes de MR déjà répertoriés ou complètement inédits. Nous avons appliqué SPARE* à un ensemble de résumés PubMed pour extraire les phénotypes associés à des MR, puis avons mis ces phénotypes en correspondance avec ceux déjà répertoriés dans l’encyclopédie Orphanet et dans Orphadata ; ceci nous a permis d’identifier de nouveaux phénotypes associés à la maladie selon les articles, mais pas encore listés dans Orphanet ou Orphadata.Enfin, nous avons appliqué les structures de patrons pour classer les MR et enrichir une ontologie préexistante. Tout d’abord, nous avons utilisé SPARE* pour compléter les descriptions en terme de phénotypes de MR disponibles dans Orphadata. Ensuite, nous proposons de compter et grouper les MR au regard de leur description phénotypique, et ce en utilisant les structures de patron. [...] / Even if they are uncommon, Rare Diseases (RDs) are numerous and generally sever, what makes their study important from a health-care point of view. Few databases provide information about RDs, such as Orphanet and Orphadata. Despite their laudable effort, they are incomplete and usually not up-to-date in comparison with what exists in the literature. Indeed, there are millions of scientific publications about these diseases, and the number of these publications is increasing in a continuous manner. This makes the manual extraction of this information painful and time consuming and thus motivates the development of semi-automatic approaches to extract information from texts and represent it in a format suitable for further applications. This thesis aims at extracting information from texts and using the result of the extraction to enrich existing ontologies of the considered domain. We studied three research directions (1) extracting relationships from text, i.e., extracting Disease-Phenotype (D-P) relationships; (2) identifying new complex entities, i.e., identifying phenotypes of a RD and (3) enriching an existing ontology on the basis of the relationship previously extracted, i.e., enriching a RD ontology. First, we mined a collection of abstracts of scientific articles that are represented as a collection of graphs for discovering relevant pieces of biomedical knowledge. We focused on the completion of RD description, by extracting D-P relationships. This could find applications in automating the update process of RD databases such as Orphanet. Accordingly, we developed an automatic approach named SPARE, for extracting D-P relationships from PubMed abstracts, where phenotypes and RDs are annotated by a Named Entity Recognizer. SPARE is a hybrid approach that combines a pattern-based method, called SPARE, and a machine learning method (SVM). It benefited both from the relatively good precision of SPARE and from the good recall of the SVM. Second, SPARE* has been used for identifying phenotype candidates from texts. We selected high-quality syntactic patterns that are specific for extracting D-P relationships only. Then, these patterns are relaxed on the phenotype constraint to enable extracting phenotype candidates that are not referenced in databases or ontologies. These candidates are verified and validated by the comparison with phenotype classes in a well-known phenotypic ontology (e.g., HPO). This comparison relies on a compositional semantic model and a set of manually-defined mapping rules for mapping an extracted phenotype candidate to a phenotype term in the ontology. This shows the ability of SPARE* to identify existing and potentially new RD phenotypes. We applied SPARE* on PubMed abstracts to extract RD phenotypes that we either map to the content of Orphanet encyclopedia and Orphadata; or suggest as novel to experts for completing these two resources. Finally, we applied pattern structures for classifying RDs and enriching an existing ontology. First, we used SPARE* to compute the phenotype description of RDs available in Orphadata. We propose comparing and grouping RDs in regard to their phenotypic descriptions, and this by using pattern structures. The pattern structures enable considering both domain knowledge, consisting in a RD ontology and a phenotype ontology, and D-P relationships from various origins. The lattice generated from this pattern structures suggests a new classification of RDs, which in turn suggests new RD classes that do not exist in the original RD ontology. As their number is large, we proposed different selection methods to select a reduced set of interesting RD classes that we suggest for experts for further analysis Extraction d’information Analyse formelle de concepts Structure de patron Enrichissement d’ontologie Natural Language Processing Information Extraction Formal Concept Analysis Pattern Structures Ontology Enrichment 006.332 025.04
5	Méthode d’extraction d’informations géographiques à des fins d’enrichissement d’une ontologie de domaine / Geographical information extraction method in order to enrich a domain ontology Nguyen, Van Tien 15 November 2012 (has links) Notre thèse se situe dans le contexte du projet ANR GEONTO qui porte sur la constitution, l’alignement, la comparaison et l’exploitation d’ontologies géographiques hétérogènes. Dans ce contexte, notre objectif est d'extraire automatiquement des termes topographiques à partir des récits de voyage afin d'enrichir une ontologie géographique initialement conçue par l'IGN. La méthode proposée permet de repérer et d'extraire des termes à connotation topographiques contenus dans un texte. Notre méthode est basée sur le repérage automatique de certaines relations linguistiques afin d'annoter ces termes. Sa mise en œuvre s'appuie sur le principe des relations n-aires et passe par l'utilisation de méthodes ou de techniques de TAL (Traitement Automatique de la Langue). Il s'agit de relations n-aires entre les termes à extraire et d'autres éléments du textes qui peuvent être repérés à l'aide de ressources externes prédéfinies, telles que des lexiques spécifiques: les verbes de récit de voyage (verbes de déplacement, verbes de perceptions, et verbes topographiques), les pré-positions (prépositions de lieu, adverbes, adjectifs), les noms toponymiques, des thésaurus génériques, des ontologies de domaine (ici l'ontologie géographique initialement conçue par l'IGN). Une fois marquées par des patrons linguistiques, les relations proposées nous permettent d'annoter et d'extraire automatiquement des termes dont les différents indices permettent de déduire qu'ils évoquent des concepts topographiques. Les règles de raisonnement qui permettent ces déductions s'appuient sur des connaissances intrinsèques (évocation du spatial dans la langue) et des connaissances externes contenues dans les ressources ci-dessus évoquées, ou leur combinaison. Le point fort de notre approche est que la méthode proposée permet d'extraire non seulement des termes rattachés directement aux noms toponymiques mais également dans des structures de phrase où d'autres termes s'intercalent. L'expérimentation sur un corpus comportant 12 récits de voyage (2419 pages, fournit par la médiathèque de Pau) a montré que notre méthode est robuste. En résultat, elle a permis d'extraire 2173 termes distincts dont 1191 termes valides, soit une précision de 0,55. Cela démontre que l'utilisation des relations proposées est plus efficace que celle des couples (termes, nom toponymique)(qui donne 733 termes distincts valides avec une précision de 0,38). Notre méthode peut également être utilisée pour d'autres applications telles que la reconnaissance des entités nommées géographiques, l'indexation spatiale des documents textuels. / This thesis is in the context of the ANR project GEONTO covering the constitution, alignment, comparison and exploitation of heterogeneous geographic ontologies. The goal is to automatically extract terms from topographic travelogues to enrich a geographical ontology originally designed by IGN. The proposed method allows identification and extraction of terms contained in a text with a topographical connotation. Our method is based on a model that relies on certain grammatical relations to locate these terms. The implementation of this model requires the use of methods or techniques of NLP (Processing of Language). Our model represents the relationships between terms to extract and other elements of the texts that can be identified by using external predefined resources, such as specific lexicons: verbs of travelogue (verbs of displacement, verbs of perceptions, topographical verbs), pre-positions (prepositions of place, adverbs, adjectives), place name, generic thesauri, ontologies of domain (in our case the geographical ontology originally designed by IGN). Once marked by linguistic patterns, the proposed relationships allow us to annotate and automatically retrieve terms. Then various indices help deduce whether the extracted terms evoke topographical concepts. It is through reasoning rules that deductions are made. These rules are based on intrinsic knowledge (evocation of space in the language) and external knowledge contained in external resources mentioned above, or their combination. The advantage of our approach is that the method can extract not only the terms related directly to place name but also those embedded in sentence structure in which other terms coexisted. Experiments on a corpus consisting of 12 travel stories (2419 pages, provided by the library of Pau) showed that our method is robust. As a result, it was used to extract 2173 distinct terms with 1191 valid terms, with a precision of 0.55. This demonstrates that the use of the proposed relationships is more effective than that of couples (term, place name) (which gives 733 distinct terms valid with an accuracy of 0.38). Our method can also be used for other applications such as geographic named entity recognition, spatial indexing of textual documents. Extractions terminologiques Enrichissement d'ontologie de domaine Expressions spatiales dans le texte Relations n-aires Patrons syntaxico-sémantiques Entités nommées Métriques de chaînes. Term extraction Geographical ontology enrichment Spatial expressions in the text N-ary relation, Syntatic-semantic patterns, Named entities String metrics.
6	Fouille de connaissances en diagnostic mammographique par ontologie et règles d'association / Ontologies and association rules knowledge mining, case study : Mammographic domain Idoudi, Rihab 24 January 2017 (has links) Face à la complexité significative du domaine mammographique ainsi que l'évolution massive de ses données, le besoin de contextualiser les connaissances au sein d'une modélisation formelle et exhaustive devient de plus en plus impératif pour les experts. C'est dans ce cadre que s'inscrivent nos travaux de recherche qui s'intéressent à unifier différentes sources de connaissances liées au domaine au sein d'une modélisation ontologique cible. D'une part, plusieurs modélisations ontologiques mammographiques ont été proposées dans la littérature, où chaque ressource présente une perspective distincte du domaine d'intérêt. D'autre part, l'implémentation des systèmes d'acquisition des mammographies rend disponible un grand volume d'informations issues des faits passés, dont la réutilisation devient un enjeu majeur. Toutefois, ces fragments de connaissances, présentant de différentes évidences utiles à la compréhension de domaine, ne sont pas interopérables et nécessitent des méthodologies de gestion de connaissances afin de les unifier. C'est dans ce cadre que se situe notre travail de thèse qui s'intéresse à l'enrichissement d'une ontologie de domaine existante à travers l'extraction et la gestion de nouvelles connaissances (concepts et relations) provenant de deux courants scientifiques à savoir: des ressources ontologiques et des bases de données comportant des expériences passées. Notre approche présente un processus de couplage entre l'enrichissement conceptuel et l'enrichissement relationnel d'une ontologie mammographique existante. Le premier volet comporte trois étapes. La première étape dite de pré-alignement d'ontologies consiste à construire pour chaque ontologie en entrée une hiérarchie des clusters conceptuels flous. Le but étant de réduire l'étape d'alignement de deux ontologies entières en un alignement de deux groupements de concepts de tailles réduits. La deuxième étape consiste à aligner les deux structures des clusters relatives aux ontologies cible et source. Les alignements validés permettent d'enrichir l'ontologie de référence par de nouveaux concepts permettant d'augmenter le niveau de granularité de la base de connaissances. Le deuxième processus s'intéresse à l'enrichissement relationnel de l'ontologie mammographique cible par des relations déduites de la base de données de domaine. Cette dernière comporte des données textuelles des mammographies recueillies dans les services de radiologies. Ce volet comporte ces étapes : i) Le prétraitement des données textuelles ii) l'application de techniques relatives à la fouille de données (ou extraction de connaissances) afin d'extraire des expériences de nouvelles associations sous la forme de règles, iii) Le post-traitement des règles générées. Cette dernière consiste à filtrer et classer les règles afin de faciliter leur interprétation et validation par l'expert vi) L'enrichissement de l'ontologie par de nouvelles associations entre les concepts. Cette approche a été mise en 'uvre et validée sur des ontologies mammographiques réelles et des données des patients fournies par les hôpitaux Taher Sfar et Ben Arous. / Facing the significant complexity of the mammography area and the massive changes in its data, the need to contextualize knowledge in a formal and comprehensive modeling is becoming increasingly urgent for experts. It is within this framework that our thesis work focuses on unifying different sources of knowledge related to the domain within a target ontological modeling. On the one hand, there is, nowadays, several mammographic ontological modeling, where each resource has a distinct perspective area of interest. On the other hand, the implementation of mammography acquisition systems makes available a large volume of information providing a decisive competitive knowledge. However, these fragments of knowledge are not interoperable and they require knowledge management methodologies for being comprehensive. In this context, we are interested on the enrichment of an existing domain ontology through the extraction and the management of new knowledge (concepts and relations) derived from two scientific currents: ontological resources and databases holding with past experiences. Our approach integrates two knowledge mining levels: The first module is the conceptual target mammographic ontology enrichment with new concepts extracting from source ontologies. This step includes three main stages: First, the stage of pre-alignment. The latter consists on building for each input ontology a hierarchy of fuzzy conceptual clusters. The goal is to reduce the alignment task from two full ontologies to two reduced conceptual clusters. The second stage consists on aligning the two hierarchical structures of both source and target ontologies. Thirdly, the validated alignments are used to enrich the reference ontology with new concepts in order to increase the granularity of the knowledge base. The second level of management is interested in the target mammographic ontology relational enrichment by novel relations deducted from domain database. The latter includes medical records of mammograms collected from radiology services. This section includes four main steps: i) the preprocessing of textual data ii) the application of techniques for data mining (or knowledge extraction) to extract new associations from past experience in the form of rules, iii) the post-processing of the generated rules. The latter is to filter and classify the rules in order to facilitate their interpretation and validation by expert, vi) The enrichment of the ontology by new associations between concepts. This approach has been implemented and validated on real mammographic ontologies and patient data provided by Taher Sfar and Ben Arous hospitals. The research work presented in this manuscript relates to knowledge using and merging from heterogeneous sources in order to improve the knowledge management process. Fouille de connaissances Enrichissement conceptuel d¿ontologie Alignement d¿ontologies Clustering hiérarchique conceptuel flou Enrichissement relationnel d¿ontologie Extraction de règles d¿association Knowledge mining Ontology conceptual enrichment Relational ontology enrichment Association rules extraction Post-Processing of association rules. 004

1

Page generated in 0.0805 seconds