Spelling suggestions: "subject:"aemantic annotation"" "subject:"aemantic innotation""
21 |
Excom‑2 : plateforme d’annotation automatique de catégories sémantiques : conception, modélisation et réalisation informatique : applications à la catégorisation des citations en arabe et en français / Excom-2 : a cross-language platform for automatic annotations according to semantic points of view : example of treatment : quotations categorization in Arabic and FrenshAlrahabi, Al Moatasem 29 January 2010 (has links)
Nous proposons une plateforme d’annotation sémantique, appelée « EXCOM-2 ». Basée sur la méthode de l’ « Exploration Contextuelle », elle permet, à travers une diversité de langues, de procéder à des annotations automatiques de segments textuels par l'analyse des formes de surface dans leur contexte. Les textes sont traités selon des « points de vue » discursifs dont les valeurs sont organisées dans une « carte sémantique ». L’annotation se base sur un ensemble de règles linguistiques, écrites par un analyste, qui permettent d’identifier les représentations textuelles sous-jacentes aux différentes catégories de la carte. Le système offre, à travers deux types d’interfaces (développeur ou utilisateur), une chaîne de traitements automatiques de textes qui comprend la segmentation, l’annotation et d’autres fonctionnalités de post-traitement. Les documents annotés peuvent être utilisés, par exemple, pour des systèmes de recherche d’information, de veille, de classification ou de résumé automatique. Comme exemple d'application, nous proposons un système d'identification et de catégorisation automatiques du discours rapporté en arabe et en français. / We propose a platform for semantic annotation, called “EXCOM-2”. Based on the “Contextual Exploration” method, it enables, across a great range of languages, to perform automatic annotations of textual segments by analyzing surface forms in their context. Texts are approached through discursive “points of view”, of which values are organized into a “semantic map”. The annotation is based on a set of linguistic rules, manually constructed by an analyst, and that enables to automatically identify the textual representations underlying the different semantic categories of the map. The system provides through two sorts of user-friendly interfaces (analyst or end-user) a complete pipeline of automatic text processing which consists of segmentation, annotation and other post-processing functionalities. Annotated documents can be used, for instance, for information retrieval systems, classification or automatic summarization. As example, we propose an analysis of the linguistic markers of the enunciative modalities in direct reported speech, in a multilingual framework concerning Arabic and French.
|
22 |
Sémantická anotace a dotazování nad RDF daty / Semantic annotation and querying RDF dataKýpeť, Jakub January 2015 (has links)
Title: Semantic annotation and querying RDF data Author: Jakub Kýpeť Department: Department of Software Engineering Supervisor: Prof. RNDr. Peter Vojtáš, DrSc. Abstract: The presented thesis in detail describes a design and an implementation of self-sustained server application, that allows us to create and manage semantic annotations for various web pages. In the first part it describes the manual annotations and the human interface we have build for them. In the second part it also describes our implementation for a web crawler and an automatic annotation system utilizing this crawler. The last part of the thesis analyzes the testing of this automated system that has been performed using several e- commerce websites with different domains. Keywords: semantic annotation, querying RDF data, user interface, web crawl- ing, automatization
|
23 |
Un système interactif et itératif extraction de connaissances exploitant l'analyse formelle de concepts / An Interactive and Iterative Knowledge Extraction Process Using Formal Concept AnalysisTang, My Thao 30 June 2016 (has links)
Dans cette thèse, nous présentons notre méthodologie de la connaissance interactive et itérative pour une extraction des textes - le système KESAM: Un outil pour l'extraction des connaissances et le Management de l’Annotation Sémantique. Le KESAM est basé sur l'analyse formelle du concept pour l'extraction des connaissances à partir de ressources textuelles qui prend en charge l'interaction aux experts. Dans le système KESAM, l’extraction des connaissances et l'annotation sémantique sont unifiées en un seul processus pour bénéficier à la fois l'extraction des connaissances et l'annotation sémantique. Les annotations sémantiques sont utilisées pour formaliser la source de la connaissance dans les textes et garder la traçabilité entre le modèle de la connaissance et la source de la connaissance. Le modèle de connaissance est, en revanche, utilisé afin d’améliorer les annotations sémantiques. Le processus KESAM a été conçu pour préserver en permanence le lien entre les ressources (textes et annotations sémantiques) et le modèle de la connaissance. Le noyau du processus est l'Analyse Formelle de Concepts (AFC) qui construit le modèle de la connaissance, i.e. le treillis de concepts, et assure le lien entre le modèle et les annotations des connaissances. Afin d'obtenir le résultat du treillis aussi près que possible aux besoins des experts de ce domaine, nous introduisons un processus itératif qui permet une interaction des experts sur le treillis. Les experts sont invités à évaluer et à affiner le réseau; ils peuvent faire des changements dans le treillis jusqu'à ce qu'ils parviennent à un accord entre le modèle et leurs propres connaissances ou le besoin de l’application. Grâce au lien entre le modèle des connaissances et des annotations sémantiques, le modèle de la connaissance et les annotations sémantiques peuvent co-évoluer afin d'améliorer leur qualité par rapport aux exigences des experts du domaine. En outre, à l'aide de l’AFC de la construction des concepts avec les définitions des ensembles des objets et des ensembles d'attributs, le système KESAM est capable de prendre en compte les deux concepts atomiques et définis, à savoir les concepts qui sont définis par un ensemble des attributs. Afin de combler l'écart possible entre le modèle de représentation basé sur un treillis de concept et le modèle de représentation d'un expert du domaine, nous présentons ensuite une méthode formelle pour l'intégration des connaissances d’expert en treillis des concepts d'une manière telle que nous pouvons maintenir la structure des concepts du treillis. La connaissance d’expert est codée comme un ensemble de dépendance de l'attribut qui est aligné avec l'ensemble des implications fournies par le concept du treillis, ce qui conduit à des modifications dans le treillis d'origine. La méthode permet également aux experts de garder une trace des changements qui se produisent dans le treillis d'origine et la version finale contrainte, et d'accéder à la façon dont les concepts dans la pratique sont liés à des concepts émis automatiquement à partir des données. Nous pouvons construire les treillis contraints sans changer les données et fournir la trace des changements en utilisant des projections extensives sur treillis. À partir d'un treillis d'origine, deux projections différentes produisent deux treillis contraints différents, et, par conséquent, l'écart entre le modèle de représentation basée sur un treillis de réflexion et le modèle de représentation d'un expert du domaine est rempli avec des projections / In this thesis, we present a methodology for interactive and iterative extracting knowledge from texts - the KESAM system: A tool for Knowledge Extraction and Semantic Annotation Management. KESAM is based on Formal Concept Analysis for extracting knowledge from textual resources that supports expert interaction. In the KESAM system, knowledge extraction and semantic annotation are unified into one single process to benefit both knowledge extraction and semantic annotation. Semantic annotations are used for formalizing the source of knowledge in texts and keeping the traceability between the knowledge model and the source of knowledge. The knowledge model is, in return, used for improving semantic annotations. The KESAM process has been designed to permanently preserve the link between the resources (texts and semantic annotations) and the knowledge model. The core of the process is Formal Concept Analysis that builds the knowledge model, i.e. the concept lattice, and ensures the link between the knowledge model and annotations. In order to get the resulting lattice as close as possible to domain experts' requirements, we introduce an iterative process that enables expert interaction on the lattice. Experts are invited to evaluate and refine the lattice; they can make changes in the lattice until they reach an agreement between the model and their own knowledge or application's need. Thanks to the link between the knowledge model and semantic annotations, the knowledge model and semantic annotations can co-evolve in order to improve their quality with respect to domain experts' requirements. Moreover, by using FCA to build concepts with definitions of sets of objects and sets of attributes, the KESAM system is able to take into account both atomic and defined concepts, i.e. concepts that are defined by a set of attributes. In order to bridge the possible gap between the representation model based on a concept lattice and the representation model of a domain expert, we then introduce a formal method for integrating expert knowledge into concept lattices in such a way that we can maintain the lattice structure. The expert knowledge is encoded as a set of attribute dependencies which is aligned with the set of implications provided by the concept lattice, leading to modifications in the original lattice. The method also allows the experts to keep a trace of changes occurring in the original lattice and the final constrained version, and to access how concepts in practice are related to concepts automatically issued from data. The method uses extensional projections to build the constrained lattices without changing the original data and provide the trace of changes. From an original lattice, two different projections produce two different constrained lattices, and thus, the gap between the representation model based on a concept lattice and the representation model of a domain expert is filled with projections.
|
24 |
Serendipity prospecção semântica de dados qualitativos em Educação EspecialFernandes, Woquiton Lima 22 August 2016 (has links)
Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-02-23T12:32:56Z
No. of bitstreams: 1
TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-03-20T13:42:30Z (GMT) No. of bitstreams: 1
TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-03-20T13:42:43Z (GMT) No. of bitstreams: 1
TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5) / Made available in DSpace on 2017-03-20T13:54:25Z (GMT). No. of bitstreams: 1
TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5)
Previous issue date: 2016-08-22 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / In the past decades, there has been a revolution in the way science has been
conducted. The current context has demanded more collaborative work such as,
studies in research networks of large scale. One of the many essential marks of
change in this new way of making science has been the intense usage of Information
and Communication Technologies (ICT), or “eScience”. Nowadays, it plays a
fundamental role in the methodology adopted by many research groups around the
world. Analyses of the qualitative data evidenced in researches about Special
Education were done then. The biggest challenge that was noticed would be to
advance in the analysis of qualitative data using information technologies without
losing the subjectivity involved in the research and to broaden the capability of going
over the data without losing the right to come and go, the right to critique and
establish proper reflexions, respecting subjective positioning and, above all,
maintaining the research's critic criteria. In this sense, this work establishes as its
main objective to evaluate the proposed technological architecture of qualitative
analyses of data. This analysis was based upon data mining theories, researches in
ontology and techniques of semantic notation in the field of special education aiming
to analyze the thresholds and possibilities this methodological approach permits. We
used as methodology the construction of a prototype, named Serendipity, based on
the perspective of software engineering, in order to extract the main techniques that
could set as a safe method for design, implementation and deployment of the
solution. Cyclically, the methodology allowed us to modify requirements and establish
improvements, allowing the feedback process from new analyses. The text mining
process relied on gaining knowledge from textual databases that have little or no
data structure. The computational ontology was the element able to reconstruct the
syntactic representation, giving it direction. The words (data) are related and are set
within a context of formal knowledge, providing them with a semantic and cognitive
ability, building concepts, open to interpretation, comprehension and common
understanding; as a result, we built up a specific ontology for Special Education. The
semantic annotation helped attach content to the text to describe their semantics,
allowing that software agents could retrieve information in a more precise manner
through the association of the document to the ontology in a conception of semantic
fields. We built a customized dictionary for special education to relate terms to
synonyms and expressions associated with the ontology. To view beyond the
semantic classes, we used automatic concept maps to establish relationships
between concepts included in a hierarchical structure of propositions. Finally, to
assess the proposal, we made use of part of the data collected from the National
Observatory of Special Education in transcribed texts about the formation of five
cities, one from each region of Brazil. The results show limits already recognized in
the proposal and; in this respect, did not aim to establish a subjective and deep
analysis that would permit extreme precision results. It points out that the researcher
is and will always be the driving factor that operates the process’ flow and relying, or
not, on computing tools is not entirely immune to err. The proposal of serendipity has
given a step forward in the automatic process of data analysis and can be used in big
data without losing the subjectivity of the researcher. However, we must add new
human and technological resources to contribute to its improvement and encourage
other areas to develop domain ontologies with their experts and the development of
specific dictionaries. Therefore, despite its limitations, the approach has shown
significant advances in semantic exploration of qualitative data in the Special Education field and it is capable of being adapted to other areas and fields of
knowledge. / Nas últimas décadas, tem ocorrido uma revolução no modo como a ciência tem sido
conduzida, o atual contexto tem demandado cada vez mais o trabalho colaborativo,
tais como os estudos em redes de pesquisa de ampla escala. Um dos pontos
essenciais de mudança nessa nova forma de se fazer ciência tem sido o uso intenso
de Tecnologias de Informação e Comunicação (TIC), chamada como “eScience”,
que desempenha hoje um papel fundamental na metodologia adotada por muitos
grupos de pesquisa ao redor do mundo. Partiu-se então para uma reflexão acerca
do aprofundamento de dados qualitativos evidenciadas principalmente nas
pesquisas em Educação Especial. O grande desafio seria avançar na qualidade da
análise de dados qualitativos com uso das tecnologias da informação sem perder a
subjetividade envolvida na pesquisa e ampliar a capacidade de esmiuçar os dados
sem perder a liberdade de ir e vir, de criticar e estabelecer reflexões próprias,
respeitando posicionamentos e, sobretudo, mantendo o rigor científico na pesquisa.
Neste sentido, o presente estudo estabeleceu como objetivo principal avaliar a
arquitetura tecnológica proposta de análise qualitativa de dados, tendo como base
as teorias de mineração de textos, ontologia computacional e técnicas de anotação
semântica, em pesquisa da educação especial, a fim de analisar os limites e
possibilidades desta abordagem metodológica. Utilizamos como metodologia
baseada na construção de um protótipo, denominado Serendipity, fundamentado na
perspectiva da engenharia de software, de maneira que extraímos as principais
técnicas que puderam definir um método seguro para a concepção, implementação
e implantação da solução. De forma cíclica a metodologia permitia modificar
requisitos e estabelecer melhorias, permitindo a retroalimentação do processo a
partir de novas análises. Para isto, a mineração de textos apoiou-se na obtenção de
conhecimento a partir de bases de dados textuais que possuem pouca ou nenhuma
estrutura de dados. A ontologia computacional foi o elemento capaz de reconstruir a
representação sintática, dando a ela sentido. As palavras (dados) se relacionam e
são postas dentro de um contexto, de um conhecimento formal, dotando-as de uma
capacidade semântica e cognitiva, construindo conceitos, passível de interpretação,
compreensão e entendimento comum; para isto construiu-se uma ontologia
específica para Educação Especial. A anotação semântica ajudou a anexar
conteúdos ao texto para descrever a sua semântica, permitindo que agentes de
software pudessem recuperar informações de forma mais precisa, através da
associação do documento à ontologia, numa concepção de campos semânticos.
Construiu-se também um dicionário da Educação Especial customizado para
relacionar termos a sinônimos e expressões associadas à ontologia. Para
visualização, além das classes semânticas, utilizou-se de mapas conceituais
automáticos para estabelecer relações entre conceitos incluídos numa estrutura
hierárquica de proposições. Por fim, para a avaliação da proposta utilizou-se de
parte dos dados coletados no Observatório Nacional da Educação Especial de
textos transcritos acerca da Formação em cinco cidades, sendo uma de cada região
do Brasil. Os resultados evidenciam limites já reconhecidos na proposta e, neste
aspecto, não teve a pretensão de determinar uma análise subjetiva e detalhista, que
a rigor, permita resultados de extrema precisão. Destaca que o pesquisador é e
sempre será o condutor livre do funcionamento do processo e contando, ou não,
com ferramentas computacionais ele pode cometer erros. A proposta do serendipity
deu um passo no processo automático de análise de dados, podendo ser
aproveitada em big data, pesquisas de nível nacional, sem perder a subjetividade do pesquisador. Para isto é preciso agregar novos recursos humanos e tecnológicos
que contribuam em seu aprimoramento. Estimular outras áreas a desenvolverem
ontologias de domínio com seus especialistas e a evolução dos dicionários
específicos. Portanto, apesar de seus limites, a abordagem possui avanços
significativos na prospecção semântica de dados qualitativos em Educação Especial
e passível de adaptação a outras áreas de conhecimento.
|
25 |
Extraction des relations de causalité dans les textes économiques par la méthode de l’exploration contextuelle / Extraction of causal relations in economic texts by the contextual exploration methodSingh, Dory 21 October 2017 (has links)
La thèse décrit un processus d’extraction d’informations causales dans les textes économiques qui, contrairement à l’économétrie, se fonde essentiellement sur des ressources linguistiques. En effet, l’économétrie appréhende la notion causale selon des modèles mathématiques et statistiques qui aujourd’hui sont sujets à controverses. Aussi, notre démarche se propose de compléter ou appuyer les modèles économétriques. Il s’agit d’annoter automatiquement des segments textuels selon la méthode de l’exploration contextuelle (EC). L’EC est une stratégie linguistique et computationnelle qui vise à extraire des connaissances selon un point de vue. Par conséquent, cette contribution adopte le point de vue discursif de la causalité où les catégories sont structurées dans une carte sémantique permettant l’élaboration des règles abductives implémentées dans les systèmes EXCOM2 et SEMANTAS. / The thesis describes a process of extraction of causal information, which contrary to econometric, is essentially based on linguistic knowledge. Econometric exploits mathematic or statistic models, which are now, subject of controversy. So, our approach intends to complete or to support the econometric models. It deals with to annotate automatically textual segments according to Contextual Exploration (CE) method. The CE is a linguistic and computational strategy aimed at extracting knowledge according to points of view. Therefore, this contribution adopts the discursive point of view of causality where the categories are structured in a semantic map. These categories allow to elaborate abductive rules implemented in the systems EXCOM2 and SEMANTAS.
|
26 |
Construction de fiches de synthèse par annotation sémantique automatique des publications scientifiques : application aux articles en biologie / Thematic sheets construction of scientific publications using semantic annotation of scientific publications : Application to biomedical papers.Makkaoui, Olfa 17 January 2014 (has links)
Les fiches de synthèse multi-documents sont considérées comme une représentation textuelle organisée et structurée des segments textuels. La construction de ces fiches repose sur l’annotation sémantique des publications scientifiques suivant un ensemble de catégories discursives qu’on appelle des points de vue de fouille (comme les hypothèses plausibles, les résultats, ou les conclusions,…). L’annotation sémantique est réalisée automatiquement par la méthode de l’Exploration Contextuelle. Il s’agit d’une méthode linguistique computationnelle, implémentée par un moteur d’annotation sémantique, qui repose sur un ensemble de marqueurs linguistiques associés à des points de vue de fouille. Afin de pouvoir expérimenter la pertinence des résultats de notre système, nous avons procédé à l’évaluation des annotations automatiques sur des textes en biologie. La notion des spéculations (hypothèses plausibles), particulièrement décrite dans ce travail, a été évaluée sur le corpus BioScope annoté manuellement pour les spéculations et les négations. Nous proposons une application informatique qui permet aux utilisateurs d’obtenir des fiches de synthèse organisées suivant des critères sémantiques paramétrables par l’utilisateur. / Multi-documents thematic sheets are considered as an organized and structured textual representationof textual segments. The thematic sheets construction is based on the semantic annotation ofscientific publications according to a set of discursive categories called search view points (such asspeculation, results or conclusions, ?). The semantic annotation is performed automatically by theContextual Exploration process. It is a computational linguistic method based on a set of linguisticmarkers associated with search view points. This method is implemented by a semantic annotationengine. In order to evaluate the relevance of the results of our system, we used biological papers toevaluate the automatic annotation. The concept of speculation (plausible hypothesis), specificallydescribed in this work, was evaluated on the Bioscope corpus which is manually annotated forspeculation and negation. We propose an application that allows users to obtain thematic sheetsorganized according to semantic criteria configurable by the user.
|
27 |
Semantic modeling of an histopathology image exploration and analysis tool / Modélisation sémantique d'un outil d'analyse et d'exploration d'images histopathologiquesTraore, Lamine 08 December 2017 (has links)
La formalisation des données cliniques est réalisée et adoptée dans plusieurs domaines de la santé comme la prévention des erreurs médicales, la standardisation, les guides de bonnes pratiques et de recommandations. Cependant, la communauté n'arrive pas encore à tirer pleinement profit de la valeur de ces données. Le problème majeur reste la difficulté à intégrer ces données et des services sémantiques associés au profit de la qualité de soins. Objectif L'objectif méthodologique de ce travail consiste à formaliser, traiter et intégrer les connaissances d'histopathologie et d'imagerie basées sur des protocoles standardisés, des référentiels et en utilisant les langages du web sémantique. L'objectif applicatif est de valoriser ces connaissances dans une plateforme pour faciliter l'exploration des lames virtuelles (LV), améliorer la collaboration entre pathologistes et fiabiliser les systèmes d'aide à la décision dans le cadre spécifique du diagnostic du cancer du sein. Il est important de préciser que notre but n'est pas de remplacer le clinicien, mais plutôt de l'accompagner et de faciliter ses lourdes tâches quotidiennes : le dernier mot reste aux pathologistes. Approche Nous avons adopté une approche transversale pour la représentation formelle des connaissances d'histopathologie et d'imagerie dans le processus de gradation du cancer. Cette formalisation s'appuie sur les technologies du web sémantique. / Semantic modelling of a histopathology image exploration and analysis tool. Recently, anatomic pathology (AP) has seen the introduction of several tools such as high-resolution histopathological slide scanners, efficient software viewers for large-scale histopathological images and virtual slide technologies. These initiatives created the conditions for a broader adoption of computer-aided diagnosis based on whole slide images (WSI) with the hope of a possible contribution to decreasing inter-observer variability. Beside this, automatic image analysis algorithms represent a very promising solution to support pathologist’s laborious tasks during the diagnosis process. Similarly, in order to reduce inter-observer variability between AP reports of malignant tumours, the College of American Pathologists edited 67 organ-specific Cancer Checklists and associated Protocols (CAP-CC&P). Each checklist includes a set of AP observations that are relevant in the context of a given organ-specific cancer and have to be reported by the pathologist. The associated protocol includes interpretation guidelines for most of the required observations. All these changes and initiatives bring up a number of scientific challenges such as the sustainable management of the available semantic resources associated to the diagnostic interpretation of AP images by both humans and computers. In this context, reference vocabularies and formalization of the associated knowledge are especially needed to annotate histopathology images with labels complying with semantic standards. In this research work, we present our contribution in this direction. We propose a sustainable way to bridge the content, features, performance and usability gaps between histopathology and WSI analysis.
|
28 |
Numbers, winds and stars: representing the ancient geographical language in the digital environmentPalladino, Chiara January 2016 (has links)
No description available.
|
29 |
MAISA - Maintenance of semantic annotations / MAISA - Maintenance des annotations sémantiquesCardoso, Silvio Domingos 07 December 2018 (has links)
Les annotations sémantiques sont utilisées dans de nombreux domaines comme celui de la santé et servent à différentes tâches notamment la recherche et le partage d’information ou encore l'aide à la décision. Les annotations sont produites en associant à des documents digitaux des labels de concepts provenant des systèmes d’organisation de la connaissance (Knowledge Organization Systems, ou KOS, en anglais) comme les ontologies. Elles permettent alors aux ordinateurs d'interpréter, connecter et d'utiliser de manière automatique de grandes quantités de données. Cependant, la nature dynamique de la connaissance engendre régulièrement de profondes modifications au niveau du contenu des KOS provoquant ainsi un décalage entre la définition des concepts et les annotations. Une adaptation des annotations à ces changements est nécessaire pour garantir une bonne utilisation par les applications informatiques. De plus, la quantité importante d’annotations affectées rend impossible une adaptation manuelle. Dans ce mémoire de thèse, nous proposons une approche originale appelée MAISA pour résoudre le problème de l'adaptation des annotations sémantiques engendrée par l’évolution des KOS et pour lequel nous distinguons deux cas. Dans le premier cas, nous considérons que les annotations sont directement modifiables. Pour traiter ce problème nous avons défini une approche à base de règles combinant des informations provenant de l’évolution des KOS et des connaissances extraites du Web. Dans le deuxième cas, nous considérons que les annotations ne sont pas modifiables comme c’est bien souvent le cas des annotations associées aux données des patients. L’objectif ici étant de pouvoir retrouver les documents annotées avec une version du KOS donnée lorsque l’utilisateur interroge le système stockant ces documents avec le vocabulaire du même KOS mais d’une version différente. Pour gérer ce décalage de versions, nous avons proposé un graphe de connaissance représentant un KOS et son historique et un mécanisme d’enrichissement de requêtes permettant d’extraire de ce graphe l’historique d’un concept pour l’ajouter à la requête initiale. Nous proposons une évaluation expérimentale de notre approche pour la maintenance des annotations à partir de cas réels construits sur quatre KOS du domaine de la santé : ICD-9-CM, MeSH, NCIt et SNOMED CT. Nous montrons à travers l’utilisation des métriques classiques que l’approche proposée permet, dans les deux cas considérés, d’améliorer la maintenance des annotations sémantiques. / Semantic annotations are often used in a wide range of applications ranging from information retrieval to decision support. Annotations are produced through the association of concept labels from Knowledge Organization System (KOS), i.e. ontology, thesaurus, dictionaries, with pieces of digital information, e.g. images or texts. Annotations enable machines to interpret, link, and use a vast amount of data. However, the dynamic nature of KOS may affect annotations each time a new version of a KOS is released. New concepts can be added, obsolete ones removed and the definition of existing concepts may be refined through the modification of their labels/properties. As a result, many annotations can lose their relevance, thus hindering the intended use and exploitation of annotated data. To solve this problem, methods to maintain the annotations up-to-date are required. In this thesis we propose a framework called MAISA to tackle the problem of adapting outdated annotations when the KOS utilized to create them change. We distinguish two different cases. In the first one we consider that annotations are directly modifiable. In this case, we proposed a rule-based approach implementing information derived from the evolution of KOS as well as external knowledge from the Web. In the second case, we consider that the annotations are not modifiable. The goal is then to keep the annotated documents searchable even if the annotations are produced with a given KOS version but the user used another version to query them. In this case, we designed a knowledge graph that represent a KOS and its successive evolution and propose a method to extract the history of a concept and add the gained label to the initial query allowing to deal with annotation evolution. We experimentally evaluated MAISA on realistic cases-studies built from four well-known biomedical KOS: ICD-9-CM, MeSH, NCIt and SNOMED CT. We show that the proposed maintenance method allow to maintain semantic annotations using standard metrics.
|
30 |
Adaptive Semantic Annotation of Entity and Concept Mentions in TextMendes, Pablo N. 05 June 2014 (has links)
No description available.
|
Page generated in 0.0787 seconds