• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 14
  • 9
  • 6
  • 2
  • 1
  • Tagged with
  • 38
  • 38
  • 12
  • 12
  • 8
  • 8
  • 7
  • 6
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Uma solução para qualidade de contexto baseada em ontologia e lógica nebulosa com aplicação em monitoramento de sinais vitais em UTI / An approach for quality of context based on ontology and fuzzy logic: a case study on vital sign monitoring at ICU

Sena, Márcio Vinícius Oliveira 03 May 2016 (has links)
Submitted by Cássia Santos (cassia.bcufg@gmail.com) on 2016-08-10T13:23:57Z No. of bitstreams: 2 Dissertação - Márcio Vinicius Oliveira Sena - 2016.pdf: 1222160 bytes, checksum: ca880e6a18b3f494d8b0e578a83d6b24 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2016-08-10T13:40:14Z (GMT) No. of bitstreams: 2 Dissertação - Márcio Vinicius Oliveira Sena - 2016.pdf: 1222160 bytes, checksum: ca880e6a18b3f494d8b0e578a83d6b24 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2016-08-10T13:40:14Z (GMT). No. of bitstreams: 2 Dissertação - Márcio Vinicius Oliveira Sena - 2016.pdf: 1222160 bytes, checksum: ca880e6a18b3f494d8b0e578a83d6b24 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2016-05-03 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / More than a decade after the first training paradigm Computer Context Sensitive, the term Quality of Context (QoC) emerged in a setting filled with services that provide userapplication interactions seamlessly, highlighting the need for managers of information context. These infrastructures that handle context information, CMS (Context Management System), are full of inputs and outputs relevant data or not. The relevance of information is studied by many lines of sensitive computing the context, including QoC. Although there are a large number of QoC metrics, the literature still lacks general values to characterize the relevance of context information, ie from metrics produce an overall value that represents the importance of information. The objective of this work is to propose, from fuzzy logic, a combination of metrics in order to produce a QoC value for each information received, and possible disposal, if this information does not meet the expected relevance, described in this work by policies of QoC. In order to disseminate the context of consumers that information produced, this research proposes also, from an ontology domain independent QoC, annotate semantically the data that refer to the quality of information, thus avoiding the need for new measures in a short-term time. Finally, this paper presents the application of the solutions found in a case study of monitoring vital signs of patients in ICU. / Mais de uma década depois da primeira formulação do paradigma de Computação Sensível ao Contexto, o termo de Quality of Context (QoC) emergiu em um cenário repleto de serviços que proporcionam interações usuário-aplicação de forma transparente, evidenciando a necessidade de gerenciadores de informações de contexto. Estas infraestruturas que manipulam informações de contexto, CMS (Context Management System), estão repletas de entradas e saídas de dados relevantes ou não. A relevância das informações é objeto de estudo de muitas linhas da computação sensível a contexto, inclusive QoC. Embora haja uma infinidade de métricas de QoC, a literatura ainda carece de valores gerais para caracterizar a relevância de uma informação de contexto, ou seja, a partir de métricas, produzir um valor geral que represente a importância de uma informação. O objetivo deste trabalho é propor, a partir de Lógica Nebulosa, uma combinação de métricas, a fim de produzir um valor de QoC para cada informação recebida, sendo possível o descarte, caso essa informação não atenda a relevância esperada, descrita nesse trabalho por políticas de QoC. Buscando disseminar aos consumidores de contexto essas informações produzidas, esta pesquisa propôe também, a partir de uma ontologia de QoC independente de domínio, anotar semanticamente os dados que se referem a qualidade da informação, evitando assim a necessidade de novas medições em um prazo curto de tempo. Por fim, este trabalho apresenta a aplicação das soluções encontradas em um estudo de caso de monitoramento de sinais vitais de pacientes em UTIs.
12

Face Identification, Gender And Age Groups Classifications For Semantic Annotation Of Videos

Yaprakkaya, Gokhan 01 December 2010 (has links) (PDF)
This thesis presents a robust face recognition method and a combination of methods for gender identification and age group classification for semantic annotation of videos. Local binary pattern histogram which has 256 bins and pixel intensity differences are used as extracted facial features for gender classification. DCT Mod2 features and edge detection results around facial landmarks are used as extracted facial features for age group classification. In gender classification module, a Random Trees classifier is trained with LBP features and an adaboost classifier is trained with pixel intensity differences. DCT Mod2 features are used for training of a Random Trees classifier and LBP features around facial landmark points are used for training another Random Trees classifier in age group classification module. DCT Mod2 features of the detected faces morped by two dimensional face morphing method based on Active Appearance Model and Barycentric Coordinates are used as the inputs of the nearest neighbor classifier with weights obtained from the trained Random Forest classifier in face identification module. Different feature extraction methods are tried and compared and the best achievements in the face recognition module to be used in the method chosen. We compared our classification results with some successful earlier works results in our experiments performed with same datasets and got satisfactory results.
13

Μεθοδολογία αυτόματου σημασιολογικού σχολιασμού στο περιεχόμενο ιστοσελίδων

Σπύρος, Γεώργιος 14 December 2009 (has links)
Στις μέρες μας η χρήση του παγκόσμιου ιστού έχει εξελιχθεί σε ένα κοινωνικό φαινόμενο. Η εξάπλωσή του είναι συνεχής και εκθετικά αυξανόμενη. Στα χρόνια που έχουν μεσολαβήσει από την εμφάνισή του, οι χρήστες έχουν αποκτήσει ένα βαθμό εμπειρίας και έχει γίνει από πλευράς τους ένα σύνολο αποδοχών βασισμένων σε αυτή ακριβώς την εμπειρία από τη χρήση του παγκόσμιου ιστού. Πιο συγκεκριμένα έχει γίνει αντιληπτό από τους χρήστες το γεγονός ότι οι ιστοσελίδες με τις οποίες αλληλεπιδρούν καθημερινά σχεδόν είναι δημιουργήματα κάποιων άλλων χρηστών. Επίσης έχει γίνει αντιληπτό ότι ο κάθε χρήστης μπορεί να δημιουργήσει τη δική του ιστοσελίδα και μάλιστα να περιλάβει σε αυτή αναφορές προς μια άλλη ιστοσελίδα κάποιου άλλου χρήστη. Οι αναφορές αυτές όμως, συνήθως δεν εμφανίζονται απλά και μόνο με τη μορφή ενός υπερσυνδέσμου. Τις περισσότερες φορές υπάρχει και κείμενο που τις συνοδεύει και που παρέχει πληροφορίες για το περιεχόμενο της αναφερόμενης ιστοσελίδας. Σε αυτή τη διπλωματική εργασία περιγράφουμε μια μεθοδολογία για τον αυτόματο σημασιολογικό σχολιασμό του περιεχομένου ιστοσελίδων. Τα εργαλεία και οι τεχνικές που περιγράφονται βασίζονται σε δύο κύριες υποθέσεις. Πρώτον, οι άνθρωποι που δημιουργούν και διατηρούν ιστοσελίδες περιγράφουν άλλες ιστοσελίδες μέσα σε αυτές. Δεύτερον, οι άνθρωποι συνδέουν τις ιστοσελίδες τους με την εκάστοτε ιστοσελίδα την οποία περιγράφουν μέσω ενός συνδέσμου αγκύρωσης (anchor link) που είναι καθαρά σημαδεμένος με μία συγκεκριμένη ετικέτα (tag) μέσα στον εκάστοτε HTML κώδικα. Ο αυτόματος σημασιολογικός σχολιασμός που επιχειρούμε για μια ιστοσελίδα ισοδυναμεί με την εύρεση μιας ετικέτας (tag) ικανής να περιγράψει το περιεχόμενο της. Η εύρεση αυτής της ετικέτας είναι μια διαδικασία που βασίζεται σε μία συγκεκριμένη μεθοδολογία που αποτελείται από ένα συγκεκριμένο αριθμό βημάτων. Κάθε βήμα από αυτά υλοποιείται με τη χρήση διαφόρων εργαλείων και τεχνικών και τροφοδοτεί με την έξοδό του την είσοδο του επόμενου βήματος. Βασική ιδέα της μεθοδολογίας είναι η συλλογή αρκετών κειμένων αγκύρωσης (anchor texts), καθώς και ενός μέρους του γειτονικού τους κειμένου, για μία ιστοσελίδα. Η συλλογή αυτή προκύπτει ύστερα από επεξεργασία αρκετών ιστοσελίδων που περιέχουν υπερσυνδέσμους προς τη συγκεκριμένη ιστοσελίδα. Η σημασιολογική ετικέτα για μια ιστοσελίδα προκύπτει από την εφαρμογή διαφόρων τεχνικών γλωσσολογικής επεξεργασίας στη συλλογή των κειμένων που την αφορούν. Έτσι προκύπτει το τελικό συμπέρασμα για το σημασιολογικό σχολιασμό του περιεχομένου της ιστοσελίδας. / Nowadays the World Wide Web usage has evolved into a social phenomenon. It’s spread is constant and it’s increasing exponentially. During the years that have passed since it’s first appearance, the users have gained a certain level of experience and they have made some acceptances through this experience. They have understood that the web pages with which they interact in their everyday web activities, are creations from some other users. It has also become clear that every user can create his own web page and include in it references to some other pages of his liking. These references don’t simply exist as hyperlinks. Most of the time they are accompanied by some text which provides useful information about the referenced page’s content. In this diploma thesis we describe a methodology for the automatic annotation of a web page’s contents. The tools and techniques that are described, are based in two main hypotheses. First, humans that create web pages describe other web pages inside them. Second, humans connect their web pages with any web page they describe via an anchor link which is clearly described with a tag in each page’s HTML code. The automatic semantic annotation that we attempt here for a web page is the process of finding a tag able to describe the page’s contents. The finding of this tag is a process based in a certain methodology which consists of a number of steps. Each step of these is implemented using various tools and techniques and his output is the next step’s input. The basic idea behind our methodology is to collect as many anchor texts as possible, along with a window of words around them, for each web page. This collection is the result of a procedure which involves the processing of many web pages that contain hyperlinks to the web page which we want to annotate. The semantic tag for a web page is derived from the usage of certain natural language processing techniques in the collection of documents that refer to the web page. Thus the final conclusion for the web page’s contents annotation is extracted.
14

Proposition d’une ontologie de domaine dédiée à l’annotation d’images spatialisées pour le suivi de la conservation du patrimoine culturel bâti / Proposition to a domain ontology dedicated to spatialized images annotations for the building cultural heritage conseravtion monitoring

Messaoudi, Tommy 12 July 2017 (has links)
Les pratiques de conservation et restauration de monuments historiques requièrent l’élaboration de diagnostics impliquant différents intervenants au sein de contextes d’études pluridisciplinaires. L’état de conservation d’un objet patrimonial est ainsi étudié et décrit au moyen d’observations directes, de sources documentaires et de données analytiques de natures différentes. Les avancées des technologies numériques en matière de collecte, traitement et gestion de données, offrent aujourd’hui une opportunité sans précédent pour intégrer les résultats de ces observations et ces données au sein de systèmes innovants de représentation pour la documentation et la connaissance du patrimoine. Cependant, si une panoplie de nouveaux outils est aujourd’hui à disposition de la communauté des scientifiques et des professionnels du patrimoine, le problème d’une corrélation pertinente de ces données et de ces informations hétérogènes reste peu exploré. Tout d’abord, si ces nouveaux outils permettent aux différents experts de mémoriser et d’analyser leurs observations sur différents supports, les données générées par ces différents experts ne sont généralement pas spatialisées autour d’un même référentiel spatial. En effet, même si toutes ces données se réfèrent à un objet physique commun, les liens entre elles ne peuvent que s’établir par des stratégies d’organisation de fichiers ou par des méthodes d’indexation basées sur des mots clés. Parallèlement, si dans les dernières années plusieurs techniques de numérisation 3D ont été expérimentées dans le but de générer des représentations géométriques denses et précises, les méthodes de traitement et de structuration de ces données 3D ne fournissent pas encore de cadres opérationnels pour l’extraction d'informations pertinentes pour l’analyse et l’interprétation de l’état de conservation. En se positionnant à l'intersection entre les domaines de l’acquisition spatialisée 3D et des systèmes d’informations, ce travail de recherche propose une ontologie de domaine dédiée à l’annotation sémantique de représentations 3D d’objets patrimoniaux visant à la constitution d’un environnement numérique pour la description de l’état de conservation des monuments historiques. Par l’interconnexion de descripteurs qualitatifs (reliés à une formalisation des connaissances du domaine) et qualitatifs, cette ontologie constitue l'échafaudage conceptuel structurant un système d’informations multidimensionnelles dédié à la corrélation spatiale, géométrique et sémantique de jeux d’annotations élaborés par des acteurs multiples et en fonction de niveaux de lecture multiples. / The conservation and restauration of historical monuments require a diagnostic analysis carried out by a multidisciplinary team. The elaboration of the diagnosis of a cultural Heritage Object requires direct observations, the examination of documentary sources as well as of diverse types of analytic data. The great advancements in digital processing, management and data collection opened unprecedented opportunities for integrating results, coming from both observations and derivative data, within innovating representation systems for heritage knowledge and documentation. However, if a range of new tools and data is today available to the scientific community and heritage experts, their correlation and integration with internal/external heterogeneous information is an issue that still remain unexplored. While these innovative tools allow different experts to record and analyze their observations with diverse formats, the results are generally not spatialized and referenced together. Indeed, even though all these data refer to a common physical object, the links between them is based only on file organization strategies or by keyword-based indexation methods. In parallel, in recent years, several 3D digitization technics has been used for generating dense and accurate geometrical representations, but the processing and structuration method of these 3D data don’t include yet an operational framework for retrieving relevant information regarding their conservation state and an interpretative analysis. Positioned in the intersection between 3D spatialized acquisition domain and information management, this research work aims to the creation of a digital framework for recording conservation state description of historical monument throughout the introduction of a functional domain ontology for the semantic annotations of heritage objects 3D representations. The proposed ontology comprises both qualitative (related to a domain knowledge formalization) and quantitative descriptors, constituting the necessary conceptual scaffold for structuring a multidimensional information system dedicated to the correlation of spatial, geometrical and semantic multi-actor annotations in relation to multiple observation levels.
15

Um esquema de anotação semântica para mapas conceituais

Silva, Viviane Gomes da 31 August 2012 (has links)
Made available in DSpace on 2015-04-11T14:02:45Z (GMT). No. of bitstreams: 1 viviane.pdf: 2545157 bytes, checksum: 9cfb739d47efc7aa223b76331116b898 (MD5) Previous issue date: 2012-08-31 / This work aimed at reuse of concept maps (CMs) through their interpretation according to a certain context. Looking for adapting a learning object (LO) database suitable to attend CM characteristics, a metadata for CM description was defined and named MOAFMC, a Portuguese acronym for Functional Learning Object Metadata Concept Maps . The MOAF-MC adapted database allows CMs information sharing through semantic annotation of the concepts in the map. / O objetivo deste trabalho é ampliar a reusabilidade de mapas conceituais (MCs) e promover seu entendimento de acordo com o contexto. Com vistas à adaptação de um repositório de objetos de aprendizagem (OAs) para atender as características dos MCs, foi definido um metadado que descreve as características de um MC, denominado Metadado de Objeto de Aprendizagem Funcional para Mapas Conceituais (MOAF-MC). O repositório adaptado com o MOAF-MC permite o compartilhamento de MCs, suas informações a partir de uma anotação semântica dos conceitos presentes no mapa.
16

Anotação semântica de objetos de aprendizagem funcionais / Annotation semantics objects of functional learning

Gomes, Sionise Rocha 26 August 2010 (has links)
Made available in DSpace on 2015-04-11T14:03:14Z (GMT). No. of bitstreams: 1 DISSERTACAO Sionise Rocha Gomes.pdf: 2635969 bytes, checksum: 53a27b162d98559a66bcb69e5ea795a1 (MD5) Previous issue date: 2010-08-26 / Fundação de Amparo à Pesquisa do Estado do Amazonas / Media resources can be used and reused for supporting teaching and learning processes. Some of these resources are non-interactive like images, texts, videos and others enable interaction among entities, digital or not, as simulations, games and other applications. The latter are considered Functional Learning Objects (FLOs). To be reused, FLOs must be described in metadata representing both the technical and the educational aspects of the object. However, current metadata are limited in describing FLOs, especially with respect to share teaching experience in the use of the object. This limitation is caused by absence of elements and ambiguity in the vocabulary used for annotation, making it difficult to carry out context-oriented searches, and for professionals from the same domain to establish information exchange. In this context, this paper presents a metadata for description of FLOs, emphasizing different user experiences through integration of different domain ontologies. / Recursos de mídia podem ser usados e/ou reutilizados como materiais de apoio ao educador no processo de ensino e aprendizagem. Alguns desses materiais são inertes , como por exemplo, imagens, textos e vídeos, outros possibilitam interação entre entidades, sejam elas digitais ou não, como simulações, jogos e outras aplicações. Os recursos do segundo tipo são denominados Objetos de Aprendizagem Funcionais (OAFs). Para que possam ser reutilizados, OAFs precisam ser descritos em metadados, os quais representam tanto os aspectos técnicos do objeto, quanto os pedagógicos. Entretanto, os atuais metadados apresentam-se limitados na descrição de OAFs, principalmente no compartilhamento de experiência docente em relação ao uso do objeto. Essa limitação ocorre pela ausência de elementos e pela ambiguidade no vocabulário utilizado para anotação, dificultando a realização de buscas contextualizadas, e que profissionais do mesmo domínio estabeleçam uma troca comum de informações. Nesse contexto, este trabalho apresenta um metadado que possibilita a descrição de OAFs, enfatizando diferentes experiências de uso por meio da integração de diferentes ontologias de domínio.
17

Exploitation informatique des annotations sémantiques automatiques d'Excom pour la recherche d'informations et la navigation / Information Retrieval and Text Navigation through the Exploitation of the Automatic Semantic Annotation of the Excom Engine

Atanassova, Iana 14 January 2012 (has links)
À partir du moteur d’annotation sémantique Excom, nous avons élaboré un systèmede recherche d’informations qui repose sur des catégories sémantiques issues d’analyses linguistiquesautomatiques afin de proposer une approche de fouille textuelle innovante. Les annotationssont obtenues par la méthode d’Exploration Contextuelle faisant appel à une modélisationdes connaissances linguistiques sous forme de marqueurs et de règles. Le traitement des requêtesselon des points de vue de fouille se trouve au coeur de la stratégie de recherche d’informations.Pour cela, notre approche s’appuie sur des catégories d’annotation organisées en ontologies linguistiquessous forme de graphes. Afin d’offrir à l’utilisateur des résultats pertinents, nous avonsmis en place des algorithmes d’ordonnancement des réponses et de gestion de la redondance.Ces algorithmes reposent principalement sur la structure des ontologies linguistiques utiliséespour l’annotation. Nous avons proposé une évaluation de la pertinence des résultats en tenantcompte de la spécificité de l’approche. Les interfaces que nous avons développées permettent laconstruction de nouveaux produits documentaires tels que les fiches de synthèse offrant une extractiond’informations structurées selon des critères sémantiques. Cee approche a égalementpour vocation de proposer des outils dédiés à la veille stratégique et à l’intelligence économique. / Using the Excom engine for semantic annotation, we have constructed an InformationRetrieval System based on semantic categories from automatic language analyses in order topropose a new approach to text search. e annotations are obtained by the Contextual Explorationmethod which is a knowledge based linguistic approach using markers and disambiguationrules. e queries are formulated according to search viewpoints which are at the heart of theInformation Retrieval strategy. Our approach uses the annotation categories which are organisedin linguistic ontologies structured as graphs. In order to provide relevant results to the user,we have designed algorithms for ranking and paraphrase identification. ese algorithms exploitprincipally the structure of the linguistic ontologies for the annotation. We have carriedout an evaluation of the relevance of the system results taking into account the specificity ofour approach. We have developed user interfaces allowing the construction of new informationproducts such as structured text syntheses using information extraction according to semanticcriteria. is approach also aims to offer tools in the field of economic intelligence.
18

Bibliosémantique : une technique linguistique et informatique par exploration contextuelle / Bibliosémantic : a linguistic and computational method by contextual exploration

Bertin, Marc 21 January 2011 (has links)
Nous avons défini la bibliosémantique comme appartenant aux domaines de l'informatique et de la linguistique. Les objectifs sont sensiblement les mêmes que ceux prônés par la scientométrie, l'infométrie ou la bibliométrie, à savoir classifier, organiser et évaluer. Le cœur de notre implémentation repose sur l’utilisation des corpora annotés sémantiquement par la plateforme EXCOM. La mise en œuvre de la méthode de l'exploration contextuelle a conduit à une implémentation informatique de la bibliosémantique qui repose donc sur une sémantique du discours à défaut d'être une application purement métrique dans le contexte de cette étude menée autour des références bibliographiques. C’est la reconnaissance des références indexées ou abrégées, au sein de corpus d’articles scientifiques, qui permet d’identifier les segments textuels candidats pour l’annotation. La thèse présente également des marqueurs discursifs organisés sous la forme d’une carte sémantique, constituant les ressources linguistiques nécessaires et permettant l’automatisation de l'ensemble des traitements sémantiques. Afin de proposer une interface de navigation conviviale et adaptée à notre problématique, le système a été développé sous forme de service web. De nouveaux produits documentaires comme une notice bibliographique augmentée ont été mis en œuvre afin de faciliter l’exploitation des annotations par l’utilisateur. Enfin, nous proposons une évaluation du système et nous explicitons le protocole utilisé. Ce travail se termine par la présentation d’un certain nombre de recommandations, notamment la mise en place d’une cellule de veille. / We have defined Bibliosemantics as belonging to both fields of Computing and Linguistics. Its objectives are essentially the same as those advocated by the Scientometrics, Informetrics and Bibliometrics, i. e. classify, organize, evaluate. The core of our implementation is based on the use of semantically annotated corpora by EXCOM platform. The application of the Contextual Exploration method has led to a computer implementation of Bibliosemantics based on discourse semantics, as it is not a purely metric application in the context of this study about bibliographic references. The identification of indexed or abbreviated references in a corpus of scientific papers allows to establish the textual segments candidates for annotation. This thesis also presents the discourse markers, organised in a semantic map, which constitute the necessary linguistic resources making possible the automatic semantic processing. The system has been developed as a web service, with the aim to provide a navigation interface which is user-friendly and adapted to our problem. New documentary products such as a enriched bibliographic records have been implemented in order to facilitate the exploitation of annotations by the user. Finally, we propose an evaluation of the system and we explain the used protocol. This work culminates with the presentation of a number of recommendations such as setting up a monitoring unit.
19

Stratégie domaine par domaine pour la création d'un FrameNet du français : annotations en corpus de cadres et rôles sémantiques / Domain by domain strategy for creating a French FrameNet : corpus annotationsof semantics frames and roles

Djemaa, Marianne 14 June 2017 (has links)
Dans cette thèse, nous décrivons la création du French FrameNet (FFN), une ressource de type FrameNet pour le français créée à partir du FrameNet de l’anglais (Baker et al., 1998) et de deux corpus arborés : le French Treebank (Abeillé et al., 2003) et le Sequoia Treebank (Candito et Seddah, 2012). La ressource séminale, le FrameNet de l’anglais, constitue un modèle d’annotation sémantique de situations prototypiques et de leurs participants. Elle propose à la fois :a) un ensemble structuré de situations prototypiques, appelées cadres, associées à des caractérisations sémantiques des participants impliqués (les rôles);b) un lexique de déclencheurs, les lexèmes évoquant ces cadres;c) un ensemble d’annotations en cadres pour l’anglais. Pour créer le FFN, nous avons suivi une approche «par domaine notionnel» : nous avons défini quatre «domaines» centrés chacun autour d’une notion (cause, communication langagière, position cognitive ou transaction commerciale), que nous avons travaillé à couvrir exhaustivement à la fois pour la définition des cadres sémantiques, la définition du lexique, et l’annotation en corpus. Cette stratégie permet de garantir une plus grande cohérence dans la structuration en cadres sémantiques, tout en abordant la polysémie au sein d’un domaine et entre les domaines. De plus, nous avons annoté les cadres de nos domaines sur du texte continu, sans sélection d’occurrences : nous préservons ainsi la distribution des caractéristiques lexicales et syntaxiques de l’évocation des cadres dans notre corpus. à l’heure actuelle, le FFN comporte 105 cadres et 873 déclencheurs distincts, qui donnent lieu à 1109 paires déclencheur-cadre distinctes, c’est-à-dire 1109 sens. Le corpus annoté compte au total 16167 annotations de cadres de nos domaines et de leurs rôles. La thèse commence par resituer le modèle FrameNet dans un contexte théorique plus large. Nous justifions ensuite le choix de nous appuyer sur cette ressource et motivons notre méthodologie en domaines notionnels. Nous explicitons pour le FFN certaines notions définies pour le FrameNet de l’anglais que nous avons jugées trop floues pour être appliquées de manière cohérente. Nous introduisons en particulier des critères plus directement syntaxiques pour la définition du périmètre lexical d’un cadre, ainsi que pour la distinction entre rôles noyaux et non-noyaux.Nous décrivons ensuite la création du FFN : d’abord, la délimitation de la structure de cadres utilisée pour le FFN, et la création de leur lexique. Nous présentons alors de manière approfondie le domaine notionnel des positions cognitives, qui englobe les cadres portant sur le degré de certitude d’un être doué de conscience sur une proposition. Puis, nous présentons notre méthodologie d’annotation du corpus en cadres et en rôles. à cette occasion, nous passons en revue certains phénomènes linguistiques qu’il nous a fallu traiter pour obtenir une annotation cohérente ; c’est par exemple le cas des constructions à attribut de l’objet.Enfin, nous présentons des données quantitatives sur le FFN tel qu’il est à ce jour et sur son évaluation. Nous terminons sur des perspectives de travaux d’amélioration et d’exploitation de la ressource créée. / This thesis describes the creation of the French FrameNet (FFN), a French language FrameNet type resource made using both the Berkeley FrameNet (Baker et al., 1998) and two morphosyntactic treebanks: the French Treebank (Abeillé et al., 2003) and the Sequoia Treebank (Candito et Seddah, 2012). The Berkeley FrameNet allows for semantic annotation of prototypical situations and their participants. It consists of:a) a structured set of prototypical situations, called frames. These frames incorporate semantic characterizations of the situations’ participants (Frame Elements, or FEs);b) a lexicon of lexical units (LUs) which can evoke those frames;c) a set of English language frame annotations. In order to create the FFN, we designed a “domain by domain” methodology: we defined four “domains”, each centered on a specific notion (cause, verbal communication, cognitive stance, or commercial transaction). We then sought to obtain full frame and lexical coverage for these domains, and annotated the first 100 corpus occurrences of each LU in our domains. This strategy guarantees a greater consistency in terms of frame structuring than other approaches and is conducive to work on both intra-domain and inter-domains frame polysemy. Our annotating frames on continuous text without selecting particular LU occurrences preserves the natural distribution of lexical and syntactic characteristics of frame-evoking elements in our corpus. At the present time, the FFNincludes 105 distinct frames and 873 distinct LUs, which combine into 1,109 LU-frame pairs (i.e. 1,109 senses). 16,167 frame occurrences, as well as their FEs, have been annotated in our corpus. In this thesis, I first situate the FrameNet model in a larger theoretical background. I then justify our using the Berkeley FrameNet as our resource base and explain why we used a domain-by- domain methodology. I next try to clarify some specific BFN notions that we found too vague to be coherently used to make the FFN. Specifically, I introduce more directly syntactic criteria both for defining a frame’s lexical perimeter and for differentiating core FEs from non-core ones.Then, I describe the FFN creation itself first by delimitating a structure of frames that will be used in the resource and by creating a lexicon for these frames. I then introduce in detail the Cognitive Stances notional domain, which includes frames having to do with a cognizer’s degree of certainty about some particular content. Next, I describe our methodology for annotating a corpus with frames and FEs, and analyze our treatment of several specific linguistic phenomena that required additional consideration (such as object complement constructions).Finally, I give quantified information about the current status of the FFN and its evaluation. I conclude with some perspectives on improving and exploiting the FFN.
20

Knowledge Extraction for Hybrid Question Answering

Usbeck, Ricardo 22 May 2017 (has links) (PDF)
Since the proposal of hypertext by Tim Berners-Lee to his employer CERN on March 12, 1989 the World Wide Web has grown to more than one billion Web pages and still grows. With the later proposed Semantic Web vision,Berners-Lee et al. suggested an extension of the existing (Document) Web to allow better reuse, sharing and understanding of data. Both the Document Web and the Web of Data (which is the current implementation of the Semantic Web) grow continuously. This is a mixed blessing, as the two forms of the Web grow concurrently and most commonly contain different pieces of information. Modern information systems must thus bridge a Semantic Gap to allow a holistic and unified access to information about a particular information independent of the representation of the data. One way to bridge the gap between the two forms of the Web is the extraction of structured data, i.e., RDF, from the growing amount of unstructured and semi-structured information (e.g., tables and XML) on the Document Web. Note, that unstructured data stands for any type of textual information like news, blogs or tweets. While extracting structured data from unstructured data allows the development of powerful information system, it requires high-quality and scalable knowledge extraction frameworks to lead to useful results. The dire need for such approaches has led to the development of a multitude of annotation frameworks and tools. However, most of these approaches are not evaluated on the same datasets or using the same measures. The resulting Evaluation Gap needs to be tackled by a concise evaluation framework to foster fine-grained and uniform evaluations of annotation tools and frameworks over any knowledge bases. Moreover, with the constant growth of data and the ongoing decentralization of knowledge, intuitive ways for non-experts to access the generated data are required. Humans adapted their search behavior to current Web data by access paradigms such as keyword search so as to retrieve high-quality results. Hence, most Web users only expect Web documents in return. However, humans think and most commonly express their information needs in their natural language rather than using keyword phrases. Answering complex information needs often requires the combination of knowledge from various, differently structured data sources. Thus, we observe an Information Gap between natural-language questions and current keyword-based search paradigms, which in addition do not make use of the available structured and unstructured data sources. Question Answering (QA) systems provide an easy and efficient way to bridge this gap by allowing to query data via natural language, thus reducing (1) a possible loss of precision and (2) potential loss of time while reformulating the search intention to transform it into a machine-readable way. Furthermore, QA systems enable answering natural language queries with concise results instead of links to verbose Web documents. Additionally, they allow as well as encourage the access to and the combination of knowledge from heterogeneous knowledge bases (KBs) within one answer. Consequently, three main research gaps are considered and addressed in this work: First, addressing the Semantic Gap between the unstructured Document Web and the Semantic Gap requires the development of scalable and accurate approaches for the extraction of structured data in RDF. This research challenge is addressed by several approaches within this thesis. This thesis presents CETUS, an approach for recognizing entity types to populate RDF KBs. Furthermore, our knowledge base-agnostic disambiguation framework AGDISTIS can efficiently detect the correct URIs for a given set of named entities. Additionally, we introduce REX, a Web-scale framework for RDF extraction from semi-structured (i.e., templated) websites which makes use of the semantics of the reference knowledge based to check the extracted data. The ongoing research on closing the Semantic Gap has already yielded a large number of annotation tools and frameworks. However, these approaches are currently still hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures. On the other hand, the issue of comparability of results is not to be regarded as being intrinsic to the annotation task. Indeed, it is now well established that scientists spend between 60% and 80% of their time preparing data for experiments. Data preparation being such a tedious problem in the annotation domain is mostly due to the different formats of the gold standards as well as the different data representations across reference datasets. We tackle the resulting Evaluation Gap in two ways: First, we introduce a collection of three novel datasets, dubbed N3, to leverage the possibility of optimizing NER and NED algorithms via Linked Data and to ensure a maximal interoperability to overcome the need for corpus-specific parsers. Second, we present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools and frameworks on multiple datasets. The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Moreover, the increasing the demand for natural-language interfaces as depicted by current mobile applications requires systems to deeply understand the underlying user information need. In conclusion, the natural language interface for asking questions requires a hybrid approach to data usage, i.e., simultaneously performing a search on full-texts and semantic knowledge bases. To close the Information Gap, this thesis presents HAWK, a novel entity search approach developed for hybrid QA based on combining structured RDF and unstructured full-text data sources.

Page generated in 0.16 seconds