Global ETD Search

1	Détection et analyse d’événement dans les messages courts / Event detection and analysis on short text messages Edouard, Amosse 02 October 2017 (has links) Les réseaux sociaux ont transformé le Web d'un mode lecture, où les utilisateurs pouvaient seulement consommer les informations, à un mode interactif leur permettant de les créer, partager et commenter. Un défi majeur du traitement d'information dans les médias sociaux est lié à la taille réduite des contenus, leur nature informelle et le manque d'informations contextuelles. D'un autre côté, le web contient des bases de connaissances structurées à partir de concepts d'ontologies, utilisables pour enrichir ces contenus. Cette thèse explore le potentiel d'utiliser les bases de connaissances du Web de données, afin de détecter, classifier et suivre des événements dans les médias sociaux, particulièrement Twitter. On a abordé 3 questions de recherche : i) Comment extraire et classifier les messages qui rapportent des événements ? ii) Comment identifier des événements précis ? iii) Étant donné un événement, comment construire un fil d'actualité représentant les différents sous-événements ? Les travaux de la thèse ont contribué à élaborer des méthodes pour la généralisation des entités nommées par des concepts d'ontologies pour mitiger le sur-apprentissage dans les modèles supervisés ; une adaptation de la théorie des graphes pour modéliser les relations entre les entités et les autres termes et ainsi caractériser des événements pertinents ; l'utilisation des ontologies de domaines et les bases de connaissances dédiées, pour modéliser les relations entre les caractéristiques et les acteurs des événements. Nous démontrons que l'enrichissement sémantique des entités par des informations du Web de données améliore la performance des modèles d'apprentissages supervisés et non supervisés. / In the latest years, the Web has shifted from a read-only medium where most users could only consume information to an interactive medium allowing every user to create, share and comment information. The downside of social media as an information source is that often the texts are short, informal and lack contextual information. On the other hand, the Web also contains structured Knowledge Bases (KBs) that could be used to enrich the user-generated content. This dissertation investigates the potential of exploiting information from the Linked Open Data KBs to detect, classify and track events on social media, in particular Twitter. More specifically, we address 3 research questions: i) How to extract and classify messages related to events? ii) How to cluster events into fine-grained categories? and 3) Given an event, to what extent user-generated contents on social medias can contribute in the creation of a timeline of sub-events? We provide methods that rely on Linked Open Data KBs to enrich the context of social media content; we show that supervised models can achieve good generalisation capabilities through semantic linking, thus mitigating overfitting; we rely on graph theory to model the relationships between NEs and the other terms in tweets in order to cluster fine-grained events. Finally, we use in-domain ontologies and local gazetteers to identify relationships between actors involved in the same event, to create a timeline of sub-events. We show that enriching the NEs in the text with information provided by LOD KBs improves the performance of both supervised and unsupervised machine learning models. Événement Réseaux sociaux Apprentissage Enrichissement sémantique Event detection Social media Machine learning Semantic enrichment
2	A Knowledge-based system framework for semantic enrichment and automated detailed design in the AEC projects Aram, Shiva 08 June 2015 (has links) Adoption of a streamlined BIM workflow throughout the AEC projects’ lifecycle will provide the project stakeholders with the rich information embedded in the parametric design models. Users can incorporate this rich information in various activities, improving efficiency and productivity of project activities and potentially enhancing accuracy and reducing errors and reworks. Two main challenges for such a streamlined information flow throughout the AEC projects that haven’t been sufficiently addressed by previous research efforts include lack of semantic interoperability and a large gap and misalignment of information between available BIM information provided by design activities and the required information for performing preconstruction and construction activities. This research effort proposes a framework for a knowledge-based system (KBS) that encapsulates domain experts’ knowledge and represents it through modularized rule set libraries as well as connected design automation and optimization solutions. The research attempts to provide a methodology for automatic semantic enrichment of design models as well as automated detailed design to fill the information gap between design and preconstruction project activities, streamlining BIM workflow and enhancing its value in the AEC projects. Knowledge-based systems KBS Semantic enrichment Automated design BIM IFC QTO Quantity take-off Cost estimation Precast concrete Reasoning
3	Semantic Enrichment of Ontology Mappings Arnold, Patrick 04 January 2016 (has links) (PDF) Schema and ontology matching play an important part in the field of data integration and semantic web. Given two heterogeneous data sources, meta data matching usually constitutes the first step in the data integration workflow, which refers to the analysis and comparison of two input resources like schemas or ontologies. The result is a list of correspondences between the two schemas or ontologies, which is often called mapping or alignment. Many tools and research approaches have been proposed to automatically determine those correspondences. However, most match tools do not provide any information about the relation type that holds between matching concepts, for the simple but important reason that most common match strategies are too simple and heuristic to allow any sophisticated relation type determination. Knowing the specific type holding between two concepts, e.g., whether they are in an equality, subsumption (is-a) or part-of relation, is very important for advanced data integration tasks, such as ontology merging or ontology evolution. It is also very important for mappings in the biological or biomedical domain, where is-a and part-of relations may exceed the number of equality correspondences by far. Such more expressive mappings allow much better integration results and have scarcely been in the focus of research so far. In this doctoral thesis, the determination of the correspondence types in a given mapping is the focus of interest, which is referred to as semantic mapping enrichment. We introduce and present the mapping enrichment tool STROMA, which obtains a pre-calculated schema or ontology mapping and for each correspondence determines a semantic relation type. In contrast to previous approaches, we will strongly focus on linguistic laws and linguistic insights. By and large, linguistics is the key for precise matching and for the determination of relation types. We will introduce various strategies that make use of these linguistic laws and are able to calculate the semantic type between two matching concepts. The observations and insights gained from this research go far beyond the field of mapping enrichment and can be also applied to schema and ontology matching in general. Since generic strategies have certain limits and may not be able to determine the relation type between more complex concepts, like a laptop and a personal computer, background knowledge plays an important role in this research as well. For example, a thesaurus can help to recognize that these two concepts are in an is-a relation. We will show how background knowledge can be effectively used in this instance, how it is possible to draw conclusions even if a concept is not contained in it, how the relation types in complex paths can be resolved and how time complexity can be reduced by a so-called bidirectional search. The developed techniques go far beyond the background knowledge exploitation of previous approaches, and are now part of the semantic repository SemRep, a flexible and extendable system that combines different lexicographic resources. Further on, we will show how additional lexicographic resources can be developed automatically by parsing Wikipedia articles. The proposed Wikipedia relation extraction approach yields some millions of additional relations, which constitute significant additional knowledge for mapping enrichment. The extracted relations were also added to SemRep, which thus became a comprehensive background knowledge resource. To augment the quality of the repository, different techniques were used to discover and delete irrelevant semantic relations. We could show in several experiments that STROMA obtains very good results w.r.t. relation type detection. In a comparative evaluation, it was able to achieve considerably better results than related applications. This corroborates the overall usefulness and strengths of the implemented strategies, which were developed with particular emphasis on the principles and laws of linguistics. ontology mapping ontology matching semantische erweiterung datenintegration hintergrundwissen ontology mapping ontology matching semantic enrichment data integration background knowledge ddc:500
4	Connecting GOMMA with STROMA: an approach for semantic ontology mapping in the biomedical domain Möller, Maximilian 13 February 2018 (has links) This thesis establishes a connection between GOMMA and STROMA – both are tools of ontology processing. Consequently, a new workflow of denoting a set of correspondences with five semantic relation types has been implemented. Such a rich denotation is scarcely discussed within the literature. The evaluation of the denotation shows that trivial correspondences are easy to recognize (tF > 90). The challenge is the denotation of non-trivial types ( 30 < ntF < 70). A prerequisite of the implemented workflow is the extraction of semantic relations between concepts. These relations represent additional background knowledge for the enrichment tool STROMA and are integrated to the repository SemRep which is accessed by this tool. Thus, STROMA is able to calculate a semantic type more precisely. UMLS was chosen as a biomedical knowledge source because it subsumes many different ontologies of this domain and thus, it represents a rich resource. Nevertheless, only a small set of relations met the requirements which are imposed to SemRep relations. Further studies may analyze whether there is an appropriate way to integrate the missing relations as well. The connection of GOMMA with STROMA allows the semantic enrichment of a biomedical mapping. As a consequence, this thesis enlightens two subjects of research. First, STROMA had been tested with general ontologies, which models common sense knowledge. Within this thesis, STROMA was applied to domain ontologies. Studies have shown that overall, STROMA was able to treat such ontologies as well. However, some strategies for the enrichment process are based on assumption which are misleading in the biomedical domain. Consequently, further strategies are suggested in this thesis which might improve the type denotation. These strategies may lead to an optimization of STROMA for biomedical data sets. A more thorough analysis will review their scope, also beyond the biomedical domain. Second, the established connection may lead to deeper investigations about advantages of semantic enrichment in the biomedical domain as an enriched mapping is returned. Despite heterogeneity of source and target ontology, such a mapping results in an improved interoperability at a finer level of granularity. The utilization of semantically rich correspondences in the biomedical domain is a worthwhile focus for future research. info:eu-repo/classification/ddc/000 ddc:000
5	Semantic Enrichment of Ontology Mappings Arnold, Patrick 15 December 2015 (has links) Schema and ontology matching play an important part in the field of data integration and semantic web. Given two heterogeneous data sources, meta data matching usually constitutes the first step in the data integration workflow, which refers to the analysis and comparison of two input resources like schemas or ontologies. The result is a list of correspondences between the two schemas or ontologies, which is often called mapping or alignment. Many tools and research approaches have been proposed to automatically determine those correspondences. However, most match tools do not provide any information about the relation type that holds between matching concepts, for the simple but important reason that most common match strategies are too simple and heuristic to allow any sophisticated relation type determination. Knowing the specific type holding between two concepts, e.g., whether they are in an equality, subsumption (is-a) or part-of relation, is very important for advanced data integration tasks, such as ontology merging or ontology evolution. It is also very important for mappings in the biological or biomedical domain, where is-a and part-of relations may exceed the number of equality correspondences by far. Such more expressive mappings allow much better integration results and have scarcely been in the focus of research so far. In this doctoral thesis, the determination of the correspondence types in a given mapping is the focus of interest, which is referred to as semantic mapping enrichment. We introduce and present the mapping enrichment tool STROMA, which obtains a pre-calculated schema or ontology mapping and for each correspondence determines a semantic relation type. In contrast to previous approaches, we will strongly focus on linguistic laws and linguistic insights. By and large, linguistics is the key for precise matching and for the determination of relation types. We will introduce various strategies that make use of these linguistic laws and are able to calculate the semantic type between two matching concepts. The observations and insights gained from this research go far beyond the field of mapping enrichment and can be also applied to schema and ontology matching in general. Since generic strategies have certain limits and may not be able to determine the relation type between more complex concepts, like a laptop and a personal computer, background knowledge plays an important role in this research as well. For example, a thesaurus can help to recognize that these two concepts are in an is-a relation. We will show how background knowledge can be effectively used in this instance, how it is possible to draw conclusions even if a concept is not contained in it, how the relation types in complex paths can be resolved and how time complexity can be reduced by a so-called bidirectional search. The developed techniques go far beyond the background knowledge exploitation of previous approaches, and are now part of the semantic repository SemRep, a flexible and extendable system that combines different lexicographic resources. Further on, we will show how additional lexicographic resources can be developed automatically by parsing Wikipedia articles. The proposed Wikipedia relation extraction approach yields some millions of additional relations, which constitute significant additional knowledge for mapping enrichment. The extracted relations were also added to SemRep, which thus became a comprehensive background knowledge resource. To augment the quality of the repository, different techniques were used to discover and delete irrelevant semantic relations. We could show in several experiments that STROMA obtains very good results w.r.t. relation type detection. In a comparative evaluation, it was able to achieve considerably better results than related applications. This corroborates the overall usefulness and strengths of the implemented strategies, which were developed with particular emphasis on the principles and laws of linguistics. info:eu-repo/classification/ddc/500 ddc:500
6	Newsminer: um sistema de data warehouse baseado em texto de notícias / Newsminer: a data warehouse system based on news websites Nogueira, Rodrigo Ramos 12 May 2017 (has links) Submitted by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:12:56Z No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:14:04Z (GMT) No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) / Approved for entry into archive by Milena Rubi (milenarubi@ufscar.br) on 2017-10-09T14:14:13Z (GMT) No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) / Made available in DSpace on 2017-10-09T14:14:24Z (GMT). No. of bitstreams: 1 NOGUEIRA_Rodrigo_2017.pdf: 5427774 bytes, checksum: db8155583bf1bffe3ceb4c01bf26f66f (MD5) Previous issue date: 2017-05-12 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Data and text mining applications managing Web data have been the subject of recent research. In every case, data mining tasks need to work on clean, consistent, and integrated data for obtaining the best results. Thus, Data Warehouse environments are a valuable source of clean, integrated data for data mining applications. Data Warehouse technology has evolved to retrieve and process data from the Web. In particular, news websites are rich sources that can compose a linguistic corpus. By inserting corpus into a Data Warehousing environment, applications can take advantage of the flexibility that a multidimensional model and OLAP operations provide. Among the benefits are the navigation through the data, the selection of the part of the data considered relevant, data analysis at different levels of abstraction, and aggregation, disaggregation, rotation and filtering over any set of data. This paper presents Newsminer, a data warehouse environment, which provides a consistent and clean set of texts in the form of a multidimensional corpus for consumption by external applications and users. The proposal includes an architecture that integrates the gathering of news in real time, a semantic enrichment module as part of the ETL stage, which adds semantic properties to the data such as news category and POS-tagging annotation and the access to data cubes for consumption by applications and users. Two experiments were performed. The first experiment selects the best news classifier for the semantic enrichment module. The statistical analysis of the results indicated that the Perceptron classifier achieved the best results of F-measure, with a good result of computational time. The second experiment collected data to evaluate real-time news preprocessing. For the data set collected, the results indicated that it is possible to achieve online processing time. / As aplicações de mineração de dados e textos oriundos da Internet têm sido alvo de recentes pesquisas. E, em todos os casos, as tarefas de mineração de dados necessitam trabalhar sobre dados limpos, consistentes e integrados para obter os melhores resultados. Sendo assim, ambientes de Data Warehouse são uma valiosa fonte de dados limpos e integrados para as aplicações de mineração. A tecnologia de Data Warehouse tem evoluído no sentido de recuperar e tratar dados provenientes da Web. Em particular, os sites de notícias são fontes ricas em textos, que podem compor um corpus linguístico. Inserindo o corpus em um ambiente de Data Warehouse, as aplicações poderão tirar proveito da flexibilidade que um modelo multidimensional e as operações OLAP fornecem. Dentre as vantagens estão a navegação pelos dados, a seleção da parte dos dados considerados relevantes, a análise dos dados em diferentes níveis de abstração, e a agregação, desagregação, rotação e filtragem sobre qualquer conjunto de dados. Este trabalho apresenta o ambiente de Data Warehouse Newsminer, que fornece um conjunto de textos consistente e limpo, na forma de um corpus multidimensional para consumo por aplicações externas e usuários. A proposta inclui uma arquitetura que integra a coleta textos de notícias em tempo próximo do tempo real, um módulo de enriquecimento semântico como parte da etapa de ETL, que acrescenta propriedades semânticas aos dados coletados tais como a categoria da notícia e a anotação POS-tagging, e a disponibilização de cubos de dados para consumo por aplicações e usuários. Foram executados dois experimentos. O primeiro experimento é relacionado à escolha do melhor classificador de categorias das notícias do módulo de enriquecimento semântico. A análise estatística dos resultados indicou que o classificador Perceptron atingiu os melhores resultados de F-medida, com resultado bom de tempo de processamento. O segundo experimento coletou dados para avaliar o pré-processamento de notícias em tempo real. Para o conjunto de dados coletados, os resultados indicaram que é possível atingir tempo de processamento online. / OB800972 Mineração de dados (Computação) Sites da Web Corpora multidimensional Enriquecimento semântico Categorização de notícias OLAP Multidimensional corpora Data mining Web sites Data Warehouse News websites Semantic enrichment News categorization
7	De l'usage de la sémantique dans la classification supervisée de textes : application au domaine médical / On the use of semantics in supervised text classification : application in the medical domain Albitar, Shereen 12 December 2013 (has links) Cette thèse porte sur l’impact de l’usage de la sémantique dans le processus de la classification supervisée de textes. Cet impact est évalué au travers d’une étude expérimentale sur des documents issus du domaine médical et en utilisant UMLS (Unified Medical Language System) en tant que ressource sémantique. Cette évaluation est faite selon quatre scénarii expérimentaux d’ajout de sémantique à plusieurs niveaux du processus de classification. Le premier scénario correspond à la conceptualisation où le texte est enrichi avant indexation par des concepts correspondant dans UMLS ; le deuxième et le troisième scénario concernent l’enrichissement des vecteurs représentant les textes après indexation dans un sac de concepts (BOC – bag of concepts) par des concepts similaires. Enfin le dernier scénario utilise la sémantique au niveau de la prédiction des classes, où les concepts ainsi que les relations entre eux, sont impliqués dans la prise de décision. Le premier scénario est testé en utilisant trois des méthodes de classification: Rocchio, NB et SVM. Les trois autres scénarii sont uniquement testés en utilisant Rocchio qui est le mieux à même d’accueillir les modifications nécessaires. Au travers de ces différentes expérimentations nous avons tout d’abord montré que des améliorations significatives pouvaient être obtenues avec la conceptualisation du texte avant l’indexation. Ensuite, à partir de représentations vectorielles conceptualisées, nous avons constaté des améliorations plus modérées avec d’une part l’enrichissement sémantique de cette représentation vectorielle après indexation, et d’autre part l’usage de mesures de similarité sémantique en prédiction. / The main interest of this research is the effect of using semantics in the process of supervised text classification. This effect is evaluated through an experimental study on documents related to the medical domain using the UMLS (Unified Medical Language System) as a semantic resource. This evaluation follows four scenarios involving semantics at different steps of the classification process: the first scenario incorporates the conceptualization step where text is enriched with corresponding concepts from UMLS; both the second and the third scenarios concern enriching vectors that represent text as Bag of Concepts (BOC) with similar concepts; the last scenario considers using semantics during class prediction, where concepts as well as the relations between them are involved in decision making. We test the first scenario using three popular classification techniques: Rocchio, NB and SVM. We choose Rocchio for the other scenarios for its extendibility with semantics. According to experiment, results demonstrated significant improvement in classification performance using conceptualization before indexing. Moderate improvements are reported using conceptualized text representation with semantic enrichment after indexing or with semantic text-to-text semantic similarity measures for prediction. Classification supervisée de texte Sémantique Conceptualisation Enrichissement sémantique Mesures de similarité sémantique Domaine médical UMLS Rocchio NB SVM Supervised text classification Semantics Conceptualization Semantic enrichment Semantic similarity measures Medical domain UMLS Rocchio NB SVM
8	Komponent pro sémantické obohacení / Semantic Enrichment Component Doležal, Jan January 2018 (has links) This master's thesis describes Semantic Enrichment Component (SEC), that searches entities (e.g., persons or places) in the input text document and returns information about them. The goals of this component are to create a single interface for named entity recognition tools, to enable parallel document processing, to save memory while using the knowledge base, and to speed up access to its content. To achieve these goals, the output of the named entity recognition tools in the text was specified, the tool for storing the preprocessed knowledge base into the shared memory was implemented, and the client-server scheme was used to create the component.

Search results