Global ETD Search

41	Leveraging Schema Information For Improved Knowledge Graph Navigation Chittella, Rama Someswar 02 August 2019 (has links) No description available. Computer Science Semantic Web Knowledge Graphs SPARQL RDF Linked Data
42	GraphQL2RDF : A proof-of-concept method to expose GraphQL data to the Semantic Web Nilsson, Anton January 2021 (has links) The Semantic Web was introduced to bring structure to the Web. The goal is to allow computer agents to be able to traverse the web and carry out tasks for human users. In the Semantic Web, data is stored using the RDF data model. The purpose of this study is to explore the possibility of exposing GraphQL data to the Semantic Web using a data-to-data translation inspired by Ontology Based Data Access (OBDA). This was done by introducing GraphQL2RDF, a proof-of-concept method to materialize GraphQL data as RDF triples. GraphQL2RDF uses two mapping schemas: a GraphQL-mapping schema annotated with directives to filter and select GraphQL data and an RDF-mapping schema to specify which RDF triples to create. GraphQL2RDF supports directives for filtering based on the SQL where-clause used for filtering in SQL. The approach is demonstrated in a library use-case, in which library data exposed in a GraphQL endpoint is mapped into RDF. GraphQL2RDF demonstrates a method for exposing GraphQL data as RDF, while imposing a minimal set of requirements on the GraphQL endpoint. Future work includes improvements of the model and exploring extensions of this translation method towards an OBDA approach that does not require full materialization of the RDF data. GraphQL RDF SPARQL Semantic Web OBDA Computer Sciences Datavetenskap (datalogi)
43	Traitement et raisonnement distribués des flux RDF / Distributed RDF stream processing and reasoning Ren, Xiangnan 19 November 2018 (has links) Le traitement en temps réel des flux de données émanant des capteurs est devenu une tâche courante dans de nombreux scénarios industriels. Dans le contexte de l'Internet des objets (IoT), les données sont émises par des sources de flux hétérogènes, c'est-à-dire provenant de domaines et de modèles de données différents. Cela impose aux applications de l'IoT de gérer efficacement l'intégration de données à partir de ressources diverses. Le traitement des flux RDF est dès lors devenu un domaine de recherche important. Cette démarche basée sur des technologies du Web Sémantique supporte actuellement de nombreuses applications innovantes où les notions de temps réel et de raisonnement sont prépondérantes. La recherche présentée dans ce manuscrit s'attaque à ce type d'application. En particulier, elle a pour objectif de gérer efficacement les flux de données massifs entrants et à avoir des services avancés d’analyse de données, e.g., la détection d’anomalie. Cependant, un moteur de RDF Stream Processing (RSP) moderne doit prendre en compte les caractéristiques de volume et de vitesse rencontrées à l'ère du Big Data. Dans un projet industriel d'envergure, nous avons découvert qu'un moteur de traitement de flux disponible 24/7 est généralement confronté à un volume de données massives, avec des changements dynamiques de la structure des données et les caractéristiques de la charge du système. Pour résoudre ces problèmes, nous proposons Strider, un moteur de traitement de flux RDF distribué, hybride et adaptatif qui optimise le plan de requête logique selon l’état des flux de données. Strider a été conçu pour garantir d'importantes propriétés industrielles telles que l'évolutivité, la haute disponibilité, la tolérance aux pannes, le haut débit et une latence acceptable. Ces garanties sont obtenues en concevant l'architecture du moteur avec des composants actuellement incontournables du Big Data: Apache Spark et Apache Kafka. De plus, un nombre croissant de traitements exécutés sur des moteurs RSP nécessitent des mécanismes de raisonnement. Ils se traduisent généralement par un compromis entre le débit de données, la latence et le coût computationnel des inférences. Par conséquent, nous avons étendu Strider pour prendre en charge la capacité de raisonnement en temps réel avec un support d'expressivité d'ontologies en RDFS + (i.e., RDFS + owl:sameAs). Nous combinons Strider avec une approche de réécriture de requêtes pour SPARQL qui bénéficie d'un encodage intelligent pour les bases de connaissances. Le système est évalué selon différentes dimensions et sur plusieurs jeux de données, pour mettre en évidence ses performances. Enfin, nous avons exploré le raisonnement du flux RDF dans un contexte d'ontologies exprimés avec un fragment d'ASP (Answer Set Programming). La considération de cette problématique de recherche est principalement motivée par le fait que de plus en plus d'applications de streaming nécessitent des tâches de raisonnement plus expressives et complexes. Le défi principal consiste à gérer les dimensions de débit et de latence avec des méthologies efficaces. Les efforts récents dans ce domaine ne considèrent pas l'aspect de passage à l'échelle du système pour le raisonnement des flux. Ainsi, nous visons à explorer la capacité des systèmes distribuées modernes à traiter des requêtes d'inférence hautement expressive sur des flux de données volumineux. Nous considérons les requêtes exprimées dans un fragment positif de LARS (un cadre logique temporel basé sur Answer Set Programming) et proposons des solutions pour traiter ces requêtes, basées sur les deux principaux modèles d’exécution adoptés par les principaux systèmes distribuées: Bulk Synchronous Parallel (BSP) et Record-at-A-Time (RAT). Nous mettons en œuvre notre solution nommée BigSR et effectuons une série d’évaluations. Nos expériences montrent que BigSR atteint un débit élevé au-delà du million de triplets par seconde en utilisant un petit groupe de machines / Real-time processing of data streams emanating from sensors is becoming a common task in industrial scenarios. In an Internet of Things (IoT) context, data are emitted from heterogeneous stream sources, i.e., coming from different domains and data models. This requires that IoT applications efficiently handle data integration mechanisms. The processing of RDF data streams hence became an important research field. This trend enables a wide range of innovative applications where the real-time and reasoning aspects are pervasive. The key implementation goal of such application consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. However, a modern RSP engine has to address volume and velocity characteristics encountered in the Big Data era. In an on-going industrial project, we found out that a 24/7 available stream processing engine usually faces massive data volume, dynamically changing data structure and workload characteristics. These facts impact the engine's performance and reliability. To address these issues, we propose Strider, a hybrid adaptive distributed RDF Stream Processing engine that optimizes logical query plan according to the state of data streams. Strider has been designed to guarantee important industrial properties such as scalability, high availability, fault-tolerant, high throughput and acceptable latency. These guarantees are obtained by designing the engine's architecture with state-of-the-art Apache components such as Spark and Kafka. Moreover, an increasing number of processing jobs executed over RSP engines are requiring reasoning mechanisms. It usually comes at the cost of finding a trade-off between data throughput, latency and the computational cost of expressive inferences. Therefore, we extend Strider to support real-time RDFS+ (i.e., RDFS + owl:sameAs) reasoning capability. We combine Strider with a query rewriting approach for SPARQL that benefits from an intelligent encoding of knowledge base. The system is evaluated along different dimensions and over multiple datasets to emphasize its performance. Finally, we have stepped further to exploratory RDF stream reasoning with a fragment of Answer Set Programming. This part of our research work is mainly motivated by the fact that more and more streaming applications require more expressive and complex reasoning tasks. The main challenge is to cope with the large volume and high-velocity dimensions in a scalable and inference-enabled manner. Recent efforts in this area still missing the aspect of system scalability for stream reasoning. Thus, we aim to explore the ability of modern distributed computing frameworks to process highly expressive knowledge inference queries over Big Data streams. To do so, we consider queries expressed as a positive fragment of LARS (a temporal logic framework based on Answer Set Programming) and propose solutions to process such queries, based on the two main execution models adopted by major parallel and distributed execution frameworks: Bulk Synchronous Parallel (BSP) and Record-at-A-Time (RAT). We implement our solution named BigSR and conduct a series of evaluations. Our experiments show that BigSR achieves high throughput beyond million-triples per second using a rather small cluster of machines Big Data Web Sémantique Sparql Système Distribué Traitement de Flux Big Data Web semantic Sparql Distributed System Stream Processing
44	Extração e consulta de informações do Currículo Lattes baseada em ontologias / Ontology-based Queries and Information Extraction from the Lattes CV Galego, Eduardo Ferreira 06 November 2013 (has links) A Plataforma Lattes é uma excelente base de dados de pesquisadores para a sociedade brasileira, adotada pela maioria das instituições de fomento, universidades e institutos de pesquisa do País. Entretanto, é limitada quanto à exibição de dados sumarizados de um grupos de pessoas, como por exemplo um departamento de pesquisa ou os orientandos de um ou mais professores. Diversos projetos já foram desenvolvidos propondo soluções para este problema, alguns inclusive desenvolvendo ontologias a partir do domínio de pesquisa. Este trabalho tem por objetivo integrar todas as funcionalidades destas ferramentas em uma única solução, a SOS Lattes. Serão apresentados os resultados obtidos no desenvolvimento desta solução e como o uso de ontologias auxilia nas atividades de identificação de inconsistências de dados, consultas para construção de relatórios consolidados e regras de inferência para correlacionar múltiplas bases de dados. Além disto, procura-se por meio deste trabalho contribuir com a expansão e disseminação da área de Web Semântica, por meio da criação de uma ferramenta capaz de extrair dados de páginas Web e disponibilizar sua estrutura semântica. Os conhecimentos adquiridos durante a pesquisa poderão ser úteis ao desenvolvimento de novas ferramentas atuando em diferentes ambientes. / The Lattes Platform is an excellent database of researchers for the Brazilian society , adopted by most Brazilian funding agencies, universities and research institutes. However, it is limited as to displaying summarized data from a group of people, such as a research department or students supervised by one or more professor. Several projects have already been developed which propose solutions to this problem, including some developing ontologies from the research domain. This work aims to integrate all the functionality of these tools in a single solution, SOS Lattes. The results obtained in the development of this solution are presented as well as the use of ontologies to help identifying inconsistencies in the data, queries for building consolidated reports and rules of inference for correlating multiple databases. Also, this work intends to contribute to the expansion and dissemination of the Semantic Web, by creating a tool that can extract data from Web pages and provide their semantic structure. The knowledge gained during the study may be useful for the development of new tools operating in different environments. Currículo Lattes Inference Inferência. Lattes CV Lattes Plataform Ontologias Ontologies OWL OWL Plataforma Lattes Semantic Web SPARQL SPARQL Web Semântica
45	Populando ontologias através de informações em HTML - o caso do currículo lattes / Populating ontologies using HTML information - the currículo lattes case Castaño, André Casado 06 May 2008 (has links) A Plataforma Lattes é, hoje, a principal base de currículos dos pesquisadores brasileiros. Os currículos da Plataforma Lattes armazenam de forma padronizada dados profissionais, acadêmicos, de produções bibliográficas e outras informações dos pesquisadores. Através de uma base de Currículos Lattes, podem ser gerados vários tipos de relatórios consolidados. As ferramentas existentes da Plataforma Lattes não são capazes de detectar alguns problemas que aparecem na geração dos relatórios consolidados como duplicidades de citações ou produções bibliográficas classificadas de maneiras distintas por cada autor, gerando um número total de publicações errado. Esse problema faz com que os relatórios gerados necessitem ser revistos pelos pesquisadores e essas falhas deste processo são a principal inspiração deste projeto. Neste trabalho, utilizamos como fonte de informações currículos da Plataforma Lattes para popular uma ontologia e utilizá-la principalmente como uma base de dados a ser consultada para geração de relatórios. Analisamos todo o processo de extração de informações a partir de arquivos HTML e seu posterior processamento para inserí-las corretamente dentro da ontologia, de acordo com sua semântica. Com a ontologia corretamente populada, mostramos também algumas consultas que podem ser realizadas e fazemos uma análise dos métodos e abordagens utilizadas em todo processo, comentando seus pontos fracos e fortes, visando detalhar todas as dificuldades existentes no processo de população (instanciação) automática de uma ontologia. / Lattes Platform is the main database of Brazilian researchers resumés in use nowadays. It stores in a standardized form professional, academic, bibliographical productions and other data from these researchers. From these Lattes resumés database, several types of reports can be generated. The tools available for Lattes platform are unable to detect some of the problems that emerge when generating consolidated reports, such as citation duplicity or bibliographical productions misclassified by their authors, generating an incorrect number of publications. This problem demands a revision performed by the researcher on the reports generated, and the flaws of this process are the main inspiration for this project. In this work we use the Lattes platform resumés database as the source for populating an ontology that is intended to be used to generate reports. We analyze the whole process of information gathering from HTML files and its post-processing to insert them correctly in the ontology, according to its semantics. With this ontology correctly populated, we show some new reports that can be generated and we perform also an analysis of the methods and approaches used in the whole process, highlighting their strengths and weaknesses, detailing the dificulties faced in the automated populating process (instantiation) of an ontology.
46	What's in a query : analyzing, predicting, and managing linked data access Lorey, Johannes January 2014 (has links) The term Linked Data refers to connected information sources comprising structured data about a wide range of topics and for a multitude of applications. In recent years, the conceptional and technical foundations of Linked Data have been formalized and refined. To this end, well-known technologies have been established, such as the Resource Description Framework (RDF) as a Linked Data model or the SPARQL Protocol and RDF Query Language (SPARQL) for retrieving this information. Whereas most research has been conducted in the area of generating and publishing Linked Data, this thesis presents novel approaches for improved management. In particular, we illustrate new methods for analyzing and processing SPARQL queries. Here, we present two algorithms suitable for identifying structural relationships between these queries. Both algorithms are applied to a large number of real-world requests to evaluate the performance of the approaches and the quality of their results. Based on this, we introduce different strategies enabling optimized access of Linked Data sources. We demonstrate how the presented approach facilitates effective utilization of SPARQL endpoints by prefetching results relevant for multiple subsequent requests. Furthermore, we contribute a set of metrics for determining technical characteristics of such knowledge bases. To this end, we devise practical heuristics and validate them through thorough analysis of real-world data sources. We discuss the findings and evaluate their impact on utilizing the endpoints. Moreover, we detail the adoption of a scalable infrastructure for improving Linked Data discovery and consumption. As we outline in an exemplary use case, this platform is eligible both for processing and provisioning the corresponding information. / Unter dem Begriff Linked Data werden untereinander vernetzte Datenbestände verstanden, die große Mengen an strukturierten Informationen für verschiedene Anwendungsgebiete enthalten. In den letzten Jahren wurden die konzeptionellen und technischen Grundlagen für die Veröffentlichung von Linked Data gelegt und verfeinert. Zu diesem Zweck wurden eine Reihe von Technologien eingeführt, darunter das Resource Description Framework (RDF) als Datenmodell für Linked Data und das SPARQL Protocol and RDF Query Language (SPARQL) zum Abfragen dieser Informationen. Während bisher hauptsächlich die Erzeugung und Bereitstellung von Linked Data Forschungsgegenstand war, präsentiert die vorliegende Arbeit neuartige Verfahren zur besseren Nutzbarmachung. Insbesondere werden dafür Methoden zur Analyse und Verarbeitung von SPARQL-Anfragen entwickelt. Zunächst werden daher zwei Algorithmen vorgestellt, die die strukturelle Ähnlichkeit solcher Anfragen bestimmen. Beide Algorithmen werden auf eine große Anzahl von authentischen Anfragen angewandt, um sowohl die Güte der Ansätze als auch die ihrer Resultate zu untersuchen. Darauf aufbauend werden verschiedene Strategien erläutert, mittels derer optimiert auf Quellen von Linked Data zugegriffen werden kann. Es wird gezeigt, wie die dabei entwickelte Methode zur effektiven Verwendung von SPARQL-Endpunkten beiträgt, indem relevante Ergebnisse für mehrere nachfolgende Anfragen vorgeladen werden. Weiterhin werden in dieser Arbeit eine Reihe von Metriken eingeführt, die eine Einschätzung der technischen Eigenschaften solcher Endpunkte erlauben. Hierfür werden praxisrelevante Heuristiken entwickelt, die anschließend ausführlich mit Hilfe von konkreten Datenquellen analysiert werden. Die dabei gewonnenen Erkenntnisse werden erörtert und in Hinblick auf die Verwendung der Endpunkte interpretiert. Des Weiteren wird der Einsatz einer skalierbaren Plattform vorgestellt, die die Entdeckung und Nutzung von Beständen an Linked Data erleichtert. Diese Plattform dient dabei sowohl zur Verarbeitung als auch zur Verfügbarstellung der zugehörigen Information, wie in einem exemplarischen Anwendungsfall erläutert wird. Vernetzte Daten SPARQL RDF Anfragepaare Informationsvorhaltung linked data SPARQL RDF query matching prefetching Data processing Computer science
47	TopFed: TCGA tailored federated query processing and linking to LOD Saleem, Muhammad, Padmanabhuni, Shanmukha S., Ngonga Ngomo, Axel-Cyrille, Iqbal, Aftab, Almeida, Jonas S., Decker, Stefan, Deus, Helena F. 12 January 2015 (has links) (PDF) Methods: We address these issues by transforming the TCGA data into the Semantic Web standard Resource Description Format (RDF), link it to relevant datasets in the Linked Open Data (LOD) cloud and further propose an efficient data distribution strategy to host the resulting 20.4 billion triples data via several SPARQL endpoints. Having the TCGA data distributed across multiple SPARQL endpoints, we enable biomedical scientists to query and retrieve information from these SPARQL endpoints by proposing a TCGA tailored federated SPARQL query processing engine named TopFed. Results: We compare TopFed with a well established federation engine FedX in terms of source selection and query execution time by using 10 different federated SPARQL queries with varying requirements. Our evaluation results show that TopFed selects on average less than half of the sources (with 100% recall) with query execution time equal to one third to that of FedX. Conclusion: With TopFed, we aim to offer biomedical scientists a single-point-of-access through which distributed TCGA data can be accessed in unison. We believe the proposed system can greatly help researchers in the biomedical domain to carry out their research effectively with TCGA as the amount and diversity of data exceeds the ability of local resources to handle its retrieval and parsing. gleichzeitige Suchanfrage SPARQL TCGA RDF Federated queries SPARQL TCGA Cancer genome atlas RDF ddc:610 ddc:004
48	Uma infraestrutura de suporte a aplicações cientes de contexto com enfoque no usuário final. / A context-aware application support infrastructure that focuses on the end user. FIGUEIRÊDO, Hugo Feitosa de. 03 August 2018 (has links) Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-08-03T21:52:09Z No. of bitstreams: 1 HUGO FEITOSA DE FIGUEIRÊDO - DISSERTAÇÃO PPGCC 2009..pdf: 2524258 bytes, checksum: 14119ff36e281c839965a8a0ddc61686 (MD5) / Made available in DSpace on 2018-08-03T21:52:09Z (GMT). No. of bitstreams: 1 HUGO FEITOSA DE FIGUEIRÊDO - DISSERTAÇÃO PPGCC 2009..pdf: 2524258 bytes, checksum: 14119ff36e281c839965a8a0ddc61686 (MD5) Previous issue date: 2009-09-18 / As aplicações cientes de contexto estão se tornando populares, como consequência de avanços tecnológicos em dispositivos móveis, sensores e comunicação de redes sem ﬁo. Entretanto, desenvolver um sistema ciente de contexto envolve vários desaﬁos. Por exemplo, quais serão as informações contextuais, como representar, adquirir e processar essas informações e como estas serão utilizadas pelo sistema. Alguns frameworks e middlewares foram propostos na literatura para auxiliar programadores a superar esses desaﬁos, porém ainda faltam mecanismos que auxiliem usuários ﬁnais na personalização dessas aplicações. Além disso, a maioria das soluções propostas não possui um modelo de contexto extensível baseado em ontologias ou n˜ao utiliza uma comunicação que permita aproveitar as potencialidades dos modelos que seguem esta abordagem. Este trabalho propõe uma infraestrutura de suporte a aplicações cientes de contexto que possui um modelo de contexto extensível baseado em ontologias e comunicação entre os elementos utilizando SPARQL e SPARQL Update. Também são propostas ferramentas para usuários ﬁnais criarem e validarem visualmente regras contextuais. / Context-aware applications have become popular as a consequence of the technological advances on mobile devices, sensors, and wireless network communication. However, there are many challenges in the development of these applications. For instance, which contextual information will be used, how to represent, capture, proccess and use the context in the system are some of such application development challenges. Frameworks and middlewares to improve context-aware application development have been proposed, but they still lack helping users in customizing their applications. Furthemore most proposed solutions do not have an extensible ontology-based context model and an efﬁcient communication which enables to explore the main features of such approach. This work proposes an infrastructure to support context-aware applications, which uses an extensible ontology-based context model and communication through SPARQL and SPARQL Update. Also there are visual tools aiming to helpend-users in the creation and validation of context rules. Ciência da Computação. Aplicação cientes de contexto Dispositivos móveis Ontologias Sistema ciente de contexto SPARQL e SPARQL Update Context-aware applications
49	La vérification de patrons de workflow métier basés sur les flux de contrôle : une approche utilisant les systèmes à base de connaissances / Control flow-based business workflow templates checking : an approach using the knowledge-based systems Nguyen, Thi Hoa Hue 23 June 2015 (has links) Cette thèse traite le problème de la modélisation des patrons de workflow sémantiquement riche et propose un processus pour développer des patrons de workflow. L'objectif est de transformer un processus métier en un patron de workflow métier basé sur les flux de contrôle qui garantit la vérification syntaxique et sémantique. Les défis majeurs sont : (i) de définir un formalisme permettant de représenter les processus métiers; (ii) d'établir des mécanismes de contrôle automatiques pour assurer la conformité des patrons de workflow métier basés sur un modèle formel et un ensemble de contraintes sémantiques; et (iii) d’organiser la base de patrons de workflow métier pour le développement de patrons de workflow. Nous proposons un formalisme qui combine les flux de contrôle (basés sur les Réseaux de Petri Colorés (CPNs)) avec des contraintes sémantiques pour représenter les processus métiers. L'avantage de ce formalisme est qu'il permet de vérifier non seulement la conformité syntaxique basée sur le modèle de CPNs mais aussi la conformité sémantique basée sur les technologies du Web sémantique. Nous commençons par une phase de conception d'une ontologie OWL appelée l’ontologie CPN pour représenter les concepts de patrons de workflow métier basés sur CPN. La phase de conception est suivie par une étude approfondie des propriétés de ces patrons pour les transformer en un ensemble d'axiomes pour l'ontologie. Ainsi, dans ce formalisme, un processus métier est syntaxiquement transformé en une instance de l’ontologie. / This thesis tackles the problem of modelling semantically rich business workflow templates and proposes a process for developing workflow templates. The objective of the thesis is to transform a business process into a control flow-based business workflow template that guarantees syntactic and semantic validity. The main challenges are: (i) to define formalism for representing business processes; (ii) to establish automatic control mechanisms to ensure the correctness of a business workflow template based on a formal model and a set of semantic constraints; and (iii) to organize the knowledge base of workflow templates for a workflow development process. We propose a formalism which combines control flow (based on Coloured Petri Nets (CPNs)) with semantic constraints to represent business processes. The advantage of this formalism is that it allows not only syntactic checks based on the model of CPNs, but also semantic checks based on Semantic Web technologies. We start by designing an OWL ontology called the CPN ontology to represent the concepts of CPN-based business workflow templates. The design phase is followed by a thorough study of the properties of these templates in order to transform them into a set of axioms for the CPN ontology. In this formalism, a business process is syntactically transformed into an instance of the CPN ontology. Therefore, syntactic checking of a business process becomes simply verification by inference, by concepts and by axioms of the CPN ontology on the corresponding instance. Contrainte sémantique Ontologie Réseaux de Petri colorés SPARQL Vérification Workflow métier Business workflow CPN Ontology Semantic constraint SPARQL Verification
50	Populando ontologias através de informações em HTML - o caso do currículo lattes / Populating ontologies using HTML information - the currículo lattes case André Casado Castaño 06 May 2008 (has links) A Plataforma Lattes é, hoje, a principal base de currículos dos pesquisadores brasileiros. Os currículos da Plataforma Lattes armazenam de forma padronizada dados profissionais, acadêmicos, de produções bibliográficas e outras informações dos pesquisadores. Através de uma base de Currículos Lattes, podem ser gerados vários tipos de relatórios consolidados. As ferramentas existentes da Plataforma Lattes não são capazes de detectar alguns problemas que aparecem na geração dos relatórios consolidados como duplicidades de citações ou produções bibliográficas classificadas de maneiras distintas por cada autor, gerando um número total de publicações errado. Esse problema faz com que os relatórios gerados necessitem ser revistos pelos pesquisadores e essas falhas deste processo são a principal inspiração deste projeto. Neste trabalho, utilizamos como fonte de informações currículos da Plataforma Lattes para popular uma ontologia e utilizá-la principalmente como uma base de dados a ser consultada para geração de relatórios. Analisamos todo o processo de extração de informações a partir de arquivos HTML e seu posterior processamento para inserí-las corretamente dentro da ontologia, de acordo com sua semântica. Com a ontologia corretamente populada, mostramos também algumas consultas que podem ser realizadas e fazemos uma análise dos métodos e abordagens utilizadas em todo processo, comentando seus pontos fracos e fortes, visando detalhar todas as dificuldades existentes no processo de população (instanciação) automática de uma ontologia. / Lattes Platform is the main database of Brazilian researchers resumés in use nowadays. It stores in a standardized form professional, academic, bibliographical productions and other data from these researchers. From these Lattes resumés database, several types of reports can be generated. The tools available for Lattes platform are unable to detect some of the problems that emerge when generating consolidated reports, such as citation duplicity or bibliographical productions misclassified by their authors, generating an incorrect number of publications. This problem demands a revision performed by the researcher on the reports generated, and the flaws of this process are the main inspiration for this project. In this work we use the Lattes platform resumés database as the source for populating an ontology that is intended to be used to generate reports. We analyze the whole process of information gathering from HTML files and its post-processing to insert them correctly in the ontology, according to its semantics. With this ontology correctly populated, we show some new reports that can be generated and we perform also an analysis of the methods and approaches used in the whole process, highlighting their strengths and weaknesses, detailing the dificulties faced in the automated populating process (instantiation) of an ontology.

Search results