Global ETD Search

11	Schema Matching and Data Extraction over HTML Tables Tao, Cui 16 September 2003 (has links) (PDF) Data on the Web in HTML tables is mostly structured, but we usually do not know the structure in advance. Thus, we cannot directly query for data of interest. We propose a solution to this problem for the case of mostly structured data in the form of HTML tables, based on document-independent extraction ontologies. The solution entails elements of table location and table understanding, data integration, and wrapper creation. Table location and understanding allows us to locate the table of interest, recognize attributes and values, pair attributes with values, and form records. Data-integration techniques allow us to match source records with a target schema. Ontologically specified wrappers allow us to extract data from source records into a target schema. Experimental results show that we can successfully map data of interest from source HTML tables with unknown structure to a given target database schema. We can thus "directly" query source data with unknown structure through a known target schema. HTML table ontology data extraction schema matching document-independent extraction data-integration techniques Computer Sciences
12	Top-k Entity Augmentation using Consistent Set Covering Eberius, Julian, Thiele, Maik, Braunschweig, Katrin, Lehner, Wolfgang 19 September 2022 (has links) Entity augmentation is a query type in which, given a set of entities and a large corpus of possible data sources, the values of a missing attribute are to be retrieved. State of the art methods return a single result that, to cover all queried entities, is fused from a potentially large set of data sources. We argue that queries on large corpora of heterogeneous sources using information retrieval and automatic schema matching methods can not easily return a single result that the user can trust, especially if the result is composed from a large number of sources that user has to verify manually. We therefore propose to process these queries in a Top-k fashion, in which the system produces multiple minimal consistent solutions from which the user can choose to resolve the uncertainty of the data sources and methods used. In this paper, we introduce and formalize the problem of consistent, multi-solution set covering, and present algorithms based on a greedy and a genetic optimization approach. We then apply these algorithms to Web table-based entity augmentation. The publication further includes a Web table corpus with 100M tables, and a Web table retrieval and matching system in which these algorithms are implemented. Our experiments show that the consistency and minimality of the augmentation results can be improved using our set covering approach, without loss of precision or coverage and while producing multiple alternative query results. info:eu-repo/classification/ddc/004 ddc:004
13	Formalisation, acquisition et mise en œuvre de connaissances pour l’intégration virtuelle de bases de données géographiques : les spécifications au cœur du processus d’intégration / Formalisation, acquisition and implementation of specifications knowledge for geographic databases integration Abadie, Nathalie 20 November 2012 (has links) Cette thèse traite de l'intégration de bases de données topographiques qui consiste à expliciter les relations de correspondance entre bases de données hétérogènes, de sorte à permettre leur utilisation conjointe. L'automatisation de ce processus d'intégration suppose celle de la détection des divers types d'hétérogénéité pouvant intervenir entre les bases de données topographiques à intégrer. Ceci suppose de disposer, pour chacune des bases à intégrer, de connaissances sur leurs contenus respectifs. Ainsi, l'objectif de cette thèse réside dans la formalisation, l'acquisition et l'exploitation des connaissances nécessaires pour la mise en œuvre d'un processus d'intégration virtuelle de bases de données géographiques vectorielles. Une première étape du processus d'intégration de bases de données topographiques consiste à apparier leurs schémas conceptuels. Pour ce faire, nous proposons de nous appuyer sur une source de connaissances particulière : les spécifications des bases de données topographiques. Celles-ci sont tout d'abord mises à profit pour la création d'une ontologie du domaine de la topographie. Cette ontologie est utilisée comme ontologie de support, dans le cadre d'une première approche d'appariement de schémas de bases de données topographiques, fondée sur des techniques d'appariement terminologiques et structurelles. Une seconde approche, inspirée des techniques d'appariement fondées sur la sémantique, met en œuvre cette ontologie pour la représentation des connaissances sur les règles de sélection et de représentation géométrique des entités géographiques issues des spécifications dans le langage OWL 2, et leur exploitation par un système de raisonnement / This PhD thesis deals with topographic databases integration. This process aims at facilitating the use of several heterogeneous databases by making the relationships between them explicit. To automatically achieve databases integration, several aspects of data heterogeneity must be detected and solved. Identifying heterogeneities between topographic databases implies comparing some knowledge about their respective contents. Therefore, we propose to formalise and acquire this knowledge and to use it for topographic databases integration. Our work focuses on the specific problem of topographic databases schema matching, as a first step in an integration application. To reach this goal, we propose to use a specific knowledge source, namely the databases specifications, which describe the data implementing rules. Firstly, they are used as the main resource for the knowledge acquisition process in an ontology learning application. As a first approach for schema matching, the domain ontology created from the texts of IGN's databases specifications is used as a background knowledge source in a schema matching application based on terminological and structural matching techniques. In a second approach, this ontology is used to support the representation, in the OWL 2 language, of topographic entities selection and geometry capture rules described in the databases specifications. This knowledge is then used by a reasoner in a semantic-based schema matching application Appariement de schémas Spécifications Représentation de connaissances Ontologies Topographic databases integration Schema matching Specifications Knowledge representation Ontology
14	Casamento de esquemas de banco de dados aplicando aprendizado ativo Rodrigues, Diego de Azevedo 12 March 2013 (has links) Submitted by Geyciane Santos (geyciane_thamires@hotmail.com) on 2015-06-18T13:54:27Z No. of bitstreams: 1 Dissertação - Diego de Azevedo Rodrigues.pdf: 8601801 bytes, checksum: 6c2dde718a0b6857ac6e14fd715e240c (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-19T21:02:00Z (GMT) No. of bitstreams: 1 Dissertação - Diego de Azevedo Rodrigues.pdf: 8601801 bytes, checksum: 6c2dde718a0b6857ac6e14fd715e240c (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2015-06-19T21:03:00Z (GMT) No. of bitstreams: 1 Dissertação - Diego de Azevedo Rodrigues.pdf: 8601801 bytes, checksum: 6c2dde718a0b6857ac6e14fd715e240c (MD5) / Made available in DSpace on 2015-06-19T21:03:00Z (GMT). No. of bitstreams: 1 Dissertação - Diego de Azevedo Rodrigues.pdf: 8601801 bytes, checksum: 6c2dde718a0b6857ac6e14fd715e240c (MD5) Previous issue date: 2013-03-12 / FAPEAM - Fundação de Amparo à Pesquisa do Estado do Amazonas / Given two database schemas within the same domain, the schema matching problem is the task of finding pairs of schema elements that have the same semantics for that domain. Usually, this task was performed manually by a specialist making it tedious and costly because the specialist should know the schemas and their domain. Currently this process is assisted by semi-automatic schema matching methods. Current, methods use some heuristics to generate matchings and many of them share a common modeling: they build a similarity matrix between the elements from functions called matchers and, based on the matrix values, decide according to a criterion which of the matchings are correct. This thesis presents an active-learning based method that uses the similarity matrix generated by the matchers, a machine learning algorithm and specialist interventions to generate matchings. The presented method di↵ers from others because it has no fixed heuristic and uses the specialist expertise only when necessary. In our experiments, we evaluate the proposed method against a baseline on two datasets: the first one was the same used by the baseline and the second containing schemas of a benchmark for schema integration. We show that baseline achieves good results on its original dataset, but its fixed strategy is not as e↵ective for other schemas. Moreover, the proposed method based on active learning is shown more consistent achieving, on average, F-measure value of 0.64. / Dados dois esquemas de bancos de dados pertencentes ao mesmo domíınio, o problema de Casamento de Esquemas consiste em encontrar pares de elementos desses esquemas que possuam a mesma semântica para aquele domínio. Tradicionalmente, tal tarefa era realizada manualmente por um especialista, tornando-a custosa e cansativa pois, este deveria conhecer bem os esquemas e o domíınio em que estes estavam inseridos. Atualmente, esse processo é assistido por métodos semi-automáticos de casamento de esquemas. Os métodos atuais utilizam diversas heurísticas para gerar os casamentos e muitos deles compartilham uma modelagem em comum: constroem uma matriz de similaridade entre os elementos a partir de funções chamadas matchers e, baseados nos valores dessa matriz, decidem segundo algum critério quais os casamentos válidos. Esta dissertação apresenta um método baseado em aprendizado ativo que utiliza a matriz de similaridade gerada pelos matchers e um algoritmo de aprendizagem de máquina, além de intervenções de um especialista, para gerar os casamentos. O método apresentado se diferencia dos outros por não possuir uma heurística fixa e por utilizar a experiência do especialista apenas quando necessário. Em nossos experimentos, avaliamos o método proposto contra um baseline em dois datasets: o primeiro que foi o mesmo utilizado pelo baseline e o segundo contendo esquemas propostos em um benchmark para integração de esquemas. Mostramos que o baseline alcança bons resultados no dataset em que foi originalmente testado, mas que sua estratégia fixa não é tão efetiva para outros esquemas. Por outro lado, o método baseado em aprendizado ativo que propomos se mostra consistente em ambos os datasets, alcançando, em média, um valor de medida-F igual a 0, 64. Casamento de esquemas Integração de dados Aprendizado ativo Schema matching Data integration Active learning
15	Application Of Schema Matching Methods To Semantic Web Service Discovery Karagoz, Funda 01 September 2006 (has links) (PDF) The Web turns out to be a collection of services that interoperate through the Internet. As the number of services increase, it is getting more and more diffucult for users to find, filter and integrate these services depending on their requirements. Automatic techniques are being developed to fulfill these tasks. The first step toward automatic composition is the discovery of services needed. UDDI which is one of the accepted web standards, provides a registry of web services. However representation capabilities of UDDI are insufficient to search for services on the basis of what they provide. Semantic web initiatives like OWL and OWL-S are promising for locating exact services based on their capabilities. In this thesis, a new semantic service discovery mechanism is implemented based on OWL-S service profiles. The service profiles of an advertisement and a request are matched based on OWL ontologies describing them. In contrast to previous work on the subject, the ontologies of the advertisement and the request are not assumed to be same. In case they are different, schema matching algorithms are applied. Schema matching algorithms find the mappings between the given schema models. A hybrid combination of semantic, syntactic and structural schema matching algorithms are applied to match ontologies QA Computer Software 76.75-76.765
16	Pareamento privado de atributos no contexto da resolução de entidades com preservação de privacidade. NÓBREGA, Thiago Pereira da. 10 September 2018 (has links) Submitted by Emanuel Varela Cardoso (emanuel.varela@ufcg.edu.br) on 2018-09-10T19:58:50Z No. of bitstreams: 1 THIAGO PEREIRA DA NÓBREGA – DISSERTAÇÃO (PPGCC) 2018.pdf: 3402601 bytes, checksum: b1a8d86821a4d14435d5adbdd850ec04 (MD5) / Made available in DSpace on 2018-09-10T19:58:50Z (GMT). No. of bitstreams: 1 THIAGO PEREIRA DA NÓBREGA – DISSERTAÇÃO (PPGCC) 2018.pdf: 3402601 bytes, checksum: b1a8d86821a4d14435d5adbdd850ec04 (MD5) Previous issue date: 2018-05-11 / A Resolução de entidades com preservação de privacidade (REPP) consiste em identificar entidades (e.g. Pacientes), armazenadas em bases de dados distintas, que correspondam a um mesmo objeto do mundo real. Como as entidades em questão possuem dados privados (ou seja, dados que não podem ser divulgados) é fundamental que a tarefa de REPP seja executada sem que nenhuma informação das entidades seja revelada entre os participantes (proprietários das bases de dados), de modo que a privacidade dos dados seja preservada. Ao final da tarefa de REPP, cada participante identifica quais entidades de sua base de dados estão presentes nas bases de dados dos demais participantes. Antes de iniciar a tarefa de REPP os participantes devem concordar em relação à entidade (em comum), a ser considerada na tarefa, e aos atributos das entidades a serem utilizados para comparar as entidades. Em geral, isso exige que os participantes tenham que expor os esquemas de suas bases de dados, compartilhando (meta-) informações que podem ser utilizadas para quebrar a privacidade dos dados. Este trabalho propõe uma abordagem semiautomática para identificação de atributos similares (pareamento de atributos) a serem utilizados para comparar entidades durante a REPP. A abordagem é inserida em uma etapa preliminar da REPP (etapa de Apresentação) e seu resultado (atributos similares) pode ser utilizado pelas etapas subsequentes (Blocagem e Comparação). Na abordagem proposta a identificação dos atributos similares é realizada utilizando-se representações dos atributos (Assinaturas de Dados), geradas por cada participante, eliminando a necessidade de divulgar informações sobre seus esquemas, ou seja, melhorando a segurança e privacidade da tarefa de REPP. A avaliação da abordagem aponta que a qualidade do pareamento de atributos é equivalente a uma solução que não considera a privacidade dos dados, e que a abordagem é capaz de preservar a privacidade dos dados. / The Privacy Preserve Record Linkage (PPRL) aims to identify entities, that can not have their information disclosed (e.g., Medical Records), which correspond to the same real-world object across different databases. It is crucial to the PPRL tasks that it is executed without revealing any information between the participants (database owners) during the PPRL task, to preserve the privacy of the original data. At the end of a PPRL task, each participant identifies which entities in its database are present in the databases of the other participants. Thus, before starting the PPRL task, the participants must agree on the entity and its attributes, to be compared in the task. In general, this agreement requires that participants have to expose their schemas, sharing (meta-)information that can be used to break the privacy of the data. This work proposes a semiautomatic approach to identify similar attributes (attribute pairing) to identify the entities attributes. The approach is inserted as a preliminary step of the PPRL (Handshake), and its result (similar attributes) can be used by subsequent steps (Blocking and Comparison). In the proposed approach, the participants generate a privacy-preserving representation (Data Signatures) of the attributes values that are sent to a trusted third-party to identify similar attributes from different data sources. Thus, by eliminating the need to share information about their schemas, consequently, improving the security and privacy of the PPRL task. The evaluation of the approach points out that the quality of attribute pairing is equivalent to a solution that does not consider data privacy, and is capable of preserving data privacy. Ciência da computação Preservação de privacidade Segurança e privacidade Resolução de entidades Integração de dados Schema matching Security and Privacy Entity resolution Data integration
17	Linked Open Data Alignment & Querying Jain, Prateek 27 August 2012 (has links) No description available. Computer Science LINKED OPEN DATA SEMANTIC WEB RELATIONSHIP IDENTIFICATION WEB OF DATA LOD SCHEMA MATCHING FEDERATED QUERYING COMPUTER SCIENCE
18	Serviceorientiertes Text Mining am Beispiel von Entitätsextrahierenden Diensten Pfeifer, Katja 08 September 2014 (has links) (PDF) Der Großteil des geschäftsrelevanten Wissens liegt heute als unstrukturierte Information in Form von Textdaten auf Internetseiten, in Office-Dokumenten oder Foreneinträgen vor. Zur Extraktion und Verwertung dieser unstrukturierten Informationen wurde eine Vielzahl von Text-Mining-Lösungen entwickelt. Viele dieser Systeme wurden in der jüngeren Vergangenheit als Webdienste zugänglich gemacht, um die Verwertung und Integration zu vereinfachen. Die Kombination verschiedener solcher Text-Mining-Dienste zur Lösung konkreter Extraktionsaufgaben erscheint vielversprechend, da so bestehende Stärken ausgenutzt, Schwächen der Systeme minimiert werden können und die Nutzung von Text-Mining-Lösungen vereinfacht werden kann. Die vorliegende Arbeit adressiert die flexible Kombination von Text-Mining-Diensten in einem serviceorientierten System und erweitert den Stand der Technik um gezielte Methoden zur Auswahl der Text-Mining-Dienste, zur Aggregation der Ergebnisse und zur Abbildung der eingesetzten Klassifikationsschemata. Zunächst wird die derzeit existierende Dienstlandschaft analysiert und aufbauend darauf eine Ontologie zur funktionalen Beschreibung der Dienste bereitgestellt, so dass die funktionsgesteuerte Auswahl und Kombination der Text-Mining-Dienste ermöglicht wird. Des Weiteren werden am Beispiel entitätsextrahierender Dienste Algorithmen zur qualitätssteigernden Kombination von Extraktionsergebnissen erarbeitet und umfangreich evaluiert. Die Arbeit wird durch zusätzliche Abbildungs- und Integrationsprozesse ergänzt, die eine Anwendbarkeit auch in heterogenen Dienstlandschaften, bei denen unterschiedliche Klassifikationsschemata zum Einsatz kommen, gewährleisten. Zudem werden Möglichkeiten der Übertragbarkeit auf andere Text-Mining-Methoden erörtert. Textmining Informationsextraktion Dienste Entitätsextraktion Entitätserkennung Schemaabbildung Dienstkombination text mining information extraction services NER named entity extraction schema matching ddc:004 rvk:ST 302 rvk:ST 515
19	UMA ABORDAGEM BASEADA NA ENGENHARIA DIRIGIDA POR MODELOS PARA SUPORTAR MERGING DE BASE DE DADOS HETEROGÊNEAS / AN APPROACH BASED IN MODEL DRIVEN ENGINEERING TO SUPPORT MERGING OF HETEROGENEOUS DATABASE CARVALHO, Marcus Vinícius Ribeiro de 24 February 2014 (has links) Made available in DSpace on 2016-08-17T14:53:26Z (GMT). No. of bitstreams: 1 Dissertacao Marcus Vinicius Ribeiro.pdf: 4694533 bytes, checksum: b84a4bad63b098d054781131cfb9bc26 (MD5) Previous issue date: 2014-02-24 / Model Driven Engineering (MDE) aims to make face to the development, maintenance and evolution of complex software systems, focusing in models and model transformations. This approach can be applied in other domains such as database schema integration. In this research work, we propose a framework to integrate database schema in the MDE context. Metamodels for defining database model, database model matching, database model merging, and integrated database model are proposed in order to support our framework. An algorithm for database model matching and an algorithm for database model merging are presented. We present also, a prototype that extends the MT4MDE and SAMT4MDE tools in order to demonstrate the implementation of our proposed framework, metodology, and algorithms. An illustrative example helps to understand our proposed framework. / A Engenharia Dirigida por Modelos (MDE) fornece suporte para o gerenciamento da complexidade de desenvolvimento, manutenção e evolução de software, através da criação e transformação de modelos. Esta abordagem pode ser utilizada em outros domínios também complexos como a integração de esquemas de base de dados. Neste trabalho de pesquisa, propomos uma metodologia para integrar schema de base de dados no contexto da MDE. Metamodelos para definição de database model, database model matching, database model merging, integrated database model são propostos com a finalidade de apoiar a metodologia. Um algoritmo para database model matching e um algoritmo para database model merging são apresentados. Apresentamos ainda, um protótipo que adapta e estende as ferramentas MT4MDE e SAMT4MDE a fim de demonstrar a implementação do framework, metodologia e algoritmos propostos. Um exemplo ilustrativo ajuda a melhor entender a metodologia apresentada, servindo para explicar os metamodelos e algoritmos propostos neste trabalho. Uma breve avaliação do framework e diretrizes futuras sobre este trabalho são apresentadas. Engenharia dirigida por modelos Integração de base de dados Model Driven Engineering Schema Matching Database Integration Metamodel and Model Matching
20	Serviceorientiertes Text Mining am Beispiel von Entitätsextrahierenden Diensten Pfeifer, Katja 16 June 2014 (has links) Der Großteil des geschäftsrelevanten Wissens liegt heute als unstrukturierte Information in Form von Textdaten auf Internetseiten, in Office-Dokumenten oder Foreneinträgen vor. Zur Extraktion und Verwertung dieser unstrukturierten Informationen wurde eine Vielzahl von Text-Mining-Lösungen entwickelt. Viele dieser Systeme wurden in der jüngeren Vergangenheit als Webdienste zugänglich gemacht, um die Verwertung und Integration zu vereinfachen. Die Kombination verschiedener solcher Text-Mining-Dienste zur Lösung konkreter Extraktionsaufgaben erscheint vielversprechend, da so bestehende Stärken ausgenutzt, Schwächen der Systeme minimiert werden können und die Nutzung von Text-Mining-Lösungen vereinfacht werden kann. Die vorliegende Arbeit adressiert die flexible Kombination von Text-Mining-Diensten in einem serviceorientierten System und erweitert den Stand der Technik um gezielte Methoden zur Auswahl der Text-Mining-Dienste, zur Aggregation der Ergebnisse und zur Abbildung der eingesetzten Klassifikationsschemata. Zunächst wird die derzeit existierende Dienstlandschaft analysiert und aufbauend darauf eine Ontologie zur funktionalen Beschreibung der Dienste bereitgestellt, so dass die funktionsgesteuerte Auswahl und Kombination der Text-Mining-Dienste ermöglicht wird. Des Weiteren werden am Beispiel entitätsextrahierender Dienste Algorithmen zur qualitätssteigernden Kombination von Extraktionsergebnissen erarbeitet und umfangreich evaluiert. Die Arbeit wird durch zusätzliche Abbildungs- und Integrationsprozesse ergänzt, die eine Anwendbarkeit auch in heterogenen Dienstlandschaften, bei denen unterschiedliche Klassifikationsschemata zum Einsatz kommen, gewährleisten. Zudem werden Möglichkeiten der Übertragbarkeit auf andere Text-Mining-Methoden erörtert. info:eu-repo/classification/ddc/004 ddc:004

Search results