Spelling suggestions: "subject:"achema matching"" "subject:"dechema matching""
1 |
Mapování PMML a BKEF dokumentů v projektu SEWEBAR-CMS / Mapping of PMML and BKEF documents using PHP in the SEWEBAR CMSVojíř, Stanislav January 2010 (has links)
In the data mining process, it is necessary to prepare the source dataset - for example, to select the cutting or grouping of continuous data attributes etc. and use the knowledge from the problem area. Such a preparation process can be guided by background (domain) knowledge obtained from experts. In the SEWEBAR project, we collect the knowledge from experts in a rich XML-based representation language, called BKEF, using a dedicated editor, and save into the database of our custom-tailored (Joomla!-based) CMS system. Data mining tools are then able to generate, from this dataset, mining models represented in the standardized PMML format. It is then necessary to map a particular column (attribute) from the dataset (in PMML) to a relevant 'metaattribute' of the BKEF representation. This specific type of schema mapping problem is addressed in my thesis in terms of algorithms for automatic suggestion of mapping of columns to metaattributes and from values of these columns to BKEF 'metafields'. Manual corrections of this mapping by the user are also supported. The implementation is based on the PHP language and then it was tested on datasets with information about courses taught in 5 universities in the U.S.A. from Illinois Semantic Integration Archive. On this datasets, the auto-mapping suggestion process archieved the precision about 70% and recall about 77% on unknown columns, but when mapping the previously user-mapped data (using implemented learning module), the recall is between 90% and 100%.
|
2 |
QuickMig: automatic schema matching for data migration projectsDrumm, Christian, Schmitt, Matthias, Do, Hong-Hai, Rahm, Erhard 14 December 2018 (has links)
A common task in many database applications is the migration of legacy data from multiple sources into a new one. This requires identifying semantically related elements of the source and target systems and the creation of mapping expressions to transform instances of those elements from the source format to the target format. Currently, data migration is typically done manually, a tedious and timeconsuming process, which is difficult to scale to a high number of data sources. In this paper, we describe QuickMig, a new semi-automatic approach to determining semantic correspondences between schema elements for data migration applications. QuickMig advances the state of the art with a set of new techniques exploiting sample instances, domain ontologies, and reuse of existing mappings to detect not only element correspondences but also their mapping expressions. QuickMig further includes new mechanisms to effectively incorporate domain knowledge of users into the matching process. The results from a comprehensive evaluation using real-world schemas and data indicate the high quality and practicability of the overall approach.
|
3 |
Information Extraction from dataSottovia, Paolo 22 October 2019 (has links)
Data analysis is the process of inspecting, cleaning, extract, and modeling data with the intention of extracting useful information in order to support users in their decisions. With the advent of Big Data, data analysis was becoming more complicated due to the volume and variety of data. This process begins with the acquisition of the data and the selection of the data that is useful for the desiderata analysis. With such amount of data, also expert users are not able to inspect the data and understand if a dataset is suitable or not for their purposes. In this dissertation, we focus on five problems in the broad data analysis process to help users find insights from the data when they do not have enough knowledge about its data. First, we analyze the data description problem, where the user is looking for a description of the input dataset. We introduce data descriptions: a compact, readable and insightful formula of boolean predicates that represents a set of data records. Finding the best description for a dataset is computationally expensive and task-specific; we, therefore, introduce a set of metrics and heuristics for generating meaningful descriptions at an interactive performance. Secondly, we look at the problem of order dependency discovery, which discovers another kind of metadata that may help the user in the understanding of characteristics of a dataset. Our approach leverages the observation that discovering order dependencies can be guided by the discovery of a more specific form of dependencies called order compatibility dependencies. Thirdly, textual data encodes much hidden information. To allow this data to reach its full potential, there has been an increasing interest in extracting structural information from it. In this regard, we propose a novel approach for extracting events that are based on temporal co-reference among entities. We consider an event to be a set of entities that collectively experience relationships between them in a specific period of time. We developed a distributed strategy that is able to scale with the largest on-line encyclopedia available, Wikipedia. Then, we deal with the evolving nature of the data by focusing on the problem of finding synonymous attributes in evolving Wikipedia Infoboxes. Over time, several attributes have been used to indicate the same characteristic of an entity. This provides several issues when we are trying to analyze the content of different time periods. To solve it, we propose a clustering strategy that combines two contrasting distance metrics. We developed an approximate solution that we assess over 13 years of Wikipedia history by proving its flexibility and accuracy. Finally, we tackle the problem of identifying movements of attributes in evolving datasets. In an evolving environment, entities not only change their characteristics, but they sometimes exchange them over time. We proposed a strategy where we are able to discover those cases, and we also test our strategy on real datasets. We formally present the five problems that we validate both in terms of theoretical results and experimental evaluation, and we demonstrate that the proposed approaches efficiently scale with a large amount of data.
|
4 |
Extraction and integration of Web query interfacesKabisch, Thomas 20 October 2011 (has links)
Diese Arbeit fokussiert auf die Integration von Web Anfrageschnittstellen (Web Formularen). Wir identifizieren mehrere Schritte für den Integrationsprozess: Im ersten Schritt werden unbekannte Anfrageschnittstellen auf ihre Anwendungsdomäne hin analysiert. Im zweiten Schritt werden die Anfrageschnittstellen in ein maschinenlesbares Format transformiert (Extraktion). Im dritten Schritt werden Paare semantisch gleicher Elemente zwischen den verschiedenen zu integrierenden Anfragesschnittstellen identifiziert (Matching). Diese Schritte bilden die Grundlage, um Systeme, die eine integrierte Sicht auf die verschiedenen Datenquellen bieten, aufsetzen zu können. Diese Arbeit beschreibt neuartige Lösungen für alle drei der genannten Schritte. Der erste zentrale Beitrag ist ein Exktraktionsalgorithmus, der eine kleine Zahl von Designregeln dazu benutzt, um Schemabäume abzuleiten. Gegenüber früheren Lösungen, welche in der Regel lediglich eine flache Schemarepräsentation anbieten, ist der Schemabaum semantisch reichhaltiger, da er zusätzlich zu den Elementen auch Strukturinformationen abbildet. Der Extraktionsalgorithmus erreicht eine verbesserte Qualität der Element-Extraktion verglichen mit Vergängermethoden. Der zweite Beitrag der Arbeit ist die Entwicklung einer neuen Matching-Methode. Hierbei ermöglicht die Repräsentation der Schnittstellen als Schemabäume eine Verbesserung vorheriger Methoden, indem auch strukturelle Aspekte in den Matching-Algorithmus einfließen. Zusätzlich wird eine globale Optimierung durchgeführt, welche auf der Theorie der bipartiten Graphen aufbaut. Als dritten Beitrag entwickelt die Arbeit einen Algorithms für eine Klassifikation von Schnittstellen nach Anwendungsdomänen auf Basis der Schemabäume und den abgeleiteten Matches. Zusätzlich wird das System VisQI vorgestellt, welches die entwickelten Algorithmen implementiert und eine komfortable graphische Oberfläche für die Unterstützung des Integrationsprozesses bietet. / This thesis focuses on the integration of Web query interfaces. We model the integration process in several steps: First, unknown interfaces have to be classified with respect to their application domain (classification); only then a domain-wise treatment is possible. Second, interfaces must be transformed into a machine readable format (extraction) to allow their automated analysis. Third, as a pre-requisite to integration across databases, pairs of semantically similar elements among multiple interfaces need to be identified (matching). Only if all these tasks have been solved, systems that provide an integrated view to several data sources can be set up. This thesis presents new algorithms for each of these steps. We developed a novel extraction algorithm that exploits a small set of commonsense design rules to derive a hierarchical schema for query interfaces. In contrast to prior solutions that use mainly flat schema representations, the hierarchical schema better represents the structure of the interfaces, leading to better accuracy of the integration step. Next, we describe a multi-step matching method for query interfaces which builds on the hierarchical schema representation. It uses methods from the theory of bipartite graphs to globally optimize the matching result. As a third contribution, we present a new method for the domain classification problem of unknown interfaces that, for the first time, combines lexical and structural properties of schemas. All our new methods have been evaluated on real-life datasets and perform superior to previous works in their respective fields. Additionally, we present the system VisQI that implements all introduced algorithmic steps and provides a comfortable graphical user interface to support the integration process.
|
5 |
[en] CONCEPTUAL SCHEMA MATCHING BASED ON SIMILARITY HEURISTICS / [pt] ALINHAMENTO DE ESQUEMAS CONCEITUAIS BASEADO EM HEURÍSTICAS DE SIMILARIDADELUIZ ANDRE PORTES PAES LEME 07 January 2016 (has links)
[pt] Alinhamento de esquema é uma questão fundamental em aplicações de banco de dados, tais como mediação de consultas, integração de banco de dados e armazéns de dados. Nesta tese, abordamos inicialmente o alinhamento de catálogos. Um catálogo é um banco de dados simples que contém informações sobre conjuntos de objetos, tipicamente classificados usando-se termos de um dado tesauro. Inicialmente apresentamos uma técnica de alinhamento baseada na noção de similaridade, que se aplica a pares de tesauros e de listas de propriedades. Descrevemos, então, o alinhamento baseado na noção de informação mútua e introduzimos variações que exploram certas heurísticas. Ao final, discutimos resultados experimentais que avaliam a precisão do método e comparam a influência das heurísticas. Após as técnicas para alinhamento de catálogos, nos concentramos no problema mais complexo de alinhamento de dois esquemas descritos em um subconjunto de OWL. Adotamos uma técnica baseada em instâncias e, por isso, assumimos que conjuntos de instâncias de cada esquema estão disponíveis. Decompomos este problema nos subproblemas de alinhamento de vocabulário e de alinhamento de conceitos. Introduzimos também condições suficientes para garantir que o alinhamento de vocabulário induz um alinhamento de conceitos correto. Em seguida, descrevemos uma técnica de alinhamento de esquemas OWL baseada no conceito de similaridade. Finalmente, avaliamos a precisão da técnica usando dados disponíveis na Web. De forma diferente de outras técnicas anteriores baseadas em instâncias, o processo de alinhamento que descrevemos usa funções de similaridade para induzir alinhamento de vocabulários de uma forma não trivial. Ilustramos, também, que a estrutura de esquemas OWL pode nos levar a mapeamentos de conceitos errados e indicamos como evitar tais problemas. / [en] Schema matching is a fundamental issue in many database applications, such as query mediation, database integration, catalog matching and data warehousing. In this thesis, we first address hot to match catalogue schemas. A catalogue is a simple database that holds information about a set of objects, typically classified using terms taken from a given thesaurus. We introduce a matching approach, based on the notion of similarity, which applies to pairs of thesauri and to pairs of lists of properties. We then describe matchings based on cooccurrence of information and introduce variations that explore certain heuristics. Lastly, we discuss experimental results that evaluate the precision of the matchings introduced and that measure the influence of the heuristics. We then focus on the mre complex problem of matching two schemas that belong to an expressive OWL dialect. We adopt an instance-based approach and, therefore, assume that a set of instances from each schema is available. We first decompose the problem of OWL schema matching into the problem of vocabulary matching and the problem of concept mapping. We also introduce sufficient conditions guaranteeing that a vocabulary matching induces a correct concept mapping. Next, we describe OWL schema matching technique based on the notion of similarity. Lastly, we evaluate the precision of the technique using data available on the Web. Unlike any of the previous instance-based techniques, the matching process we describe uses similarity functions to induce vocabulary matchings in a non-trivial, coping with an expressive OWL dialect. We also illustrate, through a set of examples, that the structure of OWL schemas may lead to incorrect concept mappings and indicate how to avoid such pitfalls.
|
6 |
Casamento de esquemas XML e esquemas relacionais / Matching of XML schemas and relational schemaMergen, Sérgio Luis Sardi January 2005 (has links)
O casamento entre esquemas XML e esquemas relacionais é necessário em diversas aplicações, tais como integração de informação e intercâmbio de dados. Tipicamente o casamento de esquemas é um processo manual, talvez suportado por uma interface grá ca. No entanto, o casamento manual de esquemas muito grandes é um processo dispendioso e sujeito a erros. Disto surge a necessidade de técnicas (semi)-automáticas de casamento de esquemas que auxiliem o usuário fornecendo sugestões de casamento, dessa forma reduzindo o esforço manual aplicado nesta tarefa. Apesar deste tema já ter sido estudado na literatura, o casamento entre esquemas XML e esquemas relacionais é ainda um tema em aberto. Isto porque os trabalhos existentes ou se aplicam para esquemas de nidos no mesmo modelo, ou são genéricos demais para o problema em questão. O objetivo desta dissertação é o desenvolvimento de técnicas especí cas para o casamento de esquemas XML e esquemas relacionais. Tais técnicas exploram as particularidades existentes entre estes esquemas para inferir valores de similaridade entre eles. As técnicas propostas são avaliadas através de experimentos com esquemas do mundo real. / The matching between XML schemas and relational schemas has many applications, such as information integration and data exchange. Typically, schema matching is done manually by domain experts, sometimes using a graphical tool. However, the matching of large schemas is a time consuming and error-prone task. The use of (semi-)automatic schema matching techniques can help the user in nding the correct matches, thereby reducing his labor. The schema matching problem has already been addressed in the literature. Nevertheless, the matching of XML schemas and relational schemas is still an open issue. This comes from the fact that the existing work is whether speci c for schemas designed in the same model, or too generic for the problem in discussion. The mais goal of this dissertation is to develop speci c techniques for the matching of XML schemas and relational schemas. Such techniques exploit the particularities found when analyzing the two schemas together, and use these cues to leverage the matching process. The techniques are evaluated by running experiments with real-world schemas.
|
7 |
Casamento de esquemas XML e esquemas relacionais / Matching of XML schemas and relational schemaMergen, Sérgio Luis Sardi January 2005 (has links)
O casamento entre esquemas XML e esquemas relacionais é necessário em diversas aplicações, tais como integração de informação e intercâmbio de dados. Tipicamente o casamento de esquemas é um processo manual, talvez suportado por uma interface grá ca. No entanto, o casamento manual de esquemas muito grandes é um processo dispendioso e sujeito a erros. Disto surge a necessidade de técnicas (semi)-automáticas de casamento de esquemas que auxiliem o usuário fornecendo sugestões de casamento, dessa forma reduzindo o esforço manual aplicado nesta tarefa. Apesar deste tema já ter sido estudado na literatura, o casamento entre esquemas XML e esquemas relacionais é ainda um tema em aberto. Isto porque os trabalhos existentes ou se aplicam para esquemas de nidos no mesmo modelo, ou são genéricos demais para o problema em questão. O objetivo desta dissertação é o desenvolvimento de técnicas especí cas para o casamento de esquemas XML e esquemas relacionais. Tais técnicas exploram as particularidades existentes entre estes esquemas para inferir valores de similaridade entre eles. As técnicas propostas são avaliadas através de experimentos com esquemas do mundo real. / The matching between XML schemas and relational schemas has many applications, such as information integration and data exchange. Typically, schema matching is done manually by domain experts, sometimes using a graphical tool. However, the matching of large schemas is a time consuming and error-prone task. The use of (semi-)automatic schema matching techniques can help the user in nding the correct matches, thereby reducing his labor. The schema matching problem has already been addressed in the literature. Nevertheless, the matching of XML schemas and relational schemas is still an open issue. This comes from the fact that the existing work is whether speci c for schemas designed in the same model, or too generic for the problem in discussion. The mais goal of this dissertation is to develop speci c techniques for the matching of XML schemas and relational schemas. Such techniques exploit the particularities found when analyzing the two schemas together, and use these cues to leverage the matching process. The techniques are evaluated by running experiments with real-world schemas.
|
8 |
Casamento de esquemas XML e esquemas relacionais / Matching of XML schemas and relational schemaMergen, Sérgio Luis Sardi January 2005 (has links)
O casamento entre esquemas XML e esquemas relacionais é necessário em diversas aplicações, tais como integração de informação e intercâmbio de dados. Tipicamente o casamento de esquemas é um processo manual, talvez suportado por uma interface grá ca. No entanto, o casamento manual de esquemas muito grandes é um processo dispendioso e sujeito a erros. Disto surge a necessidade de técnicas (semi)-automáticas de casamento de esquemas que auxiliem o usuário fornecendo sugestões de casamento, dessa forma reduzindo o esforço manual aplicado nesta tarefa. Apesar deste tema já ter sido estudado na literatura, o casamento entre esquemas XML e esquemas relacionais é ainda um tema em aberto. Isto porque os trabalhos existentes ou se aplicam para esquemas de nidos no mesmo modelo, ou são genéricos demais para o problema em questão. O objetivo desta dissertação é o desenvolvimento de técnicas especí cas para o casamento de esquemas XML e esquemas relacionais. Tais técnicas exploram as particularidades existentes entre estes esquemas para inferir valores de similaridade entre eles. As técnicas propostas são avaliadas através de experimentos com esquemas do mundo real. / The matching between XML schemas and relational schemas has many applications, such as information integration and data exchange. Typically, schema matching is done manually by domain experts, sometimes using a graphical tool. However, the matching of large schemas is a time consuming and error-prone task. The use of (semi-)automatic schema matching techniques can help the user in nding the correct matches, thereby reducing his labor. The schema matching problem has already been addressed in the literature. Nevertheless, the matching of XML schemas and relational schemas is still an open issue. This comes from the fact that the existing work is whether speci c for schemas designed in the same model, or too generic for the problem in discussion. The mais goal of this dissertation is to develop speci c techniques for the matching of XML schemas and relational schemas. Such techniques exploit the particularities found when analyzing the two schemas together, and use these cues to leverage the matching process. The techniques are evaluated by running experiments with real-world schemas.
|
9 |
An Instance based Approach to Find the Types of Correspondence between the Attributes of Heterogeneous DatasetsRiaz, Muhammad Atif, Munir, Sameer January 2012 (has links)
Context: Determining attribute correspondence is the most important, time consuming and knowledge intensive part during databases integration. It is also used in other data manipulation applications such as data warehousing, data design, semantic web and e-commerce. Objectives: In this thesis the aim is to investigate how to find the types of correspondence between the attributes of heterogeneous datasets when schema design information of the data sets is unknown. Methods: A literature review was conducted to extract the knowledge related to the approaches that are used to find the correspondence between the attributes of heterogeneous datasets. Extracted knowledge from the literature review is used in developing an instance based approach for finding types of correspondence between the attributes of heterogeneous datasets when schema design information is unknown. To validate the proposed approach an experiment was conducted in the real environment using the data provided by the Telecom Industry (Ericsson) Karlskrona. Evaluation of the results was carried using the well known and mostly used measures from information retrieval field precision, recall and F-measure. Results: To find the types of correspondence between the attributes of heterogeneous datasets, good results depend on the ability of the algorithm to avoid the unmatched pairs of rows during the Row Similarity Phase. An evaluation of proposed approach is performed via experiments. We found 96.7% (average of three experiments) F-measure. Conclusions: The analysis showed that the proposed approach was feasible to be used and it provided users a mean to find the corresponding attributes and the types of correspondence between corresponding attributes, based on the information extracted from the similar pairs of rows from the heterogeneous data sets where their similarity based on the same common primary keys values.
|
10 |
Analyse der Wiederverwendung zum Schema-MatchingMaßmann, Sabine 20 October 2017 (has links)
Datenbanken finden in mehr und mehr Bereichen einen Einsatz. Angestrebter Datenaustausch zur Vernetzung von Informationen und Gewinnung zusätzlicher Kenntnisse ist wegen der Heterogenität von den Datenbanken problembehaftet. Schema- Matching beinhaltet durch das Erzeugen von Korrespondenzen zwischen Schemata eine Lösungsmöglichkeit für zumindestens einen Teil dieser Probleme. Viele Ansätze wurden schon für das Schema-Matching entwickelt und werden in dieser Arbeit dargestellt. Es werden Strategien zur Wiederverwendung näher erläutert und Erweiterungen für die Mapping-Wiederverwendung vorgestellt. Die neuen Strategien, die in dieser Diplomarbeit entstehen, wurden in dem Schema-Matching-Prototyp COMA++ integriert und zusammen mit anderen Strategien evaluiert.
|
Page generated in 0.0769 seconds