• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 2
  • Tagged with
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Automatische Mapping-Verarbeitung auf Webdaten

Thor, Andreas 19 October 2017 (has links)
Das World Wide Web stellt eine Vielzahl von Informationen zur Verfügung. Neben dem Inhalt der Webseiten stellt ihre Verknüpfung durch Verweise zwischen ihnen eine wichtige Informationsquelle dar. So unterstützen z. B. Webseitenempfehlungen auf interessante andere Produkte Kunden groÿer E-Commerce-Websites bei der Suche nach dem passenden Produkt. Zusätzlich finden sich auf Seiten verschiedener Websites auch Informationen zum gleichen Objekt der realen Welt, so dass durch Links zwischen ihnen alle verfügbaren Informationen aus mehreren Datenquellen für entsprechende Analysen, z. B. einem Preisvergleich, erreichbar werden. In dieser Arbeit werden Verweise als sogenannte Mappings zusammengefasst, wobei ein Mapping eine Menge paarweiser Zuordnungen, sogenannter Korrespondenzen, zwischen Instanzdaten (Objekten) repräsentiert. Dabei wird mit dem Begriff Korrespondenz nicht nur die Verknüpfung gleicher Objekte gemeint, sondern allgemein eine semantische Beziehung zwischen zwei Objekten ausgedrückt. Innerhalb dieser Dissertation steht die Verarbeitung solcher Mappings, d. h. ihre Erzeugung, Optimierung und Verwendung, im Mittelpunkt.
2

Entwicklung eines Data Warehouses zur Durchführung von Zitierungsanalysen

Schnerwitzki, Tino 16 November 2017 (has links)
In vergangenen Publikationen wurden bereits verschiedene Zitierungsanalysen durchgeführt. Jedoch stets auf unterschiedlichen Datenquellen, was einen direkten Vergleich der jeweiligen Ergebnisse verhindert. Ziel dieser Arbeit soll es sein, eine Grundlage zur flexiblen Durchführung von Zitierungsanalysen zu schaffen. Durch die Entwicklung eines Data Warehouses sollen verschiedene Datenquellen integriert und konsolidiert werden, um eine Vielfalt von Analyseperspektiven und Berechnungsverfahren auf einem einheitlichen Datenbestand zu ermöglichen. Dabei wird insbesondere auf die Besonderheiten bei der Nutzung von Webdatenquellen, als wie verschiedene Methoden zur Datenbereinigung eingegangen.
3

Query-Time Data Integration

Eberius, Julian 16 December 2015 (has links) (PDF)
Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established. This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need. To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort. Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections.
4

Query-Time Data Integration

Eberius, Julian 10 December 2015 (has links)
Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established. This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need. To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort. Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections.

Page generated in 0.0366 seconds