• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 4
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Dynamic Energy-Aware Database Storage and Operations

Behzadnia, Peyman 29 March 2018 (has links)
Energy consumption has become a first-class optimization goal in design and implementation of data-intensive computing systems. This is particularly true in the design of database management systems (DBMS), which is one of the most important servers in software stack of modern data centers. Data storage system is one of the essential components of database and has been under many research efforts aiming at reducing its energy consumption. In previous work, dynamic power management (DPM) techniques that make real-time decisions to transition the disks to low-power modes are normally used to save energy in storage systems. In this research, we tackle the limitations of DPM proposals in previous contributions and design a dynamic energy-aware disk storage system in database servers. We introduce a DPM optimization model integrated with model predictive control (MPC) strategy to minimize power consumption of the disk-based storage system while satisfying given performance requirements. It dynamically determines the state of disks and plans for inter-disk data fragment migration to achieve desirable balance between power consumption and query response time. Furthermore, via analyzing our optimization model to identify structural properties of optimal solutions, a fast-solution heuristic DPM algorithm is proposed that can be integrated in large-scale disk storage systems, where finding the most optimal solution might be long, to achieve near-optimal power saving solution within short periods of computational time. The proposed ideas are evaluated through running simulations using extensive set of synthetic workloads. The results show that our solution achieves up to 1.65 times more energy saving while providing up to 1.67 times shorter response time compared to the best existing algorithm in literature. Stream join is a dynamic and expensive database operation that performs join operation in real-time fashion on continuous data streams. Stream joins, also known as window joins, impose high computational time and potentially higher energy consumption compared to other database operations, and thus we also tackle energy-efficiency of stream join processing in this research. Given that there is a strong linear correlation between energy-efficiency and performance of in-memory parallel join algorithms in database servers, we study parallelization of stream join algorithms on multicore processors to achieve energy efficiency and high performance. Equi-join is the most frequent type of join in query workloads and symmetric hash join (SHJ) algorithm is the most effective algorithm to evaluate equi-joins in data streams. To best of our knowledge, we are the first to propose a shared-memory parallel symmetric hash join algorithm on multi-core CPUs. Furthermore, we introduce a novel parallel hash-based stream join algorithm called chunk-based pairing hash join that aims at elevating data throughput and scalability. We also tackle parallel processing of multi-way stream joins where there are more than two input data streams involved in the join operation. To best of our knowledge, we are also the first to propose an in-memory parallel multi-way hash-based stream join on multicore processors. Experimental evaluation on our proposed parallel algorithms demonstrates high throughput, significant scalability, and low latency while reducing the energy consumption. Our parallel symmetric hash join and chunk-based pairing hash join achieve up to 11 times and 12.5 times more throughput, respectively, compared to that of state-of-the-art parallel stream join algorithm. Also, these two algorithms provide up to around 22 times and 24.5 times more throughput, respectively, compared to that of non-parallel (sequential) stream join computation where there is one processing thread.
2

Master Data Integration hub - řešení pro konsolidaci referenčních dat v podniku / Master Data Integration hub - solution for company-wide consolidation of referrential data

Bartoš, Jan January 2011 (has links)
In current information systems the requirement to integrate disparate applications into cohesive package is greatly accented. While well-established technologies facilitating functional and comunicational integration (ESB, message brokes, web services) already exist, tools and methodologies for continuous integration of disparate data sources on enterprise-wide level are still in development. Master Data Management (MDM) is a major approach in the area of data integration and referrential data management in particular. It encompasses the referrential data integration, data quality management and referrential data consolidation, metadata management, master data ownership, principle of accountability for master data and processes related to referrential data management. Thesis is focused on technological aspects of MDM implementation realized via introduction of centrallized repository for master data -- Master Data Integration Hub (MDI Hub). MDI Hub is an application which enables the integration and consolidation of referrential data stored in disparate systems and applications based on predefined workflows. It also handles the master data propagation back to source systems and provides services like dictionaries management and data quality monitoring. Thesis objective is to cover design and implementation aspects of MDI Hub, which forms the application part of MDM. In introduction the motivation for referrential data consolidation is discussed and list of techniques used in MDI Hub solution development is presented. The main part of thesis proposes the design of MDI Hub referrential architecture and suggests the activities performed in process of MDI Hub implementation. Thesis is based on information gained from specialized publications, on knowledge gathererd by delivering projects with companies Adastra and Ataccama and on co-workers know-how and experience. Most important contribution of thesis is comprehensive view on MDI Hub design and MDI Hub referrential architecture proposal. MDI Hub referrential architecture can serve as basis for particular MDI Hub implementation.
3

Datová kvalita, integrita a konsolidace dat v BI / Data quality, data integrity and consolidation of data in BI

Dražil, Michal January 2008 (has links)
This thesis deals with the areas of enterprise data quality, data integrity and data consolidation from the perspective of Business Intelligence (BI), which is currently experiencing significant growth. The aim of this thesis is to provide a comprehensive view of the data quality in terms of BI, to analyze problems in the area of data quality control and to propose options to address them. Moreover, the thesis aims to analyze and assess the features of specialized software tools for data quality. Last but not least aim of this thesis is to identify the critical success factors in the field of data quality in CRM and BI projects. The thesis is divided into two parts. The first (theoretical) part deals with data quality, data integrity and consolidation of data in relation to BI trying to identify key issues, which are related to these areas. The second (practical) part of the thesis deals at first with the features of software tools for data quality and offers their fundamental summary as well as the tools breakdown. This part also provides basic comparison of the few selected software products specialized at the corporate data quality assurance. The practical part hereafter describes addressing the data quality within the specific BI/CRM project conducted by Clever Decision Ltd. This thesis is intended primarily for BI and data quality experts, as well as the others who are interested in these disciplines. The main contribution of this thesis is that it provides comprehensive view not only of data quality itself, but also deals with the issues that are directly related to the corporate data quality assurance. This thesis can serve as a sort of guidance for one of the first implementation phases in the BI projects, which deals with the data integration, data consolidation and solving problems in the area of data quality.
4

[en] TOWARDS A WELL-INTERLINKED WEB THROUGH MATCHING AND INTERLINKING APPROACHES / [pt] INTERLIGANDO RECURSOS NA WEB ATRAVÉS DE ABORDAGENS DE MATCHING E INTERLINKING

BERNARDO PEREIRA NUNES 07 January 2016 (has links)
[pt] Com o surgimento da Linked (Open) Data, uma série de novos e importantes desafios de pesquisa vieram à tona. A abertura de dados, como muitas vezes a Linked Data é conhecida, oferece uma oportunidade para integrar e conectar, de forma homogênea, fontes de dados heterogêneas na Web. Como diferentes fontes de dados, com recursos em comum ou relacionados, são publicados por diferentes editores, a sua integração e consolidação torna-se um verdadeiro desafio. Outro desafio advindo da Linked Data está na criação de um grafo denso de dados na Web. Com isso, a identificação e interligação, não só de recursos idênticos, mas também dos recursos relacionadas na Web, provê ao consumidor (data consumer) uma representação mais rica dos dados e a possibilidade de exploração dos recursos conectados. Nesta tese, apresentamos três abordagens para enfrentar os problemas de integração, consolidação e interligação de dados. Nossa primeira abordagem combina técnicas de informação mútua e programação genética para solucionar o problema de alinhamento complexo entre fontes de dados, um problema raramente abordado na literatura. Na segunda e terceira abordagens, adotamos e ampliamos uma métrica utilizada em teoria de redes sociais para enfrentar o problema de consolidação e interligação de dados. Além disso, apresentamos um aplicativo Web chamado Cite4Me que fornece uma nova perspectiva sobre a pesquisa e recuperação de conjuntos de Linked Open Data, bem como os benefícios da utilização de nossas abordagens. Por fim, uma série de experimentos utilizando conjuntos de dados reais demonstram que as nossas abordagens superam abordagens consideradas como estado da arte. / [en] With the emergence of Linked (Open) Data, a number of novel and notable research challenges have been raised. The openness that often characterises Linked Data offers an opportunity to homogeneously integrate and connect heterogeneous data sources on the Web. As disparate data sources with overlapping or related resources are provided by different data publishers, their integration and consolidation becomes a real challenge. An additional challenge of Linked Data lies in the creation of a well-interlinked graph of Web data. Identifying and linking not only identical Web resources, but also lateral Web resources, provides the data consumer with richer representation of the data and the possibility of exploiting connected resources. In this thesis, we present three approaches that tackle data integration, consolidation and linkage problems. Our first approach combines mutual information and genetic programming techniques for complex datatype property matching, a rarely addressed problem in the literature. In the second and third approaches, we adopt and extend a measure from social network theory to address data consolidation and interlinking. Furthermore, we present a Web-based application named Cite4Me that provides a new perspective on search and retrieval of Linked Open Data sets, as well as the benefits of using our approaches. Finally, we validate our approaches through extensive evaluations using real-world datasets, reporting results that outperform state of the art approaches.

Page generated in 0.0431 seconds