• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 121
  • 114
  • 88
  • 69
  • 38
  • 12
  • 7
  • 7
  • 5
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 494
  • 494
  • 115
  • 108
  • 99
  • 81
  • 74
  • 73
  • 69
  • 69
  • 63
  • 56
  • 56
  • 53
  • 49
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
171

Data warehouse schema design /

Lechtenbörger, Jens. January 2001 (has links)
Univ., Diss.--Münster (Westfalen), 2001.
172

SB-Index : um índice espacial baseado em bitmap para data warehouse geográfico

Siqueira, Thiago Luís Lopes 26 August 2009 (has links)
Made available in DSpace on 2016-06-02T19:05:38Z (GMT). No. of bitstreams: 1 2652.pdf: 3404746 bytes, checksum: b3a10a77ac70bae2b29efed871dc75e4 (MD5) Previous issue date: 2009-08-26 / Universidade Federal de Minas Gerais / Geographic Data Warehouses (GDW) became one of the main technologies used in decision-making processes and spatial analysis since they provide the integration of Data Warehouses, On-Line Analytical Processing and Geographic Information Systems. As a result, a GDW enables spatial analyses together with agile and flexible multidimensional analytical queries over huge volumes of data. On the other hand, there is a challenge in a GDW concerning the query performance, which consists of retrieving data related to ad-hoc spatial query windows and avoiding the high cost of star-joins. Clearly, mechanisms to provide efficient query processing, as index structures, are essential. In this master s thesis, a novel index for GDW is introduced, namely the SB-index, which is based on the Bitmap Join Index and the Minimum Bounding Rectangle. The SB-index inherits the Bitmap Index legacy techniques and introduces them in GDW, as well as it enables support for predefined spatial attribute hierarchies. The SB-index validation was performed through experimental performance tests. Comparisons among the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicated that the SB-index significantly improves the elapsed time in query processing from 76% up to 96% with regard to queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. In addition, the impact of the increase in data volume on the performance was analyzed. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Moreover, in this master s thesis there is an experimental investigation on how does the spatial data redundancy affect query response time and storage requirements in a GDW? . Redundant and non-redundant GDW schemas were compared, concluding that redundancy is related to high performance losses. Then, aiming at improving query performance, the SB-index performance was evaluated on the redundant GDW schema. The results pointed out that SB-index significantly improves the elapsed time in query processing from 25% up to 99%. Finally, a specific enhancement of the SB-index was developed in order to deal with spatial data redundancy. With this enhancement, the minimum performance gain observed became 80%. / O Data Warehouse Geográfico (DWG) tornou-se uma das principais tecnologias de suporte à decisão, pois promove a integração de data warehouses, On-Line Analytical Processing e Sistemas de Informações Geográficas. Por isso, um DWG viabiliza a análise espacial aliada à execução de consultas analíticas multidimensionais envolvendo enormes volumes de dados. Por outro lado, existe um desafio relativo ao desempenho no processamento de consultas, que envolvem janelas de consulta espaciais ad-hoc e várias junções entre tabelas. Claramente, mecanismos para aumentar o desempenho do processamento de consultas, como as estruturas de indexação, são essenciais. Nesta dissertação, propõe-se um novo índice para DWG chamado SB-index, baseado no Índice Bitmap de Junção e no Retângulo Envolvente Mínimo. O SB-index herda todo o legado de técnicas do Índice Bitmap e o introduz no DWG. Além disso, ele provê suporte a hierarquias de atributos espaciais predefinidas. Este índice foi validado por meio de testes experimentais de desempenho. Comparações entre o SB-index, a junção estrela auxiliada pela R-tree e a junção-estrela auxiliada por GiST indicaram que o SB-index diminui significativamente o tempo de resposta do processamento de consultas roll-up e drill-down relacionadas aos predicados espaciais intersecta , está contido e contém , promovendo ganhos de 76% a 96%. Mostrou-se ainda que a variação do volume de dados não prejudica o desempenho do SB-index. Esta dissertação também investiga a seguinte questão: como a redundância de dados espaciais afeta um DWG? . Foram comparados os esquemas de DWG redundante e não-redundante. Observou-se que a redundância de dados espaciais acarreta prejuízos ao tempo de resposta das consultas e aos requisitos de armazenamento do DWG. Então, visando melhorar o desempenho do processamento de consultas, introduziu-se o SB-index no esquema de DWG redundante. Os ganhos de desempenho obtidos a partir desta ação variaram de 25% a 99%. Por fim, foi proposta uma melhoria sobre o SB-index a fim de lidar especificamente com a questão da redundância de dados espaciais. A partir desta melhoria, o ganho mínimo de desempenho tornou-se 80%.
173

Proposta de um sistema de apoio à decisão para controle e gerenciamento agrícola em usinas de açúcar e álcool / Proposal for systems of support to the decision for the control and management agricultural in sugar and ethanol plants

Renato Tavares 04 July 2008 (has links)
Aliada à crescente evolução da computação, dois fatores também começaram a receber maior atenção: o conhecimento e a informação. Esta evolução faz com que a informação possa estar disponibilizada a todos, contribuindo de certa forma, para o auxílio na aquisição de conhecimento e visando posterior tomada de decisão, observa-se também a grande importância dos bancos de dados. Este trabalho apresenta a proposta de um Sistema de Apoio à Decisão (SAD), utilizando-se de avançadas metodologias de armazenamento de dados em poderosos bancos de dados (Data Warehouse), para o controle e gerenciamento agrícola em usinas de açúcar e álcool. Foi desenvolvido um ambiente estruturado, extensível, projetado para a análise de dados não voláteis, logicamente e fisicamente transformados, provenientes de diversas aplicações, alinhados com a estrutura da empresa, atualizados e mantidos por um longo período de tempo, referidos em termos utilizados no negócio e sumarizados para análise rápida. Com a implantação destas novas tecnologias, a empresa estará apta a obter informações do nível gerencial e estratégico para ajudar nos seus processos de tomada de decisão, o que antes não era possível com os atuais sistemas de informação existentes na empresa. / Information and knowledge are receiving more attention with the growing of computing evolution. This evolution provides available information to everybody contributing for the acquisition of knowledge, transforming the database in an important key of this evolution process. This work presents a research about the use of technologies of systems of support to the decision by using methodologies of data storage in powerful data bases (Data Warehouse) for the control and management agricultural in sugar and ethanol plants. An environment was developed for the analysis of non-volatile data, extensible, projected and physically transformed, proceeding from diverse applications, aligned with the structure of the company, updated and kept for a long period of time, related in business\' terms and grouped for fast analysis. With the implantation of these new technologies, the company will be able to obtain information of the managerial and strategic level to help on its decision making processes, what was not possible before with the current systems of information existent at the company.
174

Informationsförvaltning inom en stor organisation : En fallstudie på Trafikverket / : Information management within large organizations

Glad, Therese January 2016 (has links)
Denna studie syftar till att undersöka hur en stor organisation arbetar med förvaltning av information genom att undersöka dess nuvarande informationsförvaltning, samt undersöka eventuella förslag till framtida informationsförvaltning. Vidare syftar studien också till att undersöka hur en stor organisation kan etablera en tydlig styrning, samverkan, hantering och ansvars- och rollfördelning kring informationsförvaltning. Denna studie är kvalitativ, där datainsamlingen sker genom dokumentstudier och intervjuer. Studien bedrivs med abduktion och är en normativ fallstudie då studiens mål är att ge vägledning och föreslå åtgärder till det fall som uppdragsgivaren har bett mig att studera. Fallet i denna studie är ett typiskt fall, då studiens resultat kan vara i intresse för fler än studiens uppdragsgivare, exempelvis organisationer med liknande informationsmiljö. För att samla teori till studien så har jag genomfört litteraturstudier om ämnen som är relevanta för studiens syfte: Informationsförvaltning, Business Intelligence, Data Warehouse och dess arkitektur, samt Business Intelligence Competency Center. Denna studie bidrar med praktiskt kunskapsbidrag, då studien ger svar på praktiska problem. Uppdragsgivaren har haft praktiska problem i och med en icke fungerade informationsförvaltning, och denna studie har bidragit med förslag på framtida informationsförvaltning. Förslaget på framtida informationsförvaltning involverar ett centraliserat Data Warehouse, samt utvecklingen utav en verksamhet som hanterar informationsförvaltning och styrningen kring informationsförvaltningen inom hela organisationen. / This study aims to investigate how large organizations can work with information management by examining an organization's existing information management, and investigate possible future proposals to information management. Furthermore, the study aims to investigate how an organization can establish a clear direction, collaboration, management and responsibilities and roles regarding information management. This study is qualitative, where data collection occurs through document studies and interviews. The study is conducted by an abductive research approach and the study is a normative case study as the study's goal is to provide guidance and propose measures to the case that the collaboration partner has asked me to study. The case in this study is a typical instance, because the result will be representative of more than the study's collaboration partner, such as other large organizations with similar cases concerning information management. To collect theory to the study I conducted literature reviews on topics that are relevant to the purpose of the study: Information management, Business Intelligence, Data Warehouse and its architecture, as well as Business Intelligence Competency Center. This study contributes with practical knowledge, because the study provides answers to practical problems the collaboration partner has expressed within the non-operated information management, and this study contributes with suggestions for future information management. The suggestions involve a centralized Data Warehouse and the development of a function that handles information management, and disseminate the governance of information management throughout the organization.
175

Vývoj a implementace BI nadstavby nad systémem MS Dynamics NAV / Development and Implementation of BI sollution on the MS Dynamics NAV system

Votruba, Tomáš January 2013 (has links)
The dissertation describes the process of creating Business Intelligence solution over the ERP system MS Dynamics NAV. The successful development and deployment of prototype version is also the main goal of this work. The work is divided into two parts in accordance with sub-objectives: theoretical and practical. The theoretical part starts witch research of works with similar theme and then continues with description of the current state of Czech market with BI solutions on the MS Dynamics NAV, basic principles of Business Intelligence development and then the description of the method of setting benchmarks using the Balanced Scorecard. The practical part describes the development of Business Intelligence solution starting with identifying potential customer as a wholesale distributor company and setting the requirements for BI. The practical part continues through the design of a data warehouse in MS SQL Server, creation of data pump connected to the MS Dynamics NAV database, filling the data warehouse, OLAP cubes creating and ends with an example of output reports form working solution using MS Excel 2013.
176

Data Warehouses na era do Big Data: processamento eficiente de Junções Estrela no Hadoop / Data Warehouses na era do Big Data: processamento eficiente de Junções Estrela no Hadoop

Jaqueline Joice Brito 12 December 2017 (has links)
The era of Big Data is here: the combination of unprecedented amounts of data collected every day with the promotion of open source solutions for massively parallel processing has shifted the industry in the direction of data-driven solutions. From recommendation systems that help you find your next significant one to the dawn of self-driving cars, Cloud Computing has enabled companies of all sizes and areas to achieve their full potential with minimal overhead. In particular, the use of these technologies for Data Warehousing applications has decreased costs greatly and provided remarkable scalability, empowering business-oriented applications such as Online Analytical Processing (OLAP). One of the most essential primitives in Data Warehouses are the Star Joins, i.e. joins of a central table with satellite dimensions. As the volume of the database scales, Star Joins become unpractical and may seriously limit applications. In this thesis, we proposed specialized solutions to optimize the processing of Star Joins. To achieve this, we used the Hadoop software family on a cluster of 21 nodes. We showed that the primary bottleneck in the computation of Star Joins on Hadoop lies in the excessive disk spill and overhead due to network communication. To mitigate these negative effects, we proposed two solutions based on a combination of the Spark framework with either Bloom filters or the Broadcast technique. This reduced the computation time by at least 38%. Furthermore, we showed that the use of full scan may significantly hinder the performance of queries with low selectivity. Thus, we proposed a distributed Bitmap Join Index that can be processed as a secondary index with loose-binding and can be used with random access in the Hadoop Distributed File System (HDFS). We also implemented three versions (one in MapReduce and two in Spark) of our processing algorithm that uses the distributed index, which reduced the total computation time up to 88% for Star Joins with low selectivity from the Star Schema Benchmark (SSB). Because, ideally, the system should be able to perform both random access and full scan, our solution was designed to rely on a two-layer architecture that is framework-agnostic and enables the use of a query optimizer to select which approaches should be used as a function of the query. Due to the ubiquity of joins as primitive queries, our solutions are likely to fit a broad range of applications. Our contributions not only leverage the strengths of massively parallel frameworks but also exploit more efficient access methods to provide scalable and robust solutions to Star Joins with a significant drop in total computation time. / A era do Big Data chegou: a combinação entre o volume dados coletados diarimente com o surgimento de soluções de código aberto para o processamento massivo de dados mudou para sempre a indústria. De sistemas de recomendação que assistem às pessoas a encontrarem seus pares românticos à criação de carros auto-dirigidos, a Computação em Nuvem permitiu que empresas de todos os tamanhos e áreas alcançassem o seu pleno potencial com custos reduzidos. Em particular, o uso dessas tecnologias em aplicações de Data Warehousing reduziu custos e proporcionou alta escalabilidade para aplicações orientadas a negócios, como em processamento on-line analítico (Online Analytical Processing- OLAP). Junções Estrelas são das primitivas mais essenciais em Data Warehouses, ou seja, consultas que realizam a junções de tabelas de fato com tabelas de dimensões. Conforme o volume de dados aumenta, Junções Estrela tornam-se custosas e podem limitar o desempenho das aplicações. Nesta tese são propostas soluções especializadas para otimizar o processamento de Junções Estrela. Para isso, utilizamos a família de software Hadoop em um cluster de 21 nós. Nós mostramos que o gargalo primário na computação de Junções Estrelas no Hadoop reside no excesso de operações escrita do disco (disk spill) e na sobrecarga da rede devido a comunicação excessiva entre os nós. Para reduzir estes efeitos negativos, são propostas duas soluções em Spark baseadas nas técnicas Bloom filters ou Broadcast, reduzindo o tempo total de computação em pelo menos 38%. Além disso, mostramos que a realização de uma leitura completa das tables (full table scan) pode prejudicar significativamente o desempenho de consultas com baixa seletividade. Assim, nós propomos um Índice Bitmap de Junção distribuído que é implementado como um índice secundário que pode ser combinado com acesso aleatório no Hadoop Distributed File System (HDFS). Nós implementamos três versões (uma em MapReduce e duas em Spark) do nosso algoritmo de processamento baseado nesse índice distribuído, os quais reduziram o tempo de computação em até 77% para Junções Estrelas de baixa seletividade do Star Schema Benchmark (SSB). Como idealmente o sistema deve ser capaz de executar tanto acesso aleatório quanto full scan, nós também propusemos uma arquitetura genérica que permite a inserção de um otimizador de consultas capaz de selecionar quais abordagens devem ser usadas dependendo da consulta. Devido ao fato de consultas de junção serem frequentes, nossas soluções são pertinentes a uma ampla gama de aplicações. A contribuições desta tese não só fortalecem o uso de frameworks de processamento de código aberto, como também exploram métodos mais eficientes de acesso aos dados para promover uma melhora significativa no desempenho Junções Estrela.
177

Avaliação do Star Schema Benchmark aplicado a bancos de dados NoSQL distribuídos e orientados a colunas / Evaluation of the Star Schema Benchmark applied to NoSQL column-oriented distributed databases systems

Lucas de Carvalho Scabora 06 May 2016 (has links)
Com o crescimento do volume de dados manipulado por aplicações de data warehousing, soluções centralizadas tornam-se muito custosas e enfrentam dificuldades para tratar a escalabilidade do volume de dados. Nesse sentido, existe a necessidade tanto de se armazenar grandes volumes de dados quanto de se realizar consultas analíticas (ou seja, consultas OLAP) sobre esses dados volumosos de forma eficiente. Isso pode ser facilitado por cenários caracterizados pelo uso de bancos de dados NoSQL gerenciados em ambientes paralelos e distribuídos. Dentre os desafios relacionados a esses cenários, destaca-se a necessidade de se promover uma análise de desempenho de aplicações de data warehousing que armazenam os dados do data warehouse (DW) em bancos de dados NoSQL orientados a colunas. A análise experimental e padronizada de diferentes sistemas é realizada por meio de ferramentas denominadas benchmarks. Entretanto, benchmarks para DW foram desenvolvidos majoritariamente para bancos de dados relacionais e ambientes centralizados. Nesta pesquisa de mestrado são investigadas formas de se estender o Star Schema Benchmark (SSB), um benchmark de DW centralizado, para o banco de dados NoSQL distribuído e orientado a colunas HBase. São realizadas propostas e análises principalmente baseadas em testes de desempenho experimentais considerando cada uma das quatro etapas de um benchmark, ou seja, esquema e carga de trabalho, geração de dados, parâmetros e métricas, e validação. Os principais resultados obtidos pelo desenvolvimento do trabalho são: (i) proposta do esquema FactDate, o qual otimiza consultas que acessam poucas dimensões do DW; (ii) investigação da aplicabilidade de diferentes esquemas a cenários empresariais distintos; (iii) proposta de duas consultas adicionais à carga de trabalho do SSB; (iv) análise da distribuição dos dados gerados pelo SSB, verificando se os dados agregados pelas consultas OLAP estão balanceados entre os nós de um cluster; (v) investigação da influência de três importantes parâmetros do framework Hadoop MapReduce no processamento de consultas OLAP; (vi) avaliação da relação entre o desempenho de consultas OLAP e a quantidade de nós que compõem um cluster; e (vii) proposta do uso de visões materializadas hierárquicas, por meio do framework Spark, para otimizar o desempenho no processamento de consultas OLAP consecutivas que requerem a análise de dados em níveis progressivamente mais ou menos detalhados. Os resultados obtidos representam descobertas importantes que visam possibilitar a proposta futura de um benchmark para DWs armazenados em bancos de dados NoSQL dentro de ambientes paralelos e distribuídos. / Due to the explosive increase in data volume, centralized data warehousing applications become very costly and are facing several problems to deal with data scalability. This is related to the fact that these applications need to store huge volumes of data and to perform analytical queries (i.e., OLAP queries) against these voluminous data efficiently. One solution is to employ scenarios characterized by the use of NoSQL databases managed in parallel and distributed environments. Among the challenges related to these scenarios, there is a need to investigate the performance of data warehousing applications that store the data warehouse (DW) in column-oriented NoSQL databases. In this context, benchmarks are widely used to perform standard and experimental analysis of distinct systems. However, most of the benchmarks for DW focus on relational database systems and centralized environments. In this masters research, we investigate how to extend the Star Schema Benchmark (SSB), which was proposed for centralized DWs, to the distributed and column-oriented NoSQL database HBase. We introduce proposals and analysis mainly based on experimental performance tests considering each one of the four steps of a benchmark, i.e. schema and workload, data generation, parameters and metrics, and validation. The main results described in this masters research are described as follows: (i) proposal of the FactDate schema, which optimizes queries that access few dimensions of the DW; (ii) investigation of the applicability of different schemas for different business scenarios; (iii) proposal of two additional queries to the SSB workload; (iv) analysis of the data distribution generated by the SSB, verifying if the data aggregated by OLAP queries are balanced between the nodes of a cluster; (v) investigation of the influence caused by three important parameters of the Hadoop MapReduce framework in the OLAP query processing; (vi) evaluation of the relationship between the OLAP query performance and the number of nodes of a cluster; and (vii) employment of hierarchical materialized views using the Spark framework to optimize the processing performance of consecutive OLAP queries that require progressively more or less aggregated data. These results represent important findings that enable the future proposal of a benchmark for DWs stored in NoSQL databases and managed in parallel and distributed environments.
178

Qualitätsgetriebene Datenproduktionssteuerung in Echtzeit-Data-Warehouse-Systemen

Thiele, Maik 31 May 2010 (has links)
Wurden früher Data-Warehouse-Systeme meist nur zur Datenanalyse für die Entscheidungsunterstützung des Managements eingesetzt, haben sie sich nunmehr zur zentralen Plattform für die integrierte Informationsversorgung eines Unternehmens entwickelt. Dies schließt vor allem auch die Einbindung des Data-Warehouses in operative Prozesse mit ein, für die zum einen sehr aktuelle Daten benötigt werden und zum anderen eine schnelle Anfrageverarbeitung gefordert wird. Daneben existieren jedoch weiterhin klassische Data-Warehouse-Anwendungen, welche hochqualitative und verfeinerte Daten benötigen. Die Anwender eines Data-Warehouse-Systems haben somit verschiedene und zum Teil konfligierende Anforderungen bezüglich der Datenaktualität, der Anfragelatenz und der Datenstabilität. In der vorliegenden Dissertation wurden Methoden und Techniken entwickelt, die diesen Konflikt adressieren und lösen. Die umfassende Zielstellung bestand darin, eine Echtzeit-Data-Warehouse-Architektur zu entwickeln, welche die Informationsversorgung in seiner ganzen Breite -- von historischen bis hin zu aktuellen Daten -- abdecken kann. Zunächst wurde ein Verfahren zur Ablaufplanung kontinuierlicher Aktualisierungsströme erarbeitet. Dieses berücksichtigt die widerstreitenden Anforderungen der Nutzer des Data-Warehouse-Systems und erzeugt bewiesenermaßen optimale Ablaufpläne. Im nächsten Schritt wurde die Ablaufplanung im Kontext mehrstufiger Datenproduktionsprozesse untersucht. Gegenstand der Analyse war insbesondere, unter welchen Bedingungen eine Ablaufplanung in Datenproduktionsprozessen gewinnbringend anwendbar ist. Zur Unterstützung der Analyse komplexer Data-Warehouse-Prozesse wurde eine Visualisierung der Entwicklung der Datenzustände, über die Produktionsprozesse hinweg, vorgeschlagen. Mit dieser steht ein Werkzeug zur Verfügung, mit dem explorativ Datenproduktionsprozesse auf ihr Optimierungspotenzial hin untersucht werden können. Das den operativen Datenänderungen unterworfene Echtzeit-Data-Warehouse-System führt in der Berichtsproduktion zu Inkonsistenzen. Daher wurde eine entkoppelte und für die Anwendung der Berichtsproduktion optimierte Datenschicht erarbeitet. Es wurde weiterhin ein Aggregationskonzept zur Beschleunigung der Anfrageverarbeitung entwickelt. Die Vollständigkeit der Berichtsanfragen wird durch spezielle Anfragetechniken garantiert. Es wurden zwei Data-Warehouse-Fallstudien großer Unternehmen vorgestellt sowie deren spezifische Herausforderungen analysiert. Die in dieser Dissertation entwickelten Konzepte wurden auf ihren Nutzen und ihre Anwendbarkeit in den Praxisszenarien hin überprüft.:1 Einleitung 1 2 Fallstudien 7 2.1 Fallstudie A: UBS AG . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Unternehmen und Anwendungsdomäne . . . . . . . . . . . . 8 2.1.2 Systemarchitektur . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.3 Besonderheiten und Herausforderungen . . . . . . . . . . . . 13 2.2 Fallstudie B: GfK Retail and Technology . . . . . . . . . . . . . . . . 15 2.2.1 Unternehmen und Anwendungsdomäne . . . . . . . . . . . . 15 2.2.2 Systemarchitektur . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Besonderheiten und Herausforderungen . . . . . . . . . . . . 20 3 Evolution der Data-Warehouse- Systeme und Anforderungsanalyse 23 3.1 Der Data-Warehouse-Begriff und Referenzarchitektur . . . . . . . . . 23 3.1.1 Definition des klassischen Data-Warehouse-Begriffs . . . . . . 23 3.1.2 Referenzarchitektur . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Situative Datenanalyse . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.1 Interaktion zwischen IT und Fachbereich . . . . . . . . . . . 31 3.2.2 Spreadmart-Lösungen . . . . . . . . . . . . . . . . . . . . . . 33 3.2.3 Analytische Mashups und dienstorientierte Architekturen . . 35 3.2.4 Werkzeuge und Methoden im Kostenvergleich . . . . . . . . . 40 3.3 Evolution der Data-Warehouse-Systeme . . . . . . . . . . . . . . . . 40 3.3.1 Nutzung von Data-Warehouse-Systemen . . . . . . . . . . . . 41 3.3.2 Entwicklungsprozess der Hardware- und DBMS-Architekturen 46 3.4 Architektur eines Echtzeit-Data-Warehouse . . . . . . . . . . . . . . 50 3.4.1 Der Echtzeit-Begriff im Data-Warehouse-Umfeld . . . . . . . 50 3.4.2 Architektur eines Echtzeit-Data-Warehouses . . . . . . . . . . 51 3.4.3 Systemmodell . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.5 Anforderungen an ein Echtzeit-Data-Warehouse . . . . . . . . . . . . 55 3.5.1 Maximierung der Datenaktualität . . . . . . . . . . . . . . . 55 3.5.2 Minimierung der Anfragelatenz . . . . . . . . . . . . . . . . . 56 3.5.3 Erhalt der Datenstabilität . . . . . . . . . . . . . . . . . . . . 57 4 Datenproduktionssteuerung in einstufigen Systemen 59 4.1 Qualitätskriterien und Systemmodell . . . . . . . . . . . . . . . . . . 59 4.1.1 Dienstqualitätskriterien . . . . . . . . . . . . . . . . . . . . . 60 4.1.2 Datenqualitätskriterien . . . . . . . . . . . . . . . . . . . . . 63 4.1.3 Multikriterielle Optimierung . . . . . . . . . . . . . . . . . . 64 4.1.4 Workload- und Systemmodell . . . . . . . . . . . . . . . . . . 66 4.2 Multikriterielle Ablaufplanung . . . . . . . . . . . . . . . . . . . . . 68 4.2.1 Pareto-effiziente Ablaufpläne . . . . . . . . . . . . . . . . . . 68 4.2.2 Abbildung auf das Rucksackproblem . . . . . . . . . . . . . . 71 4.2.3 Lösung mittels dynamischer Programmierung . . . . . . . . . 74 4.3 Dynamische Ablaufplanung zur Laufzeit . . . . . . . . . . . . . . . . 78 4.4 Selektionsbasierte Ausnahmebehandlung . . . . . . . . . . . . . . . . 81 4.5 Evaluierung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5.1 Experimentierumgebung . . . . . . . . . . . . . . . . . . . . . 84 4.5.2 Leistungsvergleich und Adaptivität . . . . . . . . . . . . . . . 86 4.5.3 Laufzeit- und Speicherkomplexität . . . . . . . . . . . . . . . 87 4.5.4 Änderungsstabilität . . . . . . . . . . . . . . . . . . . . . . . 89 4.6 Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5 Bewertung von Ladestrategien in mehrstufigen Datenproduktionsprozessen 5.1 Ablaufplanung in mehrstufigen Datenproduktionsprozessen . . . . . 96 5.1.1 Ladestrategien und Problemstellung . . . . . . . . . . . . . . 97 5.1.2 Evaluierung und Diskussion . . . . . . . . . . . . . . . . . . . 98 5.2 Visualisierung der Datenqualität in mehrstufigen Datenproduktionsprozessen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.2.1 Erfassung und Speicherung . . . . . . . . . . . . . . . . . . . 110 5.2.2 Visualisierung der Datenqualität . . . . . . . . . . . . . . . . 111 5.2.3 Prototypische Umsetzung . . . . . . . . . . . . . . . . . . . . 114 5.3 Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6 Konsistente Datenanalyse in operativen Datenproduktionsprozessen 119 6.1 Der Reporting-Layer als Basis einer stabilen Berichtsproduktion . . 120 6.1.1 Stabilität durch Entkopplung . . . . . . . . . . . . . . . . . . 120 6.1.2 Vorberechnung von Basisaggregaten . . . . . . . . . . . . . . 121 6.1.3 Vollständigkeitsbestimmung und Nullwertsemantik . . . . . . 125 6.1.4 Datenhaltung . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.1.5 Prozess der Anfrageverarbeitung mit Vollständigkeitsbestimmung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.1.6 Verwandte Arbeiten und Techniken . . . . . . . . . . . . . . . 127 6.1.7 Evaluierung . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.2 Nullwertkomprimierung . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.2.1 Einleitendes Beispiel und Vorbetrachtungen . . . . . . . . . . 134 6.2.2 Nullwertkomprimierung . . . . . . . . . . . . . . . . . . . . . 136 6.2.3 Anfrageverarbeitung auf nullwertkomprimierten Daten . . . . 143 6.2.4 Verwandte Arbeiten und Techniken . . . . . . . . . . . . . . . 146 6.2.5 Evaluierung . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.3 Zusammenfassung . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7 Zusammenfassung und Ausblick 157 Literaturverzeichnis 161 Online-Quellenverzeichnis 169 Abbildungsverzeichnis 173
179

Strategy and methodology for enterprise data warehouse development. Integrating data mining and social networking techniques for identifying different communities within the data warehouse.

Rifaie, Mohammad January 2010 (has links)
Data warehouse technology has been successfully integrated into the information infrastructure of major organizations as potential solution for eliminating redundancy and providing for comprehensive data integration. Realizing the importance of a data warehouse as the main data repository within an organization, this dissertation addresses different aspects related to the data warehouse architecture and performance issues. Many data warehouse architectures have been presented by industry analysts and research organizations. These architectures vary from the independent and physical business unit centric data marts to the centralised two-tier hub-and-spoke data warehouse. The operational data store is a third tier which was offered later to address the business requirements for inter-day data loading. While the industry-available architectures are all valid, I found them to be suboptimal in efficiency (cost) and effectiveness (productivity). In this dissertation, I am advocating a new architecture (The Hybrid Architecture) which encompasses the industry advocated architecture. The hybrid architecture demands the acquisition, loading and consolidation of enterprise atomic and detailed data into a single integrated enterprise data store (The Enterprise Data Warehouse) where businessunit centric Data Marts and Operational Data Stores (ODS) are built in the same instance of the Enterprise Data Warehouse. For the purpose of highlighting the role of data warehouses for different applications, we describe an effort to develop a data warehouse for a geographical information system (GIS). We further study the importance of data practices, quality and governance for financial institutions by commenting on the RBC Financial Group case. v The development and deployment of the Enterprise Data Warehouse based on the Hybrid Architecture spawned its own issues and challenges. Organic data growth and business requirements to load additional new data significantly will increase the amount of stored data. Consequently, the number of users will increase significantly. Enterprise data warehouse obesity, performance degradation and navigation difficulties are chief amongst the issues and challenges. Association rules mining and social networks have been adopted in this thesis to address the above mentioned issues and challenges. We describe an approach that uses frequent pattern mining and social network techniques to discover different communities within the data warehouse. These communities include sets of tables frequently accessed together, sets of tables retrieved together most of the time and sets of attributes that mostly appear together in the queries. We concentrate on tables in the discussion; however, the model is general enough to discover other communities. We first build a frequent pattern mining model by considering each query as a transaction and the tables as items. Then, we mine closed frequent itemsets of tables; these itemsets include tables that are mostly accessed together and hence should be treated as one unit in storage and retrieval for better overall performance. We utilize social network construction and analysis to find maximum-sized sets of related tables; this is a more robust approach as opposed to a union of overlapping itemsets. We derive the Jaccard distance between the closed itemsets and construct the social network of tables by adding links that represent distance above a given threshold. The constructed network is analyzed to discover communities of tables that are mostly accessed together. The reported test results are promising and demonstrate the applicability and effectiveness of the developed approach.
180

Partition-based workload scheduling in living data warehouse environments

Thiele, Maik, Fischer, Ulrike, Lehner, Wolfgang 04 July 2023 (has links)
The demand for so-called living or real-time data warehouses is increasing in many application areas such as manufacturing, event monitoring and telecommunications. In these fields, users normally expect short response times for their queries and high freshness for the requested data. However, meeting these fundamental requirements is challenging due to the high loads and the continuous flow of write-only updates and read-only queries that might be in conflict with each other. Therefore, we present the concept of workload balancing by election (WINE), which allows users to express their individual demands on the quality of service and the quality of data, respectively. WINE exploits these information to balance and prioritize both types of transactions—queries and updates—according to the varying user needs. A simulation study shows that our proposed algorithm outperforms competing baseline algorithms over the entire spectrum of workloads and user requirements.

Page generated in 0.0678 seconds