• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 121
  • 114
  • 88
  • 69
  • 37
  • 12
  • 7
  • 7
  • 5
  • 5
  • 4
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 493
  • 493
  • 115
  • 108
  • 99
  • 81
  • 74
  • 73
  • 69
  • 69
  • 63
  • 55
  • 55
  • 53
  • 49
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
161

Migrating an Operational Database Schema to Data Warehouse Schemas

PHIPPS, CASSANDRA J. 22 May 2002 (has links)
No description available.
162

A Metamodeling Approach to Merging Data Warehouse Conceptual Schemas

Vaidyanathan, Veena January 2008 (has links)
No description available.
163

Automating Multiple Schema Generation using Dimensional Design Patterns

Deshpande, Monali A. 23 July 2009 (has links)
No description available.
164

Strategy and methodology for enterprise data warehouse development : integrating data mining and social networking techniques for identifying different communities within the data warehouse

Rifaie, Mohammad January 2010 (has links)
Data warehouse technology has been successfully integrated into the information infrastructure of major organizations as potential solution for eliminating redundancy and providing for comprehensive data integration. Realizing the importance of a data warehouse as the main data repository within an organization, this dissertation addresses different aspects related to the data warehouse architecture and performance issues. Many data warehouse architectures have been presented by industry analysts and research organizations. These architectures vary from the independent and physical business unit centric data marts to the centralised two-tier hub-and-spoke data warehouse. The operational data store is a third tier which was offered later to address the business requirements for inter-day data loading. While the industry-available architectures are all valid, I found them to be suboptimal in efficiency (cost) and effectiveness (productivity). In this dissertation, I am advocating a new architecture (The Hybrid Architecture) which encompasses the industry advocated architecture. The hybrid architecture demands the acquisition, loading and consolidation of enterprise atomic and detailed data into a single integrated enterprise data store (The Enterprise Data Warehouse) where businessunit centric Data Marts and Operational Data Stores (ODS) are built in the same instance of the Enterprise Data Warehouse. For the purpose of highlighting the role of data warehouses for different applications, we describe an effort to develop a data warehouse for a geographical information system (GIS). We further study the importance of data practices, quality and governance for financial institutions by commenting on the RBC Financial Group case. v The development and deployment of the Enterprise Data Warehouse based on the Hybrid Architecture spawned its own issues and challenges. Organic data growth and business requirements to load additional new data significantly will increase the amount of stored data. Consequently, the number of users will increase significantly. Enterprise data warehouse obesity, performance degradation and navigation difficulties are chief amongst the issues and challenges. Association rules mining and social networks have been adopted in this thesis to address the above mentioned issues and challenges. We describe an approach that uses frequent pattern mining and social network techniques to discover different communities within the data warehouse. These communities include sets of tables frequently accessed together, sets of tables retrieved together most of the time and sets of attributes that mostly appear together in the queries. We concentrate on tables in the discussion; however, the model is general enough to discover other communities. We first build a frequent pattern mining model by considering each query as a transaction and the tables as items. Then, we mine closed frequent itemsets of tables; these itemsets include tables that are mostly accessed together and hence should be treated as one unit in storage and retrieval for better overall performance. We utilize social network construction and analysis to find maximum-sized sets of related tables; this is a more robust approach as opposed to a union of overlapping itemsets. We derive the Jaccard distance between the closed itemsets and construct the social network of tables by adding links that represent distance above a given threshold. The constructed network is analyzed to discover communities of tables that are mostly accessed together. The reported test results are promising and demonstrate the applicability and effectiveness of the developed approach.
165

Avaliação do Star Schema Benchmark aplicado a bancos de dados NoSQL distribuídos e orientados a colunas / Evaluation of the Star Schema Benchmark applied to NoSQL column-oriented distributed databases systems

Scabora, Lucas de Carvalho 06 May 2016 (has links)
Com o crescimento do volume de dados manipulado por aplicações de data warehousing, soluções centralizadas tornam-se muito custosas e enfrentam dificuldades para tratar a escalabilidade do volume de dados. Nesse sentido, existe a necessidade tanto de se armazenar grandes volumes de dados quanto de se realizar consultas analíticas (ou seja, consultas OLAP) sobre esses dados volumosos de forma eficiente. Isso pode ser facilitado por cenários caracterizados pelo uso de bancos de dados NoSQL gerenciados em ambientes paralelos e distribuídos. Dentre os desafios relacionados a esses cenários, destaca-se a necessidade de se promover uma análise de desempenho de aplicações de data warehousing que armazenam os dados do data warehouse (DW) em bancos de dados NoSQL orientados a colunas. A análise experimental e padronizada de diferentes sistemas é realizada por meio de ferramentas denominadas benchmarks. Entretanto, benchmarks para DW foram desenvolvidos majoritariamente para bancos de dados relacionais e ambientes centralizados. Nesta pesquisa de mestrado são investigadas formas de se estender o Star Schema Benchmark (SSB), um benchmark de DW centralizado, para o banco de dados NoSQL distribuído e orientado a colunas HBase. São realizadas propostas e análises principalmente baseadas em testes de desempenho experimentais considerando cada uma das quatro etapas de um benchmark, ou seja, esquema e carga de trabalho, geração de dados, parâmetros e métricas, e validação. Os principais resultados obtidos pelo desenvolvimento do trabalho são: (i) proposta do esquema FactDate, o qual otimiza consultas que acessam poucas dimensões do DW; (ii) investigação da aplicabilidade de diferentes esquemas a cenários empresariais distintos; (iii) proposta de duas consultas adicionais à carga de trabalho do SSB; (iv) análise da distribuição dos dados gerados pelo SSB, verificando se os dados agregados pelas consultas OLAP estão balanceados entre os nós de um cluster; (v) investigação da influência de três importantes parâmetros do framework Hadoop MapReduce no processamento de consultas OLAP; (vi) avaliação da relação entre o desempenho de consultas OLAP e a quantidade de nós que compõem um cluster; e (vii) proposta do uso de visões materializadas hierárquicas, por meio do framework Spark, para otimizar o desempenho no processamento de consultas OLAP consecutivas que requerem a análise de dados em níveis progressivamente mais ou menos detalhados. Os resultados obtidos representam descobertas importantes que visam possibilitar a proposta futura de um benchmark para DWs armazenados em bancos de dados NoSQL dentro de ambientes paralelos e distribuídos. / Due to the explosive increase in data volume, centralized data warehousing applications become very costly and are facing several problems to deal with data scalability. This is related to the fact that these applications need to store huge volumes of data and to perform analytical queries (i.e., OLAP queries) against these voluminous data efficiently. One solution is to employ scenarios characterized by the use of NoSQL databases managed in parallel and distributed environments. Among the challenges related to these scenarios, there is a need to investigate the performance of data warehousing applications that store the data warehouse (DW) in column-oriented NoSQL databases. In this context, benchmarks are widely used to perform standard and experimental analysis of distinct systems. However, most of the benchmarks for DW focus on relational database systems and centralized environments. In this masters research, we investigate how to extend the Star Schema Benchmark (SSB), which was proposed for centralized DWs, to the distributed and column-oriented NoSQL database HBase. We introduce proposals and analysis mainly based on experimental performance tests considering each one of the four steps of a benchmark, i.e. schema and workload, data generation, parameters and metrics, and validation. The main results described in this masters research are described as follows: (i) proposal of the FactDate schema, which optimizes queries that access few dimensions of the DW; (ii) investigation of the applicability of different schemas for different business scenarios; (iii) proposal of two additional queries to the SSB workload; (iv) analysis of the data distribution generated by the SSB, verifying if the data aggregated by OLAP queries are balanced between the nodes of a cluster; (v) investigation of the influence caused by three important parameters of the Hadoop MapReduce framework in the OLAP query processing; (vi) evaluation of the relationship between the OLAP query performance and the number of nodes of a cluster; and (vii) employment of hierarchical materialized views using the Spark framework to optimize the processing performance of consecutive OLAP queries that require progressively more or less aggregated data. These results represent important findings that enable the future proposal of a benchmark for DWs stored in NoSQL databases and managed in parallel and distributed environments.
166

Proposta de um sistema de apoio à decisão para controle e gerenciamento agrícola em usinas de açúcar e álcool / Proposal for systems of support to the decision for the control and management agricultural in sugar and ethanol plants

Tavares, Renato 04 July 2008 (has links)
Aliada à crescente evolução da computação, dois fatores também começaram a receber maior atenção: o conhecimento e a informação. Esta evolução faz com que a informação possa estar disponibilizada a todos, contribuindo de certa forma, para o auxílio na aquisição de conhecimento e visando posterior tomada de decisão, observa-se também a grande importância dos bancos de dados. Este trabalho apresenta a proposta de um Sistema de Apoio à Decisão (SAD), utilizando-se de avançadas metodologias de armazenamento de dados em poderosos bancos de dados (Data Warehouse), para o controle e gerenciamento agrícola em usinas de açúcar e álcool. Foi desenvolvido um ambiente estruturado, extensível, projetado para a análise de dados não voláteis, logicamente e fisicamente transformados, provenientes de diversas aplicações, alinhados com a estrutura da empresa, atualizados e mantidos por um longo período de tempo, referidos em termos utilizados no negócio e sumarizados para análise rápida. Com a implantação destas novas tecnologias, a empresa estará apta a obter informações do nível gerencial e estratégico para ajudar nos seus processos de tomada de decisão, o que antes não era possível com os atuais sistemas de informação existentes na empresa. / Information and knowledge are receiving more attention with the growing of computing evolution. This evolution provides available information to everybody contributing for the acquisition of knowledge, transforming the database in an important key of this evolution process. This work presents a research about the use of technologies of systems of support to the decision by using methodologies of data storage in powerful data bases (Data Warehouse) for the control and management agricultural in sugar and ethanol plants. An environment was developed for the analysis of non-volatile data, extensible, projected and physically transformed, proceeding from diverse applications, aligned with the structure of the company, updated and kept for a long period of time, related in business\' terms and grouped for fast analysis. With the implantation of these new technologies, the company will be able to obtain information of the managerial and strategic level to help on its decision making processes, what was not possible before with the current systems of information existent at the company.
167

Data Warehouses na era do Big Data: processamento eficiente de Junções Estrela no Hadoop / Data Warehouses na era do Big Data: processamento eficiente de Junções Estrela no Hadoop

Brito, Jaqueline Joice 12 December 2017 (has links)
The era of Big Data is here: the combination of unprecedented amounts of data collected every day with the promotion of open source solutions for massively parallel processing has shifted the industry in the direction of data-driven solutions. From recommendation systems that help you find your next significant one to the dawn of self-driving cars, Cloud Computing has enabled companies of all sizes and areas to achieve their full potential with minimal overhead. In particular, the use of these technologies for Data Warehousing applications has decreased costs greatly and provided remarkable scalability, empowering business-oriented applications such as Online Analytical Processing (OLAP). One of the most essential primitives in Data Warehouses are the Star Joins, i.e. joins of a central table with satellite dimensions. As the volume of the database scales, Star Joins become unpractical and may seriously limit applications. In this thesis, we proposed specialized solutions to optimize the processing of Star Joins. To achieve this, we used the Hadoop software family on a cluster of 21 nodes. We showed that the primary bottleneck in the computation of Star Joins on Hadoop lies in the excessive disk spill and overhead due to network communication. To mitigate these negative effects, we proposed two solutions based on a combination of the Spark framework with either Bloom filters or the Broadcast technique. This reduced the computation time by at least 38%. Furthermore, we showed that the use of full scan may significantly hinder the performance of queries with low selectivity. Thus, we proposed a distributed Bitmap Join Index that can be processed as a secondary index with loose-binding and can be used with random access in the Hadoop Distributed File System (HDFS). We also implemented three versions (one in MapReduce and two in Spark) of our processing algorithm that uses the distributed index, which reduced the total computation time up to 88% for Star Joins with low selectivity from the Star Schema Benchmark (SSB). Because, ideally, the system should be able to perform both random access and full scan, our solution was designed to rely on a two-layer architecture that is framework-agnostic and enables the use of a query optimizer to select which approaches should be used as a function of the query. Due to the ubiquity of joins as primitive queries, our solutions are likely to fit a broad range of applications. Our contributions not only leverage the strengths of massively parallel frameworks but also exploit more efficient access methods to provide scalable and robust solutions to Star Joins with a significant drop in total computation time. / A era do Big Data chegou: a combinação entre o volume dados coletados diarimente com o surgimento de soluções de código aberto para o processamento massivo de dados mudou para sempre a indústria. De sistemas de recomendação que assistem às pessoas a encontrarem seus pares românticos à criação de carros auto-dirigidos, a Computação em Nuvem permitiu que empresas de todos os tamanhos e áreas alcançassem o seu pleno potencial com custos reduzidos. Em particular, o uso dessas tecnologias em aplicações de Data Warehousing reduziu custos e proporcionou alta escalabilidade para aplicações orientadas a negócios, como em processamento on-line analítico (Online Analytical Processing- OLAP). Junções Estrelas são das primitivas mais essenciais em Data Warehouses, ou seja, consultas que realizam a junções de tabelas de fato com tabelas de dimensões. Conforme o volume de dados aumenta, Junções Estrela tornam-se custosas e podem limitar o desempenho das aplicações. Nesta tese são propostas soluções especializadas para otimizar o processamento de Junções Estrela. Para isso, utilizamos a família de software Hadoop em um cluster de 21 nós. Nós mostramos que o gargalo primário na computação de Junções Estrelas no Hadoop reside no excesso de operações escrita do disco (disk spill) e na sobrecarga da rede devido a comunicação excessiva entre os nós. Para reduzir estes efeitos negativos, são propostas duas soluções em Spark baseadas nas técnicas Bloom filters ou Broadcast, reduzindo o tempo total de computação em pelo menos 38%. Além disso, mostramos que a realização de uma leitura completa das tables (full table scan) pode prejudicar significativamente o desempenho de consultas com baixa seletividade. Assim, nós propomos um Índice Bitmap de Junção distribuído que é implementado como um índice secundário que pode ser combinado com acesso aleatório no Hadoop Distributed File System (HDFS). Nós implementamos três versões (uma em MapReduce e duas em Spark) do nosso algoritmo de processamento baseado nesse índice distribuído, os quais reduziram o tempo de computação em até 77% para Junções Estrelas de baixa seletividade do Star Schema Benchmark (SSB). Como idealmente o sistema deve ser capaz de executar tanto acesso aleatório quanto full scan, nós também propusemos uma arquitetura genérica que permite a inserção de um otimizador de consultas capaz de selecionar quais abordagens devem ser usadas dependendo da consulta. Devido ao fato de consultas de junção serem frequentes, nossas soluções são pertinentes a uma ampla gama de aplicações. A contribuições desta tese não só fortalecem o uso de frameworks de processamento de código aberto, como também exploram métodos mais eficientes de acesso aos dados para promover uma melhora significativa no desempenho Junções Estrela.
168

[en] INFORMATION QUALITY AND ANALYTICAL ENVIRONMENT: A CASE STUDY OF THE SAP BW IMPLEMENTATION IN PETROBRAS / [pt] AMBIENTE ANALÍTICO E QUALIDADE DE INFORMAÇÕES: UM ESTUDO DE CASO DA IMPLANTAÇÃO DO SAP BW NA PETROBRAS

ANA CLAUDIA LIMA PINHEIRO 21 August 2006 (has links)
[pt] Desde meados da década de 90, muitas empresas implementaram Sistemas Integrados de Gestão Empresarial com o objetivo de integrar seus processos de negócio, obter uma gestão melhor de suas operações e da informação que é originada em cada uma das áreas da empresa. Na Petrobras, este projeto foi iniciado em 2000 e em outubro de 2004, após algumas implantações em outras empresas do grupo, foi implantado o SAP R/3 na holding. Em paralelo, foi implantado também um ambiente de informações gerenciais para suportar o processo de gestão da empresa, com o objetivo de disponibilizar informações corporativas e com qualidade para os usuários. O presente estudo tem por objetivo avaliar a situação da qualidade das informações disponibilizadas neste ambiente, e foi escolhida a área de Materiais e Serviços para esta análise. Foram identificados, de acordo com o referencial teórico, critérios que orientaram esta avaliação e a partir destes critérios foi elaborado um questionário, que foi aplicado para cerca de 130 usuários. As respostas foram consolidadas e avaliadas estatisticamente onde foram identificados os percentuais de satisfação dos usuários com relação a cada critério avaliado. Em seguida, a luz do referencial teórico, os critérios foram grupados e foi feita uma avaliação geral de cada um dos grupamentos, identificando possíveis razões para os números encontrados, apresentando exemplos obtidos na pesquisa documental, e sugerindo ações de ajuste ou melhoria. Finalizando foram listados em ordem decrescente, os maiores problemas encontrados, bem como suas ações de correção, que servirão na prática, para orientação dos ajustes que precisam ser feitos no ambiente analítico da Petrobras. / [en] Since the mid of 90´s, many companies had implemented Entreprise Informations Systems Management with the objective to integrate their business processes, obtaining a better control of the information generated in each of the company areas. In Petrobras, this project was initiated in the year 2000. In October, 2004, following the implementation in some units of the corporation, the system was implemented in the holding company. In parallel, it was also installed an environment (another system) to support the management processes of the company. Its purpose is to provide corporate information to help in the decision process. The objective of the present study is to evaluate the quality of the information available in the information environment. For this analysis it was choosen the area of Materials and Services. In compliance with the theoretical referential, some criteria were established as guidelines, to permit the elaboration of a questionnaire which was submitted to 130 users. The answers were consolidated and evaluated with a statistical support that produced the percentage on the levels of user´s satisfaction related to each criteria. After that, in the light of the theoretical reference, the criteria were grouped and a general evaluation of each one of the groups was proceeded. Then, for each group, possible reasons for numbers found were identified, some examples gotten in the documentary research were presented and actions of adjustment or improvement were suggested. Finally, the criterias were listed in descending order, and the major problems were found. A list of recommendations were suggested, and that will help, as an orientation for the adjustments that had to be made in Petrobras analytical environment.
169

Understanding cryptic schemata in large extract-transform-load systems

Albrecht, Alexander, Naumann, Felix January 2012 (has links)
Extract-Transform-Load (ETL) tools are used for the creation, maintenance, and evolution of data warehouses, data marts, and operational data stores. ETL workflows populate those systems with data from various data sources by specifying and executing a DAG of transformations. Over time, hundreds of individual workflows evolve as new sources and new requirements are integrated into the system. The maintenance and evolution of large-scale ETL systems requires much time and manual effort. A key problem is to understand the meaning of unfamiliar attribute labels in source and target databases and ETL transformations. Hard-to-understand attribute labels lead to frustration and time spent to develop and understand ETL workflows. We present a schema decryption technique to support ETL developers in understanding cryptic schemata of sources, targets, and ETL transformations. For a given ETL system, our recommender-like approach leverages the large number of mapped attribute labels in existing ETL workflows to produce good and meaningful decryptions. In this way we are able to decrypt attribute labels consisting of a number of unfamiliar few-letter abbreviations, such as UNP_PEN_INT, which we can decrypt to UNPAID_PENALTY_INTEREST. We evaluate our schema decryption approach on three real-world repositories of ETL workflows and show that our approach is able to suggest high-quality decryptions for cryptic attribute labels in a given schema. / Extract-Transform-Load (ETL) Tools werden häufig beim Erstellen, der Wartung und der Weiterentwicklung von Data Warehouses, Data Marts und operationalen Datenbanken verwendet. ETL Workflows befüllen diese Systeme mit Daten aus vielen unterschiedlichen Quellsystemen. Ein ETL Workflow besteht aus mehreren Transformationsschritten, die einen DAG-strukturierter Graphen bilden. Mit der Zeit entstehen hunderte individueller ETL Workflows, da neue Datenquellen integriert oder neue Anforderungen umgesetzt werden müssen. Die Wartung und Weiterentwicklung von großen ETL Systemen benötigt viel Zeit und manuelle Arbeit. Ein zentrales Problem ist dabei das Verständnis unbekannter Attributnamen in Quell- und Zieldatenbanken und ETL Transformationen. Schwer verständliche Attributnamen führen zu Frustration und hohen Zeitaufwänden bei der Entwicklung und dem Verständnis von ETL Workflows. Wir präsentieren eine Schema Decryption Technik, die ETL Entwicklern das Verständnis kryptischer Schemata in Quell- und Zieldatenbanken und ETL Transformationen erleichtert. Unser Ansatz berücksichtigt für ein gegebenes ETL System die Vielzahl verknüpfter Attributnamen in den existierenden ETL Workflows. So werden gute und aussagekräftige "Decryptions" gefunden und wir sind in der Lage Attributnamen, die aus unbekannten Abkürzungen bestehen, zu "decrypten". So wird z.B. für den Attributenamen UNP_PEN_INT als Decryption UNPAIN_PENALTY_INTEREST vorgeschlagen. Unser Schema Decryption Ansatz wurde für drei ETL-Repositories evaluiert und es zeigte sich, dass unser Ansatz qualitativ hochwertige Decryptions für kryptische Attributnamen vorschlägt.
170

Qualitätsgetriebene Datenproduktionssteuerung in Echtzeit-Data-Warehouse-Systemen

Thiele, Maik 10 August 2010 (has links) (PDF)
Wurden früher Data-Warehouse-Systeme meist nur zur Datenanalyse für die Entscheidungsunterstützung des Managements eingesetzt, haben sie sich nunmehr zur zentralen Plattform für die integrierte Informationsversorgung eines Unternehmens entwickelt. Dies schließt vor allem auch die Einbindung des Data-Warehouses in operative Prozesse mit ein, für die zum einen sehr aktuelle Daten benötigt werden und zum anderen eine schnelle Anfrageverarbeitung gefordert wird. Daneben existieren jedoch weiterhin klassische Data-Warehouse-Anwendungen, welche hochqualitative und verfeinerte Daten benötigen. Die Anwender eines Data-Warehouse-Systems haben somit verschiedene und zum Teil konfligierende Anforderungen bezüglich der Datenaktualität, der Anfragelatenz und der Datenstabilität. In der vorliegenden Dissertation wurden Methoden und Techniken entwickelt, die diesen Konflikt adressieren und lösen. Die umfassende Zielstellung bestand darin, eine Echtzeit-Data-Warehouse-Architektur zu entwickeln, welche die Informationsversorgung in seiner ganzen Breite -- von historischen bis hin zu aktuellen Daten -- abdecken kann. Zunächst wurde ein Verfahren zur Ablaufplanung kontinuierlicher Aktualisierungsströme erarbeitet. Dieses berücksichtigt die widerstreitenden Anforderungen der Nutzer des Data-Warehouse-Systems und erzeugt bewiesenermaßen optimale Ablaufpläne. Im nächsten Schritt wurde die Ablaufplanung im Kontext mehrstufiger Datenproduktionsprozesse untersucht. Gegenstand der Analyse war insbesondere, unter welchen Bedingungen eine Ablaufplanung in Datenproduktionsprozessen gewinnbringend anwendbar ist. Zur Unterstützung der Analyse komplexer Data-Warehouse-Prozesse wurde eine Visualisierung der Entwicklung der Datenzustände, über die Produktionsprozesse hinweg, vorgeschlagen. Mit dieser steht ein Werkzeug zur Verfügung, mit dem explorativ Datenproduktionsprozesse auf ihr Optimierungspotenzial hin untersucht werden können. Das den operativen Datenänderungen unterworfene Echtzeit-Data-Warehouse-System führt in der Berichtsproduktion zu Inkonsistenzen. Daher wurde eine entkoppelte und für die Anwendung der Berichtsproduktion optimierte Datenschicht erarbeitet. Es wurde weiterhin ein Aggregationskonzept zur Beschleunigung der Anfrageverarbeitung entwickelt. Die Vollständigkeit der Berichtsanfragen wird durch spezielle Anfragetechniken garantiert. Es wurden zwei Data-Warehouse-Fallstudien großer Unternehmen vorgestellt sowie deren spezifische Herausforderungen analysiert. Die in dieser Dissertation entwickelten Konzepte wurden auf ihren Nutzen und ihre Anwendbarkeit in den Praxisszenarien hin überprüft.

Page generated in 0.0713 seconds