Global ETD Search

1	The representation of time in data warehouses Todman, Christopher Derek January 1999 (has links) This thesis researches the problems concerning the specification and implementation of the temporal requirements in data warehouses. The thesis focuses on two areas, firstly, the methods for identifying and capturing the business information needs and associated temporal requirements at the conceptual level and; secondly, methods for classifying and implementing the requirements at the logical level using the relational model. At the conceptual level, eight candidate methodologies were investigated to examine their suitability for the creation of data models that are appropriate for a data warehouse. The methods were evaluated to assess their representation of time, their ability to reflect the dimensional nature of data warehouse models and their simplicity of use. The research found that none of the methods under review fully satisfied the criteria. At the logical level, the research concluded that the methods widely used in current practice result in data structures that are either incapable of answering some very basic questions involving history or that return inaccurate results. Specific proposals are made in three areas. Firstly, a new conceptual model is described that is designed to capture the information requirements for dimensional models and has full support for time. Secondly, a new approach at the logical level is proposed. It provides the data structures that enable the requirements captured in the conceptual model to be implemented, thus enabling the historical questions to be answered simply and accurately. Thirdly, a set of rules is developed to help minimise the inaccuracy caused by time. A guide has been produced that provides practitioners with the tools and instructions on how to implement data warehouses using the methods developed in the thesis. 621.3822
2	Automating Multiple Schema Generation using Dimensional Design Patterns Deshpande, Monali A. 23 July 2009 (has links) No description available. Computer Science data warehouse schema star schema dimensional design patterns
3	Avaliação do Star Schema Benchmark aplicado a bancos de dados NoSQL distribuídos e orientados a colunas / Evaluation of the Star Schema Benchmark applied to NoSQL column-oriented distributed databases systems Scabora, Lucas de Carvalho 06 May 2016 (has links) Com o crescimento do volume de dados manipulado por aplicações de data warehousing, soluções centralizadas tornam-se muito custosas e enfrentam dificuldades para tratar a escalabilidade do volume de dados. Nesse sentido, existe a necessidade tanto de se armazenar grandes volumes de dados quanto de se realizar consultas analíticas (ou seja, consultas OLAP) sobre esses dados volumosos de forma eficiente. Isso pode ser facilitado por cenários caracterizados pelo uso de bancos de dados NoSQL gerenciados em ambientes paralelos e distribuídos. Dentre os desafios relacionados a esses cenários, destaca-se a necessidade de se promover uma análise de desempenho de aplicações de data warehousing que armazenam os dados do data warehouse (DW) em bancos de dados NoSQL orientados a colunas. A análise experimental e padronizada de diferentes sistemas é realizada por meio de ferramentas denominadas benchmarks. Entretanto, benchmarks para DW foram desenvolvidos majoritariamente para bancos de dados relacionais e ambientes centralizados. Nesta pesquisa de mestrado são investigadas formas de se estender o Star Schema Benchmark (SSB), um benchmark de DW centralizado, para o banco de dados NoSQL distribuído e orientado a colunas HBase. São realizadas propostas e análises principalmente baseadas em testes de desempenho experimentais considerando cada uma das quatro etapas de um benchmark, ou seja, esquema e carga de trabalho, geração de dados, parâmetros e métricas, e validação. Os principais resultados obtidos pelo desenvolvimento do trabalho são: (i) proposta do esquema FactDate, o qual otimiza consultas que acessam poucas dimensões do DW; (ii) investigação da aplicabilidade de diferentes esquemas a cenários empresariais distintos; (iii) proposta de duas consultas adicionais à carga de trabalho do SSB; (iv) análise da distribuição dos dados gerados pelo SSB, verificando se os dados agregados pelas consultas OLAP estão balanceados entre os nós de um cluster; (v) investigação da influência de três importantes parâmetros do framework Hadoop MapReduce no processamento de consultas OLAP; (vi) avaliação da relação entre o desempenho de consultas OLAP e a quantidade de nós que compõem um cluster; e (vii) proposta do uso de visões materializadas hierárquicas, por meio do framework Spark, para otimizar o desempenho no processamento de consultas OLAP consecutivas que requerem a análise de dados em níveis progressivamente mais ou menos detalhados. Os resultados obtidos representam descobertas importantes que visam possibilitar a proposta futura de um benchmark para DWs armazenados em bancos de dados NoSQL dentro de ambientes paralelos e distribuídos. / Due to the explosive increase in data volume, centralized data warehousing applications become very costly and are facing several problems to deal with data scalability. This is related to the fact that these applications need to store huge volumes of data and to perform analytical queries (i.e., OLAP queries) against these voluminous data efficiently. One solution is to employ scenarios characterized by the use of NoSQL databases managed in parallel and distributed environments. Among the challenges related to these scenarios, there is a need to investigate the performance of data warehousing applications that store the data warehouse (DW) in column-oriented NoSQL databases. In this context, benchmarks are widely used to perform standard and experimental analysis of distinct systems. However, most of the benchmarks for DW focus on relational database systems and centralized environments. In this masters research, we investigate how to extend the Star Schema Benchmark (SSB), which was proposed for centralized DWs, to the distributed and column-oriented NoSQL database HBase. We introduce proposals and analysis mainly based on experimental performance tests considering each one of the four steps of a benchmark, i.e. schema and workload, data generation, parameters and metrics, and validation. The main results described in this masters research are described as follows: (i) proposal of the FactDate schema, which optimizes queries that access few dimensions of the DW; (ii) investigation of the applicability of different schemas for different business scenarios; (iii) proposal of two additional queries to the SSB workload; (iv) analysis of the data distribution generated by the SSB, verifying if the data aggregated by OLAP queries are balanced between the nodes of a cluster; (v) investigation of the influence caused by three important parameters of the Hadoop MapReduce framework in the OLAP query processing; (vi) evaluation of the relationship between the OLAP query performance and the number of nodes of a cluster; and (vii) employment of hierarchical materialized views using the Spark framework to optimize the processing performance of consecutive OLAP queries that require progressively more or less aggregated data. These results represent important findings that enable the future proposal of a benchmark for DWs stored in NoSQL databases and managed in parallel and distributed environments. Banco de dados NoSQL Data warehouse Data warehouse Hadoop MapReduce Hadoop MapReduce HBase HBase NoSQL Star Schema Benchmark Star Schema Benchmark
4	Avaliação do Star Schema Benchmark aplicado a bancos de dados NoSQL distribuídos e orientados a colunas / Evaluation of the Star Schema Benchmark applied to NoSQL column-oriented distributed databases systems Lucas de Carvalho Scabora 06 May 2016 (has links) Com o crescimento do volume de dados manipulado por aplicações de data warehousing, soluções centralizadas tornam-se muito custosas e enfrentam dificuldades para tratar a escalabilidade do volume de dados. Nesse sentido, existe a necessidade tanto de se armazenar grandes volumes de dados quanto de se realizar consultas analíticas (ou seja, consultas OLAP) sobre esses dados volumosos de forma eficiente. Isso pode ser facilitado por cenários caracterizados pelo uso de bancos de dados NoSQL gerenciados em ambientes paralelos e distribuídos. Dentre os desafios relacionados a esses cenários, destaca-se a necessidade de se promover uma análise de desempenho de aplicações de data warehousing que armazenam os dados do data warehouse (DW) em bancos de dados NoSQL orientados a colunas. A análise experimental e padronizada de diferentes sistemas é realizada por meio de ferramentas denominadas benchmarks. Entretanto, benchmarks para DW foram desenvolvidos majoritariamente para bancos de dados relacionais e ambientes centralizados. Nesta pesquisa de mestrado são investigadas formas de se estender o Star Schema Benchmark (SSB), um benchmark de DW centralizado, para o banco de dados NoSQL distribuído e orientado a colunas HBase. São realizadas propostas e análises principalmente baseadas em testes de desempenho experimentais considerando cada uma das quatro etapas de um benchmark, ou seja, esquema e carga de trabalho, geração de dados, parâmetros e métricas, e validação. Os principais resultados obtidos pelo desenvolvimento do trabalho são: (i) proposta do esquema FactDate, o qual otimiza consultas que acessam poucas dimensões do DW; (ii) investigação da aplicabilidade de diferentes esquemas a cenários empresariais distintos; (iii) proposta de duas consultas adicionais à carga de trabalho do SSB; (iv) análise da distribuição dos dados gerados pelo SSB, verificando se os dados agregados pelas consultas OLAP estão balanceados entre os nós de um cluster; (v) investigação da influência de três importantes parâmetros do framework Hadoop MapReduce no processamento de consultas OLAP; (vi) avaliação da relação entre o desempenho de consultas OLAP e a quantidade de nós que compõem um cluster; e (vii) proposta do uso de visões materializadas hierárquicas, por meio do framework Spark, para otimizar o desempenho no processamento de consultas OLAP consecutivas que requerem a análise de dados em níveis progressivamente mais ou menos detalhados. Os resultados obtidos representam descobertas importantes que visam possibilitar a proposta futura de um benchmark para DWs armazenados em bancos de dados NoSQL dentro de ambientes paralelos e distribuídos. / Due to the explosive increase in data volume, centralized data warehousing applications become very costly and are facing several problems to deal with data scalability. This is related to the fact that these applications need to store huge volumes of data and to perform analytical queries (i.e., OLAP queries) against these voluminous data efficiently. One solution is to employ scenarios characterized by the use of NoSQL databases managed in parallel and distributed environments. Among the challenges related to these scenarios, there is a need to investigate the performance of data warehousing applications that store the data warehouse (DW) in column-oriented NoSQL databases. In this context, benchmarks are widely used to perform standard and experimental analysis of distinct systems. However, most of the benchmarks for DW focus on relational database systems and centralized environments. In this masters research, we investigate how to extend the Star Schema Benchmark (SSB), which was proposed for centralized DWs, to the distributed and column-oriented NoSQL database HBase. We introduce proposals and analysis mainly based on experimental performance tests considering each one of the four steps of a benchmark, i.e. schema and workload, data generation, parameters and metrics, and validation. The main results described in this masters research are described as follows: (i) proposal of the FactDate schema, which optimizes queries that access few dimensions of the DW; (ii) investigation of the applicability of different schemas for different business scenarios; (iii) proposal of two additional queries to the SSB workload; (iv) analysis of the data distribution generated by the SSB, verifying if the data aggregated by OLAP queries are balanced between the nodes of a cluster; (v) investigation of the influence caused by three important parameters of the Hadoop MapReduce framework in the OLAP query processing; (vi) evaluation of the relationship between the OLAP query performance and the number of nodes of a cluster; and (vii) employment of hierarchical materialized views using the Spark framework to optimize the processing performance of consecutive OLAP queries that require progressively more or less aggregated data. These results represent important findings that enable the future proposal of a benchmark for DWs stored in NoSQL databases and managed in parallel and distributed environments. Banco de dados NoSQL Data warehouse Hadoop MapReduce HBase Star Schema Benchmark Data warehouse Hadoop MapReduce HBase NoSQL Star Schema Benchmark
5	Supporting Data Warehouse Design with Data Mining Approach Tsai, Tzu-Chao 06 August 2001 (has links) Traditional relational database model does not have enough capability to cope with a great deal of data in finite time. To address these requirements, data warehouses and online analytical processing (OLAP) have emerged. Data warehouses improve the productivity of corporate decision makers through consolidation, conversion, transformation, and integration of operational data, and supports online analytical processing (OLAP). The data warehouse design is a complex and knowledge intensive process. It needs to consider not only the structure of the underlying operational databases (source-driven), but also the information requirements of decision makers (user-driven). Past research focused predominately on supporting the source-driven data warehouse design process, but paid less attention to supporting the user-driven data warehouse design process. Thus, the goal of this research is to propose a user-driven data warehouse design support system based on the knowledge discovery approach. Specifically, a Data Warehouse Design Support System was proposed and the generalization hierarchy and generalized star schemas were used as the data warehouse design knowledge. The technique for learning these design knowledge and reasoning upon them were developed. An empirical evaluation study was conducted to validate the effectiveness on the proposed techniques in supporting data warehouse design process. The result of empirical evaluation showed that this technique was useful to support data warehouse design especially on reducing the missing design and enhancing the potentially useful design. Data Mining Data Warehouse Knowledge Discovery Data Warehouse Design Star Schema
6	Rozšíření projektu vývoje aplikačního SW systému v bankovní instituci o BI nadstavbu / Implementation of BI module into a SW system in banking institution Růžičková, Lucie January 2017 (has links) The aim of this thesis is to implement Business Intelligence solution in the investment banking sector. The first part of this thesis is focused on general use of Business Intelligence in the banking sector and future developments in this area. Further it defines the objectives and reasons for the implementation of BI in the areas of investment banking. This part also explains basic terms used in investment banking which occur in the thesis and introduces a project in which the Business Intelligence is implemented. The second part focuses on determination of requirements and their analysis from which the data warehouse design is derived. Another part deals with the implementation of the designed data warehouse and ETL packages for loading the data. Further the tool in which the reports are created is described by exemplary implementation of the reports. In the last part of this thesis the implemented BI solution is evaluated by the users.
7	Extending dimensional modeling through the abstraction of data relationships and development of the semantic data warehouse Hart, Robert 04 December 2017 (has links) The Kimball methodology, often referred to as dimensional modelling, is well established in data warehousing and business intelligence as a highly successful means for turning data into information. Yet weaknesses exist in the Kimball approach that make it difficult to rapidly extend or interrelate dimensional models in complex business areas such as Health Care. This Thesis looks at the development of a methodology that will provide for the rapid extension and interrelation of Kimball dimensional models. This is achieved through the use of techniques similar to those employed in the semantic web. These techniques allow for rapid analysis and insight into highly variable data which previously was difficult to achieve. / Graduate Kimball Star Schema Health Information Business Intelligence Data Warehouse RDF Triplets Dimensional Model Health Data Research
8	Prototyp för dynamiskt beslutsstöd Lundstedt, Mattias, Norell, Axel January 2014 (has links) Företaget Nethouse har haft uppdraget att kravställa, utveckla och implementera ett verksamhetssystem åt Sveriges Skorstensfejaremästares Riksförbund (SSR). Medlemsföretagen i SSR bedriver sotarverksamhet på uppdrag av Sveriges kommuner och är beroende av insamlad data kopplad till deras verksamhet. I det nyutvecklade systemet, som går under namnet Ritz, samlas informationen i en central databas och är tillgänglig för flertalet intressenter med hjälp av ny teknik och modernare lösningar. Systemet är helt webbaserat och körs som en molntjänst, tillgängligt via antingen en webbsida eller som mobilapplikation. Åtkomsten av data baseras på företagsnivå på ”stämplad” data i databasen och för att reglera åtkomsten för företagsanvändare till respektive företags data används rollbaserad åtkomstkontroll. Detta examensarbete har syftat till att utveckla en prototyp till en beslutsstödslösning för dynamisk åtkomst till de datamängder som lagras inom Ritz. Nethouse har efterfrågat en prototyp för en BI-lösning som visar på möjligheter och fördelar för intressenter till Ritz med att implementera en sådan. Då integration och förvaltning är viktiga faktorer för Nethouse har ett krav på prototypen varit att den utvecklats inom Microsofts programvaror, precis som resten av Ritz. Prototypen färdigställdes genom konstruerandet av ett centralt data warehouse enligt Ralph Kimballs metodologier och genom implementation av en OLAP-kub byggd i Microsoft SSAS. Dataöverföringen från datakällorna till beslutsstödslösningens data warehouse skedde genom utvecklandet av en ETL-process i Microsoft SSIS. Den resulterande kuben har främst utformats för att kunna besvara den sortens frågor som länsstyrelser ställer till sotarföretag i kontrollsyfte och stöder förfrågningar mot de två centrala affärsprocesserna sotning och brandskyddskontroll. Dessa förfrågningar kan filtreras på flertalet dimensioner som exempelvis tid, utförare, status och kontrollutfall. Prototypen begränsar även åtkomst till den information som användare har rätt att ta del av genom att koppla samman användare och objekt till geografiska indelningar som kallas distrikt. Denna dynamiska säkerhetslösning ger goda förutsättningar för att kunna hantera förändringar i användarnas behörighet i framtiden. Genom den utvalda lösningen behålls den dynamiska naturen i systemet, då åtkomst till beslutsstödstjänsten kan fås genom flertalet källor som stödjer uppkoppling mot Microsofts multidimensionella beslutsstödslösningar, bland annat Excel och SQL Server Reporting Services. Beslutsstöd BI Business intelligence star schema snowflake schema SSIS SSRS SSAS Data warehouse Data mart ETL OLAP Computer Engineering Datorteknik
9	Using Work Domain Analysis to Evaluate the Design of a Data Warehouse System Iveroth, Axel January 2019 (has links) Being able to perform good data analysis is a fundamental part of running any business or organization. One way of enabling data analysis is with a data warehouse system, a type of database that gathers and transforms data from multiple sources and structures it in the goal of simplifying analysis. It is commonly used to provide support in decision-making. Although a data warehouse enables data analysis, it is also relevant to consider how well the system supports analysis. This thesis is a qualitative research that aims to investigate how work domain analysis (WDA) can be used to evaluate the design of a data warehouse system. To do so, a case study at the IT company Norconsult Astando was performed. A data warehouse system was designed for an issue management system and evaluated using the abstraction hierarchy (AH) model. The research done in this thesis showed that analysis was enabled by adopting Kimball’s bottom-up approach and a star schema design with an accumulating snapshot fact table. Through evaluation of the design, it was shown that most of the design choices made for the data warehouse were captured in the AH. It was concluded that with sufficient data collection methods, WDA can be used to a large extent when evaluating a data warehouse system. data warehouse work domain analysis cognitive work analysis abstraction hierarchy star schema accumulating snapshot fact table evaluation issue management system waterfall model Computer and Information Sciences Data- och informationsvetenskap
10	Multi-Model Snowflake Schema Creation Gruenberg, Rebecca 25 April 2022 (has links) No description available. Computer Science data lakes multi-model database snowflake schema star schema multi-model snowflake schema meta-model

Search results