Global ETD Search

31	Návrh využití nástrojů Business Intelligence pro potřeby malé firmy / Concept of Business Intelligence Using for SME's Stryka, Lukáš January 2008 (has links) This Diploma Thesis deals with analysis of the current processes in small software company. On the basis of the weakness evaluation, the new extensions of the information system are designed. The first extension is new module for on-line sale and cashless on-line payment. The second one is integration of Business Intelligence tools to help streamline marketing strategies of this company.
32	Materialized Views in the Presence of Reporting Functions Lehner, Wolfgang, Habich, Dirk, Just, Michael 15 June 2022 (has links) Materialized views are a well-known optimization strategy with the potential for massive improvements in query processing time, especially for aggregation queries over large tables. To realize this potential, the query optimizer has to know how and when to exploit materialized views. Reporting functions represent a novel technique to formulate sequence-oriented queries in SQL. They provide a column-wise ordering, partitioning, and windowing mechanism for aggregation functions and therefore extend the well-known way of grouping and applying simple aggregation functions. Up to now, current work has not considered the frequently used reporting functions in data warehouse environments. In this paper, we introduce materialized reporting function views and show how to rewrite queries with reporting functions as well as aggregation queries to this new kind of materialized views. We demonstrate the efficiency of our approach with a large number of experiments. info:eu-repo/classification/ddc/004 ddc:004
33	Secret sharing approaches for secure data warehousing and on-line analysis in the cloud / Approches de partage de clés secrètes pour la sécurisation des entrepôts de données et de l’analyse en ligne dans le nuage Attasena, Varunya 22 September 2015 (has links) Les systèmes d’information décisionnels dans le cloud Computing sont des solutions de plus en plus répandues. En effet, ces dernières offrent des capacités pour l’aide à la décision via l’élasticité des ressources pay-per-use du Cloud. Toutefois, les questions de sécurité des données demeurent une des principales préoccupations notamment lorsqu'il s’agit de traiter des données sensibles de l’entreprise. Beaucoup de questions de sécurité sont soulevées en terme de stockage, de protection, de disponibilité, d'intégrité, de sauvegarde et de récupération des données ainsi que des transferts des données dans un Cloud public. Les risques de sécurité peuvent provenir non seulement des fournisseurs de services de cloud computing mais aussi d’intrus malveillants. Les entrepôts de données dans les nuages devraient contenir des données sécurisées afin de permettre à la fois le traitement d'analyse en ligne hautement protégé et efficacement rafraîchi. Et ceci à plus faibles coûts de stockage et d'accès avec le modèle de paiement à la demande. Dans cette thèse, nous proposons deux nouvelles approches pour la sécurisation des entrepôts de données dans les nuages basées respectivement sur le partage vérifiable de clé secrète (bpVSS) et le partage vérifiable et flexible de clé secrète (fVSS). L’objectif du partage de clé cryptée et la distribution des données auprès de plusieurs fournisseurs du cloud permet de garantir la confidentialité et la disponibilité des données. bpVSS et fVSS abordent cinq lacunes des approches existantes traitant de partage de clés secrètes. Tout d'abord, ils permettent le traitement de l’analyse en ligne. Deuxièmement, ils garantissent l'intégrité des données à l'aide de deux signatures interne et externe. Troisièmement, ils aident les utilisateurs à minimiser le coût de l’entreposage du cloud en limitant le volume global de données cryptées. Sachant que fVSS fait la répartition des volumes des données cryptées en fonction des tarifs des fournisseurs. Quatrièmement, fVSS améliore la sécurité basée sur le partage de clé secrète en imposant une nouvelle contrainte : aucun groupe de fournisseurs de service ne peut contenir suffisamment de volume de données cryptées pour reconstruire ou casser le secret. Et cinquièmement, fVSS permet l'actualisation de l'entrepôt de données, même si certains fournisseurs de services sont défaillants. Pour évaluer l'efficacité de bpVSS et fVSS, nous étudions théoriquement les facteurs qui influent sur nos approches en matière de sécurité, de complexité et de coût financier dans le modèle de paiement à la demande. Nous validons également expérimentalement la pertinence de nos approches avec le Benchmark schéma en étoile afin de démontrer son efficacité par rapport aux méthodes existantes. / Cloud business intelligence is an increasingly popular solution to deliver decision support capabilities via elastic, pay-per-use resources. However, data security issues are one of the top concerns when dealing with sensitive data. Many security issues are raised by data storage in a public cloud, including data privacy, data availability, data integrity, data backup and recovery, and data transfer safety. Moreover, security risks may come from both cloud service providers and intruders, while cloud data warehouses should be both highly protected and effectively refreshed and analyzed through on-line analysis processing. Hence, users seek secure data warehouses at the lowest possible storage and access costs within the pay-as-you-go paradigm.In this thesis, we propose two novel approaches for securing cloud data warehouses by base-p verifiable secret sharing (bpVSS) and flexible verifiable secret sharing (fVSS), respectively. Secret sharing encrypts and distributes data over several cloud service providers, thus enforcing data privacy and availability. bpVSS and fVSS address five shortcomings in existing secret sharing-based approaches. First, they allow on-line analysis processing. Second, they enforce data integrity with the help of both inner and outer signatures. Third, they help users minimize the cost of cloud warehousing by limiting global share volume. Moreover, fVSS balances the load among service providers with respect to their pricing policies. Fourth, fVSS improves secret sharing security by imposing a new constraint: no cloud service provide group can hold enough shares to reconstruct or break the secret. Five, fVSS allows refreshing the data warehouse even when some service providers fail. To evaluate bpVSS' and fVSS' efficiency, we theoretically study the factors that impact our approaches with respect to security, complexity and monetary cost in the pay-as-you-go paradigm. Moreover, we also validate the relevance of our approaches experimentally with the Star Schema Benchmark and demonstrate its superiority to related, existing methods. Entrepôts de données Analyse en ligne (OLAP) Infonuagique Partage de clés secrètes Confidentialité des données Disponibilité des données Intégrité des données Data Warehouses On-line analysis processing (OLAP) Cloud computing Secret sharing Data privacy Data availability Data integrity
34	Optimisation des performances dans les entrepôts distribués avec Mapreduce : traitement des problèmes de partionnement et de distribution des données / Optimizing data management for large-scale distributed data warehouses using MapReduce Arres, Billel 08 February 2016 (has links) Dans ce travail de thèse, nous abordons les problèmes liés au partitionnement et à la distribution des grands volumes d’entrepôts de données distribués avec Mapreduce. Dans un premier temps, nous abordons le problème de la distribution des données. Dans ce cas, nous proposons une stratégie d’optimisation du placement des données, basée sur le principe de la colocalisation. L’objectif est d’optimiser les traitements lors de l’exécution des requêtes d’analyse à travers la définition d’un schéma de distribution intentionnelle des données permettant de réduire la quantité des données transférées entre les noeuds lors des traitements, plus précisément lors phase de tri (shuffle). Nous proposons dans un second temps une nouvelle démarche pour améliorer les performances du framework Hadoop, qui est l’implémentation standard du paradigme Mapreduce. Celle-ci se base sur deux principales techniques d’optimisation. La première consiste en un pré-partitionnement vertical des données entreposées, réduisant ainsi le nombre de colonnes dans chaque fragment. Ce partitionnement sera complété par la suite par un autre partitionnement d’Hadoop, qui est horizontal, appliqué par défaut. L’objectif dans ce cas est d’améliorer l’accès aux données à travers la réduction de la taille des différents blocs de données. La seconde technique permet, en capturant les affinités entre les attributs d’une charge de requêtes et ceux de l’entrepôt, de définir un placement efficace de ces blocs de données à travers les noeuds qui composent le cluster. Notre troisième proposition traite le problème de l’impact du changement de la charge de requêtes sur la stratégie de distribution des données. Du moment que cette dernière dépend étroitement des affinités des attributs des requêtes et de l’entrepôt. Nous avons proposé, à cet effet, une approche dynamique qui permet de prendre en considération les nouvelles requêtes d’analyse qui parviennent au système. Pour pouvoir intégrer l’aspect de "dynamicité", nous avons utilisé un système multi-agents (SMA) pour la gestion automatique et autonome des données entreposées, et cela, à travers la redéfinition des nouveaux schémas de distribution et de la redistribution des blocs de données. Enfin, pour valider nos contributions nous avons conduit un ensemble d’expérimentations pour évaluer nos différentes approches proposées dans ce manuscrit. Nous étudions l’impact du partitionnement et la distribution intentionnelle sur le chargement des données, l’exécution des requêtes d’analyses, la construction de cubes OLAP, ainsi que l’équilibrage de la charge (Load Balacing). Nous avons également défini un modèle de coût qui nous a permis d’évaluer et de valider la stratégie de partitionnement proposée dans ce travail. / In this manuscript, we addressed the problems of data partitioning and distribution for large scale data warehouses distributed with MapReduce. First, we address the problem of data distribution. In this case, we propose a strategy to optimize data placement on distributed systems, based on the collocation principle. The objective is to optimize queries performances through the definition of an intentional data distribution schema of data to reduce the amount of data transferred between nodes during treatments, specifically during MapReduce’s shuffling phase. Secondly, we propose a new approach to improve data partitioning and placement in distributed file systems, especially Hadoop-based systems, which is the standard implementation of the MapReduce paradigm. The aim is to overcome the default data partitioning and placement policies which does not take any relational data characteristics into account. Our proposal proceeds according to two steps. Based on queries workload, it defines an efficient partitioning schema. After that, the system defines a data distribution schema that meets the best user’s needs, and this, by collocating data blocks on the same or closest nodes. The objective in this case is to optimize queries execution and parallel processing performances, by improving data access. Our third proposal addresses the problem of the workload dynamicity, since users analytical needs evolve through time. In this case, we propose the use of multi-agents systems (MAS) as an extension of our data partitioning and placement approach. Through autonomy and self-control that characterize MAS, we developed a platform that defines automatically new distribution schemas, as new queries appends to the system, and apply a data rebalancing according to this new schema. This allows offloading the system administrator of the burden of managing load balance, besides improving queries performances by adopting careful data partitioning and placement policies. Finally, to validate our contributions we conduct a set of experiments to evaluate our different approaches proposed in this manuscript. We study the impact of an intentional data partitioning and distribution on data warehouse loading phase, the execution of analytical queries, OLAP cubes construction, as well as load balancing. We also defined a cost model that allowed us to evaluate and validate the partitioning strategy proposed in this work. Entrepôts de données Analyse en ligne (OLAP) Big Data Mapreduce Cloud computing Hadoop HDFS Systèmes Multi-Agents Data warehouses Online Analytical Processing (OLAP) Data partitioning Big Data Mapreduce Cloud computing Hadoop HDFS Multiagent systems
35	On-line analytical processing in distributed data warehouses Lehner, Wolfgang, Albrecht, Jens 14 April 2022 (has links) The concepts of 'data warehousing' and 'on-line analytical processing' have seen a growing interest in the research and commercial product community. Today, the trend moves away from complex centralized data warehouses to distributed data marts integrated in a common conceptual schema. However, as the first part of this paper demonstrates, there are many problems and little solutions for large distributed decision support systems in worldwide operating corporations. After showing the benefits and problems of the distributed approach, this paper outlines possibilities for achieving performance in distributed online analytical processing. Finally, the architectural framework of the prototypical distributed OLAP system CUBESTAR is outlined. info:eu-repo/classification/ddc/005 ddc:005
36	Processing reporting function views in a data warehouse environment Lehner, Wolfgang, Hummer, W., Schlesinger, L. 02 June 2022 (has links) Reporting functions reflect a novel technique to formulate sequence-oriented queries in SQL. They extend the classical way of grouping and applying aggregation functions by additionally providing a column-based ordering, partitioning, and windowing mechanism. The application area of reporting functions ranges from simple ranking queries (TOP(n)-analyses) over cumulative (Year-To-Date-analyses) to sliding window queries. We discuss the problem of deriving reporting function queries from materialized reporting function views, which is one of the most important issues in efficiently processing queries in a data warehouse environment. Two different derivation algorithms, including their relational mappings are introduced and compared in a test scenario. info:eu-repo/classification/ddc/004 ddc:004
37	On solving the view selection problem in distributed data warehouse architectures Lehner, Wolfgang, Bauer, Andreas 02 June 2022 (has links) The use of materialized views in a data warehouse installation is a common tool to speed up mostly aggregation queries. The problems coming along with materialized aggregate views have triggered a huge variety of proposals, such as picking the optimal set of aggregation combinations, transparently rewriting user queries to take advantage of the summary data, or synchronizing pre-computed summary data as soon as the base data changes. The paper focuses on the problem of view selection in the context of distributed data warehouse architectures. While much research was done with regard to the view selection problem in the central case, we are not aware to any other work discussing the problem of view selection in distributed data warehouse systems. The paper proposes an extension of the concept of an aggregation lattice to capture the distributed semantics. Moreover, we extend a greedy-based selection algorithm based on an adequate cost model for the distributed case. Within a performance study, we finally compare our findings with the approach of applying a selection algorithm locally to each node in a distributed warehouse environment. info:eu-repo/classification/ddc/004 ddc:004
38	Optimistic Coarse-Grained Cache Semantics for Data Marts Lehner, Wolfgang, Thiele, Maik, Albrecht, Jens 15 June 2022 (has links) Data marts and caching are two closely related concepts in the domain of multi-dimensional data. Both store pre-computed data to provide fast response times for complex OLAP queries, and for both it must be guaranteed that every query can be completely processed. However, they differ extremely in their update behaviour which we utilise to build a specific data mart extended by cache semantics. In this paper, we introduce a novel cache exploitation concept for data marts - coarse-grained caching - in which the containedness check for a multi-dimensional query is done through the comparison of the expected and the actual cardinalities. Therefore, we subdivide the multi-dimensional data into coarse partitions, the so called cubletets, which allow to specify the completeness criteria for incoming queries. We show that during query processing, the completeness check is done with no additional costs. info:eu-repo/classification/ddc/004 ddc:004
39	Building a real data warehouse for market research Lehner, Wolfgang, Albrecht, J., Teschke, M., Kirsche, T. 08 April 2022 (has links) This paper reflects the results of the evaluation phase of building a data production system for the retail research division of the GfK, Europe's largest market research company. The application specific requirements like end-user needs or data volume are very different from data warehouses discussed in the literature, making it a real data warehouse. In a case study, these requirements are compared with state-of-the-art solutions offered by leading software vendors. Each of the common architectures (MOLAP, ROLAP, HOLAP) was represented by a product. The result of this comparison is that all systems have to be massively tailored to GfK's needs, especially to cope with meta data management or the maintenance of aggregations. info:eu-repo/classification/ddc/005 ddc:005
40	Building a real data warehouse for market research Lehner, Wolfgang, Albrecht, J., Teschke, M., Kirsche, T. 19 May 2022 (has links) This paper reflects the results of the evaluation phase of building a data production system for the retail research division of the GfK, Europe's largest market research company. The application specific requirements like end-user needs or data volume are very different from data warehouses discussed in the literature, making it a real data warehouse. In a case study, these requirements are compared with state-of-the-art solutions offered by leading software vendors. Each of the common architectures (MOLAP, ROLAP, HOLAP) was represented by a product. The result of this comparison is that all systems have to be massively tailored to GfK's needs, especially to cope with meta data management or the maintenance of aggregations. info:eu-repo/classification/ddc/005 ddc:005

Search results