Global ETD Search

1	Graphs enriched by Cubes (GreC) : a new approach for OLAP on information networks / Graphes enrichis par des Cubes (GreC) : une nouvelle approche pour l’OLAP sur des réseaux d’information Jakawat, Wararat 27 September 2016 (has links) L'analyse en ligne OLAP (Online Analytical Processing) est une des technologies les plus importantes dans les entrepôts de données, elle permet l'analyse multidimensionnelle de données. Cela correspond à un outil d'analyse puissant, tout en étant flexible en terme d'utilisation pour naviguer dans les données, plus ou moins en profondeur. OLAP a été le sujet de différentes améliorations et extensions, avec sans cesse de nouveaux problèmes en lien avec le domaine et les données, par exemple le multimedia, les données spatiales, les données séquentielles, etc. A l'origine, OLAP a été introduit pour analyser des données structurées que l'on peut qualifier de classiques. Cependant, l'émergence des réseaux d'information induit alors un nouveau domaine intéressant qu'il convient d'explorer. Extraire des connaissances à partir de larges réseaux constitue une tâche complexe et non évidente. Ainsi, l'analyse OLAP peut être une bonne alternative pour observer les données avec certains points de vue. Différents types de réseaux d'information peuvent aider les utilisateurs dans différentes activités, en fonction de différents domaines. Ici, nous focalisons notre attention sur les réseaux d'informations bibliographiques construits à partir des bases de données bibliographiques. Ces données permettent d'analyser non seulement la production scientifique, mais également les collaborations entre auteurs. Il existe différents travaux qui proposent d'avoir recours aux technologies OLAP pour les réseaux d'information, nommé ``graph OLAP". Beaucoup de techniques se basent sur ce qu'on peut appeler cube de graphes. Dans cette thèse, nous proposons une nouvelle approche de “graph OLAP” que nous appelons “Graphes enrichis par des Cubes” (GreC). Notre proposition consiste à enrichir les graphes avec des cubes plutôt que de construire des cubes de graphes. En effet, les noeuds et/ou les arêtes du réseau considéré sont décrits par des cubes de données. Cela permet des analyses intéressantes pour l'utilisateur qui peut naviguer au sein d'un graphe enrichi de cubes selon différents niveaux d'analyse, avec des opérateurs dédiés. En outre, notons quatre principaux aspects dans GreC. Premièrement, GreC considère la structure du réseau afin de permettre des opérations OLAP topologiques, et pas seulement des opérations OLAP classiques et informationnelles. Deuxièmement, GreC propose une vision globale du graphe avec des informations multidimensionnelles. Troisièmement, le problème de dimension à évolution lente est pris en charge dans le cadre de l'exploration du réseau. Quatrièmement, et dernièrement, GreC permet l'analyse de données avec une évolution du réseau parce que notre approche permet d'observer la dynamique à travers la dimension temporelle qui peut être présente dans les cubes pour la description des noeuds et/ou arêtes. Pour évaluer GreC, nous avons implémenté notre approche et mené une étude expérimentale sur des jeux de données réelles pour montrer l'intérêt de notre approche. L'approche GreC comprend différents algorithmes. Nous avons validé de manière expérimentale la pertinence de nos algorithmes et montrons leurs performances. / Online Analytical Processing (OLAP) is one of the most important technologies in data warehouse systems, which enables multidimensional analysis of data. It represents a very powerful and flexible analysis tool to manage within the data deeply by operating computation. OLAP has been the subject of improvements and extensions across the board with every new problem concerning domain and data; for instance, multimedia, spatial data, sequence data and etc. Basically, OLAP was introduced to analyze classical structured data. However, information networks are yet another interesting domain. Extracting knowledge inside large networks is a complex task and too big to be comprehensive. Therefore, OLAP analysis could be a good idea to look at a more compressed view. Many kinds of information networks can help users with various activities according to different domains. In this scenario, we further consider bibliographic networks formed on the bibliographic databases. This data allows analyzing not only the productions but also the collaborations between authors. There are research works and proposals that try to use OLAP technologies for information networks and it is called Graph OLAP. Many Graph OLAP techniques are based on a cube of graphs.In this thesis, we propose a new approach for Graph OLAP that is graphs enriched by cubes (GreC). In a different and complementary way, our proposal consists in enriching graphs with cubes. Indeed, the nodes or/and edges of the considered network are described by a cube. It allows interesting analyzes for the user who can navigate within a graph enriched by cubes according to different granularity levels, with dedicated operators. In addition, there are four main aspects in GreC. First, GreC takes into account the structure of network in order to do topological OLAP operations and not only classical or informational OLAP operations. Second, GreC has a global view of a network considered with multidimensional information. Third, the slowly changing dimension problem is taken into account in order to explore a network. Lastly, GreC allows data analysis for the evolution of a network because our approach allows observing the evolution through the time dimensions in the cubes.To evaluate GreC, we implemented our approach and performed an experimental study on a real bibliographic dataset to show the interest of our proposal. GreC approach includes different algorithms. Therefore, we also validated the relevance and the performances of our algorithms experimentally. Online Analytical Processing (OLAP) Réseaux d'information Données bibliographiques Cube de données Bases de données en graphes Online Analytical Processing (OLAP) Information networks Bibliographic data Data cube Graph database
2	A Case Study In Weather Pattern Searching Using A Spatial Data Warehouse Model Koylu, Caglar 01 June 2008 (has links) (PDF) Data warehousing and Online Analytical Processing (OLAP) technology has been used to access, visualize and analyze multidimensional, aggregated, and summarized data. Large part of data contains spatial components. Thus, these spatial components convey valuable information and must be included in exploration and analysis phases of a spatial decision support system (SDSS). On the other hand, Geographic Information Systems (GISs) provide a wide range of tools to analyze spatial phenomena and therefore must be included in the analysis phases of a decision support system (DSS). In this regard, this study aims to search for answers to the problem how to design a spatially enabled data warehouse architecture in order to support spatio-temporal data analysis and exploration of multidimensional data. Consequently, in this study, the concepts of OLAP and GISs are synthesized in an integrated fashion to maximize the benefits generated from the strengths of both systems by building a spatial data warehouse model. In this context, a multidimensional spatio-temporal data model is proposed as a result of this synthesis. This model addresses the integration problem of spatial, non-spatial and temporal data and facilitates spatial data exploration and analysis. The model is evaluated by implementing a case study in weather pattern searching.
3	Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3 Kroeze, J.H. (Jan Hendrik) 28 July 2008 (has links) The thesis discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. A threedimensional array is identified as a suitable data structure to build a data cube to capture multidimensional linguistic data in a computer's temporary storage facility. It also enables online analytical processing, like slicing, to be executed on this data cube in order to reveal various subsets and presentations of the data. XML is investigated as a suitable mark-up language to permanently store such an exploitable databank of Biblical Hebrew linguistic data. This concept is illustrated by tagging a phonetic transcription of Genesis 1:1-2:3 on various linguistic levels and manipulating this databank. Transferring the data set between an XML file and a threedimensional array creates a stable environment allowing editing and advanced processing of the data in order to confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Two experiments are executed to demonstrate possible text-mining procedures. Finally, visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting the process of knowledge creation. Although the data set is very small there are exciting indications that the compilation and analysis of aggregate linguistic data may assist linguists to perform rigorous research, for example regarding the definitions of semantic functions and the mapping of these functions onto the syntactic module. / Thesis (PhD (Information Technology))--University of Pretoria, 2008. / Information Science / unrestricted Online analytical processing (olap) Xml Hebrew bible Threedimensional array Visualisation Computational linguistics Text data mining Data warehousing Database management Round-tripping UCTD
4	Klient pro zobrazování OLAP kostek / Client for Displaying OLAP Cubes Podsedník, Lukáš January 2010 (has links) At the beginning, the project describes basics and utilization of data warehousing and OLAP techniques and operations used within the data warehouses. Then follows a description of one of the commercial OLAP client - based on the features of this product the requirement analysis of the freeware OLAP cube client displayer is desribed - choosing the functionality to be implemented in the client. Using the requirement analysis the structural design of the application (including UML diagrams) is made. The best solution from compared libraries, frameworks and development environments is chosen for the design. Next chapter is about implementation and tools and frameworks used in implemetation. At the end the thesis clasifies the reached results and options for further improvement.
5	Optimisation des performances dans les entrepôts distribués avec Mapreduce : traitement des problèmes de partionnement et de distribution des données / Optimizing data management for large-scale distributed data warehouses using MapReduce Arres, Billel 08 February 2016 (has links) Dans ce travail de thèse, nous abordons les problèmes liés au partitionnement et à la distribution des grands volumes d’entrepôts de données distribués avec Mapreduce. Dans un premier temps, nous abordons le problème de la distribution des données. Dans ce cas, nous proposons une stratégie d’optimisation du placement des données, basée sur le principe de la colocalisation. L’objectif est d’optimiser les traitements lors de l’exécution des requêtes d’analyse à travers la définition d’un schéma de distribution intentionnelle des données permettant de réduire la quantité des données transférées entre les noeuds lors des traitements, plus précisément lors phase de tri (shuffle). Nous proposons dans un second temps une nouvelle démarche pour améliorer les performances du framework Hadoop, qui est l’implémentation standard du paradigme Mapreduce. Celle-ci se base sur deux principales techniques d’optimisation. La première consiste en un pré-partitionnement vertical des données entreposées, réduisant ainsi le nombre de colonnes dans chaque fragment. Ce partitionnement sera complété par la suite par un autre partitionnement d’Hadoop, qui est horizontal, appliqué par défaut. L’objectif dans ce cas est d’améliorer l’accès aux données à travers la réduction de la taille des différents blocs de données. La seconde technique permet, en capturant les affinités entre les attributs d’une charge de requêtes et ceux de l’entrepôt, de définir un placement efficace de ces blocs de données à travers les noeuds qui composent le cluster. Notre troisième proposition traite le problème de l’impact du changement de la charge de requêtes sur la stratégie de distribution des données. Du moment que cette dernière dépend étroitement des affinités des attributs des requêtes et de l’entrepôt. Nous avons proposé, à cet effet, une approche dynamique qui permet de prendre en considération les nouvelles requêtes d’analyse qui parviennent au système. Pour pouvoir intégrer l’aspect de "dynamicité", nous avons utilisé un système multi-agents (SMA) pour la gestion automatique et autonome des données entreposées, et cela, à travers la redéfinition des nouveaux schémas de distribution et de la redistribution des blocs de données. Enfin, pour valider nos contributions nous avons conduit un ensemble d’expérimentations pour évaluer nos différentes approches proposées dans ce manuscrit. Nous étudions l’impact du partitionnement et la distribution intentionnelle sur le chargement des données, l’exécution des requêtes d’analyses, la construction de cubes OLAP, ainsi que l’équilibrage de la charge (Load Balacing). Nous avons également défini un modèle de coût qui nous a permis d’évaluer et de valider la stratégie de partitionnement proposée dans ce travail. / In this manuscript, we addressed the problems of data partitioning and distribution for large scale data warehouses distributed with MapReduce. First, we address the problem of data distribution. In this case, we propose a strategy to optimize data placement on distributed systems, based on the collocation principle. The objective is to optimize queries performances through the definition of an intentional data distribution schema of data to reduce the amount of data transferred between nodes during treatments, specifically during MapReduce’s shuffling phase. Secondly, we propose a new approach to improve data partitioning and placement in distributed file systems, especially Hadoop-based systems, which is the standard implementation of the MapReduce paradigm. The aim is to overcome the default data partitioning and placement policies which does not take any relational data characteristics into account. Our proposal proceeds according to two steps. Based on queries workload, it defines an efficient partitioning schema. After that, the system defines a data distribution schema that meets the best user’s needs, and this, by collocating data blocks on the same or closest nodes. The objective in this case is to optimize queries execution and parallel processing performances, by improving data access. Our third proposal addresses the problem of the workload dynamicity, since users analytical needs evolve through time. In this case, we propose the use of multi-agents systems (MAS) as an extension of our data partitioning and placement approach. Through autonomy and self-control that characterize MAS, we developed a platform that defines automatically new distribution schemas, as new queries appends to the system, and apply a data rebalancing according to this new schema. This allows offloading the system administrator of the burden of managing load balance, besides improving queries performances by adopting careful data partitioning and placement policies. Finally, to validate our contributions we conduct a set of experiments to evaluate our different approaches proposed in this manuscript. We study the impact of an intentional data partitioning and distribution on data warehouse loading phase, the execution of analytical queries, OLAP cubes construction, as well as load balancing. We also defined a cost model that allowed us to evaluate and validate the partitioning strategy proposed in this work. Entrepôts de données Analyse en ligne (OLAP) Big Data Mapreduce Cloud computing Hadoop HDFS Systèmes Multi-Agents Data warehouses Online Analytical Processing (OLAP) Data partitioning Big Data Mapreduce Cloud computing Hadoop HDFS Multiagent systems
6	Plan Bouquets : An Exploratory Approach to Robust Query Processing Dutt, Anshuman January 2016 (has links) (PDF) Over the last four decades, relational database systems, with their mathematical basis in first-order logic, have provided a congenial and efficient environment to handle enterprise data during its entire life cycle of generation, storage, maintenance and processing. An organic reason for their pervasive popularity is intrinsic support for declarative user queries, wherein the user only specifies the end objectives, and the system takes on the responsibility of identifying the most efficient means, called “plans”, to achieve these objectives. A crucial input to generating efficient query execution plans are the compile-time estimates of the data volumes that are output by the operators implementing the algebraic predicates present in the query. These volume estimates are typically computed using the “selectivities” of the predicates. Unfortunately, a pervasive problem encountered in practice is that these selectivities often differ significantly from the values actually encountered during query execution, leading to poor plan choices and grossly inflated response times. While the database research community has spent considerable efforts to address the above challenge, the prior techniques all suffer from a systemic limitation - the inability to provide any guarantees on the execution performance. In this thesis, we materially address this long-standing open problem by developing a radically different query processing strategy that lends itself to attractive guarantees on run-time performance. Specifically, in our approach, the compile-time estimation process is completely eschewed for error-prone selectivities. Instead, from the set of optimal plans in the query’s selectivity error space, a limited subset called the “plan bouquet”, is selected such that at least one of the bouquet plans is 2-optimal at each location in the space. Then, at run time, an exploratory sequence of cost-budgeted executions from the plan bouquet is carried out, eventually finding a plan that executes to completion within its assigned budget. The duration and switching of these executions is controlled by a graded progression of isosurfaces projected onto the optimal performance profile. We prove that this construction provides viable guarantees on the worst-case performance relative to an oracular system that magically possesses accurate apriori knowledge of all selectivities. Moreover, it ensures repeatable execution strategies across different invocations of a query, an extremely desirable feature in industrial settings. Our second contribution is a suite of techniques that substantively improve on the performance guarantees offered by the basic bouquet algorithm. First, we present an algorithm that skips carefully chosen executions from the basic plan bouquet sequence, leveraging the observation that an expensive execution may provide better coverage as compared to a series of cheaper siblings, thereby reducing the aggregate exploratory overheads. Next, we explore randomized variants with regard to both the sequence of plan executions and the constitution of the plan bouquet, and show that the resulting guarantees are markedly superior, in expectation, to the corresponding worst case values. From a deployment perspective, the above techniques are appealing since they are completely “black-box”, that is, non-invasive with regard to the database engine, implementable using only API features that are commonly available in modern systems. As a proof of concept, the bouquet approach has been fully prototyped in QUEST, a Java-based tool that provides a visual and interactive demonstration of the bouquet identification and execution phases. In similar spirit, we propose an efficient isosurface identification algorithm that avoids exploration of large portions of the error space and drastically reduces the effort involved in bouquet construction. The plan bouquet approach is ideally suited for “canned” query environments, where the computational investment in bouquet identification is amortized over multiple query invocations. The final contribution of this thesis is extending the advantage of compile-time sub-optimality guarantees to ad hoc query environments where the overheads of the off-line bouquet identification may turn out to be impractical. Specifically, we propose a completely revamped bouquet algorithm that constructs the cost-budgeted execution sequence in an “on-the-fly” manner. This is achieved through a “white-box” interaction style with the engine, whereby the plan output cardinalities exposed by the engine are used to compute lower bounds on the error-prone selectivities during plan executions. For this algorithm, the sub-optimality guarantees are in the form of a low order polynomial of the number of error-prone selectivities in the query. The plan bouquet approach has been empirically evaluated on both PostgreSQL and a commercial engine ComOpt, over the TPC-H and TPC-DS benchmark environments. Our experimental results indicate that it delivers orders of magnitude improvements in the worst-case behavior, without impairing the average-case performance, as compared to the native optimizers of these systems. In absolute terms, the worst case sub-optimality is upper bounded by 20 across the suite of queries, and the average performance is empirically found to be within a factor of 4 wrt the optimal. Even with the on-the-fly bouquet algorithm, the guarantees are found to be within a factor of 3 as compared to those achievable in the corresponding canned query environment. Overall, the plan bouquet approach provides novel performance guarantees that open up exciting possibilities for robust query processing. Robust Query Processing Plan Bouquets Improve Robustness Bounds Plan Bouquet Architecture Query Processing Monadic Second Order Logic Plan Bouquet Approach Plan Bouquet Sequence Query Processing Computer Science
7	Měření výkonnosti podniku / Corporate Performance Measurement Pavlová, Petra January 2012 (has links) This thesis deals with the application of Business Intelligence (BI) to support the corporate performance management in ISS Europe, spol. s r. o. This company provides licences and implements original software products as well as third-party software products. First, an analysis is conducted in the given company, which then serves as basis for the implementation of the BI solution that should be interconnected with the company strategies. The main goal is the implementation of a pilot BI solution to aid the monitoring and optimisation of corporate performance. Among secondary goals are the analysis of related concepts, business strategy analysis, strategic goals and systems identification and the proposition and implementation of a pilot BI solution. In its theoretical part, this thesis focuses on the analysis of concepts related to corporate performance and BI implementations and shortly describes the company together with its business strategy. The following practical part is based on the theoretical findings. An analysis of the company is carried out using the Balanced Scorecard (BSC) methodology, the result of which is depicted in a strategic map. This methodology is then supplemented by the Activity Based Costing (ABC) analytical method, which divides expenses according to assets. The results are informational data about which expenses are linked to handling individual developmental, implementational and operational demands for particular contracts. This is followed by an original proposition and the implementation of a BI solution which includes the creation of a Data Warehouse (DWH), designing Extract Transform and Load (ETL) and Online Analytical Processing (OLAP) systems and generating sample reports. The main contribution of this thesis is in providing the company management with an analysis of company data using a multidimensional perspective which can be used as basis for prompt and correct decision-making, realistic planning and performance and product optimisation.
8	Data marts as management information delivery mechanisms: utilisation in manufacturing organisations with third party distribution Ponelis, S.R. (Shana Rachel) 06 August 2003 (has links) Customer knowledge plays a vital part in organisations today, particularly in sales and marketing processes, where customers can either be channel partners or final consumers. Managing customer data and/or information across business units, departments, and functions is vital. Frequently, channel partners gather and capture data about downstream customers and consumers that organisations further upstream in the channel require to be incorporated into their information systems in order to allow for management information delivery to their users. In this study, the focus is placed on manufacturing organisations using third party distribution since the flow of information between channel partner organisations in a supply chain (in contrast to the flow of products) provides an important link between organisations and increasingly represents a source of competitive advantage in the marketplace. The purpose of this study is to determine whether there is a significant difference in the use of sales and marketing data marts as management information delivery mechanisms in manufacturing organisations in different industries, particularly the pharmaceuticals and branded consumer products. The case studies presented in this dissertation indicates that there are significant differences between the use of sales and marketing data marts in different manufacturing industries, which can be ascribed to the industry, both directly and indirectly. / Thesis (MIS(Information Science))--University of Pretoria, 2002. / Information Science / MIS / unrestricted Pharmaceuticals Online analytical processing (olap) Supply chain Data mart Information dissemination Data warehouse Manufacturing Branded consumer products Information systems Data communications Sales Marketing Management information requirements Query and reporting Third party distribution Sales forecasting UCTD

Search results