Spelling suggestions: "subject:"anline analytical processing"" "subject:"bnline analytical processing""
11 |
Analýza veřejně dostupných dat Českého statistického úřadu / Analysis of Public Data of the Czech Statistical OfficePohl, Ondřej January 2017 (has links)
The aim of this thesis is analysis of data of the Czech Statistical Office concerning foreign trade. At first, reader familiarize with Business Intelligence and data warehousing. Further, OLAP analysis and data mining basics are explained. In next parts the thesis deal with describing and analysis of data of foreign trade by the help of OLAP technology and data mining in MS SQL Server including selected analytical tasks implementation.
|
12 |
Klient pro zobrazování OLAP kostek / Client for Displaying OLAP CubesPodsedník, Lukáš January 2010 (has links)
At the beginning, the project describes basics and utilization of data warehousing and OLAP techniques and operations used within the data warehouses. Then follows a description of one of the commercial OLAP client - based on the features of this product the requirement analysis of the freeware OLAP cube client displayer is desribed - choosing the functionality to be implemented in the client. Using the requirement analysis the structural design of the application (including UML diagrams) is made. The best solution from compared libraries, frameworks and development environments is chosen for the design. Next chapter is about implementation and tools and frameworks used in implemetation. At the end the thesis clasifies the reached results and options for further improvement.
|
13 |
Multiuživatelský systém pro podporu znovuvyužití materiálů / Multiuser System for Material ReusingKolarik, Petr January 2007 (has links)
This text is documentation for multi-access system, which supports recoverable materials. It deals with structure possibilities according to functional system specification and its implementation through the PHP together with using MySQL database system. It analyses a progress of system creation from ER diagram through use-case diagram to programming itself. This work shows how to design web advertisement system which enables an user to define personal multi-level views on data. This project might have been as basis for commerce project, which can check up usability designed structure of individual parts.
|
14 |
Optimisation des performances dans les entrepôts distribués avec Mapreduce : traitement des problèmes de partionnement et de distribution des données / Optimizing data management for large-scale distributed data warehouses using MapReduceArres, Billel 08 February 2016 (has links)
Dans ce travail de thèse, nous abordons les problèmes liés au partitionnement et à la distribution des grands volumes d’entrepôts de données distribués avec Mapreduce. Dans un premier temps, nous abordons le problème de la distribution des données. Dans ce cas, nous proposons une stratégie d’optimisation du placement des données, basée sur le principe de la colocalisation. L’objectif est d’optimiser les traitements lors de l’exécution des requêtes d’analyse à travers la définition d’un schéma de distribution intentionnelle des données permettant de réduire la quantité des données transférées entre les noeuds lors des traitements, plus précisément lors phase de tri (shuffle). Nous proposons dans un second temps une nouvelle démarche pour améliorer les performances du framework Hadoop, qui est l’implémentation standard du paradigme Mapreduce. Celle-ci se base sur deux principales techniques d’optimisation. La première consiste en un pré-partitionnement vertical des données entreposées, réduisant ainsi le nombre de colonnes dans chaque fragment. Ce partitionnement sera complété par la suite par un autre partitionnement d’Hadoop, qui est horizontal, appliqué par défaut. L’objectif dans ce cas est d’améliorer l’accès aux données à travers la réduction de la taille des différents blocs de données. La seconde technique permet, en capturant les affinités entre les attributs d’une charge de requêtes et ceux de l’entrepôt, de définir un placement efficace de ces blocs de données à travers les noeuds qui composent le cluster. Notre troisième proposition traite le problème de l’impact du changement de la charge de requêtes sur la stratégie de distribution des données. Du moment que cette dernière dépend étroitement des affinités des attributs des requêtes et de l’entrepôt. Nous avons proposé, à cet effet, une approche dynamique qui permet de prendre en considération les nouvelles requêtes d’analyse qui parviennent au système. Pour pouvoir intégrer l’aspect de "dynamicité", nous avons utilisé un système multi-agents (SMA) pour la gestion automatique et autonome des données entreposées, et cela, à travers la redéfinition des nouveaux schémas de distribution et de la redistribution des blocs de données. Enfin, pour valider nos contributions nous avons conduit un ensemble d’expérimentations pour évaluer nos différentes approches proposées dans ce manuscrit. Nous étudions l’impact du partitionnement et la distribution intentionnelle sur le chargement des données, l’exécution des requêtes d’analyses, la construction de cubes OLAP, ainsi que l’équilibrage de la charge (Load Balacing). Nous avons également défini un modèle de coût qui nous a permis d’évaluer et de valider la stratégie de partitionnement proposée dans ce travail. / In this manuscript, we addressed the problems of data partitioning and distribution for large scale data warehouses distributed with MapReduce. First, we address the problem of data distribution. In this case, we propose a strategy to optimize data placement on distributed systems, based on the collocation principle. The objective is to optimize queries performances through the definition of an intentional data distribution schema of data to reduce the amount of data transferred between nodes during treatments, specifically during MapReduce’s shuffling phase. Secondly, we propose a new approach to improve data partitioning and placement in distributed file systems, especially Hadoop-based systems, which is the standard implementation of the MapReduce paradigm. The aim is to overcome the default data partitioning and placement policies which does not take any relational data characteristics into account. Our proposal proceeds according to two steps. Based on queries workload, it defines an efficient partitioning schema. After that, the system defines a data distribution schema that meets the best user’s needs, and this, by collocating data blocks on the same or closest nodes. The objective in this case is to optimize queries execution and parallel processing performances, by improving data access. Our third proposal addresses the problem of the workload dynamicity, since users analytical needs evolve through time. In this case, we propose the use of multi-agents systems (MAS) as an extension of our data partitioning and placement approach. Through autonomy and self-control that characterize MAS, we developed a platform that defines automatically new distribution schemas, as new queries appends to the system, and apply a data rebalancing according to this new schema. This allows offloading the system administrator of the burden of managing load balance, besides improving queries performances by adopting careful data partitioning and placement policies. Finally, to validate our contributions we conduct a set of experiments to evaluate our different approaches proposed in this manuscript. We study the impact of an intentional data partitioning and distribution on data warehouse loading phase, the execution of analytical queries, OLAP cubes construction, as well as load balancing. We also defined a cost model that allowed us to evaluate and validate the partitioning strategy proposed in this work.
|
15 |
雲端服務中銷售員支援之研究 / A study on sales force support in cloud service翁玉麟 Unknown Date (has links)
客戶關係管理(Customer Relationship Management, CRM)藉由各種資訊技術來留住客戶,以產生更多的商業價值。然而,許多文獻指出,CRM系統的失敗率很高,尤其是CRM主要的核心能力--銷售員自動化(Sales Force Automation, SFA)。研究指出改善的方式包含更好的管理支援、培訓、系統易用性和強烈的使用動機等等。接續此建議,本文提出了一個銷售員支援(Sales Force Support, SFS)系統,藉由線上分析處理(Online Analytical Processing, OLAP)、資料採礦(Data Mining, DM)和雲端服務(Cloud Service)等技術,協助彙整及提供支援銷售員的客戶推薦 (Customer Recommendation)和自我績效評估(Self Evaluation)功能,以刺激更好的銷售能力、滿足客戶與管理。可望提高系統的易用性和業務人員的使用動機,藉以橋接銷售員和管理人員之間的差異。為了評估推薦功能之適用性,本論文也發展一套驗證指標,並採用一套隨機數學模型(Stochastic Mathematical Model),作為強化推薦預測之嘗試。 / Customer Relationship Management (CRM) adopts various information technologies to retain and attain customers in order to generate more business values. However, the earlier studies indicate the failure rate for CRM systems is high and it’s even higher for Sales Force Automation (SFA), a major core in CRM. They usually suggest the enhancement in better management support, more training, user friendliness, and usage motivation, and so on. Following the suggestions, this research proposes a Sales Force Support (SFS) system to integrate technologies like OLAP (Online Analytical Processing), Data Mining (DM), and cloud service, etc. to provide supporting information in customer recommendation and self-evaluation, in order to better stimulate sales and satisfy customer and management. The objectives can be achieved by enhancing the user friendliness and usage motivation, and bridging the differences between sales force and management. To evaluate the fitness of recommendation function, a set of validation measures is also developed. In addition, a stochastic mathematical model is also attempted to enhance the recommendation prediction.
|
16 |
Plan Bouquets : An Exploratory Approach to Robust Query ProcessingDutt, Anshuman January 2016 (has links) (PDF)
Over the last four decades, relational database systems, with their mathematical basis in first-order logic, have provided a congenial and efficient environment to handle enterprise data during its entire life cycle of generation, storage, maintenance and processing. An organic reason for their pervasive popularity is intrinsic support for declarative user queries, wherein the user only specifies the end objectives, and the system takes on the responsibility of identifying the most efficient means, called “plans”, to achieve these objectives. A crucial input to generating efficient query execution plans are the compile-time estimates of the data volumes that are output by the operators implementing the algebraic predicates present in the query. These volume estimates are typically computed using the “selectivities” of the predicates. Unfortunately, a pervasive problem encountered in practice is that these selectivities often differ significantly from the values actually encountered during query execution, leading to poor plan choices and grossly inflated response times. While the database research community has spent considerable efforts to address the above challenge, the prior techniques all suffer from a systemic limitation - the inability to provide any guarantees on the execution performance.
In this thesis, we materially address this long-standing open problem by developing a radically different query processing strategy that lends itself to attractive guarantees on run-time performance. Specifically, in our approach, the compile-time estimation process is completely eschewed for error-prone selectivities. Instead, from the set of optimal plans in the query’s selectivity error space, a limited subset called the “plan bouquet”, is selected such that at least one of the bouquet plans is 2-optimal at each location in the space. Then, at run time, an exploratory sequence of cost-budgeted executions from the plan bouquet is carried out, eventually finding a plan that executes to completion within its assigned budget. The duration and switching of these executions is controlled by a graded progression of isosurfaces projected onto the optimal performance profile. We prove that this construction provides viable guarantees on the worst-case performance relative to an oracular system that magically possesses accurate apriori knowledge of all selectivities. Moreover, it ensures repeatable execution strategies across different invocations of a query, an extremely desirable feature in industrial settings.
Our second contribution is a suite of techniques that substantively improve on the performance guarantees offered by the basic bouquet algorithm. First, we present an algorithm that skips carefully chosen executions from the basic plan bouquet sequence, leveraging the observation that an expensive execution may provide better coverage as compared to a series of cheaper siblings, thereby reducing the aggregate exploratory overheads. Next, we explore randomized variants with regard to both the sequence of plan executions and the constitution of the plan bouquet, and show that the resulting guarantees are markedly superior, in expectation, to the corresponding worst case values.
From a deployment perspective, the above techniques are appealing since they are completely “black-box”, that is, non-invasive with regard to the database engine, implementable using only API features that are commonly available in modern systems. As a proof of concept, the bouquet approach has been fully prototyped in QUEST, a Java-based tool that provides a visual and interactive demonstration of the bouquet identification and execution phases. In similar spirit, we propose an efficient isosurface identification algorithm that avoids exploration of large portions of the error space and drastically reduces the effort involved in bouquet construction.
The plan bouquet approach is ideally suited for “canned” query environments, where the computational investment in bouquet identification is amortized over multiple query invocations. The final contribution of this thesis is extending the advantage of compile-time sub-optimality guarantees to ad hoc query environments where the overheads of the off-line bouquet identification may turn out to be impractical. Specifically, we propose a completely revamped bouquet algorithm that constructs the cost-budgeted execution sequence in an “on-the-fly” manner. This is achieved through a “white-box” interaction style with the engine, whereby the plan output cardinalities exposed by the engine are used to compute lower bounds on the error-prone selectivities during plan executions. For this algorithm, the sub-optimality guarantees are in the form of a low order polynomial of the number of error-prone selectivities in the query.
The plan bouquet approach has been empirically evaluated on both PostgreSQL and a commercial engine ComOpt, over the TPC-H and TPC-DS benchmark environments. Our experimental results indicate that it delivers orders of magnitude improvements in the worst-case behavior, without impairing the average-case performance, as compared to the native optimizers of these systems. In absolute terms, the worst case sub-optimality is upper bounded by 20 across the suite of queries, and the average performance is empirically found to be within a factor of 4 wrt the optimal. Even with the on-the-fly bouquet algorithm, the guarantees are found to be within a factor of 3 as compared to those achievable in the corresponding canned query environment.
Overall, the plan bouquet approach provides novel performance guarantees that open up exciting possibilities for robust query processing.
|
17 |
Návrh metodiky testování BI řešení / Design of methodology for BI solutions testingJakubičková, Nela January 2011 (has links)
This thesis deals with Business Intelligence and its testing. It seeks to highlight the differences from the classical software testing and finally design a methodology for BI solutions testing that could be used in practice on real projects of BI companies. The aim of thesis is to design a methodology for BI solutions testing based on theoretical knowledge of Business Intelligence and software testing with an emphasis on the specific BI characteristics and requirements and also in accordance with Clever Decision's requirements and test it in practice on a real project in this company. The paper is written up on the basis of studying literature in the field of Business Intelligence and software testing from Czech and foreign sources as well as on the recommendations and experience of Clever Decision's employees. It is one of the few if not the first sources dealing with methodology for BI solutions testing in the Czech language. This work could also serve as a basis for more comprehensive methodologies of BI solutions testing. The thesis can be divided into theoretical and practical part. The theoretical part tries to explain the purpose of Business Intelligence use in enterprises. It elucidates particular components of the BI solution, then the actual software testing, various types of tests, with emphasis on the differences and specificities of Business Intelligence. The theoretical part is followed by designed methodology for BI solutions using a generic model for the BI/DW solution testing. The practical part's highlight is the description of real BI project testing in Clever Decision according to the designed methodology.
|
18 |
Měření výkonnosti podniku / Corporate Performance MeasurementPavlová, Petra January 2012 (has links)
This thesis deals with the application of Business Intelligence (BI) to support the corporate performance management in ISS Europe, spol. s r. o. This company provides licences and implements original software products as well as third-party software products. First, an analysis is conducted in the given company, which then serves as basis for the implementation of the BI solution that should be interconnected with the company strategies. The main goal is the implementation of a pilot BI solution to aid the monitoring and optimisation of corporate performance. Among secondary goals are the analysis of related concepts, business strategy analysis, strategic goals and systems identification and the proposition and implementation of a pilot BI solution. In its theoretical part, this thesis focuses on the analysis of concepts related to corporate performance and BI implementations and shortly describes the company together with its business strategy. The following practical part is based on the theoretical findings. An analysis of the company is carried out using the Balanced Scorecard (BSC) methodology, the result of which is depicted in a strategic map. This methodology is then supplemented by the Activity Based Costing (ABC) analytical method, which divides expenses according to assets. The results are informational data about which expenses are linked to handling individual developmental, implementational and operational demands for particular contracts. This is followed by an original proposition and the implementation of a BI solution which includes the creation of a Data Warehouse (DWH), designing Extract Transform and Load (ETL) and Online Analytical Processing (OLAP) systems and generating sample reports. The main contribution of this thesis is in providing the company management with an analysis of company data using a multidimensional perspective which can be used as basis for prompt and correct decision-making, realistic planning and performance and product optimisation.
|
19 |
Data marts as management information delivery mechanisms: utilisation in manufacturing organisations with third party distributionPonelis, S.R. (Shana Rachel) 06 August 2003 (has links)
Customer knowledge plays a vital part in organisations today, particularly in sales and marketing processes, where customers can either be channel partners or final consumers. Managing customer data and/or information across business units, departments, and functions is vital. Frequently, channel partners gather and capture data about downstream customers and consumers that organisations further upstream in the channel require to be incorporated into their information systems in order to allow for management information delivery to their users. In this study, the focus is placed on manufacturing organisations using third party distribution since the flow of information between channel partner organisations in a supply chain (in contrast to the flow of products) provides an important link between organisations and increasingly represents a source of competitive advantage in the marketplace. The purpose of this study is to determine whether there is a significant difference in the use of sales and marketing data marts as management information delivery mechanisms in manufacturing organisations in different industries, particularly the pharmaceuticals and branded consumer products. The case studies presented in this dissertation indicates that there are significant differences between the use of sales and marketing data marts in different manufacturing industries, which can be ascribed to the industry, both directly and indirectly. / Thesis (MIS(Information Science))--University of Pretoria, 2002. / Information Science / MIS / unrestricted
|
20 |
AL: Unified Analytics in Domain Specific TermsLuong, Johannes, Habich, Dirk, Lehner, Wolfgang 13 June 2022 (has links)
Data driven organizations gather information on various aspects of their endeavours and analyze that information to gain valuable insights or to increase automatization. Today, these organizations can choose from a wealth of specialized analytical libraries and platforms to meet their functional and non-functional requirements. Indeed, many common application scenarios involve the combination of multiple such libraries and platforms in order to provide a holistic perspective. Due to the scattered landscape of specialized analytical tools, this integration can result in complex and hard to evolve applications. In addition, the necessary movement of data between tools and formats can introduce a serious performance penalty. In this article we present a unified programming environment for analytical applications. The environment includes AL, a programming language that combines concepts of various common analytical domains. Further, the environment also includes a flexible compilation system that uses a language-, domain-, and platform independent program intermediate representation to separate high level application logic and physical organisation. We provide a detailed introduction of AL, establish our program intermediate representation as a generally useful abstraction, and give a detailed explanation of the translation of AL programs into workloads for our experimental shared-memory processing engine.
|
Page generated in 0.1606 seconds