Global ETD Search

21	Data mining with the SAP NetWeaver BI accelerator Legler, Thomas, Lehner, Wolfgang, Ross, Andrew 03 July 2023 (has links) The new SAP NetWeaver Business Intelligence accelerator is an engine that supports online analytical processing. It performs aggregation in memory and in query runtime over large volumes of structured data. This paper first briefly describes the accelerator and its main architectural features, and cites test results that indicate its power. Then it describes in detail how the accelerator may be used for data mining. The accelerator can perform data mining in the same large repositories of data and using the same compact index structures that it uses for analytical processing. A first such implementation of data mining is described and the results of a performance evaluation are presented. Association rule mining in a distributed architecture was implemented with a variant of the BUC iceberg cubing algorithm. Test results suggest that useful online mining should be possible with wait times of less than 60 seconds on business data that has not been preprocessed. info:eu-repo/classification/ddc/004 ddc:004
22	健保醫療費用審查自動化之研究 / The Research of Automatic Peer Review in National Health Insurance 王復中, Wang, Fu-Chung Unknown Date (has links) 全民健康保險自實施以來，透過危險分擔與社會互助的原則，降低了民眾就醫時的財務障礙，進而促進了全體國民的健康。可是由於收入成長減緩與支出不斷增加的情形，使得這個制度目前已面臨嚴重的財務危機，然而在目前政治與經濟環境的雙重影響之下，健保收入已無法有效增加，因此努力控制醫療費用支出，便成為當務之急。但是全民健康保險是一項社會福利政策，不能因為控制醫療費用而降低了醫療品質。如何將醫療資源有效分配，以便減少醫療資源浪費、維持醫療品質並減輕醫療費用的支出，便需依賴一個良好的審查制度。然而對於醫療費用的審查，不管在設計、分析、控管乃至評估上，都是知識密集的工作，而且審查的過程還必須藉由專業審查者的參與始能完成，因此如何善用資訊科技予以適當之輔助，便成為醫務管理上一個非常重要的議題。本研究使用健保局北區分局感冒等疾病之就醫資料作為樣本，在分析過國內外對醫療費用審查的建議方式後，嘗試設計一個新的自動化審查機制，並發展一套以資料發掘為基礎的自動化審查雛型系統，希望能在醫療院所申報的記錄中找出共同的規則，並利用這些規則自動將有問題的資料篩選出來，幫助健保局與專業審查者將焦點集中在有問題的資料上，以便能更有效率的進行審查的工作。本研究所得到的結果，經健保局人員與專業審查醫師檢視後，認為確實可行，除了證明資料發掘技術可以有效地應用在醫療費用審查，並帶來可觀的效益之外，還達到有效降低審查人力、提昇審查效率的目的。而對於這種自動篩選出異常的審查方式，應如何實際加以應用，本研究也提出了具體的建議架構及實施步驟，供健保局在未來建立自動化審查制度時的參考依據。全民健保費用審查線上分析處理資料發掘 National Health Insurance Peer Review On-Line Analytical Processing Data Mining
23	A Case Study In Weather Pattern Searching Using A Spatial Data Warehouse Model Koylu, Caglar 01 June 2008 (has links) (PDF) Data warehousing and Online Analytical Processing (OLAP) technology has been used to access, visualize and analyze multidimensional, aggregated, and summarized data. Large part of data contains spatial components. Thus, these spatial components convey valuable information and must be included in exploration and analysis phases of a spatial decision support system (SDSS). On the other hand, Geographic Information Systems (GISs) provide a wide range of tools to analyze spatial phenomena and therefore must be included in the analysis phases of a decision support system (DSS). In this regard, this study aims to search for answers to the problem how to design a spatially enabled data warehouse architecture in order to support spatio-temporal data analysis and exploration of multidimensional data. Consequently, in this study, the concepts of OLAP and GISs are synthesized in an integrated fashion to maximize the benefits generated from the strengths of both systems by building a spatial data warehouse model. In this context, a multidimensional spatio-temporal data model is proposed as a result of this synthesis. This model addresses the integration problem of spatial, non-spatial and temporal data and facilitates spatial data exploration and analysis. The model is evaluated by implementing a case study in weather pattern searching.
24	Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3 Kroeze, J.H. (Jan Hendrik) 28 July 2008 (has links) The thesis discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. A threedimensional array is identified as a suitable data structure to build a data cube to capture multidimensional linguistic data in a computer's temporary storage facility. It also enables online analytical processing, like slicing, to be executed on this data cube in order to reveal various subsets and presentations of the data. XML is investigated as a suitable mark-up language to permanently store such an exploitable databank of Biblical Hebrew linguistic data. This concept is illustrated by tagging a phonetic transcription of Genesis 1:1-2:3 on various linguistic levels and manipulating this databank. Transferring the data set between an XML file and a threedimensional array creates a stable environment allowing editing and advanced processing of the data in order to confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Two experiments are executed to demonstrate possible text-mining procedures. Finally, visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting the process of knowledge creation. Although the data set is very small there are exciting indications that the compilation and analysis of aggregate linguistic data may assist linguists to perform rigorous research, for example regarding the definitions of semantic functions and the mapping of these functions onto the syntactic module. / Thesis (PhD (Information Technology))--University of Pretoria, 2008. / Information Science / unrestricted Online analytical processing (olap) Xml Hebrew bible Threedimensional array Visualisation Computational linguistics Text data mining Data warehousing Database management Round-tripping UCTD
25	Analýza veřejně dostupných dat Českého statistického úřadu / Analysis of Public Data of the Czech Statistical Office Pohl, Ondřej January 2017 (has links) The aim of this thesis is analysis of data of the Czech Statistical Office concerning foreign trade. At first, reader familiarize with Business Intelligence and data warehousing. Further, OLAP analysis and data mining basics are explained. In next parts the thesis deal with describing and analysis of data of foreign trade by the help of OLAP technology and data mining in MS SQL Server including selected analytical tasks implementation.
26	Klient pro zobrazování OLAP kostek / Client for Displaying OLAP Cubes Podsedník, Lukáš January 2010 (has links) At the beginning, the project describes basics and utilization of data warehousing and OLAP techniques and operations used within the data warehouses. Then follows a description of one of the commercial OLAP client - based on the features of this product the requirement analysis of the freeware OLAP cube client displayer is desribed - choosing the functionality to be implemented in the client. Using the requirement analysis the structural design of the application (including UML diagrams) is made. The best solution from compared libraries, frameworks and development environments is chosen for the design. Next chapter is about implementation and tools and frameworks used in implemetation. At the end the thesis clasifies the reached results and options for further improvement.
27	Multiuživatelský systém pro podporu znovuvyužití materiálů / Multiuser System for Material Reusing Kolarik, Petr January 2007 (has links) This text is documentation for multi-access system, which supports recoverable materials. It deals with structure possibilities according to functional system specification and its implementation through the PHP together with using MySQL database system. It analyses a progress of system creation from ER diagram through use-case diagram to programming itself. This work shows how to design web advertisement system which enables an user to define personal multi-level views on data. This project might have been as basis for commerce project, which can check up usability designed structure of individual parts.
28	Optimisation des performances dans les entrepôts distribués avec Mapreduce : traitement des problèmes de partionnement et de distribution des données / Optimizing data management for large-scale distributed data warehouses using MapReduce Arres, Billel 08 February 2016 (has links) Dans ce travail de thèse, nous abordons les problèmes liés au partitionnement et à la distribution des grands volumes d’entrepôts de données distribués avec Mapreduce. Dans un premier temps, nous abordons le problème de la distribution des données. Dans ce cas, nous proposons une stratégie d’optimisation du placement des données, basée sur le principe de la colocalisation. L’objectif est d’optimiser les traitements lors de l’exécution des requêtes d’analyse à travers la définition d’un schéma de distribution intentionnelle des données permettant de réduire la quantité des données transférées entre les noeuds lors des traitements, plus précisément lors phase de tri (shuffle). Nous proposons dans un second temps une nouvelle démarche pour améliorer les performances du framework Hadoop, qui est l’implémentation standard du paradigme Mapreduce. Celle-ci se base sur deux principales techniques d’optimisation. La première consiste en un pré-partitionnement vertical des données entreposées, réduisant ainsi le nombre de colonnes dans chaque fragment. Ce partitionnement sera complété par la suite par un autre partitionnement d’Hadoop, qui est horizontal, appliqué par défaut. L’objectif dans ce cas est d’améliorer l’accès aux données à travers la réduction de la taille des différents blocs de données. La seconde technique permet, en capturant les affinités entre les attributs d’une charge de requêtes et ceux de l’entrepôt, de définir un placement efficace de ces blocs de données à travers les noeuds qui composent le cluster. Notre troisième proposition traite le problème de l’impact du changement de la charge de requêtes sur la stratégie de distribution des données. Du moment que cette dernière dépend étroitement des affinités des attributs des requêtes et de l’entrepôt. Nous avons proposé, à cet effet, une approche dynamique qui permet de prendre en considération les nouvelles requêtes d’analyse qui parviennent au système. Pour pouvoir intégrer l’aspect de "dynamicité", nous avons utilisé un système multi-agents (SMA) pour la gestion automatique et autonome des données entreposées, et cela, à travers la redéfinition des nouveaux schémas de distribution et de la redistribution des blocs de données. Enfin, pour valider nos contributions nous avons conduit un ensemble d’expérimentations pour évaluer nos différentes approches proposées dans ce manuscrit. Nous étudions l’impact du partitionnement et la distribution intentionnelle sur le chargement des données, l’exécution des requêtes d’analyses, la construction de cubes OLAP, ainsi que l’équilibrage de la charge (Load Balacing). Nous avons également défini un modèle de coût qui nous a permis d’évaluer et de valider la stratégie de partitionnement proposée dans ce travail. / In this manuscript, we addressed the problems of data partitioning and distribution for large scale data warehouses distributed with MapReduce. First, we address the problem of data distribution. In this case, we propose a strategy to optimize data placement on distributed systems, based on the collocation principle. The objective is to optimize queries performances through the definition of an intentional data distribution schema of data to reduce the amount of data transferred between nodes during treatments, specifically during MapReduce’s shuffling phase. Secondly, we propose a new approach to improve data partitioning and placement in distributed file systems, especially Hadoop-based systems, which is the standard implementation of the MapReduce paradigm. The aim is to overcome the default data partitioning and placement policies which does not take any relational data characteristics into account. Our proposal proceeds according to two steps. Based on queries workload, it defines an efficient partitioning schema. After that, the system defines a data distribution schema that meets the best user’s needs, and this, by collocating data blocks on the same or closest nodes. The objective in this case is to optimize queries execution and parallel processing performances, by improving data access. Our third proposal addresses the problem of the workload dynamicity, since users analytical needs evolve through time. In this case, we propose the use of multi-agents systems (MAS) as an extension of our data partitioning and placement approach. Through autonomy and self-control that characterize MAS, we developed a platform that defines automatically new distribution schemas, as new queries appends to the system, and apply a data rebalancing according to this new schema. This allows offloading the system administrator of the burden of managing load balance, besides improving queries performances by adopting careful data partitioning and placement policies. Finally, to validate our contributions we conduct a set of experiments to evaluate our different approaches proposed in this manuscript. We study the impact of an intentional data partitioning and distribution on data warehouse loading phase, the execution of analytical queries, OLAP cubes construction, as well as load balancing. We also defined a cost model that allowed us to evaluate and validate the partitioning strategy proposed in this work. Entrepôts de données Analyse en ligne (OLAP) Big Data Mapreduce Cloud computing Hadoop HDFS Systèmes Multi-Agents Data warehouses Online Analytical Processing (OLAP) Data partitioning Big Data Mapreduce Cloud computing Hadoop HDFS Multiagent systems
29	雲端服務中銷售員支援之研究 / A study on sales force support in cloud service 翁玉麟 Unknown Date (has links) 客戶關係管理（Customer Relationship Management, CRM）藉由各種資訊技術來留住客戶，以產生更多的商業價值。然而，許多文獻指出，CRM系統的失敗率很高，尤其是CRM主要的核心能力--銷售員自動化（Sales Force Automation, SFA）。研究指出改善的方式包含更好的管理支援、培訓、系統易用性和強烈的使用動機等等。接續此建議，本文提出了一個銷售員支援（Sales Force Support, SFS）系統，藉由線上分析處理（Online Analytical Processing, OLAP）、資料採礦（Data Mining, DM）和雲端服務（Cloud Service）等技術，協助彙整及提供支援銷售員的客戶推薦 (Customer Recommendation)和自我績效評估(Self Evaluation)功能，以刺激更好的銷售能力、滿足客戶與管理。可望提高系統的易用性和業務人員的使用動機，藉以橋接銷售員和管理人員之間的差異。為了評估推薦功能之適用性，本論文也發展一套驗證指標，並採用一套隨機數學模型（Stochastic Mathematical Model），作為強化推薦預測之嘗試。 / Customer Relationship Management (CRM) adopts various information technologies to retain and attain customers in order to generate more business values. However, the earlier studies indicate the failure rate for CRM systems is high and it’s even higher for Sales Force Automation (SFA), a major core in CRM. They usually suggest the enhancement in better management support, more training, user friendliness, and usage motivation, and so on. Following the suggestions, this research proposes a Sales Force Support (SFS) system to integrate technologies like OLAP (Online Analytical Processing), Data Mining (DM), and cloud service, etc. to provide supporting information in customer recommendation and self-evaluation, in order to better stimulate sales and satisfy customer and management. The objectives can be achieved by enhancing the user friendliness and usage motivation, and bridging the differences between sales force and management. To evaluate the fitness of recommendation function, a set of validation measures is also developed. In addition, a stochastic mathematical model is also attempted to enhance the recommendation prediction. 線上分析處理資料採礦客戶關係管理銷售員自動化客戶推薦 Online analytical processing Data mining Customer relationship management Sales force automation Customer recommendation
30	Plan Bouquets : An Exploratory Approach to Robust Query Processing Dutt, Anshuman January 2016 (has links) (PDF) Over the last four decades, relational database systems, with their mathematical basis in first-order logic, have provided a congenial and efficient environment to handle enterprise data during its entire life cycle of generation, storage, maintenance and processing. An organic reason for their pervasive popularity is intrinsic support for declarative user queries, wherein the user only specifies the end objectives, and the system takes on the responsibility of identifying the most efficient means, called “plans”, to achieve these objectives. A crucial input to generating efficient query execution plans are the compile-time estimates of the data volumes that are output by the operators implementing the algebraic predicates present in the query. These volume estimates are typically computed using the “selectivities” of the predicates. Unfortunately, a pervasive problem encountered in practice is that these selectivities often differ significantly from the values actually encountered during query execution, leading to poor plan choices and grossly inflated response times. While the database research community has spent considerable efforts to address the above challenge, the prior techniques all suffer from a systemic limitation - the inability to provide any guarantees on the execution performance. In this thesis, we materially address this long-standing open problem by developing a radically different query processing strategy that lends itself to attractive guarantees on run-time performance. Specifically, in our approach, the compile-time estimation process is completely eschewed for error-prone selectivities. Instead, from the set of optimal plans in the query’s selectivity error space, a limited subset called the “plan bouquet”, is selected such that at least one of the bouquet plans is 2-optimal at each location in the space. Then, at run time, an exploratory sequence of cost-budgeted executions from the plan bouquet is carried out, eventually finding a plan that executes to completion within its assigned budget. The duration and switching of these executions is controlled by a graded progression of isosurfaces projected onto the optimal performance profile. We prove that this construction provides viable guarantees on the worst-case performance relative to an oracular system that magically possesses accurate apriori knowledge of all selectivities. Moreover, it ensures repeatable execution strategies across different invocations of a query, an extremely desirable feature in industrial settings. Our second contribution is a suite of techniques that substantively improve on the performance guarantees offered by the basic bouquet algorithm. First, we present an algorithm that skips carefully chosen executions from the basic plan bouquet sequence, leveraging the observation that an expensive execution may provide better coverage as compared to a series of cheaper siblings, thereby reducing the aggregate exploratory overheads. Next, we explore randomized variants with regard to both the sequence of plan executions and the constitution of the plan bouquet, and show that the resulting guarantees are markedly superior, in expectation, to the corresponding worst case values. From a deployment perspective, the above techniques are appealing since they are completely “black-box”, that is, non-invasive with regard to the database engine, implementable using only API features that are commonly available in modern systems. As a proof of concept, the bouquet approach has been fully prototyped in QUEST, a Java-based tool that provides a visual and interactive demonstration of the bouquet identification and execution phases. In similar spirit, we propose an efficient isosurface identification algorithm that avoids exploration of large portions of the error space and drastically reduces the effort involved in bouquet construction. The plan bouquet approach is ideally suited for “canned” query environments, where the computational investment in bouquet identification is amortized over multiple query invocations. The final contribution of this thesis is extending the advantage of compile-time sub-optimality guarantees to ad hoc query environments where the overheads of the off-line bouquet identification may turn out to be impractical. Specifically, we propose a completely revamped bouquet algorithm that constructs the cost-budgeted execution sequence in an “on-the-fly” manner. This is achieved through a “white-box” interaction style with the engine, whereby the plan output cardinalities exposed by the engine are used to compute lower bounds on the error-prone selectivities during plan executions. For this algorithm, the sub-optimality guarantees are in the form of a low order polynomial of the number of error-prone selectivities in the query. The plan bouquet approach has been empirically evaluated on both PostgreSQL and a commercial engine ComOpt, over the TPC-H and TPC-DS benchmark environments. Our experimental results indicate that it delivers orders of magnitude improvements in the worst-case behavior, without impairing the average-case performance, as compared to the native optimizers of these systems. In absolute terms, the worst case sub-optimality is upper bounded by 20 across the suite of queries, and the average performance is empirically found to be within a factor of 4 wrt the optimal. Even with the on-the-fly bouquet algorithm, the guarantees are found to be within a factor of 3 as compared to those achievable in the corresponding canned query environment. Overall, the plan bouquet approach provides novel performance guarantees that open up exciting possibilities for robust query processing. Robust Query Processing Plan Bouquets Improve Robustness Bounds Plan Bouquet Architecture Query Processing Monadic Second Order Logic Plan Bouquet Approach Plan Bouquet Sequence Query Processing Computer Science

Search results