Spelling suggestions: "subject:"data cure"" "subject:"data cabe""
1 |
A Personalized Smart Cube for Faster and Reliable Access to DataAntwi, Daniel K. 02 December 2013 (has links)
Organizations own data sources that contain millions, billions or even trillions of rows
and these data are usually highly dimensional in nature. Typically, these raw repositories
are comprised of numerous independent data sources that are too big to be copied or
joined, with the consequence that aggregations become highly problematic. Data cubes
play an essential role in facilitating fast Online Analytical Processing (OLAP) in many
multi-dimensional data warehouses. Current data cube computation techniques have
had some success in addressing the above-mentioned aggregation problem. However,
the combined problem of reducing data cube size for very large and highly dimensional
databases, while guaranteeing fast query response times, has received less attention.
Another issue is that most OLAP tools often causes users to be lost in the ocean of
data while performing data analysis. Often, most users are interested in only a subset
of the data. For example, consider in such a scenario, a business manager who wants
to answer the crucial location-related business question. "Why are my sales declining
at location X"? This manager wants fast, unambiguous location-aware answers to his
queries. He requires access to only the relevant ltered information, as found from the
attributes that are directly correlated with his current needs. Therefore, it is important
to determine and to extract, only that small data subset that is highly relevant from a
particular user's location and perspective.
In this thesis, we present the Personalized Smart Cube approach to address the abovementioned scenario. Our approach consists of two main parts. Firstly, we combine
vertical partitioning, partial materialization and dynamic computation to drastically
reduce the size of the computed data cube while guaranteeing fast query response times.
Secondly, our personalization algorithm dynamically monitors user query pattern and
creates a personalized data cube for each user. This ensures that users utilize only that
small subset of data that is most relevant to them.
Our experimental evaluation of our Personalized Smart Cube approach showed that
our work compared favorably with other state-of-the-art methods. We evaluated our
work focusing on three main criteria, namely the storage space used, query response
time and the cost savings ratio of using a personalized cube. The results showed that our
algorithm materializes a relatively smaller number of views than other techniques and it
also compared favourable in terms of query response time. Further, our personalization
algorithm is superior to the state-of-the art Virtual Cube algorithm, when evaluated
in terms of the number of user queries that were successfully answered when using a
personalized cube, instead of the base cube.
|
2 |
A Personalized Smart Cube for Faster and Reliable Access to DataAntwi, Daniel K. January 2013 (has links)
Organizations own data sources that contain millions, billions or even trillions of rows
and these data are usually highly dimensional in nature. Typically, these raw repositories
are comprised of numerous independent data sources that are too big to be copied or
joined, with the consequence that aggregations become highly problematic. Data cubes
play an essential role in facilitating fast Online Analytical Processing (OLAP) in many
multi-dimensional data warehouses. Current data cube computation techniques have
had some success in addressing the above-mentioned aggregation problem. However,
the combined problem of reducing data cube size for very large and highly dimensional
databases, while guaranteeing fast query response times, has received less attention.
Another issue is that most OLAP tools often causes users to be lost in the ocean of
data while performing data analysis. Often, most users are interested in only a subset
of the data. For example, consider in such a scenario, a business manager who wants
to answer the crucial location-related business question. "Why are my sales declining
at location X"? This manager wants fast, unambiguous location-aware answers to his
queries. He requires access to only the relevant ltered information, as found from the
attributes that are directly correlated with his current needs. Therefore, it is important
to determine and to extract, only that small data subset that is highly relevant from a
particular user's location and perspective.
In this thesis, we present the Personalized Smart Cube approach to address the abovementioned scenario. Our approach consists of two main parts. Firstly, we combine
vertical partitioning, partial materialization and dynamic computation to drastically
reduce the size of the computed data cube while guaranteeing fast query response times.
Secondly, our personalization algorithm dynamically monitors user query pattern and
creates a personalized data cube for each user. This ensures that users utilize only that
small subset of data that is most relevant to them.
Our experimental evaluation of our Personalized Smart Cube approach showed that
our work compared favorably with other state-of-the-art methods. We evaluated our
work focusing on three main criteria, namely the storage space used, query response
time and the cost savings ratio of using a personalized cube. The results showed that our
algorithm materializes a relatively smaller number of views than other techniques and it
also compared favourable in terms of query response time. Further, our personalization
algorithm is superior to the state-of-the art Virtual Cube algorithm, when evaluated
in terms of the number of user queries that were successfully answered when using a
personalized cube, instead of the base cube.
|
3 |
Analýza a vizualizace statistických Linkded Data / Analysing and Visualizing Statistical Linked DataHelmich, Jiří January 2013 (has links)
The thesis describes several means of processing statistical data in the ambience of Linked Data and is in particular focused on the utilization of Data Cube Vocabulary metaformat. Its content offers a description of tools related to analysis and visualization of RDF data not only from the statistical view. An indivisible part of this work is the depiction of the Payola tool on whose development is the author still working on. The outcome of this thesis is mainly proposal and consequential implementation of the system that enables a conversion of RDF data in compliance with the DCV vocabularies. The designed system was implemented and integrated to the Payola application. Several other extensions of the system were also implemented by the author. Within the scope of the implementation process there are mentioned also limitations arising from the integration with Payola. In the conclusion the writer describes a few experiments where some of the chosen datasets were applied to the implemented system. Powered by TCPDF (www.tcpdf.org)
|
4 |
Fuzzy Spatial Data Cube Construction And Its Use In Association Rule MiningIsik, Narin 01 June 2005 (has links) (PDF)
The popularity of spatial databases increases since the amount of the spatial data that need to be handled has increased by the use of digital maps, images from satellites, video cameras, medical equipment, sensor networks, etc. Spatial data are difficult to examine and extract interesting knowledge / hence, applications that assist decision-making about spatial data like weather forecasting, traffic supervision, mobile communication, etc. have been introduced. In this thesis, more natural and precise knowledge from spatial data is generated by construction of fuzzy spatial data cube and extraction of fuzzy association rules from it in order to improve decision-making about spatial data. This involves an extensive research about spatial knowledge discovery and how fuzzy logic can be used to develop it. It is stated that incorporating fuzzy logic to spatial data cube construction necessitates a new method for aggregation of fuzzy spatial data. We illustrate how this method also enhances the meaning of fuzzy spatial generalization rules and fuzzy association rules with a case-study about weather pattern searching. This study contributes to spatial knowledge discovery by generating more understandable and interesting knowledge from spatial data by extending spatial generalization with fuzzy memberships, extending the spatial aggregation in spatial data cube construction by utilizing weighted measures, and generating fuzzy association rules from the constructed fuzzy spatial data cube.
|
5 |
A Recursive Relative Prefix Sum Approach to Range Queries in Data WarehousesWu¡@, Fa-Jung 07 July 2002 (has links)
Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. On-Line Analytical Processing (OLAP) provides advanced analysis tools to extract information from data stored in a Data Warehouse.
OLAP is designed to provide aggregate information that can be used to analyze the contents of databases and data warehouses. A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing
ranges of values for numeric dimensions. Range sum queries are very useful in finding trends and in discovering relationships between attributes in the database. There is a method, prefix sum method, promises that any range sum query on a data cube can be answered in constant time by precomputing some auxiliary information. However, it is hampered by its update cost. For
today's applications, interactive data analysis applications which provide current or "near current" information will require fast
response time and have reasonable update time. Since the size of a data cube is exponential in the number of its dimensions, rebuilding the entire data cube can be very costly and is not
realistic. To cope with this dynamic data cube problem, several strategies have been proposed. They all use specific data structures, which require extra storage cost, to response range
sum query fast. For example, the double relative prefix sum method makes use of three components: a block prefix array, a relative overlay array and a relative prefix array to store auxiliary
information. Although the double relative prefix sum method improves the update cost, it increases the query time. In the thesis, we present a method, called the recursive relative
prefix sum method, which tries to provide a compromise between query and update cost. In the recursive relative prefix sum method with k levels, we use a relative prefix array and k
relative overlay arrays. From our performance study, we show that the update cost of our method is always less than that of the prefix sum method. In most of cases, the update cost of our method is less than that of the relative prefix sum method. Moreover, in most of cases, the query cost of our method is less than that of
the double relative prefix sum method. Compared with the dynamic data cube method, our method has lower storage cost and shorter query time. Consequently, our recursive relative prefix sum method has a reasonable response time for ad hoc range queries on the data cube, while at the same time, greatly reduces the update cost. In some applications, however, updating in some regions may happen more frequently than others. We also provide a solution, called the weighted relative prefix sum} method, for this situation. Therefore, this method can also provide a compromise between the range sum query cost and the update cost, when the update probabilities of different regions are considered.
|
6 |
Preprocessing unbounded data for use in real time visualization : Building a visualization data cube of unbounded dataHallman, Isabelle January 2019 (has links)
This thesis evaluates the viability of a data cube as a basis for visualization of unbounded data. A cube designed for use with visualization of static data was adapted to allow for point-by-point insertions. The new cube was evaluated by measuring the time it took to insert different numbers of data points. The results indicate that the cube can keep up with data streams with a velocity of up to approximately 100 000 data points per second. The conclusion is that the cube is useful if the velocity of the data stream is within this bound, and if the granularity of the represented dimensions is sufficiently low. / Det här exjobbet utvärderar dugligheten av en datakub som bas för visualisering av obegränsad data. En kub designad för användning till visualisering av statisk data anpassades till att medge insättning punkt för punkt. Den nya kuben evaluerades genom att mäta tiden det tog att sätta in olika antal datapunkter. Resultaten indikerade att kuben kan hantera dataströmmar med en hastighet på upp till 100 000 punkter per sekund. Slutsatsen är att kuben är användbar om hastigheten av dataströmmen är inom denna gräns, och om grovheten av de representerade dimensionerna är tillräckligt hög.
|
7 |
ETANA-CMV: A coordinated multiple view visual browsing interface for ETANA-DLSam Rajkumar, Johnny L. 21 February 2007 (has links)
Archeological research embracing complex Information Technology techniques can result in vast quantities of heterogeneous information from different sites in different formats. ETANA-DL is an Archeological Digital Library (DL), providing services suited for the archeological domain. With a growing collection of records in the DL, it is a challenge to present them in an organized and meaningful way.
We have designed a new visual browsing interface called ETANA-CMV that aims to provide users a richer and more insightful browsing experience. ETANA-CMV allows users to navigate through the records in ETANA-DL that are multidimensional, hierarchical, and categorical in nature. ETANA-CMV was designed to be scalable, flexible, and easy to learn.
This interface employs a data cube based browsing index to counter performance issues that usually limit the interactivity of visual browsing interfaces to DLs. The interface has been integrated with the existing Browse Interface and the search service in ETANA-DL. Formative evaluation of the new visual interface led to several improvements in the interface. It appears that users were able to detect trends in the DL collections more accurately using visualization based strategies than with the existing textual browse interface. / Master of Science
|
8 |
[en] RDXEL: A TOOLKIT FOR RDF STATISTICAL DATA MANIPULATION THROUGH SPREADSHEETS / [pt] RDXEL: UM CONJUNTO DE FERRAMENTAS PARA MANIPULAÇÃO DE DADOS ESTATÍSTICOS EM RDF POR MEIO DE PLANILHASMARCIA LUCAS PESCE 03 May 2016 (has links)
[pt] Dados estatísticos são uma das mais importantes fontes de informação para atividades humanas e organizações. No entanto, o acesso, consulta e correlação deste tipo de dados demanda grande esforço, principalmente em situações que envolvem diferentes organizações. Soluções que facilitem o acesso e a integração de grandes bases de dados analíticos, desta forma, agregam muito valor a este cenário. Neste trabalho propomos um arcabouço de software que permite com que dados estatísticos sejam eficientemente transformados e representados no formato de triplas RDF. Utilizando como base o DataCube Vocabulary, padrão W3C para o processo de triplificação de informações, a solução proposta facilita a consulta, análise, e reuso dos dados quando no formato RDF. O processo inverso, RDF para Excel, também é suportado, de modo a oferecer uma solução para a integração e consumo de dados RDF a partir de planilha. / [en] Statistical data represent one of the most important sources of information both for humans and organizations alike. However, accessing, querying and correlating statistical data demand a great deal of effort, especially in situations that involve different organizations. Therefore, solutions to facilitate the manipulation and integration of large statistical databases add value to this scenario. In this dissertation we propose a framework that allows statistical data to be efficiently processed and represented as RDF triples. Based on the DataCube Vocabulary, W3C s triplification standard, the proposed solution makes it easy to query, analyze, and reuse statistical data in RDF format. The reverse process, RDF for Excel, is also supported, so as to offer a solution for the integration and use of RDF data in spreadsheets.
|
9 |
A Distributed Interactive Cube Exploration SystemJayachandran, Prasanth 06 August 2013 (has links)
No description available.
|
10 |
[en] CATALOGUE OF LINKED DATA CUBE DESCRIPTIONS / [pt] CATÁLOGO DE DESCRIÇÕES DE CUBOS DE DADOS INTERLIGADOSSOFIA RIBEIRO MANSO DE ABREU E SILVA 06 November 2014 (has links)
[pt] Dados estatísticos são considerados uma das principais fontes de informação e são essenciais em muitos campos, uma vez que podem funcionar como indicadores sociais e econômicos. Um conjunto de dados estatísticos compreende um conjunto de observações feitas em determinados pontos de um espaço lógico e é muitas vezes organizado como o que se chama de cubo de dados. A definição correta dos cubos de dados, especialmente das suas dimensões, ajuda a processar as observações e, mais importante, ajuda a combinar as observações de diferentes cubos de dados. Neste contexto, os princípios de Linked Data podem ser proveitosamente aplicados à definição de cubos de dados, no sentido de que os princípios oferecem uma estratégia para proporcionar a semântica ausentes das suas dimensões, incluindo os seus valores. Esta dissertação descreve inicialmente uma arquitetura de mediação para ajudar a descrever e consumir dados estatísticos, expostos como triplas RDF, mas armazenados em bancos de dados relacionais. Uma das características desta mediação é o Catálogo de Descrições de Cubos de Dados Interligados, que vai ser descrito em detalhes na dissertação. Este catálogo contém uma descrição padronizada em RDF para cada cubo de dados, que está realmente armazenado em cada banco de dados (relacional). Portanto, a principal discussão nesta dissertação é sobre a forma de representar em RDF cubos representando dados estatísticos e armazenados em bancos de dados relacionais, ou seja, como mapear os conceitos de banco de dados para RDF de uma forma em que seja fácil consultar, analisar e reutilizar dados estatísticos no formato RDF. / [en] Statistical Data are considered one of the major sources of information and are essential in many fields as they can work as social and economic indicators. A statistical data set comprises a colletion of observations made at some points of a logical space and is often organized as what is called a data cube. The proper definition of the data cubes, especially of theis dimensions, helps processing the observations and, more importantly, helps combining observations from different data cubes. In this contexto, the Linked Data principles can be profitably applied to the definition of data cubes, in the sense that the principles offer a strategy to provide the missing semantics of the dimensions, including their values.
This dissertion first describes a mediation architecture to help describing and consuming statistical data, exposed as RDFtriples, but stored in relational databases. One of the features of this architesture is the Catalogue of Linked Data Cube Descriptions, which is described in detail in the dissertation. This catalogue has a standardized description in RDF of each data cube actually stored in statistical (relational) databases. Therefore, the main discussion in this dissertation is how to represent the data cubes in RDF, i.e., how to map the database concepts to RDF in a way that makes it easy to query, analyze and reuse statistical data in the RDF format.
|
Page generated in 0.0687 seconds