Spelling suggestions: "subject:"[een] QUERY PROCESSING"" "subject:"[enn] QUERY PROCESSING""
71 |
Ερωτήματα συνένωσης και βαθμολογημένης συνένωσης σε κατανεμημένα συστήματαΠατλάκας, Ιωάννης 28 February 2013 (has links)
Η ανάπτυξη των peer-to-peer βάσεων δεδομένων και η δυναμική εισαγωγή των συστημάτων αποθήκευσης σε νέφη υπολογιστών (cloudstores) ως τα κυρίαρχα μεγάλης κλίμακας συστήματα διαχείρισης δεδομένων, έχουν οδηγήσει τους ερευνητές να εξετάσουν το πρόβλημα της
υποστήριξης πολύπλοκων ερωτημάτων με ένα πλήρως αποκεντρωμένο τρόπο. Περίπλοκα ερωτήματα επιλογής (select), συνένωσης join, καθώς και βαθμολογημένα ερωτήματα έχουν κεντρίσει το ενδιαφέρον της κοινότητας διαχείρισης δεδομένων.
Ανάμεσα στις τάξεις των ερωτημάτων αυτών είναι το κεντρικής σημασίας top-k join. To κατανεμημένο top-k join, δεν έχει μελετηθεί επαρκώς, αν και συναντάται πολύ συχνά σε πραγματικό φόρτο εργασίας σε πολλά εμπορικά και άλλα συστήματα βάσεων δεδομένων.
Με την εργασία αυτή αντιμετωπίζουμε τέτοιου είδους ερωτήματα πάνω σε δεδομένα
που είναι κατανεμημένα σε ένα μεγάλου κλίμακας δίκτυο.
Οι συνεισφορές μας με αυτήν την εργασία περιλαμβάνουν: (α) ένα νέο κατανεμημένο ευρετήριο,
που επιτρέπει την πρόσβαση σε πλειάδες με τυχαίο και
διατεταγμένο τρόπο, (β) ένα σύνολο αλγόριθμων για βαθμολογημένα ερωτημάτατα συνένωσης join. Οι αλγόριθμοί μας
στηρίζονται στην προσαρμογή γνωστών αλγοριθμών κατωφλίου για βαθμολογημένο join
σε κατανεμημένο περιβάλλον, (γ) μία νέα χρήση των Bloom φίλτρων και ιστογραμμάτων για την περαιτέρω μείωση του εύρους ζώνης
που καταναλώνουν οι παραπάνω αλγόριθμοι, καθώς και απόδειξη για το ότι οι
αλγόριθμοί μας που βασίζονται σε φίλτρα Bloom και ιστογράμματα παράγουν το σωστό top-k αποτέλεσμα, (δ) μια σε βάθος συζήτηση του
σχεδιασμού των αλγορίθμων μας και θεμάτων που συνδέονται με τις επιδόσεις και τα trade-offs.
Επιπλέον διερευνούμε την αποτελεσματικότητα και την ποιότητα των προτεινόμενων λύσεων
μέσα από μία αναλυτική πειραματική αξιολόγηση, δείχνοντας τις περιπτώσεις που ο κάθε αλγόριθμός μας είναι κατάλληλος σε
μαζικώς κατανεμημένα και αποκεντρωμένα περιβάλλοντα, ενώ τονίζουμε τα trade-offs που προκύπτουν. / The advent of peer-to-peer databases and the recent rise of cloudstores as key large-scale data management
paradigms, have led researchers to look into the problem of
supporting complex queries in a fully decentralized manner.
Among the classes of queries considered in related centralized
work, there is one that stands out as largely overlooked in widely
distributed settings, albeit very common in real-world workloads:
top-k joins. With this work we tackle such queries over data
distributed across an internet-scale network.
Our contributions include: (a) a novel distributed indexing
scheme, allowing access to tuples in both a random and an
ordered manner; (b) a set of query processing algorithms based
on a novel adaptation of rank-join and threshold algorithms,
appropriate for use in a distributed environment; (c) a novel use
of Bloom Filters and histograms to further reduce the bandwidth
consumption of the above algorithms; a proof that ensures that
our algorithms based on Bloom filters and histograms produce
the correct top-k results; and (d) an in-depth discussion of the
design space and related performance trade-offs. We further
investigate the efficiency and quality of the proposed solutions
through an elaborate experimental evaluation, showcasing their
appropriateness for widely-distributed and massively decentralized environments and highlighting related trade-offs.
|
72 |
Análise e desenvolvimento de um novo algoritmo de junção espacial para SGBD geográficos / Analysis and design of a new algorithm to perform spatial join in geographic DBMSFornari, Miguel Rodrigues January 2006 (has links)
Um Sistema de Informação Geográfica armazena e mantém dados geográficos, combinando-os, para obter novas representações do espaço geográfico. A junção espacial combina duas relações de geometrias geo-referenciadas de acordo com algum predicado espacial, como intersecção e distância entre objetos. Trata-se de uma operação essencial, pois é constantemente utilizada e possui um alto custo de realização devido a realização de grande número de operações de Entrada/Saída e a complexidade do algoritmo. Este trabalho estuda o desempenho de algoritmos de junção espacial. Inicialmente, apresenta a análise dos algoritmos já publicados na literatura, obtendo expressões de custo para número de operações de disco e processamento. Após, descreve-se a implementação de alguns algoritmos em um ambiente de testes. Este ambiente permite ao usuário variar diversos parâmetros de entrada: cardinalidade dos conjuntos, memória disponível e predicado de junção, envolvendo dados reais e sintéticos. O ambiente de testes inclui os algoritmos de Laços Aninhados, Partition Based Spatial Join Method (PBSM), Synchronized Tree Transversal (STT) para árvores R* e Iterative Spatial Stripped Join (ISSJ). Os testes demonstraram que o STT é adequado para conjuntos pequenos de dados; o ISSJ se houver memória suficiente para ordenar os conjuntos internamente; e o PBSM se houver pouca memória disponível para buffer de dados. A partir da análise um novo algoritmo, chamado Histogram-based Hash Stripped Join (HHSJ) é apresentado. O HSSJ utiliza histogramas da distribuição dos objetos no espaço para definir o particionamento, armazena os objetos em arquivos organizados em hash e subdivide o espaço em faixas (strips) para reduzir o processamento. Os testes indicam que o HHSJ é mais rápido na maioria dos cenários, sendo ainda mais vantajoso quanto maior o número de objetos envolvidos na junção. Um módulo de otimização de consultas baseado em custos, capaz de escolher o melhor algoritmo para realizar a etapa de filtragem é descrito. O módulo utiliza informações estatísticas mantidas no dicionário de dados para estimar o tempo de resposta de cada algoritmo, e indicar o mais rápido para realizar uma operação específica. Este otimizador de consultas acertou a indicação em 88,9% dos casos, errando apenas na junção de conjuntos pequenos, quando o impacto é menor. / A Geographic Information System (GIS) stores geographic data, combining them to obtain new representations of the geographic space. The spatial join operation combines two sets of spatial features, A and B, based on a spatial predicate. It is a fundamental as well as one of the most expensive operations in GIS. Combining pairs of spatial, georreferenced data objects of two different, and probably large data sets implies the execution of a significant number of Input/Output (I/O) operations as well as a large number of CPU operations. This work presents a study about the performance of spatial join algorithms. Firstly, an analysis of the algorithms is realized. As a result, mathematical expressions are identified to predict the number of I/O operations and the algorithm complexity. After this, some of the algorithms (e.g.; Nested Loops, Partition Based Spatial Join Method (PBSM), Synchronized Tree Transversal (STT) to R-Trees and Iterative Spatial Stripped Join (ISSJ)) are implemented, allowing the execution of a series of tests in different spatial join scenarios. The tests were performed using both synthetic and real data sets. Based on the results, a new algorithm, called Histogram-based Hash Stripped Join (HHSJ), is proposed. The partitioning of the space is carried out according to the spatial distribution of the objects, maintained in histograms. In addition, a hash file is created for each input data set and used to enhance both the storage of and the access to the minimum bounding rectangles (MBR) of the respective set elements. Furthermore, the space is divided in strips, to reduce the processing time. The results showed that the new algorithm is faster in almost all scenarios, specially when bigger data sets are processed. Finally, a query optimizer based on costs, capable to choose the best algorithm to perform the filter step of a spatial join operation, is presented. The query optimizer uses statistical information stored in the data dictionary to estimate the response time for each algorithm and chooses the faster to realize the operation. This query optimizer choose the right one on 88.9% of cases, mistaken just in spatial join envolving small data sets, when the impact is small.
|
73 |
Análise e desenvolvimento de um novo algoritmo de junção espacial para SGBD geográficos / Analysis and design of a new algorithm to perform spatial join in geographic DBMSFornari, Miguel Rodrigues January 2006 (has links)
Um Sistema de Informação Geográfica armazena e mantém dados geográficos, combinando-os, para obter novas representações do espaço geográfico. A junção espacial combina duas relações de geometrias geo-referenciadas de acordo com algum predicado espacial, como intersecção e distância entre objetos. Trata-se de uma operação essencial, pois é constantemente utilizada e possui um alto custo de realização devido a realização de grande número de operações de Entrada/Saída e a complexidade do algoritmo. Este trabalho estuda o desempenho de algoritmos de junção espacial. Inicialmente, apresenta a análise dos algoritmos já publicados na literatura, obtendo expressões de custo para número de operações de disco e processamento. Após, descreve-se a implementação de alguns algoritmos em um ambiente de testes. Este ambiente permite ao usuário variar diversos parâmetros de entrada: cardinalidade dos conjuntos, memória disponível e predicado de junção, envolvendo dados reais e sintéticos. O ambiente de testes inclui os algoritmos de Laços Aninhados, Partition Based Spatial Join Method (PBSM), Synchronized Tree Transversal (STT) para árvores R* e Iterative Spatial Stripped Join (ISSJ). Os testes demonstraram que o STT é adequado para conjuntos pequenos de dados; o ISSJ se houver memória suficiente para ordenar os conjuntos internamente; e o PBSM se houver pouca memória disponível para buffer de dados. A partir da análise um novo algoritmo, chamado Histogram-based Hash Stripped Join (HHSJ) é apresentado. O HSSJ utiliza histogramas da distribuição dos objetos no espaço para definir o particionamento, armazena os objetos em arquivos organizados em hash e subdivide o espaço em faixas (strips) para reduzir o processamento. Os testes indicam que o HHSJ é mais rápido na maioria dos cenários, sendo ainda mais vantajoso quanto maior o número de objetos envolvidos na junção. Um módulo de otimização de consultas baseado em custos, capaz de escolher o melhor algoritmo para realizar a etapa de filtragem é descrito. O módulo utiliza informações estatísticas mantidas no dicionário de dados para estimar o tempo de resposta de cada algoritmo, e indicar o mais rápido para realizar uma operação específica. Este otimizador de consultas acertou a indicação em 88,9% dos casos, errando apenas na junção de conjuntos pequenos, quando o impacto é menor. / A Geographic Information System (GIS) stores geographic data, combining them to obtain new representations of the geographic space. The spatial join operation combines two sets of spatial features, A and B, based on a spatial predicate. It is a fundamental as well as one of the most expensive operations in GIS. Combining pairs of spatial, georreferenced data objects of two different, and probably large data sets implies the execution of a significant number of Input/Output (I/O) operations as well as a large number of CPU operations. This work presents a study about the performance of spatial join algorithms. Firstly, an analysis of the algorithms is realized. As a result, mathematical expressions are identified to predict the number of I/O operations and the algorithm complexity. After this, some of the algorithms (e.g.; Nested Loops, Partition Based Spatial Join Method (PBSM), Synchronized Tree Transversal (STT) to R-Trees and Iterative Spatial Stripped Join (ISSJ)) are implemented, allowing the execution of a series of tests in different spatial join scenarios. The tests were performed using both synthetic and real data sets. Based on the results, a new algorithm, called Histogram-based Hash Stripped Join (HHSJ), is proposed. The partitioning of the space is carried out according to the spatial distribution of the objects, maintained in histograms. In addition, a hash file is created for each input data set and used to enhance both the storage of and the access to the minimum bounding rectangles (MBR) of the respective set elements. Furthermore, the space is divided in strips, to reduce the processing time. The results showed that the new algorithm is faster in almost all scenarios, specially when bigger data sets are processed. Finally, a query optimizer based on costs, capable to choose the best algorithm to perform the filter step of a spatial join operation, is presented. The query optimizer uses statistical information stored in the data dictionary to estimate the response time for each algorithm and chooses the faster to realize the operation. This query optimizer choose the right one on 88.9% of cases, mistaken just in spatial join envolving small data sets, when the impact is small.
|
74 |
Análise e desenvolvimento de um novo algoritmo de junção espacial para SGBD geográficos / Analysis and design of a new algorithm to perform spatial join in geographic DBMSFornari, Miguel Rodrigues January 2006 (has links)
Um Sistema de Informação Geográfica armazena e mantém dados geográficos, combinando-os, para obter novas representações do espaço geográfico. A junção espacial combina duas relações de geometrias geo-referenciadas de acordo com algum predicado espacial, como intersecção e distância entre objetos. Trata-se de uma operação essencial, pois é constantemente utilizada e possui um alto custo de realização devido a realização de grande número de operações de Entrada/Saída e a complexidade do algoritmo. Este trabalho estuda o desempenho de algoritmos de junção espacial. Inicialmente, apresenta a análise dos algoritmos já publicados na literatura, obtendo expressões de custo para número de operações de disco e processamento. Após, descreve-se a implementação de alguns algoritmos em um ambiente de testes. Este ambiente permite ao usuário variar diversos parâmetros de entrada: cardinalidade dos conjuntos, memória disponível e predicado de junção, envolvendo dados reais e sintéticos. O ambiente de testes inclui os algoritmos de Laços Aninhados, Partition Based Spatial Join Method (PBSM), Synchronized Tree Transversal (STT) para árvores R* e Iterative Spatial Stripped Join (ISSJ). Os testes demonstraram que o STT é adequado para conjuntos pequenos de dados; o ISSJ se houver memória suficiente para ordenar os conjuntos internamente; e o PBSM se houver pouca memória disponível para buffer de dados. A partir da análise um novo algoritmo, chamado Histogram-based Hash Stripped Join (HHSJ) é apresentado. O HSSJ utiliza histogramas da distribuição dos objetos no espaço para definir o particionamento, armazena os objetos em arquivos organizados em hash e subdivide o espaço em faixas (strips) para reduzir o processamento. Os testes indicam que o HHSJ é mais rápido na maioria dos cenários, sendo ainda mais vantajoso quanto maior o número de objetos envolvidos na junção. Um módulo de otimização de consultas baseado em custos, capaz de escolher o melhor algoritmo para realizar a etapa de filtragem é descrito. O módulo utiliza informações estatísticas mantidas no dicionário de dados para estimar o tempo de resposta de cada algoritmo, e indicar o mais rápido para realizar uma operação específica. Este otimizador de consultas acertou a indicação em 88,9% dos casos, errando apenas na junção de conjuntos pequenos, quando o impacto é menor. / A Geographic Information System (GIS) stores geographic data, combining them to obtain new representations of the geographic space. The spatial join operation combines two sets of spatial features, A and B, based on a spatial predicate. It is a fundamental as well as one of the most expensive operations in GIS. Combining pairs of spatial, georreferenced data objects of two different, and probably large data sets implies the execution of a significant number of Input/Output (I/O) operations as well as a large number of CPU operations. This work presents a study about the performance of spatial join algorithms. Firstly, an analysis of the algorithms is realized. As a result, mathematical expressions are identified to predict the number of I/O operations and the algorithm complexity. After this, some of the algorithms (e.g.; Nested Loops, Partition Based Spatial Join Method (PBSM), Synchronized Tree Transversal (STT) to R-Trees and Iterative Spatial Stripped Join (ISSJ)) are implemented, allowing the execution of a series of tests in different spatial join scenarios. The tests were performed using both synthetic and real data sets. Based on the results, a new algorithm, called Histogram-based Hash Stripped Join (HHSJ), is proposed. The partitioning of the space is carried out according to the spatial distribution of the objects, maintained in histograms. In addition, a hash file is created for each input data set and used to enhance both the storage of and the access to the minimum bounding rectangles (MBR) of the respective set elements. Furthermore, the space is divided in strips, to reduce the processing time. The results showed that the new algorithm is faster in almost all scenarios, specially when bigger data sets are processed. Finally, a query optimizer based on costs, capable to choose the best algorithm to perform the filter step of a spatial join operation, is presented. The query optimizer uses statistical information stored in the data dictionary to estimate the response time for each algorithm and chooses the faster to realize the operation. This query optimizer choose the right one on 88.9% of cases, mistaken just in spatial join envolving small data sets, when the impact is small.
|
75 |
Processamento de consultas baseado em ontologias para sistemas de biodiversidade / Ontology based query processing for biodiversity systemsVilar, Bruno Siqueira Campos Mendonça, 1982- 15 August 2018 (has links)
Orientador: Claudia Maria Bauzer Medeiros / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-15T00:34:29Z (GMT). No. of bitstreams: 1
Vilar_BrunoSiqueiraCamposMendonca_M.pdf: 1763680 bytes, checksum: 5a3ddb611bfab6ec3f16246598a66a5b (MD5)
Previous issue date: 2009 / Resumo: Sistemas de informação de biodiversidade lidam com um conjunto heterogêneo de informações providas por diferentes grupos de pesquisa. A diversificação pode ocorrer com relação 'as espécies estudadas, 'a estruturação das informações coletadas, ao local de estudo, metodologias de trabalho ou objetivos dos pesquisadores, dentre outros fatores. Esta heterogeneidade de dados, usuários e procedimentos dificulta o reuso e o compartilhamento de informações. Este trabalho contribui para diminuir tal obstáculo, melhorando o processo de consulta 'as informações em sistemas de biodiversidade. Para tanto, propõe um mecanismo de expansão de consultas que pré-processa uma consulta de usuário (cientista) agregando informações adicionais, provenientes de ontologias, para aproximar o resultado da intenção do usuário. Este mecanismo é baseado em serviços Web e foi implementado e testado usados dados e casos de uso reais. / Abstract: Biodiversity information systems need and manage heterogeneous information provided by different research groups. Heterogeneity occur with respect to the species studied, the structure of the information gathered, the region of study, the work methodologies, or the vocabularies and objectives of the researchers, among other factors. This heterogeneity of data, users and procedures hampers information sharing and reuse. This work contributes to reduce this obstacle, improving the query processing mechanisms in biodiversity systems. Its main interpretation is a query expansion mechanism that pre-processes a user (scientist) query aggregating additional information from ontologies, thereby approximating query results to what is intended by the user. This mechanism is based on Web services and was implemented and tested using real case studies. / Mestrado / Banco de Dados / Mestre em Ciência da Computação
|
76 |
[en] QEEF-G: ADAPTIVE PARALLEL EXECUTION OF ITERATIVE QUERIES / [pt] QEEF-G: EXECUÇÃO PARALELA ADAPTATIVA DE CONSULTAS ITERATIVASVINICIUS FONTES VIEIRA DA SILVA 25 April 2007 (has links)
[pt] O processamento de consulta paralelo tradicional utilize-
se de nós
computacionais para reduzir o tempo de processamento de
consultas. Com o
surgimento das grades computacionais, milhares de nós
podem ser utilizados,
desafiando as atuais técnicas de processamento de consulta
a oferecerem um
suporte massivo ao paralelismo em um ambiente onde as
condições variam todo a
instante. Em adição, as aplicações científicas executadas
neste ambiente oferecem
novas características de processamento de dados que devem
ser integradas em um
sistema desenvolvido para este ambiente. Neste trabalho
apresentamos o sistema
de processamento de consulta paralelo do CoDIMS-G, e seu
novo operador Orbit
que foi desenvolvido para suportar a avaliação de
consultas iterativas. Neste
modelo de execução as tuplas são constantemente avaliadas
por um fragmento
paralelo do plano de execução. O trabalho inclui o
desenvolvimento do sistema de
processamento de consulta e um novo algoritmo de
escalonamento que, considera
as variações de rede e o throughput de cada nó, permitindo
ao sistema se adaptar
constantemente as variações no ambiente. / [en] Traditional parallel query processing uses multiple
computing nodes to
reduce query response time. Within a Grid computing
context, the availability of
thousands of nodes challenge current parallel query
processing techniques to
support massive parallelism in a constantly varying
environment conditions. In
addition, scientific applications running on Grids offer
new data processing
characteristics that shall be integrated in such a
framework. In this work we
present the CoDIMS-G parallel query processing system with
a full-fledged new
query execution operator named Orbit. Orbit is designed
for evaluating massive
iterative based data processing. Tuples in Orbit iterate
over a parallelized
fragment of the query execution plan. This work includes
the development of the
query processing system and a new scheduling algorithm
that considers variation
on network and the throughput of each node. Such algorithm
permits the system
to adapt constantly to the changes in the environment.
|
77 |
Principles for Distributed Databases in Telecom Environment / Principer för distribuerade databaser inom Telecom MiljöAshraf, Imran, Khokhar, Amir Shahzed January 2010 (has links)
Centralized databases are becoming bottleneck for organizations that are physically distributed and access data remotely. Data management is easy in centralized databases. However, it carries high communication cost and most importantly high response time. The concept of distributing the data over various locations is very attractive for such organizations. In such cases the database is fragmented into fragments and distributed to the locations where it is needed. This kind of distribution provides local control of data and the data access is also very fast in such databases. However, concurrency control, query optimization and data allocations are the factors that affect the response time and must be investigated prior to implementing distributed databases. This thesis makes the use of mixed method approach to meet its objective. In quantitative section, we performed an experiment to compare the response time of two databases; centralized and fragmented/distributed. The experiment was performed at Ericsson. A literature review was also done to find out other important response time related issues like query optimization, concurrency control and data allocation. The literature review revealed that these factors can further improve the response time in distributed environment. Results of the experiment showed a substantial decrease in the response time due to the fragmentation and distribution. / Centraliserade databaser blir flaskhals för organisationer som är fysiskt distribuerade och tillgång till data på distans. Datahantering är lätt i centrala databaser. Men bär den höga kostnaden kommunikation och viktigast av hög svarstid. Konceptet att distribuera data över olika orter är mycket attraktiv för sådana organisationer. I sådana fall databasen är splittrade fragment och distribueras till de platser där det behövs. Denna typ av distribution ger lokal kontroll av uppgifter och dataåtkomst är också mycket snabb i dessa databaser. Men, samtidighet kontroll, frågeoptimering och data anslagen är de faktorer som påverkar svarstiden och måste utredas innan genomförandet distribuerade databaser. Denna avhandling gör användningen av blandade metod strategi för att nå sitt mål. I kvantitativa delen utförde vi ett experiment för att jämföra svarstid på två databaser, centraliserad och fragmenterad / distribueras. Försöket utfördes på Ericsson. En litteraturstudie har gjorts för att ta reda på andra viktiga svarstid liknande frågor som frågeoptimering, samtidighet kontroll och data tilldelning. Litteraturgenomgången visade att dessa faktorer ytterligare kan förbättra svarstiden i distribuerad miljö. Resultaten av försöket visade en betydande minskning av den svarstid på grund av splittring och distribution.
|
78 |
AHEAD: Adaptable Data Hardening for On-the-Fly Hardware Error Detection during Database Query ProcessingKolditz, Till, Habich, Dirk, Lehner, Wolfgang, Werner, Matthias, de Bruijn, S. T. J. 13 June 2022 (has links)
We have already known for a long time that hardware components are not perfect and soft errors in terms of single bit flips happen all the time. Up to now, these single bit flips are mainly addressed in hardware using general-purpose protection techniques. However, recent studies have shown that all future hardware components become less and less reliable in total and multi-bit flips are occurring regularly rather than exceptionally. Additionally, hardware aging effects will lead to error models that change during run-time. Scaling hardware-based protection techniques to cover changing multi-bit flips is possible, but this introduces large performance, chip area, and power overheads, which will become non-affordable in the future. To tackle that, an emerging research direction is employing protection techniques in higher software layers like compilers or applications. The available knowledge at these layers can be efficiently used to specialize and adapt protection techniques. Thus, we propose a novel adaptable and on-the-fly hardware error detection approach called AHEAD for database systems in this paper. AHEAD provides configurable error detection in an end-to-end fashion and reduces the overhead (storage and computation) compared to other techniques at this level. Our approach uses an arithmetic error coding technique which allows query processing to completely work on hardened data on the one hand. On the other hand, this enables on-the-fly detection during query processing of (i) errors that modify data stored in memory or transferred on an interconnect and (ii) errors induced during computations. Our exhaustive evaluation clearly shows the benefits of our AHEAD approach.
|
79 |
Efficient Approximate OLAP Querying Over Time SeriesPerera, Kasun S., Hahmann, Martin, Lehner, Wolfgang, Pedersen, Torben Bach, Thomsen, Christian 15 June 2023 (has links)
The ongoing trend for data gathering not only produces larger volumes of data, but also increases the variety of recorded data types. Out of these, especially time series, e.g. various sensor readings, have attracted attention in the domains of business intelligence and decision making. As OLAP queries play a major role in these domains, it is desirable to also execute them on time series data. While this is not a problem on the conceptual level, it can become a bottleneck with regards to query run-time. In general, processing OLAP queries gets more computationally intensive as the volume of data grows. This is a particular problem when querying time series data, which generally contains multiple measures recorded at fine time granularities. Usually, this issue is addressed either by scaling up hardware or by employing workload based query optimization techniques. However, these solutions are either costly or require continuous maintenance. In this paper we propose an approach for approximate OLAP querying of time series that offers constant latency and is maintenance-free. To achieve this, we identify similarities between aggregation cuboids and propose algorithms that eliminate the redundancy these similarities present. In doing so, we can achieve compression rates of up to 80% while maintaining low average errors in the query results.
|
80 |
CROSS-DB: a feature-extended multidimensional data model for statistical and scientific databasesLehner, Wolfgang, Ruf, Thomas, Teschke, Michael 13 September 2022 (has links)
Statistical and scientific computing applications exhibit characteristics that are fundamentally different from classical database system application domains. The CROSS-DB data model presented in this paper is optimized for use in such applications by providing advanced data modelling methods and application-oriented query facilities, thus providing a framework for optimized data management procedures. CROSS-DB (which stands for Classification-oriented, Redundancy-based Optimization of Statistical and Scientific DataBases) is based on a multidimensional data view. The model differs from other approaches by o~ering two complementary rnechanisrnsfor structuring qualifying information, classification and feature description. Using these mechanisms results in a normalized, low-dimensional database schema which ensures both, modelling uniqueness and understandability while providing enhanced modelling flexibility.
|
Page generated in 0.0649 seconds