Global ETD Search

1	Multiple Continuous Query Processing with Relative Window Predicates "Juggler" Silva, Asima 27 May 2004 (has links) "Efficient querying over streaming data is a critical technology which requires the ability to handle numerous and possibly similar queries in real time dynamic environments such as the stock market and medical devices. Existing DBMS technology is not well suited for this domain since it was developed for static historical data. Queries over streams often contain relative window predicates such as in the query: ``Heart rate decreased to fifty-two beats per second within four seconds after the patient's temperature started rising." Relative window predicates are a specific type of join between streams that is based on the tuple's timestamp. In our operator, called Juggler, predicates are classified into three types: attribute, join, and window. Attribute predicates are stream values compared to a constant. Join predicates are stream values compared to another stream's values. Window predicates are join predicates where the streams' timestamp values are compared. Juggler's composite operator incorporates the processing of similar though not identical, query functionalities as one complex computation process. This execution strategy handles multi-way joins for multiple selection and join predicates. It adaptively orders the execution of predicates by their selectivity to efficiently process multiple continuous queries based on stream characteristics. In Juggler, all similar predicates are grouped into lists. These indices are represented by a collection of bits. Every tuple contains the bit structure representation of the predicate lists which encodes tuple predicate evaluation history. Every query also contains a similar bit structure to encode the predicate's relationship to the registered queries. The tuple's and query's bit structures are compared to assess if the tuple has satisfied a query. Juggler is designed and implemented in Java. Experiments were conducted to verify correctness and to assess the performance of Juggler's three features. Its adaptivity of reordering the evaluation of predicate types performed as well as the most selective predicate ordering. Its ability to exploit similar predicates in multiple queries showed reduction in number of comparisons. Its effectiveness when multiple queries are combined in a single Juggler operator indicated potential performance improvements after optimization of Juggler's data structures." reordering predicates multi-join operator sliding windows window predicates join algorithm continuous queries Query languages (Computer science)
2	Parallélisme et équilibrage de charges dans le traitement de la jointure sur des architectures distribuées / Parallelism and load balancing in the treatment of the join on distributed architectures Al Hajj Hassan, Mohamad 16 December 2009 (has links) L’émergence des applications de bases de données dans les domaines tels que le data warehousing,le data mining et l’aide à la décision qui font généralement appel à de très grands volumes de donnéesrend la parallélisation des algorithmes des jointures nécessaire pour avoir un temps de réponse acceptable.Une accélération linéaire est l’objectif principal des algorithmes parallèles, cependant dans les applicationsréelles, elle est difficilement atteignable : ceci est dû généralement d’une part aux coûts de communicationsinhérents aux systèmes multi-processeurs et d’autre part au déséquilibre des charges des différents processeurs.En plus, dans un environnement hétérogène multi-utilisateur, la charge des différents processeurspeut varier de manière dynamique et imprévisible.Dans le cadre de cette thèse, nous nous intéressons au traitement de la jointure et de la multi-jointure surles architectures distribuées hétérogènes, les grilles de calcul et les systèmes de fichiers distribués. Nousavons proposé une variété d’algorithmes, basés sur l’utilisation des histogrammes distribués, pour traiterde manière efficace le déséquilibre des données, tout en garantissant un équilibrage presque parfait dela charge des différents processeurs même dans un environnement hétérogène et multi-utilisateur. Cesalgorithmes sont basés sur une approche dynamique de redistribution des données permettant de réduire lescoûts de communication à un minimum tout en traitant de manière très efficace le problème de déséquilibredes valeurs de l’attribut de jointure.L’analyse de complexité de nos algorithmes et les résultats expérimentaux obtenus montrent que cesalgorithmes possèdent une accélération presque linéaire. / The appeal of parallel processing becomes very strong in applications which require ever higher performanceand particularly in applications such as : data-warehousing, decision support, On-Line Analytical Processing(OLAP) and more generally DBMS. A linear speed-up is the main objective of parallel algorithms. However,in real applications, it’s not obvious to reach this objective due to the high communication cost in parallel anddistributed systems and to the possible skew in the charge of different processors. In addition, on heterogeneousmulti-user architectures, the load of each processor may highly vary in a dynamic and unpredictableway.In this thesis, we are interested in treating the join and multi-join queries on distributed multi-user heteregeneoussystems, grid systems and distributed file systems. We have proposed several algorithms based onusing distributed histograms. These algorithms are based on a dynamic data distribution and task allocationwhich makes them insensitive to data skew and ensure perfect balancing properties during all stages of joincomputation even on heteregeneous multi-user environment. The complexity analysis of our algorithms andthe experimental results show that they have a near-linear speedup. Jointures parallèles Multi-jointure Déséquilibre des données Equilibrage dynamique de charges Parallel joins Multi-join Data skew Dynamic load balancing
3	Scalable Integration View Computation and Maintenance with Parallel, Adaptive and Grouping Techniques Liu, Bin 19 August 2005 (has links) " Materialized integration views constructed by integrating data from multiple distributed data sources help to achieve better access, reliable performance, and high availability for a wide range of applications. In this dissertation, we propose parallel, adaptive, and grouping techniques to address scalability challenges in high-performance integration view computation and maintenance due to increasingly large data sources and high rates of source updates. State-of-the-art parallel integration view computation makes the common assumption that the maximal pipelined parallelism leads to superior performance. We instead propose segmented bushy parallel processing that combines pipelined parallelism with alternate forms of parallelism to achieve an overall more effective strategy. Experimental studies conducted over a cluster of high-performance PCs confirm that the proposed strategy has an on average of 50\% improvement in terms of total processing time in comparison to existing solutions. Run-time adaptation becomes critical for parallel integration view computation due to its long running and memory intensive nature. We investigate two types of state level adaptations, namely, state spill and state relocation, to address the run-time memory shortage. We propose lazy-disk and active-disk approaches that integrate both adaptations to maximize run-time query throughput in a memory constrained environment. We also propose global throughput-oriented state adaptation strategies for computation plans with multiple state intensive operators. Extensive experiments confirm the effectiveness of our proposed adaptation solutions. Once results have been computed and materialized, it's typically more efficient to maintain them incrementally instead of full recomputation. However, state-of-the-art incremental view maintenance require O($n^2$) maintenance queries with n being the number of data sources that the view is defined upon. Moreover, they do not exploit view definitions and data source processing capabilities to further improve view maintenance performance. We propose novel grouping maintenance algorithms that dramatically reduce the number of maintenance queries to (O(n)). A cost-based view maintenance framework has been proposed to generate optimized maintenance plans tuned to particular environmental settings. Extensive experimental studies verify the effectiveness of our maintenance algorithms as well as the maintenance framework. " parallel multi-join computation state level adaptation materialized view maintenance grouping maintenance cyclic join views distributed data sources Virtual storage (Computer science)

1

Page generated in 0.0336 seconds