Spelling suggestions: "subject:"found"" "subject:"sound""
611 |
Robust Query Optimization for Analytical Database SystemsHertzschuch, Axel 25 September 2023 (has links)
Querying and efficiently analyzing complex data is required to gain valuable business insights, to support machine learning applications, and to make up-to-date information available. Therefore, this thesis investigates opportunities and challenges of selecting the most efficient execution strategy for analytical queries. These challenges include hard-to-capture data characteristics such as skew and correlation, the support of arbitrary data types, and the optimization time overhead of complex queries. Existing approaches often rely on optimistic assumptions about the data distribution, which can result in significant response time delays when these assumptions are not met. On the contrary, we focus on robust query optimization, emphasizing consistent query performance and applicability. Our presentation follows the general select-project-join query pattern, representing the fundamental stages of analytical query processing. To support arbitrary data types and complex filter expressions in the select stage, a novel sampling-based selectivity estimator is developed. Our approach exploits information from filter subexpressions and estimates correlations that are not captured by existing sampling-based methods. We demonstrate improved estimation accuracy and query execution time. Further, to minimize the runtime overhead of sampling, we propose new techniques that exploit access patterns and auxiliary database objects such as indices. For the join stage, we introduce a robust optimization approach by developing an upper-bound join enumeration strategy that connects accurate filter selectivity estimates –e.g., using our sampling-based approach– to join ordering. We demonstrate that join orders based on our upper-bound join ordering strategy achieve more consistent performance and faster workload execution on state-of-the-art database systems. However, besides identifying good logical join orders, it is crucial to determine appropriate physical join operators before query plan execution. To understand the importance of fine-grained physical operator selections, we exhaustively execute fixed join orders with all possible operator combinations. This analysis reveals that none of the investigated query optimizers fully reaches the potential of optimal operator decisions. Based on these insights and to achieve fine-grained operator selections for the previously determined join orders, the thesis presents a lightweight learning-based physical execution plan refinement component called. We show that this refinement component consistently outperforms existing approaches for physical operator selection while enabling a novel two-stage optimizer design. We conclude the thesis by providing a framework for the two-stage optimizer design that allows users to modify, replicate, and further analyze the concepts discussed throughout this thesis.:1 INTRODUCTION
1.1 Analytical Query Processing . . . . . . . . . . . . . . . . . . . 12
1.2 Select-Project-Join Queries . . . . . . . . . . . . . . . . . . . 13
1.3 Basics of SPJ Query Optimization . . . . . . . . . . . . . . . . . 14
1.3.1 Plan Enumeration . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.2 Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.3 Cardinality Estimation . . . . . . . . . . . . . . . . . . . . . 15
1.4 Robust SPJ Query Optimization . . . . . . . . . . . . . . . . . . 16
1.4.1 Tail Latency Root Cause Analysis . . . . . . . . . . . . . . . . 17
1.4.2 Tenets of Robust Query Optimization . . . . . . . . . . . . . . 19
1.5 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 SELECT (-PROJECT) STAGE
2.1 Sampling for Selectivity Estimation . . . . . . . . . . . . . . . 24
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 Combined Selectivity Estimation (CSE) . . . . . . . . . . . . . 29
2.2.2 Kernel Density Estimator . . . . . . . . . . . . . . . . . . . . 31
2.2.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Beta Estimator for 0-Tuple-Situations . . . . . . . . . . . . . . 33
2.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Beta Distribution in Non-0-TS . . . . . . . . . . . . . . . . . 35
2.3.3 Parameter Estimation in 0-TS . . . . . . . . . . . . . . . . . . 37
2.3.4 Selectivity Estimation and Predicate Ordering . . . . . . . . . 39
2.3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 Customized Sampling Techniques . . . . . . . . . . . . . . . . . . 53
2.4.1 Focused Sampling . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4.2 Conditional Sampling . . . . . . . . . . . . . . . . . . . . . . 56
2.4.3 Zone Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 JOIN STAGE: LOGICAL ENUMERATION
3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.1.1 Point Estimates . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.2 Join Cardinality Upper Bound . . . . . . . . . . . . . . . . . . 64
3.2 Upper Bound Join Enumeration with Synopsis (UES) . . . . . . . . . 66
3.2.1 U-Block: Simple Upper Bound for Joins . . . . . . . . . . . . . 67
3.2.2 E-Block: Customized Enumeration Scheme . . . . . . . . . . . . . 68
3.2.3 UES Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.1 General Performance . . . . . . . . . . . . . . . . . . . . . . 72
3.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 JOIN STAGE: PHYSICAL OPERATOR SELECTION
4.1 Operator Selection vs Join Ordering . . . . . . . . . . . . . . . 77
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.1 Adaptive Query Processing . . . . . . . . . . . . . . . . . . . 80
4.2.2 Bandit Optimizer (Bao) . . . . . . . . . . . . . . . . . . . . . 81
4.3 TONIC: Learned Physical Join Operator Selection . . . . . . . . . 82
4.3.1 Query Execution Plan Synopsis (QEP-S) . . . . . . . . . . . . . 83
4.3.2 QEP-S Life-Cycle . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3.3 QEP-S Design Considerations . . . . . . . . . . . . . . . . . . 87
4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4.1 Performance Factors . . . . . . . . . . . . . . . . . . . . . . 90
4.4.2 Rate of Improvement . . . . . . . . . . . . . . . . . . . . . . 92
4.4.3 Data Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.4.4 TONIC - Runtime Traits . . . . . . . . . . . . . . . . . . . . . 97
4.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5 TWO-STAGE OPTIMIZER FRAMEWORK
5.1 Upper-Bound-Driven Join Ordering Component . . . . . . . . . . . . 101
5.2 Physical Operator Selection Component . . . . . . . . . . . . . . 103
5.3 Example Query Optimization . . . . . . . . . . . . . . . . . . . . 103
6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
A APPENDIX
A.1 Basics of Query Execution . . . . . . . . . . . . . . . . . . . . 123
A.2 Why Q? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
A.3 0-TS Proof of Unbiased Estimate . . . . . . . . . . . . . . . . . 125
A.4 UES Upper Bound Property . . . . . . . . . . . . . . . . . . . . . 127
A.5 TONIC – Selectivity-Aware Branching . . . . . . . . . . . . . . . 128
A.6 TONIC – Sequences of Query Execution . . . . . . . . . . . . . . . 129
|
612 |
Analysis of Flow Prolongation Using Graph Neural Network in FIFO Multiplexing System / Analys av Flödesförlängning Med Hjälp av Graph Neural Network i FIFO-Multiplexering SystemWang, Weiran January 2023 (has links)
Network Calculus views a network system as a queuing framework and provides a series of mathematical functions for finding an upper bound of an end-to-end delay. It is crucial for the design of networks and applications with a hard delay guarantee, such as the emerging Time Sensitive Network. Even though several approaches in Network Calculus can be used directly to find bounds on the worst-case delay, these bounds are usually not tight, and making them tight is a hard problem due to the extremely intensive computing requirements. This problem has also been proven as NP-Hard. One newly introduced solution to tighten the delay bound is the so-called Flow Prolongation. It extends the paths of cross flows to new sink servers, which naturally increases the worst-case delay, but might at the same time decrease the delay bound. The most straightforward and the most rigorous solution to find the optimal Flow Prolongation combinations is by doing exhaustive searches. However, this approach is not scalable with the network size. Thus, a machine learning model, Graph Neural Network (GNN), has been introduced for the prediction of the optimal Flow Prolongation combinations, mitigating the scalability issue. However, early research also found out that machine learning models consistently misclassify adversarial examples. In this thesis, Fast Gradient Sign Method (FGSM) is used to benchmark how adversarial attacks will influence the delay bound achieved by the Flow Prolongation method. It is performed by slightly modifying the input network features based on their gradients. To achieve this, we first learned the usage of NetCal DNC, an Free and Open Source Software, to calculate the Pay Multiplexing Only Once (PMOO), one of the Network Calculus methods for the delay bound calculation. Then we reproduced the GNN model based on PMOO, and achieved an accuracy of 65%. Finally, the FGSM is implemented on a newly created dataset with a large number of servers and flows inside. Our results demonstrate that with at most 14% changes on the network features input, the accuracy of GNN drastically decreases to an average 9.45%, and some prominent examples are found whose delay bounds are largely loosened by the GNN Flow Prolongation prediction after the FGSM attack. / Nätverkskalkylen behandlar ett nätverkssystem som ett system av köer och tillhandahåller ett antal matematiska funktioner som används för att hitta en övre gräns för end-to-end förseningar. Det är mycket viktigt för designen av nätverk och applikationer med strikta begränsningar för förseningar, så som det framväxande Time Sensitive Network. Även om ett flertal tillvägagångssätt i nätverkskalkylen kan användas direkt för att finna gränsen för förseningar i det värsta fallet så är dessa vanligtvis inte snäva. Att göra gränserna snäva är svårt då det är ett NP-svårt problem som kräver extremt mycket beräkningar. En lösning för att strama åt förseningsgränserna som nyligen introducerats kallas Flow Prolongation. Den utökar vägarna av korsflöden till nya sink servrar, vilket naturligt ökar förseningen i värsta fallet, men kan eventuellt också sänka förseningsgränsen. Den enklaste och mest rigorösa lösningen för att hitta de optimala Flow Prolongation kombinationerna är att göra uttömmande sökningar. Detta tillvägagångssätt är dock inte skalbart för stora nätverk. Därför har en maskininlärningsmodell, ett Graph Neural Network (GNN), introducerats för att förutspå de optimala Flow Prolongation kombinationerna och samtidigt mildra problemen med skalbarhet. Dock så visar de tidiga fynden att maskininlärningsmodeller ofta felaktigt klassificerar motstridiga exempel. I detta projekt används Fast Gradient Sign Method (FGSM) för att undersöka hur motståndarattacker kan påverka förseningsgränsen som hittas med hjälp av Flow Prolongation metoden. Detta görs genom att modifiera indata-nätverksfunktionerna en aning baserat på dess gradienter. För att uppnå detta lärde vi oss först att använda NetCal DNC, en mjukvara som är gratis och Open Source, för att kunna beräkna Pay Multiplexinng Only Once (PMOO), en metod inom nätverkskalkylen för att beräkna förseningsgränser. Sedan reproducerade GNN modellen baserat på PMOO, och uppnådde en träffsäkerhet på 65%. Slutligen implementerades FGSM på ett nytt dataset med ett stort antal servrar och flöden. Våra resultat visar att förändringar på upp till 14% på indata-nätverksfunktionerna resulterar i att träffsäkerheten hos GNN minskar drastiskt till ett genomsnitt på 9.45%. Vissa exempel identifierades där förseningsgränsen utvidgas kraftfullt i GNN Flow Prolongation förutsägelsen efter FGSM attacken.
|
613 |
Three Essays in Parallel Machine SchedulingGarg, Amit January 2008 (has links)
No description available.
|
614 |
Redefining the Effectiveness of Upward Bound: An Analysis of its Measuring Standards and a Proposition for the FutureMusick, Chloe Jae 22 June 2017 (has links)
No description available.
|
615 |
Phylogenetic Inference Using a Discrete-Integer Linear Programming ModelSands, William Alvah January 2017 (has links)
No description available.
|
616 |
Cultural and Linguistic Issues of Sitcom Dubbing: An Analysis of "Friends"Vierrether, Tanja 16 August 2017 (has links)
No description available.
|
617 |
Understanding Outward Bound Instructors’ Inclusive Praxis: Practices and Influential FactorsWarner, Robert P. 13 July 2018 (has links)
No description available.
|
618 |
CLASSIFICATION OF BOUND WATER AND COLLAGEN DENATURATION STATUS OF CORTICAL BONE BY RAMAN SPECTROSCOPYUNAL, MUSTAFA 08 February 2017 (has links)
No description available.
|
619 |
Capacity allocation and rescheduling in supply chainsLiu, Zhixin 20 September 2007 (has links)
No description available.
|
620 |
Key agreement against quantum adversariesKalach, Kassem 08 1900 (has links)
Key agreement is a cryptographic scenario between two legitimate parties, who need to establish a common secret key over a public authenticated channel, and an eavesdropper who intercepts all their messages in order to learn the secret.
We consider query complexity in which we count only the number of evaluations (queries) of a given black-box function, and classical communication channels.
Ralph Merkle provided the first unclassified scheme for secure communications over insecure channels.
When legitimate parties are willing to ask O(N) queries for some parameter N, any classical eavesdropper needs Omega(N^2) queries before being able to learn their secret, which is is optimal.
However, a quantum eavesdropper can break this scheme in O(N) queries.
Furthermore, it was conjectured that any scheme, in which legitimate parties are classical, could be broken in O(N) quantum queries.
In this thesis, we introduce protocols à la Merkle that fall into two categories.
When legitimate parties are restricted to use classical computers, we offer the first secure classical scheme. It requires Omega(N^{13/12}) queries of a quantum eavesdropper to learn the secret.
We give another protocol having security of Omega(N^{7/6}) queries.
Furthermore, for any k>= 2, we introduce a classical protocol in which legitimate parties establish a secret in O(N)
queries while the optimal quantum eavesdropping strategy requires Theta(N^{1/2+k/{k+1}}) queries, approaching Theta(N^{3/2}) when k increases.
When legitimate parties are provided with quantum computers, we present two quantum protocols improving on the best known scheme before this work.
Furthermore, for any k>= 2, we give a quantum protocol in which legitimate parties establish a secret in O(N) queries while the optimal quantum eavesdropping strategy requires Theta(N^{1+{k}/{k+1}})} queries, approaching Theta(N^{2}) when k increases. / Un protocole d'échange de clés est un scénario cryptographique entre deux partis légitimes ayant besoin de se mettre d'accord sur une clé commune secrète via un canal public authentifié où tous les messages sont interceptés par un espion voulant connaître leur secret.
Nous considérons un canal classique et mesurons la complexité de calcul en termes du nombre d'évaluations (requêtes) d'une fonction donnée par une boîte noire.
Ralph Merkle fut le premier à proposer un schéma non classifié permettant de réaliser des échanges securisés avec des canaux non securisés.
Lorsque les partis légitimes sont capables de faire O(N) requêtes pour un certain paramètre N, tout espion classique doit faire Omega(N^2) requêtes avant de pouvoir apprendre leur secret, ce qui est optimal.
Cependant, un espion quantique peut briser ce schéma avec O(N) requêtes.
D'ailleurs, il a été conjecturé que tout protocole, dont les partis légitimes sont classiques, pourrait être brisé avec O(N) requêtes quantiques.
Dans cette thèse, nous introduisons deux catégories des protocoles à la Merkle.
Lorsque les partis légitimes sont restreints à l'utilisation des ordinateurs classiques, nous offrons le premier schéma classique sûr. Il oblige tout adversaire quantique à faire Omega(N^{13/12}) requêtes avant d'apprendre le secret. Nous offrons aussi un protocole ayant une sécurité de Omega(N^{7/6}) requêtes. En outre, pour tout k >= 2, nous donnons un protocole classique pour lequel les partis légitimes établissent un secret avec O(N)
requêtes alors que la stratégie optimale d'espionnage quantique nécessite Theta(N^{1/2 + k/{k +1}}) requêtes, se rapprochant de Theta(N^{3/2}) lorsque k croît.
Lors les partis légitimes sont équipés d'ordinateurs quantiques, nous présentons deux protocoles supérieurs au meilleur schéma connu avant ce travail.
En outre, pour tout k >= 2, nous offrons un protocole quantique pour lequel les partis légitimes établissent un secret avec O(N) requêtes alors que
l'espionnage quantique optimale
nécessite Theta(N^{1+{k}/{k+1}}) requêtes, se rapprochant de Theta(N^{2}) lorsque k croît.
|
Page generated in 0.0476 seconds