Global ETD Search

301	'n Ondersoek na en bydraes tot navraaghantering en -optimering deur databasisbestuurstelsels / L. Muller Muller, Leslie January 2006 (has links) The problems associated with the effective design and uses of databases are increasing. The information contained in a database is becoming more complex and the size of the data is causing space problems. Technology must continually develop to accommodate this growing need. An inquiry was conducted in order to find effective guidelines that could support queries in general in terms of performance and productivity. Two database management systems were researched to compare die theoretical aspects with the techniques implemented in practice. Microsoft SQL Sewer and MySQL were chosen as the candidates and both were put under close scrutiny. The systems were researched to uncover the methods employed by each to manage queries. The query optimizer forms the basis for each of these systems and manages the parsing and execution of any query. The methods employed by each system for storing data were researched. The way that each system manages table joins, uses indices and chooses optimal execution plans were researched. Adjusted algorithms were introduced for various index processes like B+ trees and hash indexes. Guidelines were compiled that are independent of the database management systems and help to optimize relational databases. Practical implementations of queries were used to acquire and analyse the execution plan for both MySQL and SQL Sewer. This plan along with a few other variables such as execution time is discussed for each system. A model is used for both database management systems in this experiment. / Thesis (M.Sc. (Computer Science))--North-West University, Potchefstroom Campus, 2007. MySQL Microsoft SQL Serve Optimization Queries Index Relational database Query optimizer Execution plan
302	Optimal Path Queries in Very Large Spatial Databases Zhang, Jie January 2005 (has links) Researchers have been investigating the optimal route query problem for a long time. Optimal route queries are categorized as either unconstrained or constrained queries. Many main memory based algorithms have been developed to deal with the optimal route query problem. Among these, Dijkstra's shortest path algorithm is one of the most popular algorithms for the unconstrained route query problem. The constrained route query problem is more complicated than the unconstrained one, and some constrained route query problems such as the Traveling Salesman Problem and Hamiltonian Path Problem are NP-hard. There are many algorithms dealing with the constrained route query problem, but most of them only solve a specific case. In addition, all of them require that the entire graph resides in the main memory. Recently, due to the need of applications in very large graphs, such as the digital maps managed by Geographic Information Systems (GIS), several disk-based algorithms have been derived by using divide-and-conquer techniques to solve the shortest path problem in a very large graph. However, until now little research has been conducted on the disk-based constrained problem. <br /><br /> This thesis presents two algorithms: 1) a new disk-based shortest path algorithm (DiskSPNN), and 2) a new disk-based optimal path algorithm (DiskOP) that answers an optimal route query without passing a set of forbidden edges in a very large graph. Both algorithms fit within the same divide-and-conquer framework as the existing disk-based shortest path algorithms proposed by Ning Zhang and Heechul Lim. Several techniques, including query super graph, successor fragment and open boundary node pruning are proposed to improve the performance of the previous disk-based shortest path algorithms. Furthermore, these techniques are applied to the DiskOP algorithm with minor changes. The proposed DiskOP algorithm depends on the concept of collecting a set of boundary vertices and simultaneously relaxing their adjacent super edges. Even if the forbidden edges are distributed in all the fragments of a graph, the DiskOP algorithm requires little memory. Our experimental results indicate that the DiskSPNN algorithm performs better than the original ones with respect to the I/O cost as well as the running time, and the DiskOP algorithm successfully solves a specific constrained route query problem in a very large graph. Computer Science Optimal route query very large spatial databases forbidden edge constraint
303	Exploiting parallelism in decomposition methods for constraint satisfaction Akatov, Dmitri January 2010 (has links) Constraint Satisfaction Problems (CSPs) are NP-complete in general, however, there are many tractable subclasses that rely on the restriction of the structure of their underlying hypergraphs. It is a well-known fact, for instance, that CSPs whose underlying hypergraph is acyclic are tractable. Trying to deﬁne “nearly acyclic” hypergraphs led to the deﬁnition of various hypergraph decomposition methods. An important member in this class is the hypertree decomposition method, introduced by Gottlob et al. It possesses the property that CSPs falling into this class can be solved eﬃciently, and that hypergraphs in this class can be recognized efﬁciently as well. Apart from polynomial tractability, complexity analysis has shown, that both afore-mentioned problems lie in the low complexity class LOGCFL and are thus moreover eﬃciently parallelizable. A parallel algorithm has been proposed for the “evaluation problem”, however all algorithms for the “recognition problem” presented to date are sequential. The main contribution of this dissertation is the creation of an object oriented programming library including a task scheduler which allows the parallelization of a whole range of computational problems, fulﬁlling certain complexity-theoretic restrictions. This library merely requires the programmer to provide the implementation of several classes and methods, representing a general alternating algorithm, while the mechanics of the task scheduler remain hidden. In particular, we use this library to create an eﬃcient parallel algorithm, which computes hypertree decompositions of a ﬁxed width. Another result of a more theoretical nature is the deﬁnition of a new type of decomposition method, called Balanced Decompositions. Solving CSPs of bounded balanced width and recognizing such hypergraphs is only quasi-polynomial, however still parallelizable to a certain extent. A complexity-theoretic analysis leads to the deﬁnition of a new complexity class hierarchy, called the DC-hierarchy, with the ﬁrst class in this hierarchy, DC1 , precisely capturing the complexity of solving CSPs of bounded balanced width. 005.3
304	Heterogeneity-Aware Placement Strategies for Query Optimization Karnagel, Tomas 31 May 2017 (has links) (PDF) Computing hardware is changing from systems with homogeneous CPUs to systems with heterogeneous computing units like GPUs, Many Integrated Cores, or FPGAs. This trend is caused by scaling problems of homogeneous systems, where heat dissipation and energy consumption is limiting further growths in compute-performance. Heterogeneous systems provide differently optimized computing hardware, which allows different operations to be computed on the most appropriate computing unit, resulting in faster execution and less energy consumption. For database systems, this is a new opportunity to accelerate query processing, allowing faster and more interactive querying of large amounts of data. However, the current hardware trend is also a challenge as most database systems do not support heterogeneous computing resources and it is not clear how to support these systems best. In the past, mainly single operators were ported to different computing units showing great results, while missing a system wide application. To efficiently support heterogeneous systems, a systems approach for query processing and query optimization is needed. In this thesis, we tackle the optimization challenge in detail. As a starting point, we evaluate three different approaches on isolated use-cases to assess their advantages and limitations. First, we evaluate a fork-join approach of intra-operator parallelism, where the same operator is executed on multiple computing units at the same time, each execution with different data partitions. Second, we evaluate using one computing unit statically to accelerate one operator, which provides high code-optimization potential, due to this static and pre-known usage of hardware and software. Third, we evaluate dynamically placing operators onto computing units, depending on the operator, the available computing hardware, and the given data sizes. We argue that the first and second approach suffer from multiple overheads or high implementation costs. The third approach, dynamic placement, shows good performance, while being highly extensible to different computing units and different operator implementations. To automate this dynamic approach, we first propose general placement optimization for query processing. This general approach includes runtime estimation of operators on different computing units as well as two approaches for defining the actual operator placement according to the estimated runtimes. The two placement approaches are local optimization, which decides the placement locally at run-time, and global optimization, where the placement is decided at compile-time, while allowing a global view for enhanced data sharing. The main limitation of the latter is the high dependency on cardinality estimation of intermediate results, as estimation errors for the cardinalities propagate to the operator runtime estimation and placement optimization. Therefore, we propose adaptive placement optimization, allowing the placement optimization to become fully independent of cardinalities estimation, effectively eliminating the main source of inaccuracy for runtime estimation and placement optimization. Finally, we define an adaptive placement sequence, incorporating all our proposed techniques of placement optimization. We implement this sequence as a virtualization layer between the database system and the heterogeneous hardware. Our implementation approach bases on preexisting interfaces to the database system and the hardware, allowing non-intrusive integration into existing database systems. We evaluate our techniques using two different database systems and two different OLAP benchmarks, accelerating the query processing through heterogeneous execution. Datenbank System Heterogene Hardware Anfrage Optimierung Database System Heterogeneous Hardware Query Optimization ddc:004 rvk:ST 270
305	Geometric Computing over Uncertain Data Zhang, Wuzhou January 2015 (has links) <p>Entering the era of big data, human beings are faced with an unprecedented amount of geometric data today. Many computational challenges arise in processing the new deluge of geometric data. A critical one is data uncertainty: the data is inherently noisy and inaccuracy, and often lacks of completeness. The past few decades have witnessed the influence of geometric algorithms in various fields including GIS, spatial databases, and computer vision, etc. Yet most of the existing geometric algorithms are built on the assumption of the data being precise and are incapable of properly handling data in the presence of uncertainty. This thesis explores a few algorithmic challenges in what we call geometric computing over uncertain data.</p><p>We study the nearest-neighbor searching problem, which returns the nearest neighbor of a query point in a set of points, in a probabilistic framework. This thesis investigates two different nearest-neighbor formulations: expected nearest neighbor (ENN), where we consider the expected distance between each input point and a query point, and probabilistic nearest neighbor (PNN), where we estimate the probability of each input point being the nearest neighbor of a query point.</p><p>For the ENN problem, we consider a probabilistic framework in which the location of each input point and/or query point is specified as a probability density function and the goal is to return the point that minimizes the expected distance. We present methods for computing an exact ENN or an \\eps-approximate ENN, for a given error parameter 0 < \\eps < 1, under different distance functions. These methods build an index of near-linear size and answer ENN queries in polylogarithmic or sublinear time, depending on the underlying function. As far as we know, these are the first nontrivial methods for answering exact or \\eps-approximate ENN queries with provable performance guarantees. Moreover, we extend our results to answer exact or \\eps-approximate k-ENN queries. Notably, when only the query points are uncertain, we obtain state-of-the-art results for top-k aggregate (group) nearest-neighbor queries in the L1 metric using the weighted SUM operator.</p><p>For the PNN problem, we consider a probabilistic framework in which the location of each input point is specified as a probability distribution function. We present efficient algorithms for (i) computing all points that are nearest neighbors of a query point with nonzero probability; (ii) estimating, within a specified additive error, the probability of a point being the nearest neighbor of a query point; (iii) using it to return the point that maximizes the probability being the nearest neighbor, or all the points with probabilities greater than some threshold to be the nearest neighbor. We also present some experimental results to demonstrate the effectiveness of our approach.</p><p>We study the convex-hull problem, which asks for the smallest convex set that contains a given point set, in a probabilistic setting. In our framework, the uncertainty of each input point is described by a probability distribution over a finite number of possible locations including a null location to account for non-existence of the point. Our results include both exact and approximation algorithms for computing the probability of a query point lying inside the convex hull of the input, time-space tradeoffs for the membership queries, a connection between Tukey depth and membership queries, as well as a new notion of \\beta-hull that may be a useful representation of uncertain hulls.</p><p>We study contour trees of terrains, which encode the topological changes of the level set of the height value \\ell as we raise \\ell from -\\infty to +\\infty on the terrains, in a probabilistic setting. We consider a terrain that is defined by linearly interpolating each triangle of a triangulation. In our framework, the uncertainty lies in the height of each vertex in the triangulation, and we assume that it is described by a probability distribution. We first show that the probability of a vertex being a critical point, and the expected number of nodes (resp. edges) of the contour tree, can be computed exactly efficiently. Then we present efficient sampling-based methods for estimating, with high probability, (i) the probability that two points lie on an edge of the contour tree, within additive error; (ii) the expected distance of two points p, q and the probability that the distance of p, q is at least \\ell on the contour tree, within additive error and/or relative error, where the distance of p, q on a contour tree is defined to be the difference between the maximum height and the minimum height on the unique path from p to q on the contour tree.</p> / Dissertation Computer science aggregate nearest neighbor contour tree convex hull geometric computing nearest-neighbor query uncertain data
306	Proposition d'un cadre générique d'optimisation de requêtes dans les environnements hétérogènes et répartis / On a generic framework for query optimization in heterogeneous and distributed environments Liu, Tianxao 06 June 2011 (has links) Dans cette thèse, nous proposons un cadre générique d'optimisation de requêtes dans les environnements hétérogènes répartis. Nous proposons un modèle générique de description de sources (GSD), qui permet de décrire tous les types d'informations liées au traitement et à l'optimisation de requêtes. Avec ce modèle, nous pouvons en particulier obtenir les informations de coût afin de calculer le coût des différents plans d'exécution. Notre cadre générique d'optimisation fournit les fonctions unitaires permettant de mettre en œuvre les procédures d'optimisation en appliquant différentes stratégies de recherche. Nos résultats expérimentaux mettent en évidence la précision du calcul de coût avec le modèle GSD et la flexibilité de notre cadre générique d'optimisation lors du changement de stratégie de recherche. Notre cadre générique d'optimisation a été mis en œuvre et intégré dans un produit d'intégration de données (DVS) commercialisé par l'entreprise Xcalia - Progress Software Corporation. Pour des requêtes contenant beaucoup de jointures inter-site et interrogeant des sources de grand volume, le temps de calcul du plan optimal est de l'ordre de 2 secondes et le temps d'exécution du plan optimal est réduit de 28 fois par rapport au plan initial non optimisé. / This thesis proposes a generic framework for query optimization in heterogeneous and distributed environments. We propose a generic source description model (GSD), which allows describing any type of information related to query processing and optimization. With GSD, we can use cost information to calculate the costs of execution plans. Our generic framework for query optimization provides a set of unitary functions used to perform optimization by applying different search strategies. Our experimental results show the accuracy of cost calculus when using GSD, and the flexibility of our generic framework when changing search strategies. Our proposed approach has been implemented and integrated in a data integration product (DVS) licensed by Xcalia – Progress Software Corporation. For queries with many inter-site joins accessing large size data sources, the time used for finding the optimal plan is in the order of 2 seconds, and the execution time of the optimized plan is reduced by 28 times, as compared with the execution time of the non optimized original plan. Optimisation de requête Modèle de coût Système de médiation Query optimization Cost model Mediation system
307	Learning via Query Synthesis Alabdulmohsin, Ibrahim Mansour 07 May 2017 (has links) Active learning is a subfield of machine learning that has been successfully used in many applications. One of the main branches of active learning is query synthe- sis, where the learning agent constructs artificial queries from scratch in order to reveal sensitive information about the underlying decision boundary. It has found applications in areas, such as adversarial reverse engineering, automated science, and computational chemistry. Nevertheless, the existing literature on membership query synthesis has, generally, focused on finite concept classes or toy problems, with a limited extension to real-world applications. In this thesis, I develop two spectral algorithms for learning halfspaces via query synthesis. The first algorithm is a maximum-determinant convex optimization method while the second algorithm is a Markovian method that relies on Khachiyan’s classical update formulas for solving linear programs. The general theme of these methods is to construct an ellipsoidal approximation of the version space and to synthesize queries, afterward, via spectral decomposition. Moreover, I also describe how these algorithms can be extended to other settings as well, such as pool-based active learning. Having demonstrated that halfspaces can be learned quite efficiently via query synthesis, the second part of this thesis proposes strategies for mitigating the risk of reverse engineering in adversarial environments. One approach that can be used to render query synthesis algorithms ineffective is to implement a randomized response. In this thesis, I propose a semidefinite program (SDP) for learning a distribution of classifiers, subject to the constraint that any individual classifier picked at random from this distributions provides reliable predictions with a high probability. This algorithm is, then, justified both theoretically and empirically. A second approach is to use a non-parametric classification method, such as similarity-based classification. In this thesis, I argue that learning via the empirical kernel maps, also commonly referred to as 1-norm Support Vector Machine (SVM) or Linear Programming (LP) SVM, is the best method for handling indefinite similarities. The advantages of this method are established both theoretically and empirically. active learning query synthesis reverse engineering support vector machine indefinite kernels linear classification
308	Implementace paralelního zpracování dotazů v databázovém systému PostgreSQL / Implementace paralelního zpracování dotazů v databázovém systému PostgreSQL Vojtek, Daniel January 2011 (has links) CONTENTS vi Title: Implementation of parallel query processing in PostgreSQL Author: Bc. Daniel Vojtek Department: Department of Software Engineering Supervisor: Mgr. Július Štroffek Supervisor's e-mail address: julo@stroffek.cz Abstract: Parallel query processing can help with processing of huge amounts of data stored in database systems. The aim of this diploma the- sis was to explore the possibilities, analyze, design and finally implement parallel query processing in open source database system PostgreSQL. I used a Master/Worker design pattern, in which standard PostgreSQL backend process is a master. As workers I used processes created from postmaster. In the thesis I focused on preparing an infrastructure nec- essary for parallel processing. I defined a new top level memory context over shared memory, which allows efficient and convenient memory al- locations. Then I implemented creation of new worker processes, based on master process requirements. To be able to control these workers I defined controlling structures using state machines. Then I implemented parallel sort operation and SQL operator UNION ALL using this infras- tructure. The result of this diploma thesis is not only implementation of infrastructure and some parallel operations, but also description of the problems encountered during the...
309	Google ekonometrie: Aplikace na Českou republiku / Google Econometrics: An Application to the Czech Republic Platil, Lukáš January 2014 (has links) This thesis examines the applicability of Google Econometrics - the use of search volume data of particular queries as explanatory variables in time se- ries modeling - in the case of the Czech Republic. We analyze the contribu- tion of Google data by comparing out-of-sample nowcasting performance and in-sample fit with control variables in three related areas: using an auto- regressive model for unemployment, vector autoregression and logit models for GDP and household consumption, and Granger causality test for consum- er confidence. The improvement in quality of unemployment nowcasting is modest but statistically significant; sentiment index based on Google queries shows reciprocal relationship with the official Consumer Confidence Indicator, and it also provides superior nowcasts for household consumption as well as in- sample fit in logit models; its performance in GDP nowcasting is average among control variables. In overall, the results suggest that Google Econometrics is applicable also to the Czech Republic, despite the fact that the internet penetration rate and Google popularity was lower over the analyzed period compared with developed economies where these methods were usually tested. In the future, Google data may be used together with other leading and coincident indica- tors to...
310	Robust and Efficient Algorithms for Protein 3-D Structure Alignment and Genome Sequence Comparison Zhao, Zhiyu 07 August 2008 (has links) Sequence analysis and structure analysis are two of the fundamental areas of bioinformatics research. This dissertation discusses, specifically, protein structure related problems including protein structure alignment and query, and genome sequence related problems including haplotype reconstruction and genome rearrangement. It first presents an algorithm for pairwise protein structure alignment that is tested with structures from the Protein Data Bank (PDB). In many cases it outperforms two other well-known algorithms, DaliLite and CE. The preliminary algorithm is a graph-theory based approach, which uses the concept of \stars" to reduce the complexity of clique-finding algorithms. The algorithm is then improved by introducing \double-center stars" in the graph and applying a self-learning strategy. The updated algorithm is tested with a much larger set of protein structures and shown to be an improvement in accuracy, especially in cases of weak similarity. A protein structure query algorithm is designed to search for similar structures in the PDB, using the improved alignment algorithm. It is compared with SSM and shows better performance with lower maximum and average Q-score for missing proteins. An interesting problem dealing with the calculation of the diameter of a 3-D sequence of points arose and its connection to the sublinear time computation is discussed. The diameter calculation of a 3-D sequence is approximated by a series of sublinear time deterministic, zero-error and bounded-error randomized algorithms and we have obtained a series of separations about the power of sublinear time computations. This dissertation also discusses two genome sequence related problems. A probabilistic model is proposed for reconstructing haplotypes from SNP matrices with incomplete and inconsistent errors. The experiments with simulated data show both high accuracy and speed, conforming to the theoretically provable e ciency and accuracy of the algorithm. Finally, a genome rearrangement problem is studied. The concept of non-breaking similarity is introduced. Approximating the exemplar non-breaking similarity to factor n1..f is proven to be NP-hard. Interestingly, for several practical cases, several polynomial time algorithms are presented. Sequence Analysis Structure Analysis Protein Structure Alignment Protein Structure Query

Search results