Global ETD Search

1	F2DR: A Distributed Hash Table Algorithm Rea, Dana 17 January 2013 (has links) Considerable research has been directed toward the study of consistent hashing and distributed hash tables. While many useful research results have emerged from this work, existing solutions can be improved in the areas of time efficiency, system growth and change to input distributions. For resolution, systems such as Chord [69][70], Memcached [23] and others use a binary search over the set of intervals to determine the node. Also, relying on a pseudo-random designation of partitions on the continuum can result in poor worst-case time performance due to load imbalance. The work proposes F^2DR, a system that maps an arithmetic distribution of intervals on a continuum to a fluid set of nodes. Any point on the continuum can be resolved to a node in O(1) time, and O(n) space. The system also contains flexible mechanisms for adapting to load patterns through dynamic restructuring. In all, F2DR provides a fresh formulation of consistent hashing that offers several advantages over previous work. / School of Computer Science, University of Guelph Shared Nothing
2	Efficient Virtualization of Scientific Data Narayanan, Sivaramakrishnan 16 September 2008 (has links) No description available. Computer Science large data parallel shared-nothing ontology rules
3	Resource Efficient Parallel VLDB with Customizable Degree of Redundancy Xiong, Fanfan January 2009 (has links) This thesis focuses on the practical use of very large scale relational databases. It leverages two recent breakthroughs in parallel and distributed computing: a) synchronous transaction replication technologies by Justin Y. Shi and Suntain Song; and b) Stateless Parallel Processing principle pioneered by Justin Y. Shi. These breakthroughs enable scalable performance and reliability of database service using multiple redundant shared-nothing database servers. This thesis presents a Functional Horizontal Partitioning method with customizable degree of redundancy to address practical very large scale database applications problems. The prototype VLDB implementation is designed for transparent non-intrusive deployments. The prototype system supports Microsoft SQL Servers databases. Computational experiments are conducted using industry-standard benchmark (TPC-E). / Computer and Information Science Computer Science Relational Database Partition Scalability Shared-nothing Distributed System Speedup Synchronous Transaction Replication Vldb Oltp
4	Processing Exact Results for Queries over Data Streams Chakraborty, Abhirup 23 February 2010 (has links) In a growing number of information-processing applications, such as network-traffic monitoring, sensor networks, financial analysis, data mining for e-commerce, etc., data takes the form of continuous data streams rather than traditional stored databases/relational tuples. These applications have some common features like the need for real time analysis, huge volumes of data, and unpredictable and bursty arrivals of stream elements. In all of these applications, it is infeasible to process queries over data streams by loading the data into a traditional database management system (DBMS) or into main memory. Such an approach does not scale with high stream rates. As a consequence, systems that can manage streaming data have gained tremendous importance. The need to process a large number of continuous queries over bursty, high volume online data streams, potentially in real time, makes it imperative to design algorithms that should use limited resources. This dissertation focuses on processing exact results for join queries over high speed data streams using limited resources, and proposes several novel techniques for processing join queries incorporating secondary storages and non-dedicated computers. Existing approaches for stream joins either, (a) deal with memory limitations by shedding loads, and therefore can not produce exact or highly accurate results for the stream joins over data streams with time varying arrivals of stream tuples, or (b) suffer from large I/O-overheads due to random disk accesses. The proposed techniques exploit the high bandwidth of a disk subsystem by rendering the data access pattern largely sequential, eliminating small, random disk accesses. This dissertation proposes an I/O-efficient algorithm to process hybrid join queries, that join a fast, time varying or bursty data stream and a persistent disk relation. Such a hybrid join is the crux of a number of common transformations in an active data warehouse. Experimental results demonstrate that the proposed scheme reduces the response time in output results by exploiting spatio-temporal locality within the input stream, and minimizes disk overhead through disk-I/O amortization. The dissertation also proposes an algorithm to parallelize a stream join operator over a shared-nothing system. The proposed algorithm distributes the processing loads across a number of independent, non-dedicated nodes, based on a fixed or predefined communication pattern; dynamically maintains the degree of declustering in order to minimize communication and processing overheads; and presents mechanisms for reducing storage and communication overheads while scaling over a large number of nodes. We present experimental results showing the efficacy of the proposed algorithms. Data Streams query processing join processing Shared Nothing Cluster Sliding windows adaptive parallelism
5	Processing Exact Results for Queries over Data Streams Chakraborty, Abhirup 23 February 2010 (has links) In a growing number of information-processing applications, such as network-traffic monitoring, sensor networks, financial analysis, data mining for e-commerce, etc., data takes the form of continuous data streams rather than traditional stored databases/relational tuples. These applications have some common features like the need for real time analysis, huge volumes of data, and unpredictable and bursty arrivals of stream elements. In all of these applications, it is infeasible to process queries over data streams by loading the data into a traditional database management system (DBMS) or into main memory. Such an approach does not scale with high stream rates. As a consequence, systems that can manage streaming data have gained tremendous importance. The need to process a large number of continuous queries over bursty, high volume online data streams, potentially in real time, makes it imperative to design algorithms that should use limited resources. This dissertation focuses on processing exact results for join queries over high speed data streams using limited resources, and proposes several novel techniques for processing join queries incorporating secondary storages and non-dedicated computers. Existing approaches for stream joins either, (a) deal with memory limitations by shedding loads, and therefore can not produce exact or highly accurate results for the stream joins over data streams with time varying arrivals of stream tuples, or (b) suffer from large I/O-overheads due to random disk accesses. The proposed techniques exploit the high bandwidth of a disk subsystem by rendering the data access pattern largely sequential, eliminating small, random disk accesses. This dissertation proposes an I/O-efficient algorithm to process hybrid join queries, that join a fast, time varying or bursty data stream and a persistent disk relation. Such a hybrid join is the crux of a number of common transformations in an active data warehouse. Experimental results demonstrate that the proposed scheme reduces the response time in output results by exploiting spatio-temporal locality within the input stream, and minimizes disk overhead through disk-I/O amortization. The dissertation also proposes an algorithm to parallelize a stream join operator over a shared-nothing system. The proposed algorithm distributes the processing loads across a number of independent, non-dedicated nodes, based on a fixed or predefined communication pattern; dynamically maintains the degree of declustering in order to minimize communication and processing overheads; and presents mechanisms for reducing storage and communication overheads while scaling over a large number of nodes. We present experimental results showing the efficacy of the proposed algorithms. Data Streams query processing join processing Shared Nothing Cluster Sliding windows adaptive parallelism

1

Page generated in 0.0698 seconds