• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Elastic Data Stream Processing

Heinze, Thomas 27 October 2021 (has links)
Data stream processing systems are used to process data from high velocity data sources like financial, sensor, or logistics data. Many use cases force these systems to use a distributed setup to be able to fulfill the strict requirements regarding expected system throughput and end-to-end latency. The major challenge for a distributed data stream processing system is unpredictable load peaks. Most systems use overprovisioning to solve this problem, which leads to a low system utilization and high monetary cost for the user. This doctoral thesis studies a potential solution to this problem by automatic scaling in or out based on the changing workload. This approach is called elastic scaling and allows a cost-efficient execution of the system with a high quality of service. In this thesis, we present our elastic scaling data stream processing system FUGU and address three major challenges of such systems: 1) consideration of user-defined end-to-end latency constraints during the elastic scaling, 2) study of different auto-scaling techniques, and 3) combination of elastic scaling with different fault tolerance techniques. First, we demonstrate how our system considers user-defined end-to-end latency constraints during the scaling decisions. Each scaling decision causes short latency peaks, because the processing needs to be paused while operators are moved. FUGU estimates the latency peaks for different scaling decisions, tries to minimize the created latency peak and at the same time to achieve similar monetary costs like alternative approaches. Second, we study different auto-scaling techniques for elastic-scaling data stream processing systems. Auto-scaling techniques are a very important part of such systems as they derive the scaling decisions. In this thesis, we study three auto-scaling techniques: Threshold-based Scaling, Reinforcement Learning and the novel Online Parameter Optimization. The Online Parameter Optimization overcomes the shortcomings of the two other approaches by avoiding manual tuning and being robust towards different workload patterns. Finally, we present an integration of an elastic scaling with different replication techniques for high availability to allow to minimize the spent monetary cost and to ensure at the same time a maximal recovery time. We leverage two replication approaches in FUGU and evaluate a trade-off between recovery time and overhead. FUGU estimates the recovery time and adaptively optimizes the used replication technique for each operator. All these contributions are carefully evaluated in three real-world scenarios and we discuss the relationship of our contributions towards related work.

Page generated in 0.0661 seconds