1 |
Targeted Prioritized Processing in Overloaded Data Stream SystemsWorks, Karen E. 11 December 2013 (has links)
"We are in an era of big data, sensors, and monitoring technology. One consequence of this technology is the continuous generation of massive volumes of streaming data. To support this, stream processing systems have emerged. These systems must produce results while meeting near-real time response obligations. However, computation intensive processing on high velocity streams is challenging. Stream arrival rates are often unpredictable and can fluctuate. This can cause systems to not always be able to process all incoming data within their required response time.Yet inherently some results may be much more significant than others. The delay or complete neglect of producing certain highly significant results could result in catastrophic consequences. Unfortunately, this critical problem of targeted prioritized processing in overloaded environments remains largely unaddressed to date. In this talk, I will describe four key challenges that my dissertation successfully tackled. First, I address the problem of optimally processing the most significant tuples identified by the user at compile-time before less critical ones. Second, I propose a new aggregate operator that increases the accuracy of aggregate results produced for TP systems. Third, I address the problem of identifying and pulling forward significant tuples at run-time via dynamic determinants. Fourth, I design multi-input operators, such as the join operator, which produce multi-stream results in significance order. My experimental studies explore a rich diversity of workloads, queries, and data sets, including real data streams. The results substantiate that my approaches are a significant improvement over the state-of-the-art approaches."
|
2 |
Integrated resource management for data stream systemsBerthold, Henrike, Schmidt, Sven, Lehner, Wolfgang, Hamann, Claude-Joachim 13 December 2022 (has links)
Data stream systems have to deal with massive data volumes. To perform several queries in parallel or to perform even a single query, resources must be planned carefully and the resulting quality-of-service (QoS) is lower than the best one. Typical QoS measures are the output delay and the amount of data in the stream used for the processing. In this paper, we introduce a model which allows to describe stream operators and the streams between the operators of an operator graph belonging to a stream query. The model allows us to calculate the resources consumed by a query graph given a certain result quality. Furthermore, it can be used to determine in advance if the quality-of-service requirement of a given query can be met with the actual available system resources. This model is the basis for building QoS-guaranteeing systems.
|
Page generated in 0.0905 seconds