12 October 2011
"Complex Event Processing (CEP) has become increasingly important for tracking and monitoring anomalies and trends in event streams emitted from business processes such as supply chain management to online stores in e-commerce. These monitoring applications submit complex event queries to track sequences of events that match a given pattern. While the state-of-the-art CEP systems mostly focus on the execution of flat sequence queries, we instead support the execution of nested CEP queries specified by the (NEsted Event Language) NEEL. However the iterative execution often results in the repeated recomputation of similar or even identical results for nested sub- expressions as the window slides over the event stream. This work proposes to optimize NEEL execution performance by caching intermediate results. In particular a method of applying selective caching of intermediate results called Continuous Sliding Caching technique has been designed. Then a further optimization of the previous technique which we call the Semantic Caching and the Continuous Semantic Caching have been proposed. Techniques for incrementally loading, purging and exploiting the cache content are described. Our experimental study using real- world stock trades evaluates the performance of our proposed caching strategies for different query types."
25 April 2012
Many modern applications, including online financial feeds, tag-based mass transit systems and RFID-based supply chain management systems transmit real-time data streams. There is a need for event stream processing technology to analyze this vast amount of sequential data to enable online operational decision making. This dissertation focuses on innovating several techniques at the core of a scalable E-Analytic system to achieve efficient, scalable and robust methods for in-memory multi-dimensional nested pattern analysis over high-speed event streams. First, I address the problem of processing flat pattern queries on event streams with out-of-order data arrival. I design two alternate solutions: aggressive and conservative strategies respectively. The aggressive strategy produces maximal output under the optimistic assumption that out-of-order event arrival is rare. The conservative method works under the assumption that out-of-order data may be common, and thus produces output only when its correctness can be guaranteed. Second, I design the integration of CEP and OLAP techniques (ECube model) for efficient multi-dimensional event pattern analysis at different abstraction levels. Strategies of drill-down (refinement from abstract to specific patterns) and of roll-up (generalization from specific to abstract patterns) are developed for the efficient workload evaluation. I design a cost-driven adaptive optimizer called Chase that exploits reuse strategies for optimal E-Cube hierarchy execution. Then, I explore novel optimization techniques to support the high- performance processing of powerful nested CEP patterns. A CEP query language called NEEL, is designed to express nested CEP pattern queries composed of sequence, negation, AND and OR operators. To allow flexible execution ordering, I devise a normalization procedure that employs rewriting rules for flattening a nested complex event expression. To conserve CPU and memory consumption, I propose several strategies for efficient shared processing of groups of normalized NEEL subexpressions. Our comprehensive experimental studies, using both synthetic as well as real data streams demonstrate superiority of our proposed strategies over alternate methods in the literature in both effectiveness and efficiency.
Cabanillas Macias, Cristina, Baumgrass, Anne, Di Ciccio, Claudio
(has links) (PDF)
The field of Smart Logistics is attracting interest in several areas of research, including Business Process Management. Awide range of research works are carried out to enhance the capability of monitoring the execution of ongoing logistics processes and predict their likely evolvement. In order to do this, it is crucial to have in place an IT infrastructure that provides the capability of automatically intercepting the digitalised transportation-related events stemming from widespread sources, along with their elaboration, interpretation and dispatching. In this context, we present here the service-oriented software architecture of such an event-based information engine. In particular, we describe the requisites that it must meet. Thereafter, we present the interfaces and subsequently the service-oriented components that are in charge of realising them. The outlined architecture is being utilised as the reference model for an ongoing European research project on Smart Logistics, namely GET Service.
abstract: Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and user credit card purchase pattern monitoring, however the matches to the user queries are in fact plentiful and the system has to efficiently sift through these many matches to locate only the few most preferable matches. In this work, we propose a complex pattern ranking (CPR) framework for specifying top-k pattern queries over streaming data, present new algorithms to support top-k pattern queries in data streaming environments, and verify the effectiveness and efficiency of the proposed algorithms. The developed algorithms identify top-k matching results satisfying both patterns as well as additional criteria. To support real-time processing of the data streams, instead of computing top-k results from scratch for each time window, we maintain top-k results dynamically as new events come and old ones expire. We also develop new top-k join execution strategies that are able to adapt to the changing situations (e.g., sorted and random access costs, join rates) without having to assume a priori presence of data statistics. Experiments show significant improvements over existing approaches. / Dissertation/Thesis / M.S. Computer Science 2011
02 January 2013
Complex Event Processing (CEP) is the technical choice for high performance analytics in time-critical decision-making applications. Although current CEP systems support sequence pattern detection on continuous event streams, they do not support the computation of aggregated values over the matched sequences of a query pattern. Instead, aggregation is typically applied as a post processing step after CEP pattern detection, leading to an extremely inefficient solution for sequence aggregation. Meanwhile, the state-of-art aggregation techniques over traditional stream data are not directly applicable in the context of the sequence-semantics of CEP. In this paper, we propose an approach, called A-Seq, that successfully pushes the aggregation computation into the sequence pattern detection process. A-Seq succeeds to compute aggregation online by dynamically recording compact partial sequence aggregation without ever constructing the to-be-aggregated matched sequences. Techniques are devised to tackle all the key CEP- specific challenges for aggregation, including sliding window semantics, event purging, as well as sequence negation. For scalability, we further introduce the Chop-Connect methodology, that enables sequence aggregation sharing among queries with arbitrary substring relationships. Lastly, our cost-driven optimizer selects a shared execution plan for effectively processing a workload of CEP aggregation queries. Our experimental study using real data sets demonstrates over four orders of magnitude efficiency improvement for a wide range of tested scenarios of our proposed A-Seq approach compared to the state-of-art solutions, thus achieving high-performance CEP aggregation analytics.
05 January 2018
Advances in hardware, software and communication networks have enabled applications to generate data at unprecedented volume and velocity. An important type of this data are event streams generated from financial transactions, health sensors, web logs, social media, mobile devices, and vehicles. The world is thus poised for a sea-change in time-critical applications from financial fraud detection to health care analytics empowered by inferring insights from event streams in real time. Event processing systems continuously evaluate massive workloads of Kleene queries to detect and aggregate event trends of interest. Examples of these trends include check kites in financial fraud detection, irregular heartbeat in health care analytics, and vehicle trajectories in traffic control. These trends can be of any length. Worst yet, their number may grow exponentially in the number of events. State-of-the-art systems do not offer practical solutions for trend analytics and thus suffer from long delays and high memory costs. In this dissertation, we propose the following event trend detection and aggregation techniques. First, we solve the trade-off between CPU processing time and memory usage while computing event trends over high-rate event streams. Namely, our event trend detection approach guarantees minimal CPU processing time given limited memory. Second, we compute online event trend aggregation at multiple granularity levels from fine (per matched event), to medium (per event type), to coarse (per pattern). Thus, we minimize the number of aggregates Â– reducing both time and space complexity compared to the state-of-the-art approaches. Third, we share intermediate aggregates among multiple event sequence queries while avoiding the expensive construction of matched event sequences. In several comprehensive experimental studies, we demonstrate the superiority of the proposed strategies over the state-of-the-art techniques with respect to latency, throughput, and memory costs.
Rozet, Allison M.
07 May 2020
Streaming analytics deploy Kleene pattern queries to detect and aggregate event trends against high-rate data streams. Despite increasing workloads, most state-of-the-art systems process each query independently, thus missing cost-saving sharing opportunities. Sharing complex event trend aggregation poses several technical challenges. First, the execution of nested and diverse Kleene patterns is difficult to share. Second, we must share aggregate computation without the exponential costs of constructing the event trends. Third, not all sharing opportunities are beneficial because sharing aggregation introduces overhead. We propose a novel framework, Muse (Multi-query Snapshot Execution), that shares aggregation queries with Kleene patterns while avoiding expensive trend construction. It adopts an online sharing strategy that eliminates re-computations for shared sub-patterns. To determine the beneficial sharing plan, we introduce a cost model to estimate the sharing benefit and design the Muse refinement algorithm to efficiently select robust sharing candidates from the search space. Finally, we explore optimization decisions to further improve performance. Our experiments over a wide range of scenarios demonstrate that Muse increases throughput by 4 orders of magnitude compared to state-of-the-art approaches with negligible memory requirements.
20 May 2022
Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt. / Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency.
12 November 2008
As the use of Web expands, Web Service is gradually becoming the basic system infrastructure. However, as it matures and a large number of Web Service becomes available, the focus will shift from service development to service management. One key component in management systems is monitoring. The growing complexity of Web Service platforms and their dynamically varying workloads make manually monitoring them a demanding task. Therefore monitoring tools are required to support the management efforts. Our approach, Web Service Monitoring System (WSMS), utilizes Autonomic Computing technology to monitor Web Service for an automated manager. WSMS correlates lower level events into a meaningful diagnosed symptom which provides higher level information for problem determination. It also gains the ability to take autonomic actions and solve the original problem using corrective actions. In this thesis, a complete design of WSMS is presented along with a practical implementation showing viability and proof of concept of WSMS. / Thesis (Master, Computing) -- Queen's University, 2008-11-12 16:20:13.738
30 April 2013
Recently numerous emerging applications, ranging from on-line financial transactions, RFID based supply chain management, traffic monitoring to real-time object monitoring, generate high-volume event streams. To meet the needs of processing event data streams in real-time, Complex Event Processing technology (CEP) has been developed with the focus on detecting occurrences of particular composite patterns of events. By analyzing and constructing several real-world CEP applications, we found that CEP needs to be extended with advanced services beyond detecting pattern queries. We summarize these emerging needs in three orthogonal directions. First, for applications which require access to both streaming and stored data, we need to provide a clear semantics and efficient schedulers in the face of concurrent access and failures. Second, when a CEP system is deployed in a sensitive environment such as health care, we wish to mitigate possible privacy leaks. Third, when input events do not carry the identification of the object being monitored, we need to infer the probabilistic identification of events before feed them to a CEP engine. Therefore this dissertation discusses the construction of a framework for extending CEP to support these critical services. First, existing CEP technology is limited in its capability of reacting to opportunities and risks detected by pattern queries. We propose to tackle this unsolved problem by embedding active rule support within the CEP engine. The main challenge is to handle interactions between queries and reactions to queries in the high-volume stream execution. We hence introduce a novel stream-oriented transactional model along with a family of stream transaction scheduling algorithms that ensure the correctness of concurrent stream execution. And then we demonstrate the proposed technology by applying it to a real-world healthcare system and evaluate the stream transaction scheduling algorithms extensively using real-world workload. Second, we are the first to study the privacy implications of CEP systems. Specifically we consider how to suppress events on a stream to reduce the disclosure of sensitive patterns, while ensuring that nonsensitive patterns continue to be reported by the CEP engine. We formally define the problem of utility-maximizing event suppression for privacy preservation. We then design a suite of real-time solutions that eliminate private pattern matches while maximizing the overall utility. Our first solution optimally solves the problem at the event-type level. The second solution, at event-instance level, further optimizes the event-type level solution by exploiting runtime event distributions using advanced pattern match cardinality estimation techniques. Our experimental evaluation over both real-world and synthetic event streams shows that our algorithms are effective in maximizing utility yet still efficient enough to offer near real time system responsiveness. Third, we observe that in many real-world object monitoring applications where the CEP technology is adopted, not all sensed events carry the identification of the object whose action they report on, so called €œnon-ID-ed€� events. Such non-ID-ed events prevent us from performing object-based analytics, such as tracking, alerting and pattern matching. We propose a probabilistic inference framework to tackle this problem by inferring the missing object identification associated with an event. Specifically, as a foundation we design a time-varying graphic model to capture correspondences between sensed events and objects. Upon this model, we elaborate how to adapt the state-of-the-art Forward-backward inference algorithm to continuously infer probabilistic identifications for non-ID-ed events. More important, we propose a suite of strategies for optimizing the performance of inference. Our experimental results, using large-volume streams of a real-world health care application, demonstrate the accuracy, efficiency, and scalability of the proposed technology.
Page generated in 0.1253 seconds