Global ETD Search

11	Application of a Temporal Database Framework for Processing Event Queries January 2012 (has links) abstract: This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query languages have been a subject of research for more than 30 years and are a natural fit for expressing queries that involve a temporal dimension. However, operators developed in this context cannot be directly applied to event streams. The research extends a preexisting relational framework for event stream processing to support temporal queries. The language features and formal semantic extensions to extend the relational framework are identified. The extended framework supports continuous, step-wise evaluation of temporal queries. The incremental evaluation of TEQL operators is formalized to avoid re-computation of previous results. The research includes the development of a prototype that supports the integrated event and temporal query processing framework, with support for incremental evaluation and materialization of intermediate results. TEQL enables reporting temporal data in the output, direct specification of conditions over timestamps, and specification of temporal relational operators. Through the integration of temporal database operators with event languages, a new class of temporal queries is made possible for querying event streams. New features include semantic aggregation, extraction of temporal patterns using set operators, and a more accurate specification of event co-occurrence. / Dissertation/Thesis / Ph.D. Computer Science 2012 Computer science event stream processing query languages temporal queries
12	Stream Processing in the Robot Operating System framework Hongslo, Anders January 2012 (has links) Streams of information rather than static databases are becoming increasingly important with the rapid changes involved in a number of fields such as finance, social media and robotics. DyKnow is a stream-based knowledge processing middleware which has been used in autonomous Unmanned Aerial Vehicle (UAV) research. ROS (Robot Operating System) is an open-source robotics framework providing hardware abstraction, device drivers, communication infrastructure, tools, libraries as well as other functionalities. This thesis describes a design and a realization of stream processing in ROS based on the stream-based knowledge processing middleware DyKnow. It describes how relevant information in ROS can be selected, labeled, merged and synchronized to provide streams of states. There are a lot of applications for such stream processing such as execution monitoring or evaluating metric temporal logic formulas through progression over state sequences containing the features of the formulas. Overviews are given of DyKnow and ROS before comparing the two and describing the design. The stream processing capabilities implemented in ROS are demonstrated through performance evaluations which show that such stream processing is fast and efficient. The resulting realization in ROS is also readily extensible to provide further stream processing functionality. Artificial Intelligence Stream Processing Robotics Computer Engineering Datorteknik
13	Implementierung und Evaluierung einer Verarbeitung von Datenströmen im Big Data Umfeld am Beispiel von Apache Flink Oelschlegel, Jan 17 May 2021 (has links) Die Verarbeitung von Datenströmen rückt zunehmend in den Fokus beim Aufbau moderner Big Data Infrastrukturen. Der Praxispartner dieser Master-Thesis, die integrationfactory GmbH & Co. KG, möchte zunehmend den Big Data Bereich ausbauen, um den Kunden auch in diesen Aspekten als Beratungshaus Unterstützung bieten zu können. Der Fokus wurde von Anfang an auf Apache Flink gelegt, einem aufstrebenden Stream-Processing-Framework. Das Ziel dieser Arbeit ist die Implementierung verschiedener typischer Anwendungsfälle des Unternehmens mithilfe von Flink und die anschließende Evaluierung dieser. Im Rahmen dessen wird am Anfang zunächst die zentrale Problemstellung festgehalten und daraus die Zielstellungen abgeleitet. Zum besseren Verständnis werden im Nachgang wichtige Grundbegriffe und Konzepte vermittelt. Es wird außerdem dem Framework ein eigenes Kapitel gewidmet, um den Leser einen umfangreichen aber dennoch kompakten Einblick in Flink zu geben. Dabei wurde auf verschiedene Quellen eingegangen, mitunter wurde auch ein direkter Kontakt mit aktiven Entwicklern des Frameworks aufgebaut. Dadurch konnten zunächst unklare Sachverhalte durch fehlende Informationen aus den Primärquellen im Nachgang geklärt und aufbereitet in das Kapitel hinzugefügt werden. Im Hauptteil der Arbeit wird eine Implementierung von definierten Anwendungsfällen vorgenommen. Dabei kommen die Datastream-API und FlinkSQL zum Einsatz, dessen Auswahl auch begründet wird. Die Ausführung der programmierten Jobs findet im firmeneigenen Big Data Labor statt, einer virtualisierten Umgebung zum Testen von Technologien. Als zentrales Problem dieser Master-Thesis sollen beide Schnittstellen auf die Eignung hinsichtlich der Anwendungsfälle evaluiert werden. Auf Basis des Wissens aus den Grundlagen-Kapiteln und der Erfahrungen aus der Entwicklung der Jobs werden Kriterien zur Bewertung mithilfe des Analytic Hierarchy Processes aufgestellt. Im Nachgang findet eine Auswertung statt und die Einordnung des Ergebnisses.:1. Einleitung 1.1. Motivation 1.2. Problemstellung 1.3. Zielsetzung 2. Grundlagen 2.1. Begriffsdefinitionen 2.1.1. Big Data 2.1.2. Bounded vs. unbounded Streams 2.1.3. Stream vs. Tabelle 2.2. Stateful Stream Processing 2.2.1. Historie 2.2.2. Anforderungen 2.2.3. Pattern-Arten 2.2.4. Funktionsweise zustandsbehafteter Datenstromverarbeitung 3. Apache Flink 3.1. Historie 3.2. Architektur 3.3. Zeitabhängige Verarbeitung 3.4. Datentypen und Serialisierung 3.5. State Management 3.6. Checkpoints und Recovery 3.7. Programmierschnittstellen 3.7.1. DataStream-API 3.7.2. FlinkSQL & Table-API 3.7.3. Integration mit Hive 3.8. Deployment und Betrieb 4. Implementierung 4.1. Entwicklungsumgebung 4.2. Serverumgebung 4.3. Konfiguration von Flink 4.4. Ausgangsdaten 4.5. Anwendungsfälle 4.6. Umsetzung in Flink-Jobs 4.6.1. DataStream-API 4.6.2. FlinkSQL 4.7. Betrachtung der Resultate 5. Evaluierung 5.1. Analytic Hierarchy Process 5.1.1. Ablauf und Methodik 5.1.2. Phase 1: Problemstellung 5.1.3. Phase 2: Struktur der Kriterien 5.1.4. Phase 3: Aufstellung der Vergleichsmatrizen 5.1.5. Phase 4: Bewertung der Alternativen 5.2. Auswertung des AHP 6. Fazit und Ausblick 6.1. Fazit 6.2. Ausblick
14	Scalable Stream Processing and Management for Time Series Data Mousavi, Bamdad 15 June 2021 (has links) There has been an enormous growth in the generation of time series data in the past decade. This trend is caused by widespread adoption of IoT technologies, the data generated by monitoring of cloud computing resources, and cyber physical systems. Although time series data have been a topic of discussion in the domain of data management for several decades, this recent growth has brought the topic to the forefront. Many of the time series management systems available today lack the necessary features to successfully manage and process the sheer amount of time series being generated today. In this today we stive to examine the field and study the prior work in time series management. We then propose a large system capable of handling time series management end to end, from generation to consumption by the end user. Our system is composed of open-source data processing frameworks. Our system has the capability to collect time series data, perform stream processing over it, store it for immediate and future processing and create necessary visualizations. We present the implementation of the system and perform experimentations to show its scalability to handle growing pipelines of incoming data from various sources. Time series Scalability Stream processing Time series management system TSMS
15	Empirical Evaluation of Edge Computing for Smart Building Streaming IoT Applications Ghaffar, Talha 13 March 2019 (has links) Smart buildings are one of the most important emerging applications of Internet of Things (IoT). The astronomical growth in IoT devices, data generated from these devices and ubiquitous connectivity have given rise to a new computing paradigm, referred to as "Edge computing", which argues for data analysis to be performed at the "edge" of the IoT infrastructure, near the data source. The development of efficient Edge computing systems must be based on advanced understanding of performance benefits that Edge computing can offer. The goal of this work is to develop this understanding by examining the end-to-end latency and throughput performance characteristics of Smart building streaming IoT applications when deployed at the resource-constrained infrastructure Edge and to compare it against the performance that can be achieved by utilizing Cloud's data-center resources. This work also presents a real-time streaming application to detect and localize the footstep impacts generated by a building's occupant while walking. We characterize this application's performance for Edge and Cloud computing and utilize a hybrid scheme that (1) offers maximum of around 60% and 65% reduced latency compared to Edge and Cloud respectively for similar throughput performance and (2) enables processing of higher ingestion rates by eliminating network bottleneck. / Master of Science / Among the various emerging applications of Internet of Things (IoT) are Smart buildings, that allow us to monitor and manipulate various operating parameters of a building by instrumenting it with sensor and actuator devices (Things). These devices operate continuously and generate unbounded streams of data that needs to be processed at low latency. This data, until recently, has been processed by the IoT applications deployed in the Cloud at the cost of high network latency of accessing Cloud’s resources. However, the increasing availability of IoT devices, ubiquitous connectivity, and exponential growth in the volume of IoT data has given rise to a new computing paradigm, referred to as “Edge computing”. Edge computing argues that IoT data should be analyzed near its source (at the network’s Edge) in order to eliminate high latency of accessing Cloud for data processing. In order to develop efficient Edge computing systems, an in-depth understanding of the trade-offs involved in Edge and Cloud computing paradigms is required. In this work, we seek to understand these trade-offs and the potential benefits of Edge computing. We examine end to-end latency and throughput performance characteristics of Smart building streaming IoT applications by deploying them at the resource-constrained Edge and compare it against the performance that can be achieved by Cloud deployment. We also present a real-time streaming application to detect and localize the footstep impacts generated by a building’s occupant while walking. We characterize this application’s performance for Edge and Cloud computing and utilize a hybrid scheme that (1) offers maximum of around 60% and 65% reduced latency compared to Edge and Cloud respectively for similar throughput performance and (2) enables processing of higher ingestion rates by eliminating network bottleneck. Edge Computing Stream Processing Internet of Things Apache Storm
16	Benchmarking and Scheduling Strategies for Distributed Stream Processing Shukla, Anshu January 2017 (has links) (PDF) The velocity dimension of Big Data refers to the need to rapidly process data that arrives continuously as streams of messages or events. Distributed Stream Processing Systems (DSPS) refer to distributed programming and runtime platforms that allow users to define a composition of dataflow logic that are executed on distributed resources over streams of incoming messages. A DSPS uses commodity clusters and Cloud Virtual Machines (VMs) for its execution. In order to meet the required performance for these applications, the DSPS needs to schedule these dataßows eﬃciently over the resources. Despite their growing use, resource scheduling for DSPSÕs tends to be done in an ad hoc manner, favoring empirical and reactive approaches, rather than a model-driven and analytical approach. Such empirical strategies may arrive at an approximate schedule for the dataflow that needs further tuning to meet the quality of service. We propose a model-based scheduling approach that makes use of performance profiles and benchmarks developed for tasks in the dataßow to plan both the resource allocation and the resource mapping that together form the schedule planning process. We propose the Model Based Allocation (MBA) and the Slot Aware Mapping (SAM) approaches that efectively utilize knowledge of the performance model of logic tasks to provide an eﬃcient and predictable scheduling behavior. We implemented and validate these algorithms using the popular open source Apache Storm DSPS for several micro and application dataflows. The results show that our model-driven approach is able to reduce the amount of required resources (VMs) by 30% − 50% relative to existing techniques. Also we see that our strategies o↵er a predictable behavior that ensures that the expected and actual rates supported and resources used match closely. This can enable deterministic schedule planning even under dynamic conditions. Besides this static scheduling, we also examine the ability to dynamically consolidate tasks onto fewer VMs when the load on the dataßow decreases or the VMs get fragmented. We propose reliable task migration models for Apache Storm dataßows that are able to rapidly move the task assignment in the cluster, and resume the dataflow execution without any message loss. Distributed Stream Processing Distributed Programming Apache Storm Dataflows Stream Processing Benchmark IoT Applications Streaming Dataflows Cloud Virtual Machines (VMs) Model Based Allocation (MBA) Slot Aware Mapping (SAM) Computer Science
17	Scalable Validation of Data Streams Xu, Cheng January 2016 (has links) In manufacturing industries, sensors are often installed on industrial equipment generating high volumes of data in real-time. For shortening the machine downtime and reducing maintenance costs, it is critical to analyze efficiently this kind of streams in order to detect abnormal behavior of equipment. For validating data streams to detect anomalies, a data stream management system called SVALI is developed. Based on requirements by the application domain, different stream window semantics are explored and an extensible set of window forming functions are implemented, where dynamic registration of window aggregations allow incremental evaluation of aggregate functions over windows. To facilitate stream validation on a high level, the system provides two second order system validation functions, model-and-validate and learn-and-validate. Model-and-validate allows the user to define mathematical models based on physical properties of the monitored equipment, while learn-and-validate builds statistical models by sampling the stream in real-time as it flows. To validate geographically distributed equipment with short response time, SVALI is a distributed system where many SVALI instances can be started and run in parallel on-board the equipment. Central analyses are made at a monitoring center where streams of detected anomalies are combined and analyzed on a cluster computer. SVALI is an extensible system where functions can be implemented using external libraries written in C, Java, and Python without any modifications of the original code. The system and the developed functionality have been applied on several applications, both industrial and for sports analytics. Data Stream Management Distributed Data Stream Processing Data Stream Validation Anomaly Detection
18	A programming language based on recurrence equations and polyhedral compilation for stream processing Leben, Jakob 31 July 2019 (has links) The work presented in this dissertation contributes to the field of programming lan- guage design and implementation for stream processing applications. There is a fast-expanding domain of stream processing applications which demand processing high-volume streams quickly and often in real time. Examples include analysis and synthesis of audio, video and other digital media, sensor array signals, real-time phys- ical simulation etc. High performance is crucial in this domain. When choosing between available programming methods, the programmer often chooses one that maximizes performance while sacrificing ease of programming, code comprehension, maintainability and reusability. This work contributes towards improving the state of the art by jointly maximizing these aspects. High-volume streams are often most naturally represented as multi-dimensional arrays with one infinite dimension representing time. Algorithms working with such streams are typically defined mathematically using recurrence equations. A pro- gramming language is presented in this dissertation which enables an almost literal translation of such mathematical definitions to computer programs. The language also supports powerful facilities for abstraction and code reuse such as polymorphic and higher-order functions. Together, these features enable a more natural expression of algorithms and improve code modularity and reusability. A major contribution of this dissertation is the compilation of the proposed lan- guage in the polyhedral framework, specifically targeting general-purpose multi-core processors. This framework provides powerful means of analysis and transformations of computations on multi-dimensional arrays, which enables data-locality optimiza- tions essential for high performance on general-purpose processors with deep memory hierarchies. The benefit of this framework for computations on finite arrays has been extensively explored. However, this dissertation presents essential extensions that enable the application of state-of-the-art optimizations in this framework on infinite arrays representing streams. / Graduate programming language stream processing signal processing recurrence equations polyhedral model functional language compiler
19	Semantic Query Optimization for Processing XML Streams with Minimized Memory Footprint Li, Ming 25 August 2007 (has links) "XML streams have become increasingly prevalent in modern applications, ranging from network traffic monitoring to real-time information publishing. XQuery evaluation over XML streams require the temporary buffering of XML elements, which not only utilizes system buffer and CPU resources but also causes un-necessary output latency. This thesis presents a semantic query optimization solution to minimize memory footprint during XQuery evaluation by exploiting XML schema knowledge. In many practical applications, XML streams are generated conforming to pre-defined schema constraints typically expressed via a DTD or an XML schema specification. Utilizing such constraints enables us to on-the-fly predict the non-occurrence of a given pattern within a bound context. This helps us to avoid data buffering and to release buffered data at an earlier moment, thus achieving a minimized memory footprint. In this work, we focus on one particular class of constraints, namely, the Pattern Non-Occurrence (PNO) constraint. We develop an automaton-based technique to detect PNO constraints at runtime. For a given query, optimization opportunities which can be triggered by runtime PNO detection are explored for memory footprint minimization. Optimization decisions are encoded using our proposed Condition-Action Graph (CAG). The optimization-embedded execution strategy is then proposed to execute an optimized plan by detecting PNO constraints at run-time and then triggering the corresponding encoded actions when certain predefined conditions are satisfied. To ensure the efficiency of such PNO-triggered optimization, we propose optimization strategy on shrinking the CAGs by utilizing constraint knowledge during the query plan compiling phase. We implement our optimization technique within the Raindrop XQuery engine. Our system implementation processes XQuery utilizing the Raindrop algebra. It is efficiently augmented by our optimization module, which uses Glushkov automaton technique to capture and monitor PNO constraints in parallel with the query-driven pattern retrieval. Finally, we conduct experimental studies using both real and synthetic data streams to illustrate that our techniques bring significant performance improvement in both memory and CPU usage as well as improved output latency over state-of-the-art solutions, with little overhead." optimization XML query evaluation stream processing database XML (Document markup language) Querying (Computer science) Mathematical optimization
20	A benchmark suite for distributed stream processing systems / Um benchmark suite para sistemas distribuídos de stream processing Bordin, Maycon Viana January 2017 (has links) Um dado por si só não possui valor algum, a menos que ele seja interpretado, contextualizado e agregado com outros dados, para então possuir valor, tornando-o uma informação. Em algumas classes de aplicações o valor não está apenas na informação, mas também na velocidade com que essa informação é obtida. As negociações de alta frequência (NAF) são um bom exemplo onde a lucratividade é diretamente proporcional a latência (LOVELESS; STOIKOV; WAEBER, 2013). Com a evolução do hardware e de ferramentas de processamento de dados diversas aplicações que antes levavam horas para produzir resultados, hoje precisam produzir resultados em questão de minutos ou segundos (BARLOW, 2013). Este tipo de aplicação tem como característica, além da necessidade de processamento em tempo-real ou quase real, a ingestão contínua de grandes e ilimitadas quantidades de dados na forma de tuplas ou eventos. A crescente demanda por aplicações com esses requisitos levou a criação de sistemas que disponibilizam um modelo de programação que abstrai detalhes como escalonamento, tolerância a falhas, processamento e otimização de consultas. Estes sistemas são conhecidos como Stream Processing Systems (SPS), Data Stream Management Systems (DSMS) (CHAKRAVARTHY, 2009) ou Stream Processing Engines (SPE) (ABADI et al., 2005). Ultimamente estes sistemas adotaram uma arquitetura distribuída como forma de lidar com as quantidades cada vez maiores de dados (ZAHARIA et al., 2012). Entre estes sistemas estão S4, Storm, Spark Streaming, Flink Streaming e mais recentemente Samza e Apache Beam. Estes sistemas modelam o processamento de dados através de um grafo de fluxo com vértices representando os operadores e as arestas representando os data streams. Mas as similaridades não vão muito além disso, pois cada sistema possui suas particularidades com relação aos mecanismos de tolerância e recuperação a falhas, escalonamento e paralelismo de operadores, e padrões de comunicação. Neste senário seria útil possuir uma ferramenta para a comparação destes sistemas em diferentes workloads, para auxiliar na seleção da plataforma mais adequada para um trabalho específico. Este trabalho propõe um benchmark composto por aplicações de diferentes áreas, bem como um framework para o desenvolvimento e avaliação de SPSs distribuídos. / Recently a new application domain characterized by the continuous and low-latency processing of large volumes of data has been gaining attention. The growing number of applications of such genre has led to the creation of Stream Processing Systems (SPSs), systems that abstract the details of real-time applications from the developer. More recently, the ever increasing volumes of data to be processed gave rise to distributed SPSs. Currently there are in the market several distributed SPSs, however the existing benchmarks designed for the evaluation this kind of system covers only a few applications and workloads, while these systems have a much wider set of applications. In this work a benchmark for stream processing systems is proposed. Based on a survey of several papers with real-time and stream applications, the most used applications and areas were outlined, as well as the most used metrics in the performance evaluation of such applications. With these information the metrics of the benchmark were selected as well as a list of possible application to be part of the benchmark. Those passed through a workload characterization in order to select a diverse set of applications. To ease the evaluation of SPSs a framework was created with an API to generalize the application development and collect metrics, with the possibility of extending it to support other platforms in the future. To prove the usefulness of the benchmark, a subset of the applications were executed on Storm and Spark using the Azure Platform and the results have demonstrated the usefulness of the benchmark suite in comparing these systems. Processamento distribuido Processamento : Alto desempenho Distributed systems Benchmark suite Stream processing Real-time processing Big data

Search results