Spelling suggestions: "subject:"[een] DATA STREAM PROCESSING"" "subject:"[enn] DATA STREAM PROCESSING""
1 |
Efficient Multi-Core Implementation of the IPsec Encapsulating Security Payload Protocol for a Single Security Association / Effektiv, flerkärnig implementation av IPsec Encapsulating Security Payload protokollet för en Security AssociationHellsing, Mattias, Albin, Odervall January 2018 (has links)
As the mobile Internet traffic increases, the workload of the base stations processing this traffic increases with it. To cope with this, the telecommunication providers responsible for the systems deployed in these base stations have looked to parallelism. This, together with the fact that these providers have a vested interest in protecting their users' data from potential attackers, means that there is a need for efficient parallel packet processing software which handles encryption as well as authentication. A well known protocol for encryption and authentication of IP packets is the Encapsulating Security Payload (ESP) protocol of the IPsec protocol suite. IPsec establishes simplex connections, called Security Associations (SA), between entities that wish to communicate. This thesis investigates a special case of this problem where the work of encrypting and authenticating the packets within a single SA is parallelized. This problem was investigated by developing and comparing two multi-threaded implementations based on the Eventdev, an event driven programming library, and ring buffer libraries of Data Plane Development Kit (DPDK). One additional Eventdev-based implementation was also investigated which schedules linked lists of packets, instead of single packets, in an attempt to reduce the overhead of scheduling packets to the worker cores. These implementations were then evaluated in terms of throughput, latency, speedup, and last level cache miss rates. The results showed that the ring buffer-based implementation performed the best in all metrics while the single packet-scheduling Eventdev-based implementation was outperformed by the one using linked lists of packets. It was shown that the packet generation, which was done by the receiving core, was the main limiting factor for all implementations. In addition, the memory resources such as the memory bus, memory controller and prefetching hardware were shown to likely be an area of contention and a possible bottleneck as the packet generation rate increases. The conclusion drawn from this was that a parallelized packet retrieval solution such as Receive Side Scaling (RSS) together with minimizing memory resource contention is necessary to further improve performance.
|
2 |
Implementierung und Evaluierung einer Verarbeitung von Datenströmen im Big Data Umfeld am Beispiel von Apache FlinkOelschlegel, Jan 17 May 2021 (has links)
Die Verarbeitung von Datenströmen rückt zunehmend in den Fokus beim Aufbau moderner Big Data Infrastrukturen. Der Praxispartner dieser Master-Thesis, die integrationfactory GmbH & Co. KG, möchte zunehmend den Big Data Bereich ausbauen, um den Kunden auch in diesen Aspekten als Beratungshaus Unterstützung bieten zu können. Der Fokus wurde von Anfang an auf Apache Flink gelegt, einem aufstrebenden Stream-Processing-Framework. Das Ziel dieser Arbeit ist die Implementierung verschiedener typischer Anwendungsfälle des Unternehmens mithilfe von Flink und die anschließende Evaluierung
dieser. Im Rahmen dessen wird am Anfang zunächst die zentrale Problemstellung festgehalten und daraus die Zielstellungen abgeleitet. Zum besseren Verständnis werden im Nachgang wichtige Grundbegriffe und Konzepte vermittelt. Es wird außerdem dem Framework ein eigenes Kapitel gewidmet, um den Leser einen umfangreichen aber dennoch kompakten Einblick in Flink zu geben. Dabei wurde auf verschiedene Quellen eingegangen, mitunter wurde auch ein direkter Kontakt mit aktiven Entwicklern des Frameworks aufgebaut. Dadurch konnten zunächst unklare Sachverhalte durch fehlende Informationen aus den Primärquellen im Nachgang geklärt und aufbereitet in das Kapitel hinzugefügt werden. Im Hauptteil der Arbeit wird eine Implementierung von definierten Anwendungsfällen
vorgenommen. Dabei kommen die Datastream-API und FlinkSQL zum Einsatz, dessen Auswahl auch begründet wird. Die Ausführung der programmierten Jobs findet im firmeneigenen Big Data Labor statt, einer virtualisierten Umgebung zum Testen von Technologien. Als zentrales Problem dieser Master-Thesis sollen beide Schnittstellen auf die Eignung hinsichtlich der Anwendungsfälle evaluiert werden. Auf Basis des Wissens aus den Grundlagen-Kapiteln und der Erfahrungen aus der Entwicklung der Jobs werden Kriterien zur Bewertung mithilfe des Analytic Hierarchy Processes aufgestellt. Im Nachgang findet eine Auswertung statt und die Einordnung des Ergebnisses.:1. Einleitung
1.1. Motivation
1.2. Problemstellung
1.3. Zielsetzung
2. Grundlagen
2.1. Begriffsdefinitionen
2.1.1. Big Data
2.1.2. Bounded vs. unbounded Streams
2.1.3. Stream vs. Tabelle
2.2. Stateful Stream Processing
2.2.1. Historie
2.2.2. Anforderungen
2.2.3. Pattern-Arten
2.2.4. Funktionsweise zustandsbehafteter Datenstromverarbeitung
3. Apache Flink
3.1. Historie
3.2. Architektur
3.3. Zeitabhängige Verarbeitung
3.4. Datentypen und Serialisierung
3.5. State Management
3.6. Checkpoints und Recovery
3.7. Programmierschnittstellen
3.7.1. DataStream-API
3.7.2. FlinkSQL & Table-API
3.7.3. Integration mit Hive
3.8. Deployment und Betrieb
4. Implementierung
4.1. Entwicklungsumgebung
4.2. Serverumgebung
4.3. Konfiguration von Flink
4.4. Ausgangsdaten
4.5. Anwendungsfälle
4.6. Umsetzung in Flink-Jobs
4.6.1. DataStream-API
4.6.2. FlinkSQL
4.7. Betrachtung der Resultate
5. Evaluierung
5.1. Analytic Hierarchy Process
5.1.1. Ablauf und Methodik
5.1.2. Phase 1: Problemstellung
5.1.3. Phase 2: Struktur der Kriterien
5.1.4. Phase 3: Aufstellung der Vergleichsmatrizen
5.1.5. Phase 4: Bewertung der Alternativen
5.2. Auswertung des AHP
6. Fazit und Ausblick
6.1. Fazit
6.2. Ausblick
|
3 |
Leistungsoptimierung der persistenten Datenverwaltung in DSP-Architekturen zur Live-Analyse von SensordatenWeißbach, Manuel 28 October 2021 (has links)
Aufgrund der in vielen Bereichen stets wachsenden Menge an zu verarbeitenden Daten haben sich Big-Data-Anwendungen in den letzten Jahren zunehmend verbreitet. Twitter gab bereits im Jahr 2011 an, täglich 15 Millionen URLs in Echtzeit zu untersuchen, um die Verbreitung von Spamlinks zu unterbinden [1]. Facebook verarbeitet pro Minute über vier Millionen „Gefällt mir“-Klicks und verwaltet über 300 Petabyte Daten [2]. Über das Businessportal LinkedIn wurden 2011 rund eine Milliarde Nachrichten pro Tag zugestellt, 2015 waren es laut Angaben des Unternehmens bereits 1,1 Billionen täglich versendete Nachrichten [3]. Diesem starken Anstieg liegt ein exponentielles Wachstum zugrunde, das für Big Data typisch ist.
Gartner definiert den Begriff „Big Data“ auf Basis seiner spezifischen Eigenschaften, die in englischer Sprache auch als die „drei V´s“ bezeichnet werden: „Volume“, „Variety“ und „Velocity“ [4]. Neben der enormen Menge an zu verarbeitenden Daten („Volume“) und ihrer Vielfalt und Unstrukturiertheit („Variety“), ist demnach auch die Geschwindigkeit („Velocity“), in der die Daten generiert werden, ein wesentliches Merkmal von Big Data [5, 6]. Soll trotz der ständigen und immer schneller werdenden Generierung neuer Daten ein Verarbeitungsrückstau vermieden werden, so folgt daraus auch die Notwendigkeit, die kontinuierlich wachsenden Datenmengen immer schneller zu verarbeiten.
|
4 |
Scalable Validation of Data StreamsXu, Cheng January 2016 (has links)
In manufacturing industries, sensors are often installed on industrial equipment generating high volumes of data in real-time. For shortening the machine downtime and reducing maintenance costs, it is critical to analyze efficiently this kind of streams in order to detect abnormal behavior of equipment. For validating data streams to detect anomalies, a data stream management system called SVALI is developed. Based on requirements by the application domain, different stream window semantics are explored and an extensible set of window forming functions are implemented, where dynamic registration of window aggregations allow incremental evaluation of aggregate functions over windows. To facilitate stream validation on a high level, the system provides two second order system validation functions, model-and-validate and learn-and-validate. Model-and-validate allows the user to define mathematical models based on physical properties of the monitored equipment, while learn-and-validate builds statistical models by sampling the stream in real-time as it flows. To validate geographically distributed equipment with short response time, SVALI is a distributed system where many SVALI instances can be started and run in parallel on-board the equipment. Central analyses are made at a monitoring center where streams of detected anomalies are combined and analyzed on a cluster computer. SVALI is an extensible system where functions can be implemented using external libraries written in C, Java, and Python without any modifications of the original code. The system and the developed functionality have been applied on several applications, both industrial and for sports analytics.
|
5 |
Handling Tradeoffs between Performance and Query-Result Quality in Data Stream ProcessingJi, Yuanzhen 27 March 2018 (has links) (PDF)
Data streams in the form of potentially unbounded sequences of tuples arise naturally in a large variety of domains including finance markets, sensor networks, social media, and network traffic management. The increasing number of applications that require processing data streams with high throughput and low latency have promoted the development of data stream processing systems (DSPS). A DSPS processes data streams with continuous queries, which are issued once and return query results to users continuously as new tuples arrive.
For stream-based applications, both the query-execution performance (in terms of, e.g., throughput and end-to-end latency) and the quality of produced query results (in terms of, e.g., accuracy and completeness) are important. However, a DSPS often needs to make tradeoffs between these two requirements, either because of the data imperfection within the streams, or because of the limited computation capacity of the DSPS itself. Performance versus result-quality tradeoffs caused by data imperfection are inevitable, because the quality of the incoming data is beyond the control of a DSPS, whereas tradeoffs caused by system limitations can be alleviated—even erased—by enhancing the DSPS itself.
This dissertation seeks to advance the state of the art on handling the performance versus result-quality tradeoffs in data stream processing caused by the above two aspects of reasons. For tradeoffs caused by data imperfection, this dissertation focuses on the typical data-imperfection problem of stream disorder and proposes the concept of quality-driven disorder handling (QDDH). QDDH enables a DSPS to make flexible and user-configurable tradeoffs between the end-to-end latency and the query-result quality when dealing with stream disorder. Moreover, compared to existing disorder handling approaches, QDDH can significantly reduce the end-to-end latency, and at the same time provide users with desired query-result quality. In this dissertation, a generic buffer-based QDDH framework and three instantiations of the generic framework for distinct query types are presented. For tradeoffs caused by system limitations, this dissertation proposes a system-enhancement approach that combines the row-oriented and the column-oriented data layout and processing techniques in data stream processing to improve the throughput. To fully exploit the potential of such hybrid execution of continuous queries, a static, cost-based query optimizer is introduced. The optimizer works at the operator level and takes the unique property of execution plans of continuous queries—feasibility—into account.
|
6 |
Handling Tradeoffs between Performance and Query-Result Quality in Data Stream ProcessingJi, Yuanzhen 28 November 2017 (has links)
Data streams in the form of potentially unbounded sequences of tuples arise naturally in a large variety of domains including finance markets, sensor networks, social media, and network traffic management. The increasing number of applications that require processing data streams with high throughput and low latency have promoted the development of data stream processing systems (DSPS). A DSPS processes data streams with continuous queries, which are issued once and return query results to users continuously as new tuples arrive.
For stream-based applications, both the query-execution performance (in terms of, e.g., throughput and end-to-end latency) and the quality of produced query results (in terms of, e.g., accuracy and completeness) are important. However, a DSPS often needs to make tradeoffs between these two requirements, either because of the data imperfection within the streams, or because of the limited computation capacity of the DSPS itself. Performance versus result-quality tradeoffs caused by data imperfection are inevitable, because the quality of the incoming data is beyond the control of a DSPS, whereas tradeoffs caused by system limitations can be alleviated—even erased—by enhancing the DSPS itself.
This dissertation seeks to advance the state of the art on handling the performance versus result-quality tradeoffs in data stream processing caused by the above two aspects of reasons. For tradeoffs caused by data imperfection, this dissertation focuses on the typical data-imperfection problem of stream disorder and proposes the concept of quality-driven disorder handling (QDDH). QDDH enables a DSPS to make flexible and user-configurable tradeoffs between the end-to-end latency and the query-result quality when dealing with stream disorder. Moreover, compared to existing disorder handling approaches, QDDH can significantly reduce the end-to-end latency, and at the same time provide users with desired query-result quality. In this dissertation, a generic buffer-based QDDH framework and three instantiations of the generic framework for distinct query types are presented. For tradeoffs caused by system limitations, this dissertation proposes a system-enhancement approach that combines the row-oriented and the column-oriented data layout and processing techniques in data stream processing to improve the throughput. To fully exploit the potential of such hybrid execution of continuous queries, a static, cost-based query optimizer is introduced. The optimizer works at the operator level and takes the unique property of execution plans of continuous queries—feasibility—into account.
|
7 |
Hardware Utilisation Techniques for Data Stream ProcessingMeldrum, Max January 2019 (has links)
Recent years have seen an increase in use of the stream processing architecture to compose continuous analytics applications. This thesis presents the design of a Rust-based stream processor that adopts two separate techniques to tackle existing weaknesses in modern production-grade stream processors. The first technique employs a data analytics language on top of the streaming runtime, in order to provide both dataflow and low-level compiler optimisations. This technique is motivated by an analysis of the impact that the lack of compiler integration may have on the end-to-end performance of streaming pipelines in Apache Flink. In the second technique streaming operators are scheduled using a task-parallel approach to boost performance for skewed data distributions. The experimental results for data-parallel streaming pipelines in this thesis demonstrate, that the scheduling model of the prototype achieves performance improvements in skewed scenarios without exhibiting any significant losses in performance during uniform distributions. / Under senare år har användningen av strömbearbetningsarkitekturen ökat för att komponera kontinuerliga analysapplikationer. Denna avhandling presenterar designen av en Rust-baserad strömprocessor som använder två separata tekniker för att hantera befintliga svagheter i moderna strömprocessorer. Den första tekniken använder ett dataanalysspråk ovanpå strömprocessorn, för att ge både dataflöde och kompilatoroptimeringar på låg nivå. Denna teknik är motiverad av en analys av påverkan som bristen på kompilatorintegration kan ha på den slutliga prestandan för analysapplikationer i Apache Flink. I den andra tekniken schemaläggs strömningsoperatörer med hjälp av en uppgiftsparallell metod för att öka prestanda för skev datadistribution. De experimentella resultaten för data-parallella analysapplikationer i denna avhandling visar att schemaläggningsmodellen för prototypen uppnår prestandaförbättringar i ojämna distributioner utan att uppvisa några betydande förluster i prestanda under enhetliga fördelningar.
|
8 |
Quality-of-Service-Aware Data Stream ProcessingSchmidt, Sven 13 March 2007 (has links)
Data stream processing in the industrial as well as in the academic field has gained more and more importance during the last years. Consider the monitoring of industrial processes as an example. There, sensors are mounted to gather lots of data within a short time range. Storing and post-processing these data may occasionally be useless or even impossible. On the one hand, only a small part of the monitored data is relevant. To efficiently use the storage capacity, only a preselection of the data should be considered. On the other hand, it may occur that the volume of incoming data is generally too high to be stored in time or–in other words–the technical efforts for storing the data in time would be out of scale. Processing data streams in the context of this thesis means to apply database operations to the stream in an on-the-fly manner (without explicitly storing the data). The challenges for this task lie in the limited amount of resources while data streams are potentially infinite. Furthermore, data stream processing must be fast and the results have to be disseminated as soon as possible. This thesis focuses on the latter issue. The goal is to provide a so-called Quality-of-Service (QoS) for the data stream processing task. Therefore, adequate QoS metrics like maximum output delay or minimum result data rate are defined. Thereafter, a cost model for obtaining the required processing resources from the specified QoS is presented. On that basis, the stream processing operations are scheduled. Depending on the required QoS and on the available resources, the weight can be shifted among the individual resources and QoS metrics, respectively. Calculating and scheduling resources requires a lot of expert knowledge regarding the characteristics of the stream operations and regarding the incoming data streams. Often, this knowledge is based on experience and thus, a revision of the resource calculation and reservation becomes necessary from time to time. This leads to occasional interruptions of the continuous data stream processing, of the delivery of the result, and thus, of the negotiated Quality-of-Service. The proposed robustness concept supports the user and facilitates a decrease in the number of interruptions by providing more resources.
|
9 |
[en] A DYNAMIC LOAD BALANCING MECHANISM FOR DATA STREAM PROCESSING ON DDS SYSTEMS / [pt] UM MECANISMO DE BALANCEAMENTO DE CARGA DINÂMICO PARA PROCESSAMENTO DE FLUXO DE DADOS EM SISTEMAS DDSRAFAEL OLIVEIRA VASCONCELOS 04 November 2014 (has links)
[pt] Esta dissertação apresenta a solução de balanceamento de carga baseada em fatias de processamento de dados (Data Processing Slice Load Balancing solution) para permitir o balanceamento de carga dinâmico do processamento de fluxos de dados em sistemas baseados em DDS (Data Distribution Service). Um grande número de aplicações requer o processamento contínuo de alto volume de dados oriundos de várias fontes distribuídas., tais como monitoramento de rede, sistemas de engenharia de tráfego, roteamento inteligente de carros em áreas metropolitanas, redes de sensores, sistemas de telecomunicações, aplicações financeiras e meteorologia. Conceito chave da solução proposta é o Data Processing Slice, o qual é a unidade básica da carga de processamento dos dados dos nós servidores em um domínio DDS. A solução consiste de um nó balanceador, o qual é responsável por monitorar a carga atual de um conjunto de nós processadores homogêneos e quando um desbalanceamento de carga é detectado, coordenar ações para redistribuir entre os nós processadores algumas fatias de carga de trabalho de forma segura. Experimentos feitos com grandes fluxos de dados que demonstram a baixa sobrecarga, o bom desempenho e a confiabilidade da solução apresentada. / [en] This thesis presents the Data Processing Slice Load Balancing solution to enable dynamic load balancing of Data Stream Processing on DDS-based systems (Data Distribution Service). A large number of applications require continuous and timely processing of high-volume of data originated from many distributed sources, such as network monitoring, traffic engineering systems, intelligent routing of cars in metropolitan areas, sensor networks, telecommunication systems, financial applications and meteorology. The key concept of the proposed solution is the Data Processing Slice (DPS), which is the basic unit of data processing load of server nodes in a DDS Domain. The Data Processing Slice Load Balancing solution consists of a load balancer, which is responsible for monitoring the current load of a set of homogenous data processing nodes and when a load unbalance is detected, it coordinates the actions to redistribute some data processing slices among the processing nodes in a secure way. Experiments with large data stream have demonstrated the low overhead, good performance and the reliability of the proposed solution.
|
10 |
Datenqualität in Sensordatenströmen / Data Quality in Sensor Data StreamsKlein, Anja 23 March 2010 (has links) (PDF)
Die stetige Entwicklung intelligenter Sensorsysteme erlaubt die Automatisierung und Verbesserung komplexer Prozess- und Geschäftsentscheidungen in vielfältigen Anwendungsszenarien.
Sensoren können zum Beispiel zur Bestimmung optimaler Wartungstermine oder zur Steuerung von Produktionslinien genutzt werden. Ein grundlegendes Problem bereitet dabei die Sensordatenqualität, die durch Umwelteinflüsse und Sensorausfälle
beschränkt wird. Ziel der vorliegenden Arbeit ist die Entwicklung eines Datenqualitätsmodells, das Anwendungen und Datenkonsumenten Qualitätsinformationen für eine umfassende Bewertung unsicherer Sensordaten zur Verfügung stellt. Neben Datenstrukturen zur
effizienten Datenqualitätsverwaltung in Datenströmen und Datenbanken wird eine umfassende Datenqualitätsalgebra zur Berechnung der Qualität von Datenverarbeitungsergebnissen
vorgestellt. Darüber hinaus werden Methoden zur Datenqualitätsverbesserung entwickelt, die speziell auf die Anforderungen der Sensordatenverarbeitung angepasst sind. Die Arbeit wird durch Ansätze zur nutzerfreundlichen Datenqualitätsanfrage
und -visualisierung vervollständigt.
|
Page generated in 0.0738 seconds