Global ETD Search

61	[en] AN EFFICIENT APPROACH TO COORDINATED RECONFIGURATION IN DISTRIBUTED DATA STREAM SYSTEMS / [pt] UMA ABORDAGEM EFICIENTE PARA RECONFIGURAÇÃO COORDENADA EM SISTEMAS DISTRIBUÍDOS DE PROCESSAMENTO DE DATA STREAMS RAFAEL OLIVEIRA VASCONCELOS 24 July 2017 (has links) [pt] Ao mesmo tempo em que sistemas de processamento de fluxo de dados devem prover serviços de análise e manipulação de dados ininterruptamente (disponibilidade 24x7), eles comumente também precisam lidar com mudanças em seus ambientes de execução (e.g., alterar a topologia da rede) e nos requisitos que eles devem cumprir (e.g., adição de novas funções de processamento dos fluxos de dados). Por um lado, reconfiguração dinâmica de software (i.e., a capacidade de substituir parte do software em tempo de execução) é uma característica desejável. Por outro lado, sistemas de fluxo de dados podem sofrer com a interrupção e sobrecarga causada pela reconfiguração. Por conta da necessidade de reconfigurar (i.e., evoluir) o sistema ao mesmo tempo em que o sistema não pode ser interrompido (i.e., bloqueado), reconfiguração consistente e não bloqueante é ainda considerada um problema em aberto na literatura. Esta tese apresenta e valida uma abordagem não quiescente para reconfiguração dinâmica de software que preserva a consistência de sistemas de fluxo de dados distribuídos. A abordagem proposta permite que o sistema seja reconfigurado gradual e suavemente, sem precisar interromper o processamento do fluxo de dados ou atingir a quiescência. A avaliação indica que a abordagem proposta realiza reconfiguração distribuída consistentemente e tem um impacto desprezível sobre a diminuição na disponibilidade e no desempenho do sistema. Além disto, a implementação da abordagem proposta teve um desempenho melhor em todos os testes comparativos. / [en] While many data stream systems have to provide continuous (24x7) services with no acceptable downtime, they also have to cope with changes in their execution environments and in the requirements that they must comply (e.g., moving from on-premises architecture to a cloud system, changing the network technology, adding new functionality or modifying existing parts). On one hand, dynamic software reconfiguration (i.e., the capability of evolving on the fly) is a desirable feature. On the other hand, stream systems may suffer from the disruption and overhead caused by the reconfiguration. Due to the necessity of reconfiguring (i.e., evolving) the system whilst the system must not be disrupted (i.e., blocked), consistent and non-disruptive reconfiguration is still considered an open problem. This thesis presents and validates a non-quiescent approach for dynamic software reconfiguration that preserves the consistency of distributed data stream processing systems. Unlike many works that require the system to reach a safe state (e.g., quiescence) before performing a reconfiguration, the proposed approach enables the system to smoothly evolve (i.e., be reconfigured) in a non-disruptive way without reaching quiescence. The evaluation indicates that the proposed approach supports consistent distributed reconfiguration and has negligible impact on availability and performance. Furthermore, the implementation of the proposed approach showed better performance results in all experiments than the quiescent approach and Upstart. [pt] ADAPTABILIDADE [en] ADAPTABILITY [pt] COMUNICACAO MOVEL [en] MOBILE COMMUNICATION [pt] RECONFIGURACAO DINAMICA [en] DYNAMIC RECONFIGURATION [pt] PROCESSAMENTO DE FLUXO DE DADOS [en] DATA STREAM PROCESSING [pt] ADAPTACAO DE SOFTWARE [en] SOFTWARE ADAPTATION
62	[en] DG2CEP: AN ON-LINE ALGORITHM FOR REAL-TIME DETECTION OF SPATIAL CLUSTERS FROM LARGE DATA STREAMS THROUGH COMPLEX EVENT PROCESSING / [pt] DG2CEP: UM ALGORITMO ON-LINE PARA DETECÇÃO EM TEMPO REAL DE AGLOMERADOS ESPACIAIS EM GRANDES FLUXOS DE DADOS ATRAVÉS DE PROCESSAMENTO DE FLUXO DE DADOS MARCOS PAULINO RORIZ JUNIOR 08 June 2017 (has links) [pt] Clusters (ou concentrações) de objetos móveis, como veículos e seres humanos, é um padrão de mobilidade relevante para muitas aplicações. Uma detecção rápida deste padrão e de sua evolução, por exemplo, se o cluster está encolhendo ou crescendo, é útil em vários cenários, como detectar a formação de engarrafamentos ou detectar uma rápida dispersão de pessoas em um show de música. A detecção on-line deste padrão é uma tarefa desafiadora porque requer algoritmos que sejam capazes de processar de forma contínua e eficiente o alto volume de dados enviados pelos objetos móveis em tempo hábil. Atualmente, a maioria das abordagens para a detecção destes clusters operam em lote. As localizações dos objetos móveis são armazenadas durante um determinado período e depois processadas em lote por uma rotina externa, atrasando o resultado da detecção do cluster até o final do período ou do próximo lote. Além disso, essas abordagem utilizam extensivamente estruturas de dados e operadores espaciais, o que pode ser problemático em cenários de grande fluxos de dados. Com intuito de abordar estes problemas, propomos nesta tese o DG2CEP, um algoritmo que combina o conhecido algoritmo de aglomeração por densidade (DBSCAN) com o paradigma de processamento de fluxos de dados (Complex Event Processing) para a detecção contínua e rápida dos aglomerados. Nossos experimentos com dados reais indicam que o DG2CEP é capaz de detectar a formação e dispersão de clusters rapidamente, em menos de alguns segundos, para milhares de objetos móveis. Além disso, os resultados obtidos indicam que o DG2CEP possui maior similaridade com DBSCAN do que abordagens baseadas em lote. / [en] Spatial concentrations (or spatial clusters) of moving objects, such as vehicles and humans, is a mobility pattern that is relevant to many applications. A fast detection of this pattern and its evolution, e.g., if the cluster is shrinking or growing, is useful in numerous scenarios, such as detecting the formation of traffic jams or detecting a fast dispersion of people in a music concert. An on-line detection of this pattern is a challenging task because it requires algorithms that are capable of continuously and efficiently processing the high volume of position updates in a timely manner. Currently, the majority of approaches for spatial cluster detection operate in batch mode, where moving objects location updates are recorded during time periods of certain length and then batch-processed by an external routine, thus delaying the result of the cluster detection until the end of the time period. Further, they extensively use spatial data structures and operators, which can be troublesome to maintain or parallelize in on-line scenarios. To address these issues, in this thesis we propose DG2CEP, an algorithm that combines the well-known density-based clustering algorithm DBSCAN with the data stream processing paradigm Complex Event Processing (CEP) to achieve continuous and timely detection of spatial clusters. Our experiments with real world data streams indicate that DG2CEP is able to detect the formation and dispersion of clusters with small latency while having a higher similarity to DBSCAN than batch-based approaches. [pt] PROCESSAMENTO DE FLUXO DE DADOS [en] DATA STREAM PROCESSING [pt] AGLOMERACAO ESPACIAL [pt] AGLOMERACAO EM FLUXO DE DADOS [pt] AGLOMERACAO EM TEMPO REAL [pt] DETECCAO ON-LINE DE AGLOMERADOS
63	Parallel algorithms and data structures for interactive applications / Algoritmos Paralelos e Estruturas de Dados para Aplicações Interativas / Algorithmes et Structures de Données Parallèles pour Applications Interactives Toss, Julio January 2017 (has links) La quête de performance a été une constante à travers l’histoire des systèmes informatiques. Il y a plus d’une décennie maintenant, le modèle de traitement séquentiel montrait ses premiers signes d’épuisement pour satisfaire les exigences de performance. Les barrières du calcul séquentiel ont poussé à un changement de paradigme et ont établi le traitement parallèle comme standard dans les systèmes informatiques modernes. Avec l’adoption généralisée d’ordinateurs parallèles, de nombreux algorithmes et applications ont été développés pour s’adapter à ces nouvelles architectures. Cependant, dans des applications non conventionnelles, avec des exigences d’interactivité et de temps réel, la parallélisation efficace est encore un défi majeur. L’exigence de performance en temps réel apparaît, par exemple, dans les simulations interactives où le système doit prendre en compte l’entrée de l’utilisateur dans une itération de calcul de la boucle de simulation. Le même type de contrainte apparaît dans les applications d’analyse de données en continu. Par exemple, lorsque des donnes issues de capteurs de trafic ou de messages de réseaux sociaux sont produites en flux continu, le système d’analyse doit être capable de traiter ces données à la volée rapidement sur ce flux tout en conservant un budget de mémoire contrôlé La caractéristique dynamique des données soulève plusieurs problèmes de performance tel que la décomposition du problème pour le traitement en parallèle et la maintenance de la localité mémoire pour une utilisation efficace du cache. Les optimisations classiques qui reposent sur des modèles pré-calculés ou sur l’indexation statique des données ne conduisent pas aux performances souhaitées. Dans cette thèse, nous abordons les problèmes dépendants de données sur deux applications différentes : la première dans le domaine de la simulation physique interactive et la seconde sur l’analyse des données en continu. Pour le problème de simulation, nous présentons un algorithme GPU parallèle pour calculer les multiples plus courts chemins et des diagrammes de Voronoi sur un graphe en forme de grille. Pour le problème d’analyse de données en continu, nous présentons une structure de données parallélisable, basée sur des Packed Memory Arrays, pour indexer des données dynamiques géo-référencées tout en conservant une bonne localité de mémoire. / A busca por desempenho tem sido uma constante na história dos sistemas computacionais. Ha mais de uma década, o modelo de processamento sequencial já mostrava seus primeiro sinais de exaustão pare suprir a crescente exigência por performance. Houveram "barreiras"para a computação sequencial que levaram a uma mudança de paradigma e estabeleceram o processamento paralelo como padrão nos sistemas computacionais modernos. Com a adoção generalizada de computadores paralelos, novos algoritmos foram desenvolvidos e aplicações reprojetadas para se adequar às características dessas novas arquiteturas. No entanto, em aplicações menos convencionais, com características de interatividade e tempo real, alcançar paralelizações eficientes ainda representa um grande desafio. O requisito por desempenho de tempo real apresenta-se, por exemplo, em simulações interativas onde o sistema deve ser capaz de reagir às entradas do usuário dentro do tempo de uma iteração da simulação. O mesmo tipo de exigência aparece em aplicações de monitoramento de fluxos contínuos de dados (streams). Por exemplo, quando dados provenientes de sensores de tráfego ou postagens em redes sociais são produzidos em fluxo contínuo, o sistema de análise on-line deve ser capaz de processar essas informações em tempo real e ao mesmo tempo manter um consumo de memória controlada A natureza dinâmica desses dados traz diversos problemas de performance, tais como a decomposição do problema para processamento em paralelo e a manutenção da localidade de dados para uma utilização eficiente da memória cache. As estratégias de otimização tradicionais, que dependem de modelos pré-computados ou de índices estáticos sobre os dados, não atendem às exigências de performance necessárias nesses cenários. Nesta tese, abordamos os problemas dependentes de dados em dois contextos diferentes: um na área de simulações baseada em física e outro em análise de dados em fluxo contínuo. Para o problema de simulação, apresentamos um algoritmo paralelo, em GPU, para computar múltiplos caminhos mínimos e diagramas de Voronoi em um grafo com topologia de grade. Para o problema de análise de fluxos de dados, apresentamos uma estrutura de dados paralelizável, baseada em Packed Memory Arrays, para indexar dados dinâmicos geo-localizados ao passo que mantém uma boa localidade de memória. / The quest for performance has been a constant through the history of computing systems. It has been more than a decade now since the sequential processing model had shown its first signs of exhaustion to keep performance improvements. Walls to the sequential computation pushed a paradigm shift and established the parallel processing as the standard in modern computing systems. With the widespread adoption of parallel computers, many algorithms and applications have been ported to fit these new architectures. However, in unconventional applications, with interactivity and real-time requirements, achieving efficient parallelizations is still a major challenge. Real-time performance requirement shows up, for instance, in user-interactive simulations where the system must be able to react to the user’s input within a computation time-step of the simulation loop. The same kind of constraint appears in streaming data monitoring applications. For instance, when an external source of data, such as traffic sensors or social media posts, provides a continuous flow of information to be consumed by an online analysis system. The consumer system has to keep a controlled memory budget and deliver a fast processed information about the stream Common optimizations relying on pre-computed models or static index of data are not possible in these highly dynamic scenarios. The dynamic nature of the data brings up several performance issues originated from the problem decomposition for parallel processing and from the data locality maintenance for efficient cache utilization. In this thesis we address data-dependent problems on two different applications: one on physically based simulations and another on streaming data analysis. To deal with the simulation problem, we present a parallel GPU algorithm for computing multiple shortest paths and Voronoi diagrams on a grid-like graph. Our contribution to the streaming data analysis problem is a parallelizable data structure, based on packed memory arrays, for indexing dynamic geo-located data while keeping good memory locality. Algorithmes parallèles Localité de données Traitement de flux de données Traitement en temps réel Simulation physique Processamento : Imagens Algoritmos paralelos Parallel processing Data locality Stream processing Real-time processing Physically based simulation
64	Unsupervised Spatio-Temporal Activity Learning and Recognition in a Stream Processing Framework / Oövervakad maskininlärning och klassificering av spatio-temporala aktiviteter i ett ström-baserat ramverk Tiger, Mattias January 2014 (has links) Learning to recognize and predict common activities, performed by objects and observed by sensors, is an important and challenging problem related both to artificial intelligence and robotics.In this thesis, the general problem of dynamic adaptive situation awareness is considered and we argue for the need for an on-line bottom-up approach.A candidate for a bottom layer is proposed, which we consider to be capable of future extensions that can bring us closer towards the goal.We present a novel approach to adaptive activity learning, where a mapping between raw data and primitive activity concepts are learned and continuously improved on-line and unsupervised. The approach takes streams of observations of objects as input and learns a probabilistic representation of both the observed spatio-temporal activities and their causal relations. The dynamics of the activities are modeled using sparse Gaussian processes and their causal relations using probabilistic graphs.The learned model supports both estimating the most likely activity and predicting the most likely future (and past) activities. Methods and ideas from a wide range of previous work are combined to provide a uniform and efficient way to handle a variety of common problems related to learning, classifying and predicting activities.The framework is evaluated both by learning activities in a simulated traffic monitoring application and by learning the flight patterns of an internally developed autonomous quadcopter system. The conclusion is that our framework is capable of learning the observed activities in real-time with good accuracy.We see this work as a step towards unsupervised learning of activities for robotic systems to adapt to new circumstances autonomously and to learn new activities on the fly that can be detected and predicted immediately. / Att lära sig känna igen och förutsäga vanliga aktiviteter genom att analysera sensordata från observerade objekt är ett viktigt och utmanande problem relaterat både till artificiell intelligens och robotik. I det här exjobbet studerar vi det generella problemet rörande adaptiv situationsmedvetenhet, och vi argumenterar för behovet av ett angreppssätt som arbetar on-line (direkt på ny data) och från botten upp. Vi föreslår en möjlig lösning som vi anser bereder väg för framtida utökningar som kan ta oss närmare detta mål. Vi presenterar en ny metod för adaptiv aktivitetsinlärning, där en mappning mellan rå-data och grundläggande aktivitetskoncept, samt deras kausala relationer, lärs och är kontinuerligt förfinade utan behov av övervakning. Tillvägagångssättet bygger på användandet av strömmar av observationer av objekt, och inlärning sker av en statistik representation för både de observerade spatio-temporala aktiviteterna och deras kausala relationer. Aktiviteternas dynamik modelleras med hjälp av glesa Gaussiska processer och för att modellera aktiviteternas kausala samband används probabilistiska grafer. Givet observationer av ett objekt så stödjer de inlärda modellerna både skattning av den troligaste aktiviteten och förutsägelser av de mest troliga framtida (och dåtida) aktiviteterna utförda. Metoder och idéer från en rad olika tidigare arbeten kombineras på ett sätt som möjliggör ett enhetligt och effektivt sätt att hantera en mängd vanliga problem relaterade till inlärning, klassificering och förutsägelser av aktiviteter. Ramverket är utvärderat genom att dels inlärning av aktiviteter i en simulerad trafikövervakningsapplikation och dels genom inlärning av flygmönster hos ett internt utvecklad quadrocoptersystem. Slutsatsen är att vårt ramverk klarar av att lära sig de observerade aktivisterna i realtid med god noggrannhet. Vi ser detta arbete som ett steg mot oövervakad inlärning av aktiviteter för robotsystem, så att dessa kan anpassa sig till nya förhållanden autonomt och lära sig nya aktiviteter direkt och som då dessutom kan börja detekteras och förutsägas omedelbart. Activity learning Activity recognition Activity prediction Unsupervised On-line learning Artificial Intelligence Spatio-temporal Stream processing Sparse Gaussian process Computer Sciences Datavetenskap (datalogi)
65	Performance Optimizations and Operator Semantics for Streaming Data Flow Programs Sax, Matthias J. 01 July 2020 (has links) Unternehmen sammeln mehr Daten als je zuvor und müssen auf diese Informationen zeitnah reagieren. Relationale Datenbanken eignen sich nicht für die latenzfreie Verarbeitung dieser oft unstrukturierten Daten. Um diesen Anforderungen zu begegnen, haben sich in der Datenbankforschung seit dem Anfang der 2000er Jahre zwei neue Forschungsrichtungen etabliert: skalierbare Verarbeitung unstrukturierter Daten und latenzfreie Datenstromverarbeitung. Skalierbare Verarbeitung unstrukturierter Daten, auch bekannt unter dem Begriff "Big Data"-Verarbeitung, hat in der Industrie schnell Einzug erhalten. Gleichzeitig wurden in der Forschung Systeme zur latenzfreien Datenstromverarbeitung entwickelt, die auf eine verteilte Architektur, Skalierbarkeit und datenparallele Verarbeitung setzen. Obwohl diese Systeme in der Industrie vermehrt zum Einsatz kommen, gibt es immer noch große Herausforderungen im praktischen Einsatz. Diese Dissertation verfolgt zwei Hauptziele: Zuerst wird das Laufzeitverhalten von hochskalierbaren datenparallelen Datenstromverarbeitungssystemen untersucht. Im zweiten Hauptteil wird das "Dual Streaming Model" eingeführt, das eine Semantik zur gleichzeitigen Verarbeitung von Datenströmen und Tabellen beschreibt. Das Ziel unserer Untersuchung ist ein besseres Verständnis über das Laufzeitverhalten dieser Systeme zu erhalten und dieses Wissen zu nutzen um Anfragen automatisch ausreichende Rechenkapazität zuzuweisen. Dazu werden ein Kostenmodell und darauf aufbauende Optimierungsalgorithmen für Datenstromanfragen eingeführt, die Datengruppierung und Datenparallelität einbeziehen. Das vorgestellte Datenstromverarbeitungsmodell beschreibt das Ergebnis eines Operators als kontinuierlichen Strom von Veränderugen auf einer Ergebnistabelle. Dabei behandelt unser Modell die Diskrepanz der physikalischen und logischen Ordnung von Datenelementen inhärent und erreicht damit eine deterministische Semantik und eine minimale Verarbeitungslatenz. / Modern companies are able to collect more data and require insights from it faster than ever before. Relational databases do not meet the requirements for processing the often unstructured data sets with reasonable performance. The database research community started to address these trends in the early 2000s. Two new research directions have attracted major interest since: large-scale non-relational data processing as well as low-latency data stream processing. Large-scale non-relational data processing, commonly known as "Big Data" processing, was quickly adopted in the industry. In parallel, low latency data stream processing was mainly driven by the research community developing new systems that embrace a distributed architecture, scalability, and exploits data parallelism. While these systems have gained more and more attention in the industry, there are still major challenges to operate them at large scale. The goal of this dissertation is two-fold: First, to investigate runtime characteristics of large scale data-parallel distributed streaming systems. And second, to propose the "Dual Streaming Model" to express semantics of continuous queries over data streams and tables. Our goal is to improve the understanding of system and query runtime behavior with the aim to provision queries automatically. We introduce a cost model for streaming data flow programs taking into account the two techniques of record batching and data parallelization. Additionally, we introduce optimization algorithms that leverage our model for cost-based query provisioning. The proposed Dual Streaming Model expresses the result of a streaming operator as a stream of successive updates to a result table, inducing a duality between streams and tables. Our model handles the inconsistency of the logical and the physical order of records within a data stream natively, which allows for deterministic semantics as well as low latency query execution. Datenstromverarbeitung Datenflussprogram Parallelität Optimierung Verarbeitungssemantik Data Stream Processing Data Flow Program Parallelization Optimization Processing Semantics 004 Informatik ST 265 ddc:004
66	Datenqualität in Sensordatenströmen Klein, Anja 19 June 2009 (has links) Die stetige Entwicklung intelligenter Sensorsysteme erlaubt die Automatisierung und Verbesserung komplexer Prozess- und Geschäftsentscheidungen in vielfältigen Anwendungsszenarien. Sensoren können zum Beispiel zur Bestimmung optimaler Wartungstermine oder zur Steuerung von Produktionslinien genutzt werden. Ein grundlegendes Problem bereitet dabei die Sensordatenqualität, die durch Umwelteinflüsse und Sensorausfälle beschränkt wird. Ziel der vorliegenden Arbeit ist die Entwicklung eines Datenqualitätsmodells, das Anwendungen und Datenkonsumenten Qualitätsinformationen für eine umfassende Bewertung unsicherer Sensordaten zur Verfügung stellt. Neben Datenstrukturen zur effizienten Datenqualitätsverwaltung in Datenströmen und Datenbanken wird eine umfassende Datenqualitätsalgebra zur Berechnung der Qualität von Datenverarbeitungsergebnissen vorgestellt. Darüber hinaus werden Methoden zur Datenqualitätsverbesserung entwickelt, die speziell auf die Anforderungen der Sensordatenverarbeitung angepasst sind. Die Arbeit wird durch Ansätze zur nutzerfreundlichen Datenqualitätsanfrage und -visualisierung vervollständigt. info:eu-repo/classification/ddc/004 ddc:004
67	Queryable Workflows: Extending Dataflow Streaming with Dynamic Request/Reply Communication / Arbetsflöden som kan efterfrågas: Utökning av dataflödesströmning med dynamisk begäran/återkopplingskommunikation Huang, Chengyang January 2023 (has links) Stream processing systems have been widely adopted in applications such as recommendation systems, anomaly detection, and system monitoring due to their real-time capabilities. Improving observability in stream processing systems can further expand their application scenarios, including the implementation of stateful serverless applications. Stateful serverless applications are an emerging model in serverless computing that focuses on addressing the challenges of state management, enabling developers to build distributed applications in a simpler way. One possible implementation of stateful serverless applications is based on stream processing engines. However, the current approaches for observability in stream processing engines suffer from issues such as efficiency, consistency, and functionality, resulting in limited practical use cases. To address these challenges, we propose Queryable Workflow, an extension to stream processing engines. This extension allows users to access or modify the state within stream processing engines with transactional semantics using a SQL interface, enabling use cases such as ad-hoc querying, serializable updates, or even stateful serverless applications. We implemented our system on stream processing engines such as Portals and Apache Flink, and evaluated their performance. The result showed that our system has achieved 4.33x throughput improvement and 30% latency reduction compared to a baseline implemented with Apache Flink and Apache Kafka. With hand-crafted optimizations, our system achieved to process over 29,000 queries per second with a 99th percentile latency of 8.58 ms under a single-threaded runtime. Our proposed system provides a viable option for implementing stateful serverless applications that require transactional guarantees, while also expanding the potential application scenarios for stream processing engines. / Strömbehandlingssystem har på grund av sina realtidsegenskaper fått stor spridning i tillämpningar som rekommendationssystem, anomalidetektering och systemövervakning. Förbättrad observerbarhet i stream processing-system kan ytterligare utöka deras tillämpningsscenarier, inklusive implementeringen av stateful serverless-applikationer. Stateful serverless-applikationer är en framväxande modell inom serverless computing som fokuserar på att hantera utmaningarna med tillståndshantering, vilket gör det möjligt för utvecklare att bygga distribuerade applikationer på ett enklare sätt. En möjlig implementering av stateful serverless-applikationer är baserad på stream processing-motorer. De nuvarande metoderna för observerbarhet i strömbehandlingsmotorer lider dock av problem som effektivitet, konsistens och funktionalitet, vilket resulterar i begränsade praktiska användningsfall. För att ta itu med dessa utmaningar föreslog vi Queryable Workflow, ett tillägg till stream processing-motorer. Med detta tillägg kan användare komma åt eller ändra tillståndet i strömbehandlingsmotorer med transaktionssemantik med hjälp av ett SQL-gränssnitt, vilket möjliggör användningsfall som ad hoc-förfrågningar, serialiserbara uppdateringar eller till och med serverlösa applikationer med tillstånd. Vi implementerade vårt system på stream processing-motorer som Portals och Apache Flink, och utvärderade deras prestanda. Resultatet visade att vårt system har förbättrat genomströmningen 4,33 gånger och minskat latensen med 30% jämfört med en baslinje som implementerats med Apache Flink och Apache Kafka. Med handgjorda optimeringar lyckades vårt system bearbeta över 29 000 frågor per sekund med en 99:e percentil latens på 8,58 ms under en enkeltrådad körtid. Vårt föreslagna system har gett ett hållbart alternativ för att implementera stateful serverless-applikationer som kräver transaktionsgarantier, samtidigt som det också utökat de potentiella applikationsscenarierna för stream processing-motorer. Stream Processing Observability SQL Query Engine Stateful Serverless Searbetning av Strömmar Observabilitet SQL-förfrågningsmotor Stateful Serverless Computer and Information Sciences Data- och informationsvetenskap
68	Laufzeitadaption von zustandsbehafteten Datenstromoperatoren Wolf, Bernhard 10 December 2012 (has links) Änderungen von Datenstromanfragen zur Laufzeit werden insbesondere durch zustandsbehaftete Datenstromoperatoren erschwert. Da die Zustände im Arbeitsspeicher abgelegt sind und bei einem Neustart verloren gehen, wurden in der Vergangenheit Migrationsverfahren entwickelt, um die inneren Operatorzustände bei einem Änderungsvorgang zu erhalten. Die Migrationsverfahren basieren auf zwei unterschiedlichen Ansätzen - Zustandstransfer und Parallelausführung - sind jedoch aufgrund ihrer Realisierung auf eine zentrale Ausführung beschränkt. Mit wachsenden Anforderungen in Bezug auf Datenmengen und Antwortzeiten werden Datenstromsysteme vermehrt verteilt ausgeführt, beispielsweise durch Sensornetze oder verteilte IT-Systeme. Zur Anpassung der Anfragen zur Laufzeit sind existierende Migrationsstrategien nicht oder nur bedingt geeignet. Diese Arbeit leistet einen Beitrag zur Lösung dieser Problematik und zur Optimierung der Migration in Datenstromsystemen. Am Beispiel von präventiven Instandhaltungsstrategien in Fabrikumgebungen werden Anforderungen für die Datenstromverarbeitung und insbesondere für die Migration abgeleitet. Das generelle Ziel ist demnach eine möglichst schnelle Migration bei gleichzeitiger Ergebnisausgabe. In einer detaillierten Analyse der existierenden Migrationsstrategien werden deren Stärken und Schwächen bezüglich der gestellten Anforderungen diskutiert. Für die Adaption von laufenden Datenstromanfragen wird eine allgemeine Methodik vorgestellt, welche als Basis für die neuen Strategien dient. Diese Adaptionsmethodik unterstützt zwei Verfahren zur Bestimmung von Migrationskonfigurationen - ein numerisches Verfahren für periodische Datenströme und ein heuristisches Verfahren, welches auch auf aperiodische Datenströme angewendet werden kann. Eine wesentliche Funktionalität zur Minimierung der Migrationsdauer ist dabei die Beschränkung auf notwendige Zustandswerte, da in verteilten Umgebungen eine Übertragungszeit für den Zustandstransfer veranschlagt werden muss - zwei Aspekte, die bei existierenden Verfahren nicht berücksichtigt werden. Durch die Verwendung von neu entwickelten Zustandstransfermethoden kann zudem die Übertragungsreihenfolge der einzelnen Zustandswerte beeinflusst werden. Die Konzepte wurden in einem OSGi-basierten Prototyp implementiert und zudem simulativ analysiert. Mit einer umfassenden Evaluierung wird die Funktionsfähigkeit aller Komponenten und Konzepte demonstriert. Der Performance-Vergleich zwischen den existierenden und den neuen Migrationsstrategien fällt deutlich zu Gunsten der neuen Strategien aus, die zudem in der Lage sind, alle Anforderungen zu erfüllen. info:eu-repo/classification/ddc/004 ddc:004 Datenstrom; Datenbank
69	INVESTIGATORY ANALYSIS OF BIG DATA’S ROLE AND IMPACT ON LOCAL ORGANIZATIONS, INSTITUTIONS, AND BUSINESSES’ DECISION-MAKING AND DAY-TO-DAY OPERATIONS Markle, Scott Timothy 30 March 2023 (has links) No description available. Computer Science web scraping comparative analysis Big Data survey stream processing batch processing hesitancies and obstructions industry utilization collegiate supplement ParseHub Simplescraper
70	[pt] CEP DISTRIBUÍDO PARA AQUISIÇÃO E PROCESSAMENTO DE INFORMAÇÃO ADAPTATIVOS CIENTES DE CONTEXTO / [en] DISTRIBUTED CEP FOR CONTEXT-AWARE ADAPTIVE ACQUIREMENT AND PROCESSING OF INFORMATION FERNANDO BENEDITO VERAS MAGALHAES 07 June 2021 (has links) [pt] A disseminação atual da IoT aumenta a implantação de soluções de processamento de fluxo de dados para monitorar e controlar elementos do mundo real. Uma dessas soluções é o Processamento de Eventos Complexos (CEP). Inicialmente, um único computador ou cluster concentraria toda a execução do CEP. No entanto, a execução centralizada do CEP não é ideal para lidar com o alto volume, velocidade e volatilidade dos fluxos de dados dos sensores IoT. Em vez disso, as aplicações CEP devem criar e decentralizar o processamento de eventos CEP, de preferência tendo agentes CEP na nuvem e em dispositivos na borda. Além disso, tão importante quanto a descentralização, é decidir como o processamento será dividido entre esses dispositivos. Dito isso, estar ciente do contexto atual de cada dispositivo, por exemplo, sua localização e sensores disponíveis, pode ajudar a coletar e (parcialmente) processar os dados em dispositivos próximos ao local onde os dados foram produzidos. Este trabalho apresenta uma plataforma de CEP distribuído com ciência de contexto chamada Global CEP Manager (GCM). GCM é um serviço do middleware ContextNet que oferece suporte à implantação e ao rearranjo dinâmico de consultas CEP baseados em contexto para motores CEP em execução na nuvem, em dispositivos na borda estacionários e M-Hubs, que são dispositivos na borda móveis do ContextNet. O GCM usa o ContextMatcher, que também faz parte deste trabalho. ContextMatcher é um módulo para aplicações ContextNet que permite a entrega de mensagens para nós cujo contexto esteja de compatível com um determinado conjunto de características contextuais. / [en] The current dissemination of IoT increases the deployment of stream processing solutions for monitoring and controlling elements of the real world. One of those solutions is Complex Event Processing (CEP). Initially, a single computer/cluster would concentrate all the CEP execution. However, a centralized execution of CEP is not suitable for coping with the high volume, velocity, and volatility of IoT sensors’ data streams. Instead, applications using CEP should deploy a distributed CEP Event Processing Network, preferably having CEP agents both in the cloud and at edge devices. Also, deciding the arrangement used to split the processing among these tiers and their devices can be just as important. That said, being aware of each of the devices current context, for instance, their location and available sensors, can help to collect and (partially) process the data on devices close to the data s production site. This work presents a contextaware distributed CEP platform called Global CEP Manager (GCM). GCM is a service of the ContextNet middleware that supports the context-based deployment, and dynamic rearrangement of CEP queries to CEP engines executing in the cloud, stationary edge devices, and M-Hubs, which are ContextNet s mobile edge devices. GCM uses the ContextMatcher, which is also part of this work. ContextMatcher is a module for ContextNet applications that enables the delivery of messages for nodes that match a specified set of contextual requirements. [pt] PROCESSAMENTO DE EVENTOS COMPLEXOS [pt] PROCESSAMENTO DE FLUXOS DISTRIBUIDO [pt] CIENCIA DE CONTEXTO [pt] INTERNET DAS COISAS [en] COMPLEX EVENT PROCESSING [en] DISTRIBUTED STREAM PROCESSING [en] CONTEXT-AWARENESS [en] INTERNET OF THINGS

Search results