• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 14
  • 14
  • 7
  • 6
  • 6
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Secure log-management for an Apache Kafka-based data-streaming service / Säker logghantering i en Apache Kafka baserad data-streaming tjänst

Kull, Hjalmar, Hujic, Mirza January 2023 (has links)
This thesis aims to investigate the prospect of using Apache Kafka to manage data streams based on secrecy/classification level and separate these data streams in order to meet the requirement set by the secrecy/classification levels. Basalt AB has the responsibility of managing classified data for private and state actors, including the Swedish Armed Forces and other organizations. There is interest in a data-streaming solution that can securely stream large amounts of data while coordinating different data classifications and managing user access. This thesis work examines the viability of logically and physically separating producer data streams into categories based on the classification level of the data in an Apache Kafka cluster. Additionally, the thesis examines the viability of managing access control through the use of Access Control Lists. To protect against embedded attackers this thesis examines the viability of using Shamir Secret Sharing (SSS) algoritm to segment messages to on top of that use multi-factor authentication to ensure that messages cannot be read by a lone individual. The work seeks to contribute to the existing body of knowledge by improving the security, and ensuring the integrity of data through the application of detailed or granular user management of event-logs in an Apache Kafka cluster. This is of interest to organizations that require protection from both external and internal potential attackers. Our results indicate that Apache Kafka is an appropriate tool for data streaming secret data, we used a secret sharing algorithm to segment data and used Simple Authentication and Security Layer to build a multi-factor authentication system.
2

Systém sledování změn v pasivních optických sítích / System for monitoring changes in passive optical networks

Pancák, Matej January 2021 (has links)
This diploma thesis describes a design and implementation of a system for monitoring events in passive optical networks, specifically in GPON networks. The main technologies used in the implementation of this system are Apache Kafka, Docker and the Python programming language. Within the created application, several filters are implemented. This filters obtain essential information from the captured frames in terms of traffic analysis on the given network. The result of the thesis is a functional system that from the captured GPON frames obtains information about the network traffic and stores them in the Apache Kafka, where the stored data is accessible for further processing. The work also provides examples of how to process the stored data, along with information about their meaning and structure.
3

Building a high throughput microscope simulator using the Apache Kafka streaming framework

Lugnegård, Lovisa January 2018 (has links)
Today microscopy imaging is a widely used and powerful method for investigating biological processes. The microscopes can produce large amounts of data in a short time. It is therefore impossible to analyse all the data thoroughly because of time and cost constraints. HASTE (Hierarchical Analysis of Temporal and Spatial Image Data) is a collaborative research project between Uppsala University, AstraZeneca and Vironova which addresses this specific problem. The idea is to analyse the image data in real time to make fast decisions on whether to analyse further, store or throw away the data. To facilitate the development process of this system a microscope simulator has been designed and implemented with large focus on parameters relating to data throughput. Apart from building the simulator the framework Apache Kafka has been evaluated for streaming large images. The results from this project are both a working simulator which shows a performance similar to that of the microscope and an evaluation of Apache Kafka showing that it is possible to stream image data with the framework.
4

Implementering av testplattform för end-to-end streaming telemetry i nätverk

Erlandsson, Niklas January 2020 (has links)
Målen med denna studie är att implementera en testmiljö för streaming telemetry samt jämföra två alternativ för att möjliggöra realtidsanalys av det insamlade datat. Dessa två alternativ är Python-biblioteken PyKafka och Confluent-Kafka-Python. Bedömningskritierna för jämförselsen var dokumentation, kodmängd och minnesanvändning. Testmiljön för streaming telemetry använder en router med Cisco IOS XR programvara som skickar data till en Cisco Pipeline collector, som vidare sänder datat till ett Kafka-kluster. Jämförelsen av Python-biblioteken utfördes med språket Python. Resultaten av jämförelsen visade att båda biblioteken hade välskriven dokumentation och liten skillnad i kodmängd, dock använde Confluent-Kafka-Python mindre minne. Studien visar att streaming telemetry med realtidsanalys kan fungera bra som ett komplement till eller en ersättning av SNMP. Studien rekommenderar användning av Confluent-Kafka-Python för implementering i produktionsmiljöer med ett stort antal nätverksenheter med tanke på den lägre minnesanvändningen. / The goals of this study are to implement a test environment for streaming telemetry and compare two alternatives for analysing the collected data in realtime. The two alternatives are the Python libraries PyKafka and Confluent-Kafka-Python. The comparison focused mainly on three areas, these being documentation, amount of code and memory usage. The test environment for streaming telemetry was set up with a router running IOS XR software that is sending data to a Cisco Pipeline collector, which in turn sends data to a Kafka-cluster. The comparison of the two libraries for interfacing with the cluster was made with the language Python. The results of the comparison showed that both libraries had well-written documentation and showed a negligible difference in amount of code. The memory usage was considerably lower with the Confluent-Kafka-Python library. The study shows that streaming telemetry together with real-time analysis makes a good complement to or a replacement of SNMP. The study further recommends the use of Confluent-Kafka-Python in real-world implementations of streaming telemetry, particularly in large networks with a large amount of devices.
5

Проектирование системы информирования клиентов : магистерская диссертация / Design of a system for informing customers

Кашин, А. А., Kashin, A. A. January 2023 (has links)
Целью работы является моделирование существующего процесса информирования клиентов, оптимизация этого процесса, сравнительный анализ существующих систем информирования, проектирование архитектуры собственной системы. В ходе выполнения работы был проведен сравнительный анализ брокеров сообщений, выявлены достоинства и недостатки каждого из них. Для подключения к существующей корпоративной платформе был разработан план внедрения и выполнена миграция данных в целевую систему с помощью разработанной программы-синхронизатора. / The purpose of the work is to simulate the existing process of informing customers, to conduct a comparative analysis of existing informing systems, and to design the architecture of the same type system. In the course of the work, a comparative analysis of message broker programs was carried out and the advantages and disadvantages of each were identified. In order to connect to the existing corporate platform, an implementation plan was developed and data migration to the target system was performed with the help of the developed synchronization program.
6

Collecting Information from a decentralized microservice architecture

Ekbjörn, Carl, Sonesson, Daniel January 2018 (has links)
As a system grows in size, it is common that it is transformed into a microservice architecture. In order to be able monitor this new architecture there is a need to collect information from the microservices. The software company IDA Infront is transitioning their product iipax to a microservice architecture and is faced with this problem. In order to solve this, they propose the use of a Message-oriented Middleware (MOM). There exists many different MOMs that are suitable to execute this task. The aim of this thesis is to determine, in terms of latency, throughput and scalability, which MOM is best suitable for this. Out of four suitable MOMs Apache Kafka and RabbitMQ are chosen for further testing and benchmarking. The tests display that RabbitMQ is able to send single infrequent messages (latency) faster than Kafka. But it is also shown that Kafka is faster at sending a lot of messages rapidly and with an increased number of producers sending messages (throughput and scalability). However, the scalability test suggests that RabbitMQ possibly scales better with a larger amount of microservices, thus more testing is needed to get a definite conclusion.
7

Výpočetní úlohy pro řešení paralelního zpracování dat / Computational tasks for solving parallel data processing

Rexa, Denis January 2019 (has links)
The goal of this diploma thesis was to create four laboratory exercises for the subject "Parallel Data Processing", where students will try on the options and capabilities of Apache Spark as a parallel computing platform. The work also includes basic setup and use of Apache Kafka technology and NoSQL Apache Cassandra database. The other two lab assignments focus on working with a Travelling Salesman Problem. The first lab was designed to demonstrate the difficulty of a task where the student will face an exponential increase in complexity. The second task consists of an optimization algorithm to solve the problem in cluster. This algorithm is subjected to performance measurements in clusters. The conclusion of the thesis contains recommendations for optimization as well as comparison of running with different number of computing devices.
8

Platforma pro sběr kryptoměnových adres / Platform for Cryptocurrency Address Collection

Bambuch, Vladislav January 2020 (has links)
Cílem této práce je vytvořit platformu pro sběr a zobrazování metadat o kryptoměnových adresách z veřejného i temného webu. K dosažení tohoto cíle jsem použil technologie zpracování webu napsané v PHP. Komplikace doprovázející automatické zpracování webových stránek byly vyřešeny techonologí Apache Kafka a jejími schopnosti škálování procesů. Modularita platformy byla dosažena pomocí architektury microservices a Docker containerization. Práce umožňuje jedinečný způsob, jak hledat potenciální kriminální aktivity, které se odehrály mimo rámec blockchain, pomocí webové aplikace pro správu platformy a vyhledávání v extrahovaných datech. Vytvořená platforma zjednodušuje přidávání nových, na sobě nezávislých modulů, kde Apache Kafka zprostředkovává komunikaci mezi nimi. Výsledek této práce může být použit pro detekci a prevenci kybernetické kriminality. Uživatelé tohoto systému mohou být orgány činné v trestním řízení nebo ostatní činitelé a uživatelé, zajímající se o reputaci a kreditibilitu kryptoměnových adres.
9

Parallel Kafka Producer Applications : Their performance and its limitations

Sundbom, Arvid January 2023 (has links)
"This paper examines multi-threaded Kafka producer applications, and how the performance of such applications is affected by how the number of producer instances relates to the number of executing threads. Specifically, the performance of such applications when using a single producer instance, shared among all threads, and when each thread is allotted a separate, private instance, is compared. This comparison is carried out for a number of different producer configurations and varying levels of computational work per message produced.Overall, the data indicates that utilizing private producer instances results in highe rperformance, in terms of data throughput, than sharing a single instance among the executing threads. The magnitude of this difference is affected, to some extent, by the configuration profiles used to create the producer instances, as well as the computational workload of the application hosting the producers. Specifically, configuring producers for reliability seems to increase the difference, and so does increasing the rate at which messages are to be produced.As a result of this, Brod, a wrapper library [56], based on an implementation of a client library for Apache Kafka [25], has been developed. The purpose of the library is to provide functionality which simplifies the development of multi-threadedKafka producer applications."
10

Geo-distributed multi-layer stream aggregation

Cannalire, Pietro January 2018 (has links)
The standard processing architectures are enough to satisfy a lot of applications by employing already existing stream processing frameworks which are able to manage distributed data processing. In some specific cases, having geographically distributed data sources requires to distribute even more the processing over a large area by employing a geographically distributed architecture.‌ The issue addressed in this work is the reduction of data movement across the network which is continuously flowing in a geo-distributed architecture from streaming sources to the processing location and among processing entities within the same distributed cluster. Reduction of data movement can be critical for decreasing bandwidth costs since accessing links placed in the middle of the network can be costly and can increase as the amount of data exchanges increase. In this work we want to create a different concept to deploy geographically distributed architectures by relying on Apache Spark Structured Streaming and Apache Kafka. The features needed for an algorithm to run on a geo-distributed architecture are provided. The algorithms to be executed on this architecture apply the windowing and the data synopses techniques to produce a summaries of the input data and to address issues of the geographically distributed architecture. The computation of the average and the Misra-Gries algorithm are then implemented to test the designed architecture. This thesis work contributes in providing a new model of building geographically distributed architecture. The experimental results show that, for the algorithms running on top of the geo distributed architecture, the computation time is reduced on average by 70% compared to the distributed setup. Similarly, and the amount of data exchanged across the network is reduced on average by 99%, compared to the distributed setup. / Standardbehandlingsarkitekturer är tillräckligt för uppfylla behoven av många tillämpningar genom användning av befintliga ramverk för flödesbehandling med stöd för distribuerad databehandling. I specifika fall kan geografiskt fördelade datakällor kräva att databehandlingen fördelas över ett stort område med hjälp av en geografiskt distribuerad arkitektur. Problemet som behandlas i detta arbete är minskningen av kontinuerlig dataöverföring i ett nätverk med geo-distribuerad arkitektur. Minskad dataöverföring kan vara avgörande för minskade bandbreddskonstnader då åtkomst av länkar placerade i mitten av ett nätverk kan vara dyrt och öka ytterligare med tilltagande dataöverföring. I det här arbetet vill vi skapa ett nytt koncept för att upprätta geografiskt distribuerade arkitekturer med hjälp av Apache Spark Structured Streaming och Apache Kafka. Funktioner och förutsättningar som behövs för att en algoritm ska kunna köras på en geografisk distribuerad arkitektur tillhandahålls. Algoritmerna som ska köras på denna arkitektur tillämpar “windowing synopsing” och “data synopses”-tekniker för att framställa en sammanfattning av ingående data samt behandla problem beträffande den geografiskt fördelade arkitekturen. Beräkning av medelvärdet och Misra-Gries-algoritmen implementeras för att testa den konstruerade arkitekturen. Denna avhandling bidrar till att förse ny modell för att bygga geografiskt distribuerad arkitektur. Experimentella resultat visar att beräkningstiden reduceras i genomsnitt 70% för de algoritmer som körs ovanför den geo-distribuerade arkitekturen jämfört med den distribuerade konfigurationen. På liknande sätt reduceras mängden data som utväxlas över nätverket med 99% i snitt jämfört med den distribuerade inställningen.

Page generated in 0.0345 seconds