Global ETD Search

1	Secure log-management for an Apache Kafka-based data-streaming service / Säker logghantering i en Apache Kafka baserad data-streaming tjänst Kull, Hjalmar, Hujic, Mirza January 2023 (has links) This thesis aims to investigate the prospect of using Apache Kafka to manage data streams based on secrecy/classification level and separate these data streams in order to meet the requirement set by the secrecy/classification levels. Basalt AB has the responsibility of managing classified data for private and state actors, including the Swedish Armed Forces and other organizations. There is interest in a data-streaming solution that can securely stream large amounts of data while coordinating different data classifications and managing user access. This thesis work examines the viability of logically and physically separating producer data streams into categories based on the classification level of the data in an Apache Kafka cluster. Additionally, the thesis examines the viability of managing access control through the use of Access Control Lists. To protect against embedded attackers this thesis examines the viability of using Shamir Secret Sharing (SSS) algoritm to segment messages to on top of that use multi-factor authentication to ensure that messages cannot be read by a lone individual. The work seeks to contribute to the existing body of knowledge by improving the security, and ensuring the integrity of data through the application of detailed or granular user management of event-logs in an Apache Kafka cluster. This is of interest to organizations that require protection from both external and internal potential attackers. Our results indicate that Apache Kafka is an appropriate tool for data streaming secret data, we used a secret sharing algorithm to segment data and used Simple Authentication and Security Layer to build a multi-factor authentication system. Apache Kafka Computer Systems Datorsystem
2	Systém sledování změn v pasivních optických sítích / System for monitoring changes in passive optical networks Pancák, Matej January 2021 (has links) This diploma thesis describes a design and implementation of a system for monitoring events in passive optical networks, specifically in GPON networks. The main technologies used in the implementation of this system are Apache Kafka, Docker and the Python programming language. Within the created application, several filters are implemented. This filters obtain essential information from the captured frames in terms of traffic analysis on the given network. The result of the thesis is a functional system that from the captured GPON frames obtains information about the network traffic and stores them in the Apache Kafka, where the stored data is accessible for further processing. The work also provides examples of how to process the stored data, along with information about their meaning and structure.
3	Building a high throughput microscope simulator using the Apache Kafka streaming framework Lugnegård, Lovisa January 2018 (has links) Today microscopy imaging is a widely used and powerful method for investigating biological processes. The microscopes can produce large amounts of data in a short time. It is therefore impossible to analyse all the data thoroughly because of time and cost constraints. HASTE (Hierarchical Analysis of Temporal and Spatial Image Data) is a collaborative research project between Uppsala University, AstraZeneca and Vironova which addresses this specific problem. The idea is to analyse the image data in real time to make fast decisions on whether to analyse further, store or throw away the data. To facilitate the development process of this system a microscope simulator has been designed and implemented with large focus on parameters relating to data throughput. Apart from building the simulator the framework Apache Kafka has been evaluated for streaming large images. The results from this project are both a working simulator which shows a performance similar to that of the microscope and an evaluation of Apache Kafka showing that it is possible to stream image data with the framework. Data streaming Cloud computing Apache Kafka Engineering and Technology Teknik och teknologier
4	Implementering av testplattform för end-to-end streaming telemetry i nätverk Erlandsson, Niklas January 2020 (has links) Målen med denna studie är att implementera en testmiljö för streaming telemetry samt jämföra två alternativ för att möjliggöra realtidsanalys av det insamlade datat. Dessa två alternativ är Python-biblioteken PyKafka och Confluent-Kafka-Python. Bedömningskritierna för jämförselsen var dokumentation, kodmängd och minnesanvändning. Testmiljön för streaming telemetry använder en router med Cisco IOS XR programvara som skickar data till en Cisco Pipeline collector, som vidare sänder datat till ett Kafka-kluster. Jämförelsen av Python-biblioteken utfördes med språket Python. Resultaten av jämförelsen visade att båda biblioteken hade välskriven dokumentation och liten skillnad i kodmängd, dock använde Confluent-Kafka-Python mindre minne. Studien visar att streaming telemetry med realtidsanalys kan fungera bra som ett komplement till eller en ersättning av SNMP. Studien rekommenderar användning av Confluent-Kafka-Python för implementering i produktionsmiljöer med ett stort antal nätverksenheter med tanke på den lägre minnesanvändningen. / The goals of this study are to implement a test environment for streaming telemetry and compare two alternatives for analysing the collected data in realtime. The two alternatives are the Python libraries PyKafka and Conﬂuent-Kafka-Python. The comparison focused mainly on three areas, these being documentation, amount of code and memory usage. The test environment for streaming telemetry was set up with a router running IOS XR software that is sending data to a Cisco Pipeline collector, which in turn sends data to a Kafka-cluster. The comparison of the two libraries for interfacing with the cluster was made with the language Python. The results of the comparison showed that both libraries had well-written documentation and showed a negligible difference in amount of code. The memory usage was considerably lower with the Conﬂuent-Kafka-Python library. The study shows that streaming telemetry together with real-time analysis makes a good complement to or a replacement of SNMP. The study further recommends the use of Conﬂuent-Kafka-Python in real-world implementations of streaming telemetry, particularly in large networks with a large amount of devices. Streaming Telemetry Apache Kafka Python Real-time analysis Streaming Telemetry Apache Kafka Python Realtidsanalys Computer Engineering Datorteknik
5	Проектирование системы информирования клиентов : магистерская диссертация / Design of a system for informing customers Кашин, А. А., Kashin, A. A. January 2023 (has links) Целью работы является моделирование существующего процесса информирования клиентов, оптимизация этого процесса, сравнительный анализ существующих систем информирования, проектирование архитектуры собственной системы. В ходе выполнения работы был проведен сравнительный анализ брокеров сообщений, выявлены достоинства и недостатки каждого из них. Для подключения к существующей корпоративной платформе был разработан план внедрения и выполнена миграция данных в целевую систему с помощью разработанной программы-синхронизатора. / The purpose of the work is to simulate the existing process of informing customers, to conduct a comparative analysis of existing informing systems, and to design the architecture of the same type system. In the course of the work, a comparative analysis of message broker programs was carried out and the advantages and disadvantages of each were identified. In order to connect to the existing corporate platform, an implementation plan was developed and data migration to the target system was performed with the help of the developed synchronization program. АРХИТЕКТУРА МИГРАЦИЯ BPMN IDEF0 APACHE KAFKA RABBITMQ MASTER'S THESIS INFORMING CLIENTS ARCHITECTURE MIGRATION BPMN IDEF0 APACHE KAFKA RABBITMQ MESSAGE PRIORITIZATION
6	Prestandajämförelse mellan Apache Kafka och Redpanda för realtidsdataapplikationer inom Internet of Things / Performance Comparison Between Apache Kafka and Redpanda for Real-Time Data Applications in the Internet of Things Alkurdi, Yaman January 2024 (has links) Det finns en brist på oberoende forskning som jämför Redpandas kapacitet med etablerade alternativ som Apache Kafka, särskilt i IoT-sammanhang där resurseffektivitet är avgörande. Detta arbete jämför prestandan hos de två plattformarna i realtidsdataapplikationer under förhållanden som liknar de i IoT-miljöer. Genom en egenutvecklad applikation genomfördes prestandatester i en lokal containeriserad miljö för att utvärdera genomströmningshastighet och latens vid olika meddelandestorlekar och antal partitioner. Studien visar att Redpanda överträffar Kafka vid mindre meddelandestorlekar, med högre genomströmningshastighet och lägre latens, särskilt vid högre antal partitioner. Däremot utmärker sig Kafka vid större meddelandestorlekar genom att uppnå högre genomströmningshastighet, men med ökad latens. Resultaten indikerar att Redpanda är väl lämpad för IoT-applikationer som kräver snabb hantering av små meddelanden, medan Kafka är mer effektiv för scenarier som involverar större datamängder. Fynden betonar vikten av att välja rätt plattform baserat på specifika applikationsbehov, vilket bidrar med värdefulla insikter inom IoT och realtidsdatahantering. / There is a lack of independent research comparing the capacity of Redpanda to established alternatives like Apache Kafka, particularly in IoT contexts where resource efficiency is critical. This thesis compares the performance of the two platforms in real-time data applications under conditions similar to those in IoT environments. Through a custom-developed application, performance tests were conducted in a local containerized environment to evaluate throughput and latency across various message sizes and partition counts. The study finds that Redpanda outperforms Kafka with smaller message sizes, offering higher throughput and lower latency, particularly at higher partition counts. Conversely, Kafka excels with larger message sizes, achieving higher throughput but with increased latency. The results indicate that Redpanda is well-suited for IoT applications requiring rapid handling of small messages, while Kafka is more efficient for scenarios involving larger data volumes. The findings emphasize the importance of selecting the appropriate platform based on specific application needs, thus contributing valuable insights in IoT and real-time data streaming. Apache Kafka Redpanda Internet of Things Real-time data performance throughput latency Apache Kafka Redpanda Internet of Things realtidsdata prestanda genomströmningshastighet latens Information Systems
7	Collecting Information from a decentralized microservice architecture Ekbjörn, Carl, Sonesson, Daniel January 2018 (has links) As a system grows in size, it is common that it is transformed into a microservice architecture. In order to be able monitor this new architecture there is a need to collect information from the microservices. The software company IDA Infront is transitioning their product iipax to a microservice architecture and is faced with this problem. In order to solve this, they propose the use of a Message-oriented Middleware (MOM). There exists many different MOMs that are suitable to execute this task. The aim of this thesis is to determine, in terms of latency, throughput and scalability, which MOM is best suitable for this. Out of four suitable MOMs Apache Kafka and RabbitMQ are chosen for further testing and benchmarking. The tests display that RabbitMQ is able to send single infrequent messages (latency) faster than Kafka. But it is also shown that Kafka is faster at sending a lot of messages rapidly and with an increased number of producers sending messages (throughput and scalability). However, the scalability test suggests that RabbitMQ possibly scales better with a larger amount of microservices, thus more testing is needed to get a definite conclusion. Message-oriented middleware Apache kafka RabbitMQ microservices benchmark Computer and Information Sciences Data- och informationsvetenskap
8	Výpočetní úlohy pro řešení paralelního zpracování dat / Computational tasks for solving parallel data processing Rexa, Denis January 2019 (has links) The goal of this diploma thesis was to create four laboratory exercises for the subject "Parallel Data Processing", where students will try on the options and capabilities of Apache Spark as a parallel computing platform. The work also includes basic setup and use of Apache Kafka technology and NoSQL Apache Cassandra database. The other two lab assignments focus on working with a Travelling Salesman Problem. The first lab was designed to demonstrate the difficulty of a task where the student will face an exponential increase in complexity. The second task consists of an optimization algorithm to solve the problem in cluster. This algorithm is subjected to performance measurements in clusters. The conclusion of the thesis contains recommendations for optimization as well as comparison of running with different number of computing devices.
9	Platforma pro sběr kryptoměnových adres / Platform for Cryptocurrency Address Collection Bambuch, Vladislav January 2020 (has links) Cílem této práce je vytvořit platformu pro sběr a zobrazování metadat o kryptoměnových adresách z veřejného i temného webu. K dosažení tohoto cíle jsem použil technologie zpracování webu napsané v PHP. Komplikace doprovázející automatické zpracování webových stránek byly vyřešeny techonologí Apache Kafka a jejími schopnosti škálování procesů. Modularita platformy byla dosažena pomocí architektury microservices a Docker containerization. Práce umožňuje jedinečný způsob, jak hledat potenciální kriminální aktivity, které se odehrály mimo rámec blockchain, pomocí webové aplikace pro správu platformy a vyhledávání v extrahovaných datech. Vytvořená platforma zjednodušuje přidávání nových, na sobě nezávislých modulů, kde Apache Kafka zprostředkovává komunikaci mezi nimi. Výsledek této práce může být použit pro detekci a prevenci kybernetické kriminality. Uživatelé tohoto systému mohou být orgány činné v trestním řízení nebo ostatní činitelé a uživatelé, zajímající se o reputaci a kreditibilitu kryptoměnových adres.
10	Parallel Kafka Producer Applications : Their performance and its limitations Sundbom, Arvid January 2023 (has links) "This paper examines multi-threaded Kafka producer applications, and how the performance of such applications is affected by how the number of producer instances relates to the number of executing threads. Specifically, the performance of such applications when using a single producer instance, shared among all threads, and when each thread is allotted a separate, private instance, is compared. This comparison is carried out for a number of different producer configurations and varying levels of computational work per message produced.Overall, the data indicates that utilizing private producer instances results in highe rperformance, in terms of data throughput, than sharing a single instance among the executing threads. The magnitude of this difference is affected, to some extent, by the configuration profiles used to create the producer instances, as well as the computational workload of the application hosting the producers. Specifically, configuring producers for reliability seems to increase the difference, and so does increasing the rate at which messages are to be produced.As a result of this, Brod, a wrapper library [56], based on an implementation of a client library for Apache Kafka [25], has been developed. The purpose of the library is to provide functionality which simplifies the development of multi-threadedKafka producer applications." Distriubted Systems Parallel Systems Apache Kafka Rust OpenMP Computer Systems Datorsystem

Search results