• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Replacing batch-based data extraction withevent streaming with Apache Kafka : A comparative study

Axelsson, Richard January 2022 (has links)
For growing organisations that have built their data flow around a monolithic database server, anever-increasing number of applications and an ever-increasing demand for data freshness willeventually push the existing system to its limits, prompting either hardware upgrades or anupdated data architecture. Switching from an approach of full extractions of data at regularintervals to an approach where only changes are extracted, resource consumption couldpotentially be decreased, while simultaneously increasing data freshness. The objective of this thesis is to provide insights into how implementing an event streamingsetup with Apache Kafka connected to SQL Server through the Debezium source connectoraffects resource consumption on the database server. Other studies in related work have oftenbeen focused on steps further downstream in the data pipeline. This thesis can thereforecontribute to an area where more knowledge is needed. Through an empirical study done using two different setups in the same system, traditional dataextraction in batches and extraction through event streaming is measured and compared. The point of measurement is the SQL Server database from which data is extracted. Both memoryutilisation and CPU utilisation is measured, using SQL Server Profiler. Different parameters fortable sizes, volumes of data and intervals between changes are used to simulate differentscenarios. One of the takeaways of the results is that, at the same number of total changes, the size of theindividual transactions has a large impact on the resource consumption caused by eventstreaming. The study shows that an overhead cost is involved with each transaction, and also thatthe regular polling that the source connector performs causes resource consumption even inidleness. The thesis concludes that event streaming can offer reduced resource consumption on thedatabase server. However, when the source table size is small, and the number of changes large,extraction in batches is less resource-intensive.

Page generated in 0.0542 seconds