Spelling suggestions: "subject:"clickhouse"" "subject:"brickhouse""
1 |
Assessing Query Execution Time and Implementational Complexity in Different Databases for Time Series Data / Utvärdering av frågeexekveringstid och implementeringskomplexitet i olika databaser för tidsseriedataJama Mohamud, Nuh, Söderström Broström, Mikael January 2024 (has links)
Traditional database management systems are designed for general purpose data handling, and fail to work efficiently with time-series data due to characteristics like high volume, rapid ingestion rates, and a focus on temporal relationships. However, what is a best solution is not a trivial question to answer. Hence, this thesis aims to analyze four different Database Management Systems (DBMS) to determine their suitability for managing time series data, with a specific focus on Internet of Things (IoT) applications. The DBMSs examined include PostgreSQL, TimescaleDB, ClickHouse, and InfluxDB. This thesis evaluates query performance across varying dataset sizes and time ranges, as well as the implementational complexity of each DBMS. The benchmarking results indicate that InfluxDB consistently delivers the best performance, though it involves higher implementational complexity and time consumption. ClickHouse emerges as a strong alternative with the second-best performance and the simplest implementation. The thesis also identifies potential biases in benchmarking tools and suggests that TimescaleDB's performance may have been affected by configuration errors. The findings provide significant insights into the performance metrics and implementation challenges of the selected DBMSs. Despite limitations in fully addressing the research questions, this thesis offers a valuable overview of the examined DBMSs in terms of performance and implementational complexity. These results should be considered alongside additional research when selecting a DBMS for time series data. / Traditionella databashanteringssystem är utformade för allmän datahantering och fungerar inte effektivt med tidsseriedata på grund av egenskaper som hög volym, snabba insättningshastigheter och fokus på tidsrelationer. Dock är frågan om vad som är den bästa lösningen inte trivial. Därför syftar denna avhandling till att analysera fyra olika databashanteringssystem (DBMS) för att fastställa deras lämplighet för att hantera tidsseriedata, med ett särskilt fokus på Internet of Things (IoT)-applikationer. De DBMS som undersöks inkluderar PostgreSQL, TimescaleDB, ClickHouse och InfluxDB. Denna avhandling utvärderar sökprestanda över varierande datamängder och tidsintervall, samt implementeringskomplexiteten för varje DBMS. Prestandaresultaten visar att InfluxDB konsekvent levererar den bästa prestandan, men med högre implementeringskomplexitet och tidsåtgång. ClickHouse framstår som ett starkt alternativ med näst bäst prestanda och är enklast att implementera. Studien identifierar också potentiella partiskhet i prestandaverktygen och antyder att TimescaleDB:s prestandaresultat kan ha påverkats av konfigurationsfel. Resultaten ger betydande insikter i prestandamått och implementeringsutmaningar för de utvalda DBMS. Trots begränsningarna i att fullt ut besvara forskningsfrågorna erbjuder studien en värdefull översikt. Dessa resultat bör beaktas tillsammans med ytterligare forskning vid val av ett DBMS för tidsseriedata.
|
2 |
Построение потокового захвата изменения данных для аналитических хранилищ данных : магистерская диссертация / Building a streaming data change capture system for analytical data warehousesГоликов, А. А., Golikov, A. A. January 2024 (has links)
The object of the thesis is streaming data change capture. The purpose of the work is to analyze the methods for building a streaming data change capture system and implement the best method selected during the analysis. Research methods: theoretical analysis, testing, programming. The result of the work is the successful implementation and testing of a streaming data change capture system based on Kafka Connect in conjunction with the Debezium connector for MySQL and ClickHouse Kafka Connect Sink for ClickHouse, the solved problem of the latter's limitation on processing remote records from the data source, as well as obtaining the current state of data from the source. The scope of the obtained results is data engineering and artificial intelligence. The significance of the work lies in the possibility of its practical implementation at the place of work, as well as in the flexible approach to solving the problem under the conditions of tool limitations. / Цель работы – анализ методов построения потокового захвата изменений данных и реализация лучшего метода, выбранного в ходе анализа. Методы исследования: теоретический анализ, тестирование, программирование. Результатом работы является успешная реализация и тестирование системы потокового захвата изменений данных на базе Kafka Connect в связке с коннектором Debezium для MySQL и ClickHouse Kafka Connect Sink для ClickHouse, решённая проблема ограничения последнего на обработку удалённых записей из источника данных, а также получение актуального состояния данных из источника. Область применения полученных результатов – инженерия данных и искусственного интеллекта. Значимость работы заключается в возможности её практической реализации по месту работы, а также в гибком подходе к решению поставленной задачи в условиях ограничений инструментария.
|
3 |
A Comparative Analysis of the Ingestion and Storage Performance of Log Aggregation Solutions: Elastic Stack & SigNozDuras, Robert January 2024 (has links)
As infrastructures and software grow in complexity the need to keep track of things becomes important. It is the job of log aggregation solutions to condense log data into a form that is easier to search, visualize, and analyze. There are many log aggregation solutions out there today with various pros and cons to fit the various types of data and architectures. This makes the choice of selecting a log aggregation solution an important one. This thesis analyzes two full-stack log aggregation solutions, Elastic stack and SigNoz, with the goal of evaluating how the ingestion and storage components of the two stacks perform with smaller and larger amounts of data. The evaluation of these solutions was done by ingesting log files of varying sizes into them while tracking their performance. These performance metrics were then analyzed to find similarities and differences. The thesis found that SigNoz featured a higher CPU usage on average, faster processing times, and lower memory usage. Elastic stack was found to do more processing and indexing on the data, requiring more memory and storage space to allow for more detailed searchability of the ingested data. This also meant that there was a larger storage space requirement for Elastic stack than SigNoz to store the ingested logs. The hope of this thesis is that these findings can be used to provide insight into the area and aid those choosing between the two solutions in making a more informed decision.
|
Page generated in 0.0276 seconds