Spelling suggestions: "subject:"telemetry."" "subject:"biotelemetry.""
1 |
Optimizing Distributed Tracing Overhead in a Cloud Environment with OpenTelemetryElias, Norgren January 2024 (has links)
To gain observability in distributed systems, some telemetry generation and gathering must be implemented. This is especially important when systems have layers of dependencies on other microservices. One method for observability is called distributed tracing. Distributed tracing is the act of building causal event chains between microservices, which are called traces. Finding bottlenecks and dependencies within each call chain is possible with the traces. One framework for implementing distributed tracing is OpenTelemetry. The developer must determine design choices when deploying OpenTelemetry in a Kubernetes cluster. For example, OpenTelemetry provides a collector that collects spans, which are parts of a trace from microservices. These collectors can be deployed one on each node, called a daemonset. Or it can be deployed with one for each service, called sidecars. This study compared the performance impact of the sidecar and daemonset setup to that of having no OpenTelemetry implemented. The resources analyzed were CPU usage, network usage, and RAM usage. Tests were done in a permutation of 4 different scenarios. Experiments were run on 4 and 2 nodes, as well as a balanced and unbalanced service placement setup. The experiments were run in a cloud environment using Kubernetes. The tested system was an emulation of one of Nasdaq's systems based on real data from the company. The study concluded that having OpenTelemetry added overhead / increased resource usage in all cases. Having the daemonset setup, compared to no OpenTelemetry, increased CPU usage by 46.5 %, network usage by 18.25 %, and memory usage by 47.5 % on average. Sidecar did, in most cases, perform worse than the daemonset setup in most cases and resources, especially in RAM and CPU usage.
|
2 |
Performance Overhead Of OpenTelemetry Sampling Methods In A Cloud InfrastructureKarkan, Tahir Mert January 2024 (has links)
This thesis explores the overhead of distributed tracing in OpenTelemetry, using different sampling strategies, in a cloud environment. Distributed tracing is telemetry data that allows developers to analyse causal events in a system with temporal information. This comes at the cost of overhead, in terms of CPU, memory and network usage, as the telemetry data has to be generated and sent through collectors that handle traces and at last sends them to a backend. By sampling using three different sampling strategies, head and tail based sampling and a mixture of those two, overhead can be reduced at the price of losing some information. To gain a measure of how this information loss impacts application performance, synthetic error messages are introduced in traces and used to gauge how many traces with errors the sampling strategies can detect. All three sampling strategies were compared for services that sent more and less data between nodes in Kubernetes. The experiments were also tested in a two and four nodes setup. This thesis was conducted with Nasdaq as it is of their interest to have high performing monitoring tools and their systems were analysed and emulated for relevance. The thesis concluded that tail based sampling had the highest overhead (71.33% CPU, 23.7% memory and 5.6% network average overhead compared to head based sampling) for the benefit of capturing all the errors. Head based sampling had the least overhead, except in the node that had deployed Jaeger as the backend for traces, where its higher total sampling rate added on average 12.75% CPU overhead for the four node setup compared to mixed sampling. Although, mixed sampling captured more errors. When measuring the overall time taken for the experiments, the highest impact could be observed when more requests had to be sent between nodes.
|
3 |
Adopting Observability-Driven Development for Cloud-Native Applications : Designing End-to-end Observability Pipeline using Open-source Software / Anta observerbarhetsdriven utveckling för molnbaserade applikationer : En skalbar öppen källkodspipeline och arkitekturNi, Chujie January 2023 (has links)
As cloud-native applications become more distributed, complex, and unpredictable with the adoption of microservices and other new architectural components, traditional monitoring solutions are inadequate in providing end-to-end visibility and proactively identifying deviations from expected behaviour before they become disruptive to services. In response to these challenges, observability-driven development (ODD) is proposed as a new methodology that leverages tools and practices to observe the state and detect the behaviour of systems. Unlike the leading IT giants developing their proprietary tools and platforms, some non-IT companies and smaller organizations still have difficulty adopting observability-driven development. Proprietary development demands extensive resources and manpower, while connecting to third-party platforms may compromise data security. This thesis proposed an end-to-end observability pipeline that is composed of merely open-source components. The pipeline collects and correlates metrics, logs, and traces to facilitate software development and help troubleshoot in production. The pipeline is designed to be adaptive and extensible so that companies can adopt it as the first step towards observability-driven development, and customize it to meet their specific requirements. / Molnbaserade applikationer blir alltmer distribuerade, komplexa och oförutsägbara med införandet av mikrotjänster och andra nya arkitektoniska komponenter. Detta resulterar i att traditionella övervakningslösningar blir alltmer inadekvata. De traditionella lösningarna tillhandahåller inte tillräcklig överskådlighet över dessa applikationer (end-to-end) för proaktiv identifiering av avvikelser från förväntat beteende innan de börjar påverka tjänsterna negativt. Som svar på dessa utmaningar föreslås observerbarhetsdriven utveckling (ODD) som en ny metod som utnyttjar verktyg och praxis för att observera tillståndet och upptäcka systemens beteende. Till skillnad från de ledande IT-jättarna som utvecklar sina egna verktyg och plattformar, har vissa icke-IT-företag och mindre organisationer fortfarande svårt att ta till sig observerbarhetsdriven utveckling. Egenutvecklad mjukvara kräver omfattande resurser och arbetskraft, medan anslutning till tredjepartsplattformar kan äventyra datasäkerheten. Den här avhandlingen bidrar med en end-to-end lösning som enbart baserats på öppen källkod. Pipelinen samlar in data från loggar och korrelerar dessa mätvärden för att underlätta mjukvaruutveckling och hjälpa till att felsöka i produktionen. Pipelinen är designad för att vara anpassningsbar och utvidgningsbar så att företag kan använda den som ett första steg mot observerbarhetsdriven utveckling och anpassa den för att möta deras specifika krav.
|
4 |
A Comparative Analysis of the Ingestion and Storage Performance of Log Aggregation Solutions: Elastic Stack & SigNozDuras, Robert January 2024 (has links)
As infrastructures and software grow in complexity the need to keep track of things becomes important. It is the job of log aggregation solutions to condense log data into a form that is easier to search, visualize, and analyze. There are many log aggregation solutions out there today with various pros and cons to fit the various types of data and architectures. This makes the choice of selecting a log aggregation solution an important one. This thesis analyzes two full-stack log aggregation solutions, Elastic stack and SigNoz, with the goal of evaluating how the ingestion and storage components of the two stacks perform with smaller and larger amounts of data. The evaluation of these solutions was done by ingesting log files of varying sizes into them while tracking their performance. These performance metrics were then analyzed to find similarities and differences. The thesis found that SigNoz featured a higher CPU usage on average, faster processing times, and lower memory usage. Elastic stack was found to do more processing and indexing on the data, requiring more memory and storage space to allow for more detailed searchability of the ingested data. This also meant that there was a larger storage space requirement for Elastic stack than SigNoz to store the ingested logs. The hope of this thesis is that these findings can be used to provide insight into the area and aid those choosing between the two solutions in making a more informed decision.
|
Page generated in 0.0726 seconds