• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 91
  • 4
  • 2
  • 1
  • Tagged with
  • 99
  • 63
  • 47
  • 40
  • 31
  • 31
  • 28
  • 27
  • 26
  • 23
  • 23
  • 19
  • 19
  • 15
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Active Assurance in Kubernetes

Wennerström, William January 2021 (has links)
No description available.
62

Project-based Multi-tenant Container Registry For Hopsworks

Kashyap, Pradyumna Krishna January 2020 (has links)
There has been a substantial growth in the usage of data in the past decade, cloud technologies and big data platforms have gained popularity as they help in processing such data on a large scale. Hopsworks is such a managed plat- form for scale out data science. It is an open-source platform for the develop- ment and operation of Machine Learning models, available on-premise and as a managed platform in the cloud. As most of these platforms provide data sci- ence environments to collate the required libraries to work with, Hopsworks provides users with Anaconda environments.Hopsworks provides multi-tenancy, ensuring a secure model to manage sen- sitive data in the shared platform. Most of the Hopsworks features are built around projects, each project includes an Anaconda environment that provides users with a number of libraries capable of processing data. Each project cre- ation triggers a creation of a base Anaconda environment and each added li- brary updates this environment. For an on-premise application, as data science teams are diverse and work towards building repeatable and scalable models, it becomes increasingly important to manage these environments in a central location locally.The purpose of the thesis is to provide a secure storage for these Anaconda en- vironments. As Hopsworks uses a Kubernetes cluster to serve models, these environments can be containerized and stored on a secure container registry on the Kubernetes Cluster. The provided solution also aims to extend the multi- tenancy feature of Hopsworks onto the hosted local storage. The implemen- tation comprises of two parts; First one, is to host a compatible open source container registry to store the container images on a local Kubernetes cluster with fault tolerance and by avoiding a single point of failure. Second one, is to leverage the multi-tenancy feature in Hopsworks by storing the images on the self sufficient secure registry with project level isolation. / Det har skett en betydande tillväxt i dataanvändningen under det senaste decen- niet, molnteknologier och big data-plattformar har vunnit popularitet eftersom de hjälper till att bearbeta sådan data i stor skala. Hopsworks är en sådan hante- rad plattform för att skala ut datavetenskap. Det är en öppen källkodsplattform för utveckling och drift av Machine Learning-modeller, tillgänglig på plats och som en hanterad plattform i molnet. Eftersom de flesta av dessa plattformar tillhandahåller datavetenskapsmiljöer för att samla in de bibliotek som krävs för att arbeta med, ger Hopsworks användare Anaconda-miljöer.Hopsworks tillhandahåller multi-tenancy, vilket säkerställer en säker modell för att hantera känslig data i den delade plattformen. De flesta av Hopsworks- funktionerna är uppbyggda kring projekt, varje projekt innehåller en Anaconda- miljö som ger användarna ett antal bibliotek som kan bearbeta data. Varje projektskapning utlöser skapandet av en basanacondamiljö och varje tillagt bibliotek uppdaterar denna miljö. För en lokal applikation, eftersom datave- tenskapsteam är olika och arbetar för att bygga repeterbara och skalbara mo- deller, blir det allt viktigare att hantera dessa miljöer på en central plats lokalt. Syftet med avhandlingen är att tillhandahålla en säker lagring för dessa Anaconda- miljöer. Eftersom Hopsworks använder ett Kubernetes-kluster för att betjäna modeller kan dessa miljöer containeriseras och lagras i ett säkert container- register i Kubernetes-klustret. Den medföljande lösningen syftar också till att utvidga Hopsworks-funktionen för flera hyresgäster till det lokala lagrade vär- det. Implementeringen består av två delar; Den första är att vara värd för ett kompatibelt register med öppen källkod för att lagra behållaravbildningarna iett lokalt Kubernetes-kluster med feltolerans och genom att undvika en enda felpunkt. Den andra är att utnyttja multihyresfunktionen i Hopsworks genom att lagra bilderna i det självförsörjande säkra registret med projektnivåisole- ring.
63

Spark on Kubernetes using HopsFS as a backing store : Measuring performance of Spark with HopsFS for storing and retrieving shuffle files while running on Kubernetes

Saini, Shivam January 2020 (has links)
Data is a raw list of facts and details, such as numbers, words, measurements or observations that is not useful for us all by itself. Data processing is a technique that helps to process the data in order to get useful information out of it. Today, the world produces huge amounts of data that can not be processed using traditional methods. Apache Spark (Spark) is an open-source distributed general-purpose cluster computing framework for large scale data processing. In order to fulfill its task, Spark uses a cluster of machines to process the data in a parallel fashion. External shuffle service is a distributed component of Apache Spark cluster that provides resilience in case of a machine failure. A cluster manager helps spark to manage the cluster of machines and provide Spark with the required resources to run the application. Kubernetes is a new cluster manager that enables Spark to run in a containerized environment. However, running external shuffle service is not possible while running Spark using Kubernetes as the resource manager. This highly impacts the performance of Spark applications due to the failed tasks caused by machine failures. As a solution to this problem, the open source Spark community has developed a plugin that can provide the similar resiliency as provided by the external shuffle service. When used with Spark applications, the plugin asynchronously back-up the data onto an external storage. In order not to compromise the Spark application performance, it is important that the external storage provides Spark with a minimum latency. HopsFS is a next generation distribution of Hadoop Distributed Filesystem (HDFS) and provides special support to small files (<64 KB) by storing them in a NewSQL database and thus enabling it to provide lower client latencies. The thesis work shows that HopsFS provides 16% higher performance to Spark applications for small files as compared to larger ones. The work also shows that using the plugin to back-up Spark data on HopsFS can reduce the total execution time of Spark applications by 20%-30% as compared to recalculation of tasks in case of a node failure. / Data är en rå lista över fakta och detaljer, som siffror, ord, mätningar eller observationer som inte är användbara för oss alla i sig. Databehandling är en teknik som hjälper till att bearbeta data för att få användbar information ur den. Idag producerar världen enorma mängder data som inte kan bearbetas med traditionella metoder. Apache Spark (Spark) är en öppen källkod distribuerad ram för allmänt ändamål kluster dator för storskalig databehandling. För att fullgöra sin uppgift använder Spark ett kluster av maskiner för att bearbeta data på ett parallellt sätt. Extern shuffle-tjänst är en distribuerad komponent i Apache Spark-klustret som ger motståndskraft vid maskinfel. En klusterhanterare hjälper gnista att hantera kluster av maskiner och förse Spark med de resurser som krävs för att köra applikationen. Kubernetes är en ny klusterhanterare som gör att Spark kan köras i en containeriserad miljö. Det är dock inte möjligt att köra extern shuffle-tjänst när du kör Spark med Kubernetes som resurshanterare. Detta påverkar starkt prestanda för Spark-applikationer på grund av misslyckade uppgifter orsakade av maskinfel. Som en lösning på detta problem har Spark-communityn med öppen källkod utvecklat ett plugin-program som kan tillhandahålla liknande motståndskraft som tillhandahålls av den externa shuffle-tjänsten. När det används med Spark- applikationer säkerhetskopierar plugin-programmet asynkront data till en extern lagring. För att inte kompromissa med Spark-applikationsprestandan är det viktigt att det externa lagret ger Spark en minimal latens. HopsFS är en nästa generations distribution av Hadoop Distribuerat filsystem (HDFS) och ger specialstöd till små filer (<64 kB) genom att lagra dem i en NewSQL-databas och därmed möjliggöra lägre klientfördröjningar. Examensarbetet visar att HopsFS ger 16 % högre prestanda till Spark-applikationer för små filer jämfört med större. Arbetet visar också att användning av plugin för att säkerhetskopiera Spark-data på HopsFS kan minska den totala körningstiden för Spark-applikationer med 20 % - 30 % jämfört med omberäkning av uppgifter i händelse av ett nodfel.
64

Managing Microservices with a Service Mesh : An implementation of a service mesh with Kubernetes and Istio

Mara Jösch, Ronja January 2020 (has links)
The adoption of microservices facilitates extending computer systems in size, complexity, and distribution. Alongside their benefits, they introduce the possibility of partial failures. Besides focusing on the business logic, developers have to tackle cross-cutting concerns of service-to-service communication which now defines the applications' reliability and performance. Currently, developers use libraries embedded into the application code to address these concerns. However, this increases the complexity of the code and requires the maintenance and management of various libraries. The service mesh is a relatively new technology that possibly enables developers staying focused on their business logic. This thesis investigates one of the available service meshes called Istio, to identify its benefits and limitations. The main benefits found are that Istio adds resilience and security, allows features currently difficult to implement, and enables a cleaner structure and a standard implementation of features within and across teams. Drawbacks are that it decreases performance by adding CPU usage, memory usage, and latency. Furthermore, the main disadvantage of Istio is its limited testing tools. Based on the findings, the Webcore Infra team of the company can make a more informed decision whether or not Istio is to be introduced. / Tillämpningen av microservices underlättar utvidgningen av datorsystem i storlek, komplexitet och distribution. Utöver fördelarna introducerar de möjligheten till partiella misslyckanden. Förutom att fokusera på affärslogiken måste utvecklare hantera övergripande problem med kommunikation mellan olika tjänster som nu definierar applikationernas pålitlighet och prestanda. För närvarande använder utvecklare bibliotek inbäddade i programkoden för att hantera dessa problem. Detta ökar dock kodens komplexitet och kräver underhåll och hantering av olika bibliotek. Service mesh är en relativt ny teknik som kan möjliggöra för utvecklare att hålla fokus på sin affärslogik. Denna avhandling undersöker ett av de tillgängliga service mesh som kallas Istio för att identifiera dess fördelar och begränsningar. De viktigaste fördelarna som hittas är att Istio lägger till resistens och säkerhet, tillåter funktioner som för närvarande är svåra att implementera och möjliggör en renare struktur och en standardimplementering av funktioner inom och över olika team. Nackdelarna är att det minskar prestandan genom att öka CPU-användning, minnesanvändning och latens. Dessutom är Istios största nackdel dess begränsade testverktyg. Baserat på resultaten kan Webcore Infra-teamet i företaget fatta ett mer informerat beslut om Istio ska införas eller inte.
65

Constraint based network communications in a virtual environment of a proprietary hardware

Bhonagiri, Saaish, Mudugonda, Soumith Kumar January 2022 (has links)
The specialized hardware remains a key component of the mobile networks, but at the same time, the telecom industry is adapting a vision of a fully programable distributed end-to-end network with cloud style management and Software-Defined Networking. In the specialized hardware programmable network, it will be possible to place workloads across abstracted compute and networking infrastructure. But, whereas virtualization standard compute resources is a mature technology and well supported in cloud management systems such as OpenStack and Kubernetes, this is not the case for specialized hardware with more complex constraints. There is a significant gap in terms of advanced constraints and service level aware schedulers. The main objective of this thesis is that the specialised hardware needs to adapt to the features of edge computing. Edge computing provides the opportunity to explore how technologies can advance industrial processes. To achieve flexibility by choosing where the workload should be processed on the board based on available resources. Utilising this technology, highly intensive applications can be handled at the network’s edge. There is a necessity to virtualize the proprietary hardware and run workloads in VMs and containers. In this thesis, we discuss kernel bypass, PCI passthrough and MPI communication technologies in a virtual environment by considering the hardware constraints and software requirements so that these technologies can be integrated into OpenStack and Kubernetes in future.
66

A Comparison of CI/CD Tools on Kubernetes

Johansson, William January 2022 (has links)
Kubernetes is a fast emerging technological platform for developing and operating modern IT applications. The capacity to deploy new apps and change old ones at a faster rate with less chance of error is one of the key value proposition of the Kubernetes platform. A continuous integration and continuous deployment (CI/CD) pipeline is a crucial component of the technology. Such pipelines compile all updated code and do specific tests and may then automatically deploy the produced code artifacts to a running system. There is a thriving ecosystem of CI/CD tools. Tools can also be divided into two types: integrated and standalone. Integrated tools will be utilized for both pipeline phases, CI and CD. The standalone tools will be used just for one of the processes, which needs the usage of two independent programs to build up the pipeline. Some tools predate Kubernetes and may be converted to operate on Kubernetes, while others are new and designed specifically for usage with Kubernetes clusters. CD systems are classified as push-style (artifacts from outside the cluster are pushed into the cluster) or pull-style (CD tool running inside the cluster pulling built artifacts into the cluster). Pull- and push-style pipelines will have an impact on how cluster credentials are managed and if they ever need to leave the cluster. This thesis investigates the deployment time, fault tolerance, and access security of pipelines. Using a simple microservices application, a testing setup is created to measure the metrics of the pipelines. Drone, Argo Workflows, ArgoCD, and GoCD are the tools compared in this study. These tools are coupled to form various pipelines. The pipeline using Kubernetes-specific tools, Argo Workflows and ArgoCD, is the fastest, the pipeline with GoCD is somewhat slower, and the Drone pipeline is the slowest. The pipeline that used Argo Workflows and ArgoCD could also withstand failures. Theother pipelines that used Drone and GoCD were unable to recover and timed out. Pull pipelines handles the Kubernetes access differently to push pipelines as the Kubernetes cluster credentials does not have to leave the cluster, whereas push pipelines needs the cluster credentials in the external environment where the CD tool is running.
67

Optimizing Systems for Deep Learning Applications

Albahar, Hadeel Ahmad 01 March 2023 (has links)
Modern systems for Machine Learning (ML) workloads support heterogeneous workloads and resources. However, existing resource managers in these systems do not differentiate between heterogeneous GPU resources. Moreover, users are usually unaware of the sufficient and appropriate type and amount of GPU resources to request for their ML jobs. In this thesis, we analyze the performance of ML training and inference jobs and identify ML model and GPU characteristics that impact this performance. We then propose ML-based prediction models to accurately determine appropriate and sufficient resource requirements to ensure improved job latency and GPU utilization in the cluster. / Doctor of Philosophy / We daily interact with and use many software applications such as social media, e-commerce, healthcare, and finance. These applications rely on different computing systems as well as artificial intelligence to deliver users the best service and experience. In this dissertation, we present optimizations to improve the performance of these artificial intelligence applications while at the same time improving the performance and the utilization of the systems and the heterogeneous resources they run on. We propose utilizing machine learning models, that learn from historical data of application performance as well as application and resource characteristics, to predict the necessary and sufficient resource requirements for these applications to ensure the optimal performance for the application and the underlying system.
68

Optimizing Distributed Tracing Overhead in a Cloud Environment with OpenTelemetry

Elias, Norgren January 2024 (has links)
To gain observability in distributed systems, some telemetry generation and gathering must be implemented. This is especially important when systems have layers of dependencies on other microservices. One method for observability is called distributed tracing. Distributed tracing is the act of building causal event chains between microservices, which are called traces. Finding bottlenecks and dependencies within each call chain is possible with the traces. One framework for implementing distributed tracing is OpenTelemetry. The developer must determine design choices when deploying OpenTelemetry in a Kubernetes cluster. For example, OpenTelemetry provides a collector that collects spans, which are parts of a trace from microservices. These collectors can be deployed one on each node, called a daemonset. Or it can be deployed with one for each service, called sidecars. This study compared the performance impact of the sidecar and daemonset setup to that of having no OpenTelemetry implemented. The resources analyzed were CPU usage, network usage, and RAM usage. Tests were done in a permutation of 4 different scenarios. Experiments were run on 4 and 2 nodes, as well as a balanced and unbalanced service placement setup. The experiments were run in a cloud environment using Kubernetes. The tested system was an emulation of one of Nasdaq's systems based on real data from the company. The study concluded that having OpenTelemetry added overhead / increased resource usage in all cases. Having the daemonset setup, compared to no OpenTelemetry, increased CPU usage by 46.5 %, network usage by 18.25 %, and memory usage by 47.5 % on average. Sidecar did, in most cases, perform worse than the daemonset setup in most cases and resources, especially in RAM and CPU usage.
69

Performance Overhead Of OpenTelemetry Sampling Methods In A Cloud Infrastructure

Karkan, Tahir Mert January 2024 (has links)
This thesis explores the overhead of distributed tracing in OpenTelemetry, using different sampling strategies, in a cloud environment. Distributed tracing is telemetry data that allows developers to analyse causal events in a system with temporal information. This comes at the cost of overhead, in terms of CPU, memory and network usage, as the telemetry data has to be generated and sent through collectors that handle traces and at last sends them to a backend. By sampling using three different sampling strategies, head and tail based sampling and a mixture of those two, overhead can be reduced at the price of losing some information. To gain a measure of how this information loss impacts application performance, synthetic error messages are introduced in traces and used to gauge how many traces with errors the sampling strategies can detect. All three sampling strategies were compared for services that sent more and less data between nodes in Kubernetes. The experiments were also tested in a two and four nodes setup. This thesis was conducted with Nasdaq as it is of their interest to have high performing monitoring tools and their systems were analysed and emulated for relevance. The thesis concluded that tail based sampling had the highest overhead (71.33% CPU, 23.7% memory and 5.6% network average overhead compared to head based sampling) for the benefit of capturing all the errors. Head based sampling had the least overhead, except in the node that had deployed Jaeger as the backend for traces, where its higher total sampling rate added on average 12.75% CPU overhead for the four node setup compared to mixed sampling. Although, mixed sampling captured more errors. When measuring the overall time taken for the experiments, the highest impact could be observed when more requests had to be sent between nodes.
70

A Comparison Between Different Frameworks Based on Application Metrics à la Argo Rollouts

Gustaf, Söderlund January 2024 (has links)
This research investigates the integration and effectiveness of two monitoring frameworks, the Four Golden Signals and the RED Method, with Argo Rollouts for automated deployments. The study aims to identify which framework integrates better with Argo Rollouts, compare their effectiveness in automating deployment procedures, and assess the impact of automated deployments on application performance. Experiments involve fault injections, such as HTTP 500 errors and delays, to evaluate the frameworks ability to detect unhealthy deployments and trigger rollbacks. Both frameworks were successfully integrated using Prometheus for metric collection and custom analysis templates for health assessment. The Four Golden Signals provided more comprehensive insights due to its additional metrics (saturation and latency), whereas the RED Method was simpler to configure and interpret. The findings highlight the importance of carefully calibrating metric thresholds to accurately identify unhealthy deployments. Future work suggests exploring Blue-Green deployments, investigating the robustness of systems under security breaches, and assessing cost savings from using Argo Rollouts over time.

Page generated in 0.0374 seconds