• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Rethinking Serverless for Machine Learning Inference

Ellore, Anish Reddy 21 August 2023 (has links)
In the era of artificial intelligence and machine learning, AI/ML inference tasks have become exceedingly popular. However, executing these workloads on dedicated hardware may not be feasible for many users due to high maintenance costs, varying load patterns, and time to production. Furthermore, ML inference workloads are stateless, and most of them are not extremely latency sensitive. For example, tasks such as fake review removal, abusive language detection, tweet classification, image tagging, and free-tier-chat-bots do not require real-time inference. All these characteristics make serverless platforms a good fit for deployment, and in this work, we identify the bottlenecks involved in hosting these inference jobs on serverless and optimize serverless for better performance and resource utilization. Specifically, we identify model loading and model memory duplication as major bottlenecks in Serverless Inference, and to address these problems, we propose a new approach that rethinks the way we serve FaaS requests. To support this design, we employ a hybrid scaling approach to implement the autoscale feature of serverless. / Master of Science / Most modern software applications leverage the power of machine learning to incorporate intelligent features. For instance, platforms like Yelp employ machine learning algorithms to detect fake reviews, while intelligent chatbots such as ChatGPT provide interactive conversations. Even Netflix relies on machine learning to recommend personalized content to its users. The process of creating these machine learning services involves several stages, including data collection, model training using the collected data, and serving the trained model to deploy the service. This final stage, known as inference, is crucial for delivering real-time predictions or responses to user queries. In our research, we focus on selecting serverless computing as the preferred infrastructure for deploying these popular inference workloads. Serverless, also referred to as Function as a Service (FaaS), is an execution paradigm in cloud computing that allows users to efficiently run their code by providing scalability, elasticity and fine-grained billing. In this work we identified, model loading and model memory duplication as major bottlenecks in Serverless Inference. To solve these problems we propose a new approach which rethinks the way we serve FaaS requests. To support this design we use a hybrid scaling approach to implement the autoscale feature of serverless.
2

An Open-Source Framework for Large-Scale ML Model Serving

Sigfridsson, Petter January 2022 (has links)
The machine learning (ML) industry has taken great strides forward and is today facing new challenges. Many more models are developed, used and served within the industry. Datasets that models are trained on, are constantly changing. This demands that modern machine learning processes can handle large number of models, extreme load and support recurring updates in a scalable manner. To handle these challenges, there is a concept called model serving. Model serving is a relatively new concept where more efforts are required to address both conceptual and technical challenges. Existing ML model serving solutions aim to be scalable for the purpose of serving one model at a time. The industry itself requires that the whole ML process, the number of served models and that recurring updates are scalable. That is why this thesis presents an open-source framework for large-scale ML model serving that aims to meet the requirements of today’s ML industry. The presented framework is proven to handle a large-scale ML model serving environment in a scalable way but with some limitations. Results show that the number of parallel requests the framework can handle can be optimized. This would make the solution more efficient in the sense of resource utilization. One avenue for future improvements could be to integrate the developed framework as an application into the open-source machine learning platform STACKn.

Page generated in 0.0693 seconds