Global ETD Search

Return to search

Towards High-Accuracy and Resource-Efficient Edge-Assisted Augmented Reality

Immersive applications such as augmented reality (AR) and mixed reality (MR) often need to perform latency-critical analytics tasks on every frame captured on camera. These tasks, often powered by deep neural networks (DNNs) for their superior accuracy, necessitate offloading to edge servers with GPUs due to their computational intensity. Achieving high accuracy and efficient AR task offloading faces two fundamental challenges untapped by prior work: (1) In practice, multiple DNN-supported tasks need to offload concurrently to achieve the app functionality -- how to schedule such offloaded tasks on the client which compete for shared edge server resources to maximize the app QoE? (2) Concurrent AR clients from a large user base offload to a cluster of GPU servers -- how to schedule the offloaded tasks on the servers to maximize the number of clients served and lower the operating cost?To tackle the first challenge, we design a framework, AccuMO, that balances the offloading frequencies of different tasks by dynamically scheduling the offloading of multiple tasks from an AR client to an edge server, thereby optimizing the overall accuracy across tasks and hence app QoE. Our design employs two novel ideas: (1) task-specific lightweight models that predict offloading accuracy drop as a function of offloading frequency and frame content, and (2) a general two-level control feedback loop that concurrently balances offloading among tasks and adapts between offloading and using local algorithms for each task.We tackle the challenge of supporting concurrent AR clients in two steps. We first focus on maximizing the capacity of individual edge servers, where we present ARISE, which untangles the intricate interplay between per-client offloading schedule and batched inference on the server by proactively coordinating offloading requests from different AR clients. In the second step, we focus on a cluster setup of heterogeneous GPU servers which exposes the synergy between diversity in both DNN layers and GPU architectures, manifesting as comparable inference latency for many layers in DNN models when running on low-class and high-class GPUs. We exploit such overlooked capability of low-class GPUs using pipeline parallelism and present a novel inference serving system, IPIPE, that employs pool-based pipeline parallelism with a mixed-integer linear programming (MILP)-based control plane and a data plane that performs resource reservation-based adaptive batching.

10.25394/pgs.26322106.v1

Mobile computing

Networking and communications

Neural networks

Mobile augmented reality

Edge computing

DNN offloading

DNN serving

Machine learning as a service

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/26322106
Date	21 July 2024
Creators	Qiang Xu (19166152)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/Towards_High-Accuracy_and_Resource-Efficient_Edge-Assisted_Augmented_Reality/26322106

Page generated in 0.0018 seconds

Towards High-Accuracy and Resource-Efficient Edge-Assisted Augmented Reality

Description

Links & Downloads

Tags

Additional Fields