Return to search

Reinforcement Learning Based Fair Edge-User Allocation for Delay-Sensitive Edge Computing Applications

Cloud Gaming systems are among the most challenging networked-applications, since they deal with streaming high-quality and bulky video in real-time to players’ devices. While all industry solutions today are centralized, we introduce an AI-assisted hybrid networking architecture that, in addition to the central cloud servers, also uses some players’ computing resources as additional points of service. We describe the problem, its mathematical formulation, and potential solution strategy.

Edge computing is a promising paradigm that brings servers closer to users, leading to lower latencies and enabling latency-sensitive applications such as cloud gaming, virtual/augmented reality, telepresence, and telecollaboration. Due to the high number of possible edge servers and incoming user requests, the optimum choice of user-server matching has become a difficult challenge, especially in the 5G era where the network can offer very low latencies. In this thesis, we introduce the problem of fair server selection as not only complying with an application's latency threshold but also reducing the variance of the latency among users in the same session. Due to the dynamic and rapidly evolving nature of such an environment and the capacity limitation of the servers, we propose as solution a Reinforcement Learning method in the form of a Quadruple Q-Learning model with action suppression, Q-value normalization, and a reward function that minimizes the variance of the latency. Our evaluations in the context of a cloud gaming application show that, compared to a existing methods, our proposed method not only better meets the application's latency threshold but is also more fair with a reduction of up to 35\% in the standard deviation of the latencies while using the geo-distance, and it shows improvements in fairness up to 18.7\% compared to existing solutions using the RTT delay especially during resource scarcity. Additionally, the RL solution can act as a heuristic algorithm even when it is not fully trained.

While designing this solution, we also introduced action suppression, Quadruple Q-Learning, and normalization of the Q-values, leading to a more scalable and implementable RL system. We focus on algorithms for distributed applications and especially esports, but the principles we discuss apply to other domains and applications where fairness can be a crucial aspect to be optimized.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/42915
Date15 November 2021
CreatorsAlchalabi, Alaa Eddin
ContributorsShirmohammadi, Shervin
PublisherUniversité d'Ottawa / University of Ottawa
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf
RightsAttribution 4.0 International, http://creativecommons.org/licenses/by/4.0/

Page generated in 0.019 seconds