Return to search

Automating Network Operation Centers using Reinforcement Learning

Reinforcement learning (RL) has been at the core of recent advances in fulfilling
the AI promise towards general intelligence. Unlike other machine learning (ML)
paradigms, such as supervised learning (SL) that learn to mimic how humans act,
RL tries to mimic how humans learn, and in many tasks, managed to discover new
strategies and achieved super-human performance. This is possible mainly because
RL algorithms are allowed to interact with the world to collect the data they need for
training by themselves. This is not possible in SL, where the ML model is limited to a
dataset collected by humans which can be biased towards sub-optimal solutions.
The downside of RL is its high cost when trained on real systems. This high cost
stems from the fact that the actions taken by an RL model during the initial phase of
training are merely random. To overcome this issue, it is common to train RL models
using simulators before deploying them in production. However, designing a realistic
simulator that faithfully resembles the real environment is not easy at all. Furthermore,
simulator-based approaches don’t utilize the sheer amount of field-data available at
their disposal.
This work investigates new ways to bridge the gap between SL and RL through an
offline pre-training phase. The idea is to utilize the field-data to pre-train RL models
in an offline setting (similar to SL), and then allow them to safely explore and improve
their performance beyond human-level. The proposed training pipeline includes: (i)
a process to convert static datasets into RL-environment, (ii) an MDP-aware data
augmentation process of offline-dataset, and (iii) a pre-training step that improves
RL exploration phase. We show how to apply this approach to design an action
recommendation engine (ARE) that automates network operation centers (NOC); a
task that is still tackled by teams of network professionals using hand-crafted rules.
Our RL algorithm learns to maximize the Quality of Experience (QoE) of NOC
users and minimize the operational costs (OPEX) compared to traditional algorithms.
Furthermore, our algorithm is scalable, and can be used to control large-scale networks
of arbitrary size.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/44971
Date18 May 2023
CreatorsAltamimi, Sadi
ContributorsShirmohammadi, Shervin
PublisherUniversité d'Ottawa / University of Ottawa
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Formatapplication/pdf
RightsAttribution-NonCommercial-NoDerivatives 4.0 International, http://creativecommons.org/licenses/by-nc-nd/4.0/

Page generated in 0.0025 seconds