Return to search

Autonomic computer network defence using risk states and reinforcement learning

Autonomic Computer Network Defence aims to achieve self-protection capability of IT networks in order to limit the risk caused by malicious and accidental events. To achieve this, networks require an automated controller with a policy, which allows selecting the most appropriate action in any undesired network state. Due to the complexity and constant evolution of the Computer Network Defence environment, a-priori design of an automated controller is not effective. A solution for generating and continuously improving decision policies is needed.
A significant technical challenge in Autonomic Computer Network Defence is finding a strategy to efficiently generate, trial and compare different policies and retain the best performing one. To address this challenge, we use Reinforcement Learning to explore Computer Network Defence action and state spaces and to learn which policy optimally reduces risk. A simulated Computer Network Defence environment is implemented using Discrete Event Dynamic System simulation and a novel graph model. A network asset value assessment technique and a dynamic risk assessment algorithm are also implemented to provide evaluation metrics. This environment is used to train two Reinforcement Learning agents, one with a table policy and the other with a neural network policy. The resulting policies are then compared to three other empirical policies for their risk performances. These empirical policies serve as evaluation baseline and include: letting risk grow without any action, randomly selecting valid actions, and selecting the next action based on the pre-computed asset value of the affected assets and choosing the one with the highest value first.
We found that in all test scenarios, both Reinforcement Learning policies evaluated improved the overall risk when compared to random selection of valid actions. In one simple scenario, both Reinforcement Learning policies converged to the same optimum policy with better risk performance than other assessed policies. Generally, we found that, for the tested scenarios and training strategies, a simple policy addressing affected assets in the order of their asset values, can generally yield superior results.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/28367
Date January 2009
CreatorsBeaudoin, Luc
PublisherUniversity of Ottawa (Canada)
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format91 p.

Page generated in 0.0018 seconds