Return to search

Deep Recurrent Q Networks for Dynamic Spectrum Access in Dynamic Heterogeneous Envirnments with Partial Observations

Dynamic Spectrum Access (DSA) has strong potential to address the need for improved spectrum efficiency. Unfortunately, traditional DSA approaches such as simple "sense-and-avoid" fail to provide sufficient performance in many scenarios. Thus, the combination of sensing with deep reinforcement learning (DRL) has been shown to be a promising alternative to previously proposed simplistic approaches. DRL does not require the explicit estimation of transition probability matrices and prohibitively large matrix computations as compared to traditional reinforcement learning methods. Further, since many learning approaches cannot solve the resulting online Partially-Observable Markov Decision Process (POMDP), Deep Recurrent Q-Networks (DRQN) have been proposed to determine the optimal channel access policy via online learning. The fundamental goal of this dissertation is to develop DRL-based solutions to address this POMDP-DSA problem. We mainly consider three aspects in this work: (1) optimal transmission strategies, (2) combined intelligent sensing and transmission strategies, and (c) learning efficiency or online convergence speed. Four key challenges in this problem are (1) the proposed DRQN-based node does not know the other nodes' behavior patterns a priori and must to predict the future channel state based on previous observations; (2) the impact to primary user throughput during learning and even after learning must be limited; (3) resources can be wasted the sensing/observation; and (4) convergence speed must be improved without impacting performance performance. We demonstrate in this dissertation, that the proposed DRQN can learn: (1) the optimal transmission strategy in a variety of environments under partial observations; (2) a sensing strategy that provides near-optimal throughput in different environments while dramatically reducing the needed sensing resources; (3) robustness to imperfect observations; (4) a sufficiently flexible approach that can accommodate dynamic environments, multi-channel transmission and the presence of multiple agents; (5) in an accelerated fashion utilizing one of three different approaches. / Doctor of Philosophy / With the development of wireless communication, such as 5G, global mobile data traffic has experienced tremendous growth, which makes spectrum resources even more critical for future networks. However, the spectrum is an exorbitant and scarce resource. Dynamic Spectrum Access (DSA) has strong potential to address the need for improved spectrum efficiency. Unfortunately, traditional DSA approaches such as simple "sense-and-avoid" fail to provide sufficient performance in many scenarios. Thus, the combination of sensing with deep reinforcement learning (DRL) has been shown to be a promising alternative to previously proposed simplistic approaches. Compared with traditional reinforcement learning methods, DRL does not require explicit estimation of transition probability matrices and extensive matrix computations. Furthermore, since many learning methods cannot solve the resulting online partially observable Markov decision process (POMDP), a deep recurrent Q-network (DRQN) is proposed to determine the optimal channel access policy through online learning. The basic goal of this paper is to develop a DRL-based solution to this POMDP-DSA problem. This paper mainly focuses on improving performance from three directions. 1. Find the optimal (or sub-optimal) channel access strategy based on fixed partial observation mode; 2. Based on work 1, propose a more intelligent way to dynamically and efficiently find more reasonable (higher efficiency) sensing/observation policy and corresponding channel access strategy; 3. On the premise of ensuring performance, use different machine learning algorithms or structures to improve learning efficiency and avoid users waiting too long for expected performance. Through the research in these three main directions, we have found an efficient and diverse solution, namely DRQN-based technology.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/111994
Date23 September 2022
CreatorsXu, Yue
ContributorsElectrical Engineering, Buehrer, Richard M., Headley, William C., Dhillon, Harpreet Singh, Liu, Lingjia, Wang, Yue J.
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0028 seconds