Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
431 |
Zero-Knowledge Agent Trained for the Game of RiskBethdavid, Simon January 2020 (has links)
Recent developments in deep reinforcement learning applied to abstract strategy games such as Go, chess and Hex have sparked an interest within military planning. This Master thesis explores if it is possible to implement an algorithm similar to Expert Iteration and AlphaZero to wargames. The studied wargame is Risk, which is a turn-based multiplayer game played on a simplified political map of the world. The algorithms consist of an expert, in the form of a Monte Carlo tree search algorithm, and an apprentice, implemented through a neural network. The neural network is trained by imitation learning, trained to mimic expert decisions generated from self-play reinforcement learning. The apprentice is then used as heuristics in forthcoming tree searches. The results demonstrated that a Monte Carlo tree search algorithm could, to some degree, be employed on a strategy game as Risk, dominating a random playing agent. The neural network, fed with a state representation in the form of a vector, had difficulty in learning expert decisions and could not beat a random playing agent. This led to a halt in the expert/apprentice learning process. However, possible solutions are provided as future work.
|
432 |
Device to Device Communications for Smart GridShimotakahara, Kevin 17 June 2020 (has links)
This thesis identifies and addresses two barriers to the adoption of Long Term Evolution (LTE) Device-to-Device (D2D) communication enabled smart grid applications in out of core network coverage regions. The first barrier is the lack of accessible simulation software for engineers to develop and test the feasibility of their D2D LTE enabled smart grid application designs. The second barrier is the lack of a distributed resource allocation algorithm for LTE D2D communications that has been tailored to the needs of smart grid applications.
A solution was proposed to the first barrier in the form of a simulator constructed in
Matlab/Simulink used to simulate power systems and the underlying communication
system, i.e., D2D communication protocol stack of Long Term Evolution (LTE). The
simulator is built using Matlab's LTE System Toolbox, SimEvents, and Simscape Power Systems in addition to an in-house developed interface software to facilitate D2D communications in smart grid applications. To test the simulator, a simple fault location, isolation, and restoration (FLISR) application was implemented using the simulator to show that the LTE message timing is consistent with the relay signaling in the power system.
A solution was proposed to the second barrier in the form of a multi-agent Q-learning based resource allocation algorithm that allows Long Term Evolution (LTE) enabled
device-to-device (D2D) communication agents to generate orthogonal transmission schedules outside of network coverage. This algorithm reduces packet drop rates (PDR) in distributed D2D communication networks to meet the quality of service requirements of microgrid communications. The PDR and latency performance of the proposed algorithm was compared to the existing random self-allocation mechanism introduced under the Third Generation Partnership Project's LTE Release 12. The proposed algorithm outperformed the LTE algorithm for all tested scenarios, demonstrating 20-40% absolute reductions in PDR and 10-20 ms reductions in latency for all microgrid applications.
|
433 |
Is the Click the Trick? The Efficacy of Clickers and Other Reinforcement Methods in Training Naïve Dogs to Perform New TasksJanuary 2020 (has links)
abstract: A handheld metal noisemaker known as a “clicker” is widely used to train new behaviors in dogs; however, evidence for the superior efficacy of clickers as opposed to providing solely primary reinforcement or other secondary reinforcers in the acquisition of novel behavior in dogs is almost entirely anecdotal. Three experiments were conducted to determine under what circumstances a clicker may result in acquisition of a novel behavior more rapidly or to a higher level compared to other readily available reinforcement methods. In Experiment 1, three groups of 30 dogs each were trained to emit a novel sit and stay behavior of increasing duration with either the delivery of food alone, a verbal stimulus paired with food, or a clicker with food. The group that received only a primary reinforcer reached a significantly higher criterion of training success than the group trained with a verbal secondary reinforcer. Performance of the group experiencing a clicker secondary reinforcer was intermediate between the other two groups, but not significantly different from either. In Experiment 2, three different groups of 25 dogs each were shaped to emit a nose targeting behavior and then perform that behavior at increasing distances from the experimenter using the same three methods of positive reinforcement as in Experiment 1. No statistically significant differences between the groups were found. In Experiment 3, three groups of 30 dogs each were shaped to emit a nose-targeting behavior upon an array of wooden blocks with task difficulty increasing throughout testing using the same three methods of positive reinforcement as previously. No statistically significant differences between the groups were found. Overall, the findings suggest that both clickers and other forms of positive reinforcement can be used successfully in training a dog to perform a novel behavior, but that no positive reinforcement method has significantly greater efficacy than any other. / Dissertation/Thesis / Masters Thesis Psychology 2020
|
434 |
Better cooperation through communication in multi-agent reinforcement learningKiseliou, Ivan January 2020 (has links)
Cooperative needs play a critical role in the organisation of natural systems of communications. A number of recent studies in multi-agent reinforcement learning have established that artificial intelligence agents are similarly able to develop functional communication when required to complete a cooperative task. This thesis studies the emergence of communication in reinforcement learning agents, using a custom card game environment as a test-bed. Two contrasting approaches encompassing continuous and discrete modes of communication were appraised experimentally. Based on the average game completion rate, the agents provisioned with a continuous communication channel consistently exceed the no-communication baseline. A qualitative analysis of the agents’ behavioural strategies reveals a clearly defined communication protocol as well as the deployment of playing tactics unseen in the baseline agents. On the other hand, the agents equipped with the discrete channel fail to learn to utilise it effectively, ultimately showing no improvement from the baseline.
|
435 |
On Hierarchical Goal Based Reinforcement LearningDenis, Nicholas 27 August 2019 (has links)
Discrete time sequential decision processes require that an agent select an action
at each time step. As humans, we plan over long time horizons and use temporal
abstraction by selecting temporally extended actions such as “make lunch” or “get
a masters degree”, whereby each is comprised of more granular actions. This thesis
concerns itself with such hierarchical temporal abstractions in the form of macro
actions and options, as they apply to goal-based Markov Decision Processes. A novel
algorithm for discovering hierarchical macro actions in goal-based MDPs, as well as
a novel algorithm utilizing landmark options for transfer learning in multi-task goal-
based reinforcement learning settings are introduced. Theoretical properties regarding the life-long regret of an agent executing the latter algorithm are also discussed.
|
436 |
SUSTAINING CHAOS USING DEEP REINFORCEMENT LEARNINGUnknown Date (has links)
Numerous examples arise in fields ranging from mechanics to biology where disappearance of Chaos can be detrimental. Preventing such transient nature of chaos has been proven to be quite challenging. The utility of Reinforcement Learning (RL), which is a specific class of machine learning techniques, in discovering effective control mechanisms in this regard is shown. The autonomous control algorithm is able to prevent the disappearance of chaos in the Lorenz system exhibiting meta-stable chaos, without requiring any a-priori knowledge about the underlying dynamics. The autonomous decisions taken by the RL algorithm are analyzed to understand how the system’s dynamics are impacted. Learning from this analysis, a simple control-law capable of restoring chaotic behavior is formulated. The reverse-engineering approach adopted in this work underlines the immense potential of the techniques used here to discover effective control strategies in complex dynamical systems. The autonomous nature of the learning algorithm makes it applicable to a diverse variety of non-linear systems, and highlights the potential of RLenabled control for regulating other transient-chaos like catastrophic events. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection
|
437 |
A Deep Reinforcement Learning Approach for Robotic Bicycle StabilizationJanuary 2020 (has links)
abstract: Bicycle stabilization has become a popular topic because of its complex dynamic behavior and the large body of bicycle modeling research. Riding a bicycle requires accurately performing several tasks, such as balancing and navigation which may be difficult for disabled people. Their problems could be partially reduced by providing steering assistance. For stabilization of these highly maneuverable and efficient machines, many control techniques have been applied – achieving interesting results, but with some limitations which includes strict environmental requirements. This thesis expands on the work of Randlov and Alstrom, using reinforcement learning for bicycle self-stabilization with robotic steering. This thesis applies the deep deterministic policy gradient algorithm, which can handle continuous action spaces which is not possible for Q-learning technique. The research involved algorithm training on virtual environments followed by simulations to assess its results. Furthermore, hardware testing was also conducted on Arizona State University’s RISE lab Smart bicycle platform for testing its self-balancing performance. Detailed analysis of the bicycle trial runs are presented. Validation of testing was done by plotting the real-time states and actions collected during the outdoor testing which included the roll angle of bicycle. Further improvements in regard to model training and hardware testing are also presented. / Dissertation/Thesis / Masters Thesis Mechanical Engineering 2020
|
438 |
Regret analysis of constrained irreducible MDPs with reset action / リセット行動が存在する制約付き既約MDPに対するリグレット解析Watanabe, Takashi 23 March 2020 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(人間・環境学) / 甲第22535号 / 人博第938号 / 新制||人||223(附属図書館) / 2019||人博||938(吉田南総合図書館) / 京都大学大学院人間・環境学研究科共生人間学専攻 / (主査)准教授 櫻川 貴司, 教授 立木 秀樹, 教授 日置 尋久 / 学位規則第4条第1項該当 / Doctor of Human and Environmental Studies / Kyoto University / DGAM
|
439 |
A Study on Resolution and Retrieval of Implicit Entity References in Microblogs / マイクロブログにおける暗黙的な実体参照の解決および検索に関する研究Lu, Jun-Li 23 March 2020 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22580号 / 情博第717号 / 新制||情||123(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授 吉川 正俊, 教授 黒橋 禎夫, 教授 田島 敬史, 教授 田中 克己(京都大学 名誉教授) / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
440 |
Deep Reinforcement Learning For Distributed Fog Network ProbingGuan, Xiaoding 01 September 2020 (has links)
The sixth-generation (6G) of wireless communication systems will significantly rely on fog/edge network architectures for service provisioning. To satisfy stringent quality of service requirements using dynamically available resources at the edge, new network access schemes are needed. In this paper, we consider a cognitive dynamic edge/fog network where primary users (PUs) may temporarily share their resources and act as fog nodes for secondary users (SUs). We develop strategies for distributed dynamic fog probing so SUs can find out available connections to access the fog nodes. To handle the large-state space of the connectivity availability that includes availability of channels, computing resources, and fog nodes, and the partial observability of the states, we design a novel distributed Deep Q-learning Fog Probing (DQFP) algorithm. Our goal is to develop multi-user strategies for accessing fog nodes in a distributed manner without any centralized scheduling or message passing. By using cooperative and competitive utility functions, we analyze the impact of the multi-user dynamics on the connectivity availability and establish design principles for our DQFP algorithm.
|
Page generated in 0.1169 seconds