Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
431 |
SUSTAINING CHAOS USING DEEP REINFORCEMENT LEARNINGUnknown Date (has links)
Numerous examples arise in fields ranging from mechanics to biology where disappearance of Chaos can be detrimental. Preventing such transient nature of chaos has been proven to be quite challenging. The utility of Reinforcement Learning (RL), which is a specific class of machine learning techniques, in discovering effective control mechanisms in this regard is shown. The autonomous control algorithm is able to prevent the disappearance of chaos in the Lorenz system exhibiting meta-stable chaos, without requiring any a-priori knowledge about the underlying dynamics. The autonomous decisions taken by the RL algorithm are analyzed to understand how the system’s dynamics are impacted. Learning from this analysis, a simple control-law capable of restoring chaotic behavior is formulated. The reverse-engineering approach adopted in this work underlines the immense potential of the techniques used here to discover effective control strategies in complex dynamical systems. The autonomous nature of the learning algorithm makes it applicable to a diverse variety of non-linear systems, and highlights the potential of RLenabled control for regulating other transient-chaos like catastrophic events. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection
|
432 |
A Deep Reinforcement Learning Approach for Robotic Bicycle StabilizationJanuary 2020 (has links)
abstract: Bicycle stabilization has become a popular topic because of its complex dynamic behavior and the large body of bicycle modeling research. Riding a bicycle requires accurately performing several tasks, such as balancing and navigation which may be difficult for disabled people. Their problems could be partially reduced by providing steering assistance. For stabilization of these highly maneuverable and efficient machines, many control techniques have been applied – achieving interesting results, but with some limitations which includes strict environmental requirements. This thesis expands on the work of Randlov and Alstrom, using reinforcement learning for bicycle self-stabilization with robotic steering. This thesis applies the deep deterministic policy gradient algorithm, which can handle continuous action spaces which is not possible for Q-learning technique. The research involved algorithm training on virtual environments followed by simulations to assess its results. Furthermore, hardware testing was also conducted on Arizona State University’s RISE lab Smart bicycle platform for testing its self-balancing performance. Detailed analysis of the bicycle trial runs are presented. Validation of testing was done by plotting the real-time states and actions collected during the outdoor testing which included the roll angle of bicycle. Further improvements in regard to model training and hardware testing are also presented. / Dissertation/Thesis / Masters Thesis Mechanical Engineering 2020
|
433 |
Regret analysis of constrained irreducible MDPs with reset action / リセット行動が存在する制約付き既約MDPに対するリグレット解析Watanabe, Takashi 23 March 2020 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(人間・環境学) / 甲第22535号 / 人博第938号 / 新制||人||223(附属図書館) / 2019||人博||938(吉田南総合図書館) / 京都大学大学院人間・環境学研究科共生人間学専攻 / (主査)准教授 櫻川 貴司, 教授 立木 秀樹, 教授 日置 尋久 / 学位規則第4条第1項該当 / Doctor of Human and Environmental Studies / Kyoto University / DGAM
|
434 |
A Study on Resolution and Retrieval of Implicit Entity References in Microblogs / マイクロブログにおける暗黙的な実体参照の解決および検索に関する研究Lu, Jun-Li 23 March 2020 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22580号 / 情博第717号 / 新制||情||123(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授 吉川 正俊, 教授 黒橋 禎夫, 教授 田島 敬史, 教授 田中 克己(京都大学 名誉教授) / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
435 |
Deep Reinforcement Learning For Distributed Fog Network ProbingGuan, Xiaoding 01 September 2020 (has links)
The sixth-generation (6G) of wireless communication systems will significantly rely on fog/edge network architectures for service provisioning. To satisfy stringent quality of service requirements using dynamically available resources at the edge, new network access schemes are needed. In this paper, we consider a cognitive dynamic edge/fog network where primary users (PUs) may temporarily share their resources and act as fog nodes for secondary users (SUs). We develop strategies for distributed dynamic fog probing so SUs can find out available connections to access the fog nodes. To handle the large-state space of the connectivity availability that includes availability of channels, computing resources, and fog nodes, and the partial observability of the states, we design a novel distributed Deep Q-learning Fog Probing (DQFP) algorithm. Our goal is to develop multi-user strategies for accessing fog nodes in a distributed manner without any centralized scheduling or message passing. By using cooperative and competitive utility functions, we analyze the impact of the multi-user dynamics on the connectivity availability and establish design principles for our DQFP algorithm.
|
436 |
Reinforcement Learning: New Algorithms and An Application for Integer ProgrammingTang, Yunhao January 2021 (has links)
Reinforcement learning (RL) is a generic paradigm for the modeling and optimization of sequential decision making. In the recent decade, progress in RL research has brought about breakthroughs in several applications, ranging from playing video games, mastering board games, to controlling simulated robots. To bring the potential benefits of RL to other domains, two elements are critical: (1) Efficient and general-purpose RL algorithms; (2) Formulations of the original applications into RL problems. These two points are the focus of this thesis.
We start by developing more efficient RL algorithms. In Chapter 2, we propose Taylor Expansion Policy Optimization, a model-free algorithmic framework that unifies a number of important prior work as special cases. This unifying framework also allows us to develop a natural algorithmic extension to prior work, with empirical performance gains. In Chapter 3, we propose Monte-Carlo Tree Search as Regularized Policy Optimization, a model-based framework that draws close connections between policy optimization and Monte-Carlo tree search. Building on this insight, we propose Policy Optimization Zero (POZero), a novel algorithm which leverages the strengths of regularized policy search to achieve significant performance gains over MuZero.
To showcase how RL can be applied to other domains where the original applications could benefit from learning systems, we study the acceleration of integer programming (IP) solvers with RL. Due to the ubiquity of IP solvers in industrial applications, such research holds the promise of significant real life impacts and practical values. In Chapter 4, we focus on a particular formulation of Reinforcement Learning for Integer Programming: Learning to Cut. By combining cutting plane methods with selection rules learned by RL, we observe that the RL-augmented cutting plane solver achieves significant performance gains over traditional heuristics. This serves as a proof-of-concept of how RL can be combined with general IP solvers, and how learning augmented optimization systems might achieve significant acceleration in general.
|
437 |
Multi-criteria decision making using reinforcement learning and its application to food, energy, and water systems (FEWS) problemAishwarya Vikram Deshpande (11819114) 20 December 2021 (has links)
<p>Multi-criteria decision making (MCDM) methods have evolved over the past several decades. In today’s world with rapidly growing industries, MCDM has proven to be significant in many application areas. In this study, a decision-making model is devised using reinforcement learning to carry out multi-criteria optimization problems. Learning automata algorithm is used to identify an optimal solution in the presence of single and multiple environments (criteria) using pareto optimality. The application of this model is also discussed, where the model provides an optimal solution to the food, energy, and water systems (FEWS) problem.</p>
|
438 |
Viewpoint Optimization for Autonomous Strawberry Harvesting with Deep Reinforcement LearningSather, Jonathon J 01 June 2019 (has links)
Autonomous harvesting may provide a viable solution to mounting labor pressures in the United States' strawberry industry. However, due to bottlenecks in machine perception and economic viability, a profitable and commercially adopted strawberry harvesting system remains elusive. In this research, we explore the feasibility of using deep reinforcement learning to overcome these bottlenecks and develop a practical algorithm to address the sub-objective of viewpoint optimization, or the development of a control policy to direct a camera to favorable vantage points for autonomous harvesting. We evaluate the algorithm's performance in a custom, open-source simulated environment and observe affirmative results. Our trained agent yields 8.7 times higher returns than random actions and 8.8 percent faster exploration than our best baseline policy, which uses visual servoing. Visual investigation shows the agent is able to fixate on favorable viewpoints, despite having no explicit means to propagate information through time. Overall, we conclude that deep reinforcement learning is a promising area of research to advance the state of the art in autonomous strawberry harvesting.
|
439 |
Reinforcement Learning Based Fair Edge-User Allocation for Delay-Sensitive Edge Computing ApplicationsAlchalabi, Alaa Eddin 15 November 2021 (has links)
Cloud Gaming systems are among the most challenging networked-applications, since they deal with streaming high-quality and bulky video in real-time to players’ devices. While all industry solutions today are centralized, we introduce an AI-assisted hybrid networking architecture that, in addition to the central cloud servers, also uses some players’ computing resources as additional points of service. We describe the problem, its mathematical formulation, and potential solution strategy.
Edge computing is a promising paradigm that brings servers closer to users, leading to lower latencies and enabling latency-sensitive applications such as cloud gaming, virtual/augmented reality, telepresence, and telecollaboration. Due to the high number of possible edge servers and incoming user requests, the optimum choice of user-server matching has become a difficult challenge, especially in the 5G era where the network can offer very low latencies. In this thesis, we introduce the problem of fair server selection as not only complying with an application's latency threshold but also reducing the variance of the latency among users in the same session. Due to the dynamic and rapidly evolving nature of such an environment and the capacity limitation of the servers, we propose as solution a Reinforcement Learning method in the form of a Quadruple Q-Learning model with action suppression, Q-value normalization, and a reward function that minimizes the variance of the latency. Our evaluations in the context of a cloud gaming application show that, compared to a existing methods, our proposed method not only better meets the application's latency threshold but is also more fair with a reduction of up to 35\% in the standard deviation of the latencies while using the geo-distance, and it shows improvements in fairness up to 18.7\% compared to existing solutions using the RTT delay especially during resource scarcity. Additionally, the RL solution can act as a heuristic algorithm even when it is not fully trained.
While designing this solution, we also introduced action suppression, Quadruple Q-Learning, and normalization of the Q-values, leading to a more scalable and implementable RL system. We focus on algorithms for distributed applications and especially esports, but the principles we discuss apply to other domains and applications where fairness can be a crucial aspect to be optimized.
|
440 |
An Application of Sliding Mode Control to Model-Based Reinforcement LearningParisi, Aaron Thomas 01 September 2019 (has links)
The state-of-art model-free reinforcement learning algorithms can generate admissible controls for complicated systems with no prior knowledge of the system dynamics, so long as sufficient (oftentimes millions) of samples are available from the environ- ment. On the other hand, model-based reinforcement learning approaches seek to leverage known optimal or robust control to reinforcement learning tasks by mod- elling the system dynamics and applying well established control algorithms to the system model. Sliding-mode controllers are robust to system disturbance and modelling errors, and have been widely used for high-order nonlinear system control. This thesis studies the application of sliding mode control to model-based reinforcement learning. Computer simulation results demonstrate that sliding-mode control is viable in the setting of reinforcement learning. While the system performance may suffer from problems such as deviations in state estimation, limitations in the capacity of the system model to express the system dynamics, and the need for many samples to converge, this approach still performs comparably to conventional model-free reinforcement learning methods.
|
Page generated in 0.1293 seconds