Global ETD Search

41	Využití opakovaně posilovaného učení pro řízení čtyřnohého robotu / Using of Reinforcement Learning for Four Legged Robot Control Ondroušek, Vít January 2011 (has links) The Ph.D. thesis is focused on using the reinforcement learning for four legged robot control. The main aim is to create an adaptive control system of the walking robot, which will be able to plan the walking gait through Q-learning algorithm. This aim is achieved using the design of the complex three layered architecture, which is based on the DEDS paradigm. The small set of elementary reactive behaviors forms the basis of proposed solution. The set of composite control laws is designed using simultaneous activations of these behaviors. Both types of controllers are able to operate on the plain terrain as well as on the rugged one. The model of all possible behaviors, that can be achieved using activations of mentioned controllers, is designed using an appropriate discretization of the continuous state space. This model is used by the Q-learning algorithm for finding the optimal strategies of robot control. The capabilities of the control unit are shown on solving three complex tasks: rotation of the robot, walking of the robot in the straight line and the walking on the inclined plane. These tasks are solved using the spatial dynamic simulations of the four legged robot with three degrees of freedom on each leg. Resulting walking gaits are evaluated using the quantitative standardized indicators. The video files, which show acting of elementary and composite controllers as well as the resulting walking gaits of the robot, are integral part of this thesis.
42	Essays on Reinforcement Learning with Decision Trees and Accelerated Boosting of Partially Linear Additive Models Dinger, Steven 01 October 2019 (has links) No description available. Statistics Fitted Q-Iteration Gradient Boosting Online Random Forest Q-Learning Twin Boosting Variable Selection
43	Board Game AI Using Reinforcement Learning Strömberg, Linus, Lind, Viktor January 2022 (has links) The purpose of this thesis is to develop an agent that learns to play an interpretation ofthe popular game Ticket To Ride. This project is done in collaboration with Piktiv AB.This thesis presents how an agent based on the Double Deep Q-network algorithm learnsto play a version of Ticket To Ride using self-play. This is the documentation of how thegame and the agent was developed, as well as how the agent was evaluated. Reinforcement Learning Game Development Computer Engineering Q-learning 2 Computer Sciences Datavetenskap (datalogi)
44	Automated touch-less customer order and robot deliver system design at Kroger Shan, Xingjian 22 August 2022 (has links) No description available. Mechanical Engineering Q-learning Reinforcment learning hand pose detection robot design human-robot interaction
45	Model Checked Reinforcement Learning For Multi-Agent Planning Wetterholm, Erik January 2023 (has links) Autonomous systems, or agents as they sometimes are called can be anything from drones, self-driving cars, or autonomous construction equipment. The systems are often given tasks of accomplishing missions in a group or more. This may require that they can work within the same area without colliding or disturbing other agents' tasks. There are several tools for planning and designing such systems, one of them being UPPAAL STRATEGO. Multi-agent planning (MAP) is about planning actions in optimal ways such that the agents can accomplish their mission efficiently. A method of doing this named MCRL, utilizes Q learning as the algorithm for finding an optimal plan. These plans then need to be verified to ensure that they can accomplish what a user intended within the allowed time, something that UPPAAL STRATEGO can do. This is because a Q-learning algorithm does not have a correctness guarantee. Using this method alleviates the state-explosion problem that exists with an increasing number of agents. Using UPPAAL STRATEGO it is also possible to acquire the best and worst-case execution time (BCET and WCET) and their corresponding traces. This thesis aims to obtain the BCET and WCET and their corresponding traces in the model. MALTA; UPPAAL UPPAAL STRATEGO TImed Games Q-Learning Timed Automata Timed Games Computer Sciences Datavetenskap (datalogi)
46	Reinforcement Learning Based Resource Allocation for Network Slicing in O-RAN Cheng, Nien Fang 06 July 2023 (has links) Fifth Generation (5G) introduces technologies that expedite the adoption of mobile networks, such as densely connected devices, ultra-fast data rate, low latency and more. With those visions in 5G and 6G in the next step, the need for a higher transmission rate and lower latency is more demanding, possibly breaking Moore’s law. With Artificial Intelligence (AI) techniques becoming mature in the past decade, optimizing resource allocation in the network has become a highly demanding problem for Mobile Network Operators (MNOs) to provide better Quality of Service (QoS) with less cost. This thesis proposes a Reinforcement Learning (RL) solution on bandwidth allocation for network slicing integration in disaggregated Open Radio Access Network (O-RAN) architecture. O-RAN redefines traditional Radio Access Network (RAN) elements into smaller components with detailed functional specifications. The concept of open modularization leads to greater potential for managing resources of different network slices. In 5G mobile networks, there are three major types of network slices, Enhanced Mobile Broadband (eMBB), Ultra-Reliable Low Latency Communications (URLLC), and Massive Machine Type Communications (mMTC). Each network slice has different features in the 5G network; therefore, the resources can be relocated depending on different needs. The virtualization of O-RAN divides the RAN into smaller function groups. This helps the network slices to divide the shared resources further down. Compared to traditional sequential signal processing, allocating dedicated resources for each network slice can improve the performance individually. In addition, shared resources can be customized statically based on the feature requirement of each slice. To further enhance the bandwidth utilization on the disaggregated O-RAN, a RL algorithm is proposed in this thesis on midhaul bandwidth allocation shared between Centralized Unit (CU) and Distributed Unit (DU). A Python-based simulator has been implemented considering several types of mobile User Equipment (UE)s for this thesis. The simulator is later integrated with the proposed Q-learning model. The RL model finds the optimization on bandwidth allocation in midhaul between Edge Open Cloud (O-Cloud)s (DUs) and Regional O-Cloud (CU). The results show up to 50% improvement in the throughput of the targeted slice, fairness to other slices, and overall bandwidth utilization on the O-Clouds. In addition, the UE QoS has a significant improvement in terms of transmission time. network slicing O-RAN CU-DU functional split bandwidth optimization RL Q-learning
47	A Learning based Adaptive Cruise and Lane Control System Xu, Peng 31 August 2018 (has links) No description available. Engineering Electrical Engineering Artificial Intelligence
48	Reinforcement Learning Methods for OpenAI Environments Winberg, Andreas, Öhrstam Lindström, Oliver January 2020 (has links) Using the powerful methods developed in the fieldof reinforcement learning requires an understanding of theadvantages and drawbacks of different methods as well as theeffects of the different adjustable parameters. This paper high-lights the differences in performance and applicability betweenthree different Q-learning methods: Q-table, deep Q-network anddouble deep Q-network where Q refers to the value assigned toa given state-action pair. The performance of these algorithms isevaluated on the two OpenAI gym environments MountainCar-v0 and CartPole-v0. The implementations are done in Pythonusing the Tensorflow toolkit with Keras. The results show thatthe Q-table was the best to use in the Mountain car environmentbecause it was the easiest to implement and was much fasterto compute, but it was also shown that the network methodsrequired far less training data. No significant difference inperformance was found between the deep Q-network and thedouble deep Q-network. In the end, there is a trade-off betweenthe number of episodes required and the computation time foreach episode. The network parameters were also harder to tunesince much more time was needed to compute and visualize theresult. / Att använda de kraftfulla metoderna som utvecklats inom området reinforcement learning kräver en förståelse av fördelar och nackdelar mellan olika metoder samt effekterna av de olika justerbara parametrarna. Denna artikel belyser skillnaderna i prestanda och funktionalitet mellan tre olika metoder: Q-table, deep Q-network och double deep Q- network. Prestandan för dessa algoritmer utvärderas i de två OpenAI gym-miljöerna MountainCar-v0 samt Cartpole-v0. Implementeringarna görs i python med hjälp av programvarubiblioteket Tensorflow tillsammans med Keras. Resultaten visar att Q-table var lättast att implementera och tränade snabbast i båda miljöerna. Nätverksmetoderna krävde dock mindre träningsdata även om det tog lång tid att träna på den data som fanns. Inga stora skillnader i prestanda hittades mellan deep Q-network och double deep Q-network. I slutändan kommer det alltid vara en balansgång mellan mängden träningsdata som krävs och tiden det tar att träna på den data som finns. Reinforcement learning Q-learning OpenAIgym neural networks Elektroteknik och elektronik
49	A Distributed Q-learning Classifier System for task decomposition in real robot learning problems Chapman, Kevin L. 04 March 2009 (has links) A distributed reinforcement-learning system is designed and implemented on a mobile robot for the study of complex task decomposition in real robot learning environments. The Distributed Q-learning Classifier System (DQLCS) is evolved from the standard Learning Classifier System (LCS) proposed by J.H. Holland. Two of the limitations of the standard LCS are its monolithic nature and its complex apportionment of credit scheme, the bucket brigade algorithm (BBA). The DQLCS addresses both of these problems as well as the inherent difficulties faced by learning systems operating in real environments. We introduce Q-learning as the apportionment of credit component of the DQLCS, and we develop a distributed learning architecture to facilitate complex task decomposition. Based upon dynamic programming, the Q-learning update equation is derived and its advantages over the complex BBA are discussed. The distributed architecture is implemented to provide for faster learning by allowing the system to effectively decrease the size of the problem space it must explore. Holistic and monolithic shaping approaches are used to distribute reward among the learning modules of the DQLCS in a variety of real robot learning experiments. The results of these experiments support the DQLCS as a useful reinforcement learning paradigm and suggest future areas of study in distributed learning systems. / Master of Science Q-learning Learning Classifier Systems artificial intelligence mobile robots task decomposition LD5655.V855 1996.C437
50	Learning Strategies in Multi-Agent Systems - Applications to the Herding Problem Gadre, Aditya Shrikant 14 December 2001 (has links) "Multi-Agent systems" is a topic for a lot of research, especially research involving strategy, evolution and cooperation among various agents. Various learning algorithm schemes have been proposed such as reinforcement learning and evolutionary computing. In this thesis two solutions to a multi-agent herding problem are presented. One solution is based on Q-learning algorithm, while the other is based on modeling of artificial immune system. Q-learning solution for the herding problem is developed, using region-based local learning for each individual agent. Individual and batch processing reinforcement algorithms are implemented for non-cooperative agents. Agents in this formulation do not share any information or knowledge. Issues such as computational requirements, and convergence are discussed. An idiotopic artificial immune network is proposed that includes individual B-cell model for agents and T-cell model for controlling the interaction among these agents. Two network models are proposed--one for evolving group behavior/strategy arbitration and the other for individual action selection. A comparative study of the Q-learning solution and the immune network solution is done on important aspects such as computation requirements, predictability, and convergence. / Master of Science Idiotopic Network Reinforcement Learning Reward functions Dynamic Programming Q-learning Artificial Immune System

Search results