Global ETD Search

251	Reinforcement Learning with Gaussian Processes for Unmanned Aerial Vehicle Navigation Gondhalekar, Nahush Ramesh 03 August 2017 (has links) We study the problem of Reinforcement Learning (RL) for Unmanned Aerial Vehicle (UAV) navigation with the smallest number of real world samples possible. This work is motivated by applications of learning autonomous navigation for aerial robots in structural inspec- tion. A naive RL implementation suffers from curse of dimensionality in large continuous state spaces. Gaussian Processes (GPs) exploit the spatial correlation to approximate state- action transition dynamics or value function in large state spaces. By incorporating GPs in naive Q-learning we achieve better performance in smaller number of samples. The evalua- tion is performed using simulations with an aerial robot. We also present a Multi-Fidelity Reinforcement Learning (MFRL) algorithm that leverages Gaussian Processes to learn the optimal policy in a real world environment leveraging samples gathered from a lower fidelity simulator. In MFRL, an agent uses multiple simulators of the real environment to perform actions. With multiple levels of fidelity in a simulator chain, the number of samples used in successively higher simulators can be reduced. / Master of Science / Increasing development in the field of infrastructure inspection using Unmanned Aerial Vehicles (UAVs) has been seen in the recent years. This thesis presents work related to UAV navigation using Reinforcement Learning (RL) with the smallest number of real world samples. A naive RL implementation suffers from the curse of dimensionality in large continuous state spaces. Gaussian Processes (GPs) exploit the spatial correlation to approximate state-action transition dynamics or value function in large state spaces. By incorporating GPs in naive Q-learning we achieve better performance in smaller number of samples. The evaluation is performed using simulations with an aerial robot. We also present a Multi-Fidelity Reinforcement Learning (MFRL) algorithm that leverages Gaussian Processes to learn the optimal policy in a real world environment leveraging samples gathered from a lower fidelity simulator. In MFRL, an agent uses multiple simulators of the real environment to perform actions. With multiple levels of fidelity in a simulator chain, the number of samples used in successively higher simulators can be reduced. By developing a bidirectional simulator chain, we try to provide a learning platform for the robots to safely learn required skills in the smallest possible number of real world samples possible. Reinforcement Learning Gaussian Processes Unmanned Aerial Vehicle Navigation
252	Optimization of Trusses and Frames using Reinforcement Learning / 強化学習を用いたトラスと骨組の最適化 Chi-Tathon, Kupwiwat 25 March 2024 (has links) 京都大学 / 新制・課程博士 / 博士(工学) / 甲第25241号 / 工博第5200号 / 京都大学大学院工学研究科建築学専攻 / (主査)教授大崎純, 教授西山峰広, 准教授藤田皓平 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DFAM trusse frame structural optimization machine learning reinforcement learning 500
253	Design of neurofuzzy controller using reinforcement learning with application to linear system and inverted pendulum Saengdeejing, Apiwat 01 January 1998 (has links) No description available. Neural networks (Computer science) Reinforcement learning Electrical and Computer Engineering Engineering
254	Machine Learning Simulation: Torso Dynamics of Robotic Biped Renner, Michael Robert 22 August 2007 (has links) Military, Medical, Exploratory, and Commercial robots have much to gain from exchanging wheels for legs. However, the equations of motion of dynamic bipedal walker models are highly coupled and non-linear, making the selection of an appropriate control scheme difficult. A temporal difference reinforcement learning method known as Q-learning develops complex control policies through environmental exploration and exploitation. As a proof of concept, Q-learning was applied through simulation to a benchmark single pendulum swing-up/balance task; the value function was first approximated with a look-up table, and then an artificial neural network. We then applied Evolutionary Function Approximation for Reinforcement Learning to effectively control the swing-leg and torso of a 3 degree of freedom active dynamic bipedal walker in simulation. The model began each episode in a stationary vertical configuration. At each time-step the learning agent was rewarded for horizontal hip displacement scaled by torso altitude--which promoted faster walking while maintaining an upright posture--and one of six coupled torque activations were applied through two first-order filters. Over the course of 23 generations, an approximation of the value function was evolved which enabled walking at an average speed of 0.36 m/s. The agent oscillated the torso forward then backward at each step, driving the walker forward for forty-two steps in thirty seconds without falling over. This work represents the foundation for improvements in anthropomorphic bipedal robots, exoskeleton mechanisms to assist in walking, and smart prosthetics. / Master of Science Dynamic Bipedal Walking Reinforcement Learning Q-Learning Torso NEAT+Q
255	Neuro-Symbolic Distillation of Reinforcement Learning Agents Abir, Farhan Fuad 01 January 2024 (has links) (PDF) In the past decade, reinforcement learning (RL) has achieved breakthroughs across various domains, from surpassing human performance in strategy games to enhancing the training of large language models (LLMs) with human feedback. However, RL has yet to gain widespread adoption in mission-critical fields such as healthcare and autonomous vehicles. This is primarily attributed to the inherent lack of trust, explainability, and generalizability of neural networks in deep reinforcement learning (DRL) agents. While neural DRL agents leverage the power of neural networks to solve specific tasks robustly and efficiently, this often comes at the cost of explainability and generalizability. In contrast, pure symbolic agents maintain explainability and trust but often underperform in high-dimensional data. In this work, we developed a method to distill explainable and trustworthy agents using neuro-symbolic AI. Neuro-symbolic distillation combines the strengths of symbolic reasoning and neural networks, creating a hybrid framework that leverages the structured knowledge representation of symbolic systems alongside the learning capabilities of neural networks. The key steps of neuro-symbolic distillation involve training traditional DRL agents, followed by extracting, selecting, and distilling their learned policies into symbolic forms using symbolic regression and tree-based models. These symbolic representations are then employed instead of the neural agents to make interpretable decisions with comparable accuracy. The approach is validated through experiments on Lunar Lander and Pong, demonstrating that symbolic representations can effectively replace neural agents while enhancing transparency and trustworthiness. Our findings suggest that this approach mitigates the black-box nature of neural networks, providing a pathway toward more transparent and trustworthy AI systems. The implications of this research are significant for fields requiring both high performance and explainability, such as autonomous systems, healthcare, and financial modeling. Neuro-symbolic AI Deep Reinforcement Learning Symbolic Regression Explainability
256	Reinforcement Learning Framework For The Unreal Engine Wheeler, Justin B 01 March 2023 (has links) (PDF) This dissertation addresses the need for using machine learning-based methods rather than traditional rule-based methods for controlling non-playable characters (NPCs). The goal of the Reinforcement Learning Framework for the Unreal Engine is to enable game development studios to create, train, and more easily implement smarter, more compelling AI characters in major video game releases. The framework contains three distinct software libraries: an Unreal Engine reinforcement learning library whose purpose is to enable Unreal Engine levels to act as reinforcement learning environments, a python library which provides convenient abstractions and implementations to the reinforcement learning process, and a flexible connection system responsible for the communication between the two sides of the framework. In this dissertation, I describe the framework in detail, demonstrate the framework’s capability by implementing, training, and evaluating on the cartpole benchmark, and prove the system’s viability by comparing it to similar tools already on the market. Unreal Engine Reinforcement Learning Video Games Machine Learning Computational Engineering
257	Learning Cooperation In Hunter-prey Problem Via State Abstraction Iscen, Atil 01 June 2009 (has links) (PDF) Hunter-Prey or Prey-Pursuit problem is a common toy domain for Reinforcement Learning, but the size of the state space is exponential in the parameters such as size of the grid or number of agents. As the size of the state space makes the flat Q-learning impossible to use for different scenarios, this thesis presents an approach to make the size of the state space constant by producing agents that use previously learned knowledge to perform on bigger scenarios containing more agents. Inspired from HRL methods, the method is composed of a parallel subtasks schema dividing the task into choices of simpler subtasks, a state representation technique convenient for this schema and its extension for bigger grids. Experimental results show that proposed method successfully provides agents that perform near to hand-coded agents by using constant sized state space independent from parameters of the domain.
258	Autonoma drönare : modifiering av belöningsfunktionen i airsim / Autonomous Drones : modification of the reward function in airsim Dzeko, Elvir, Carlsson, Markus January 2018 (has links) Inom det heta forskningsområdet med självflygande drönare sker det en kontinuerlig utveckling både inom forskningen och inom industrin. Det finns flera forskningsproblem kring autonoma fordon, inklusive autonom styrning av drönare. Ett intressant spår för autonom styrning av drönare, är via deep reinforcement learning, dvs. en kombination av djupa neuronnät med reinforcement learning. Problemen som ofta uppkommer är tidskrävande träning, ineffektiv manövrering och problem med oförutsägbarhet och säkerhet. Även höga kostnader kan vara ett problem. Med hjälp av simuleringsprogrammet AirSim har vi fått en möjlighet att testa aktuella algoritmer utan hänsyn till kostnader och andra begränsande faktorer som kan utgöra svårigheter för att arbeta inom detta område. Microsofts egenutvecklade simulator AirSim tillåter användare att via deras applikationsprogrammeringsgränssnitt kommunicera med drönaren i programmet, vilket gör det möjligt att testa olika algoritmer. Frågeställningen som berörs är hur kan den existerande belöningsfunktionen i AirSim simulatorn förbättras med avseende på att undvika hinder och förflytta drönaren från start till mål. Målet med undersökningen är att studera och förbättra AirSims existerande Deep Q-Network algoritm med fokus på belöningsfunktionen och testa den i olika simulerade miljöer. Med hjälp av två olika experiment som utförts i två olika miljöer, observerades belöningen, antalet kollisioner och beteendet agenten hade i simulatorn. Vi lyckades inte få fram tillräckligt med data för att kunna mäta en tydlig förbättring av den modifierade belöningsfunktionens utvärderingsmått, dock kan vi säga att vi lyckades utveckla en belöningsfunktion som presterar bra genom att den undviker hinder och tar sig till mål. För att kunna jämföra vilken av belöningsfunktionerna som är bättre, behövs mer forskning inom ämnet. Med de problem som fanns med att samla in data är slutsatsen att vi inte lyckades förbättra algoritmen då vi vet inte om den presterar bättre eller sämre än den existerande belöningsfunktionen. / Drones are growing popular and so is the research within the field of autonomous drones. There are several research problems around autonomous vehicles overall, but one interesting problem covered by this study is the autonomous manoeuvring of drones. One interesting path for autonomous drones is through deep reinforcement learning, which is a combination of deep neural networks and reinforcement learning. Problems that researchers often encounter within the field stretch from time consuming training, effective manoeuvring to problems with unpredictability and security. Even high costs of testing can be an issue. With the help of simulation programs, we are able to test algorithms without any concerns to cost or other real-world factors that could limit our work. Microsoft’s own simulator AirSim lets users control the vehicle in their simulator through an application programming interface, which enables the possibility to test a variety of algorithms. The research question addressed in this study is how can the pre-existing reward function be improved on avoiding obstacles and move the drone from start to goal. The goal of this study is to find improvements on AirSim’s pre-existing Deep Q-Network algorithm’s reward function and test it in two different simulated environments. By conducting several experiments and storing evaluation metrics produced by the agents, it was possible to observe a result. The observed evaluation metrics included the average reward that the agent received over time, number of collisions and overall performance in the respective environment. We were not successfully able to gather enough data to measure an improvement of the evaluation metrics for the modified reward function. The modified function that was created performed well but did not display any substantially improved performance. To be able to successfully compare if one reward function is better than the other more research needs to be done. With the difficulties of gathering data, the conclusion is that we created a reward function that we can’t tell if it is better or worse than the benchmark reward function. Autonomous drones dqn reinforcement learning neural network AirSim Autonom drönare AirSim reinforcement learning neuronnät Information Systems
259	Solution Of Delayed Reinforcement Learning Problems Having Continuous Action Spaces Ravindran, B 03 1900 (has links) (PDF) No description available. Machine Learning Self Organizing Systems Reinforcement Learning Q-learning Continuous Action Spaces Q-functions Delayed Reinforcement Learning Computer Science
260	Using Reinforcement Learning in Partial Order Plan Space Ceylan, Hakan 05 1900 (has links) Partial order planning is an important approach that solves planning problems without completely specifying the orderings between the actions in the plan. This property provides greater flexibility in executing plans; hence making the partial order planners a preferred choice over other planning methodologies. However, in order to find partially ordered plans, partial order planners perform a search in plan space rather than in space of world states and an uninformed search in plan space leads to poor efficiency. In this thesis, I discuss applying a reinforcement learning method, called First-visit Monte Carlo method, to partial order planning in order to design agents which do not need any training data or heuristics but are still able to make informed decisions in plan space based on experience. Communicating effectively with the agent is crucial in reinforcement learning. I address how this task was accomplished in plan space and the results from an evaluation of a blocks world test bed. Intelligent agents (Computer software) Artificial intelligence. artificial intelligence reinforcement learning partial order planning

Search results