Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
251 |
A Reinforcement Learning-based Scheduler for Minimizing Casualties of a Military Drone SwarmJin, Heng 14 July 2022 (has links)
In this thesis, we consider a swarm of military drones flying over an unfriendly territory, where a drone can be shot down by an enemy with an age-based risk probability. We study the problem of scheduling surveillance image transmissions among the drones with the objective of minimizing the overall casualty. We present Hector, a reinforcement learning-based scheduling algorithm. Specifically, Hector only uses the age of each detected target, a piece of locally available information at each drone, as an input to a neural network to make scheduling decisions. Extensive simulations show that Hector significantly reduces casualties than a baseline round-robin algorithm. Further, Hector can offer comparable performance to a high-performing greedy scheduler, which assumes complete knowledge of global information. / Master of Science / Drones have been successfully deployed by the military. The advancement of machine learning further empowers drones to automatically identify, recognize, and even eliminate adversary targets on the battlefield. However, to minimize unnecessary casualties to civilians, it is important to introduce additional checks and control from the control center before lethal force is authorized. Thus, the communication between drones and the control center becomes critical.
In this thesis, we study the problem of communication between a military drone swarm and the control center when drones are flying over unfriendly territory where drones can be shot down by enemies. We present Hector, an algorithm based on machine learning, to minimize the overall casualty of drones by scheduling data transmission. Extensive simulations show that Hector significantly reduces casualties than traditional algorithms.
|
252 |
Considerations of Reinforcement Learning within Real-Time Wireless Communication SystemsJones, Alyse M. 15 June 2022 (has links)
Afflicted heavily by spectrum congestion, the unpredictable, dynamic conditions of the radio frequency (RF) spectrum has increasingly become a major obstacle for devices today. More specifically, a significant threat existing within this kind of environment is interference caused by collisions, which is increasingly unavoidable in an overcrowded spectrum. Thus, these devices require a way to avoid such events. Cognitive radios (CR) were proposed as a solution through its transmission adaptability and decision-making capabilities within a radio. Through spectrum sensing, CRs are able to capture the current condition of the RF spectrum and based on its decision-making strategy, interpret these results to make an informed decision on what to do next to optimize its own communication. With the emergence of artificial intelligence, one such decision-making strategy CRs can utilize is Reinforcement Learning (RL). Unlike standard adaptive radios, CRs equipped with RL can predict the conditions of the RF spectrum, and using these predictions, understand what it must do in the future to operate optimally.
Recognizing the usefulness of RL in hard-to-predict environments, such as the RF spectrum, research of RL within CRs have become more popular over the past decade, especially for interference mitigation. However, the existing literature neglects to confront the possible limitations that pose a threat to the proper implementation of RL in RF systems. Therefore, this thesis is motivated to investigate what limitations in real-time communication systems can hinder the performance of RL, and as a result of these limitations, emphasize the considerations that should be a focus in the design and implementation of radio frequency reinforcement learning (RFRL) systems. The effects of latency, power, wireless channel impairments, different transmission protocols, and different spectrum sensing detectors are among the possible limitations simulated and analyzed within this work that are not typically considered within simulation-based prior art. To perform this investigation, a representative real-time OFDM transmit/receive chain is implemented within the GNU Radio framework. The system, operating over-the-air through USRPs, leverages reinforcement learning, e.g. Q-Learning, in order to avoid interference with other spectrum users. Performance analysis of this representative system provides a systematic approach for helping to predict limiting factors within an implemented real-time system and thus, aim to provide guidance on how to design these systems with these practical limitations in mind. / M.S. / Because the space in which communication signals travel is congested with activity, collisions among signals, called interference, is becoming more of a problem in wireless communications. Therefore, to avoid such an occurrence, intelligent radios are used to adapt communication devices to operate optimally within this congested space. With the emergence of artificial intelligence, where devices can learn on their own how to adapt, one such way an intelligent radio can dynamically adapt to the congestion is through Reinforcement Learning (RL), which enables prediction of signal activity within the communication space over time. Intelligent radios equipped with RL learn through trial-and-error how to operate optimally within the communication space to avoid places within the communication space that are busy and congested.
Recognizing the usefulness of RL in hard-to-predict environments, research of RL within intelligent radios have become more popular over the past decade, especially for navigating a communication space that is congested and where collisions are common. However, existing literature neglects to confront the possible limitations that pose a threat to the proper implementation of RL in communication systems. Therefore, this thesis is motivated to investigate what limitations in real-world communication systems can hinder the performance of RL, and as a result of these limitations, emphasize the considerations that should be a focus in the design and implementation of communication systems equipped with RL. Effects, such as delays in the system, differences in how the signal operates, and how the signal is affected while it is traveling, are among the possible limitations simulated and analyzed within this work that are not typically considered within prior art. To perform this investigation, a modern representative communication system was implemented within software-enabled radios. The system leverages reinforcement learning in order to avoid collisions with other signals in the communication space. Performance analysis of this representative system provides a systematic approach for helping to predict limiting factors within an implemented real-world communication system with RL and thus, aim to provide guidance on how to design these systems with these practical limitations in mind.
|
253 |
Directional Airflow for HVAC SystemsAbedi, Milad January 2019 (has links)
Directional airflow has been utilized to enable targeted air conditioning in cars and airplanes for many years, where the occupants could adjust the direction of flow. In the building sector however, HVAC systems are usually equipped with stationary diffusors that can only supply the air either in the form of diffusion or with fixed direction to the room in which they have been installed. In the present thesis, the possibility of adopting directional airflow in lieu of the conventional uniform diffusors has been investigated. The potential benefits of such a modification in control capabilities of the HVAC system in terms of improvements in the overall occupant thermal comfort and energy consumption of the HVAC system have been investigated via a simulation study and an experimental study. In the simulation study, an average of 59% per cycle reduction was achieved in the energy consumption. The reduction in the required duration of airflow (proportional to energy consumption) in the experimental study was 64% per cycle. The feasibility of autonomous control of the directional airflow, has been studied in a simulation experiment by utilizing the Reinforcement Learning algorithm which is an artificial intelligence approach that facilitates autonomous control in unknown environments. In order to demonstrate the feasibility of enabling the existing HVAC systems to control the direction of airflow, a device (called active diffusor) was designed and prototyped. The active diffusor successfully replaced the existing uniform diffusor and was able to effectively target the occupant positions by accurately directing the airflow jet to the desired positions. / M.S. / The notion of adjustable direction of airflow has been used in the car industry and airplanes for decades, enabling the users to manually adjust the direction of airflow to their satisfaction. However, in the building the introduction of the incoming airflow to the environment of the room is achieved either by non-adjustable uniform diffusors, aiming to condition the air in the environment in a homogeneous manner. In the present thesis, the possibility of adopting directional airflow in place of the conventional uniform diffusors has been investigated. The potential benefits of such a modification in control capabilities of the HVAC system in terms of improvements in the overall occupant thermal comfort and energy consumption of the HVAC system have been investigated via a simulation study and an experimental study. In the simulation study, an average of 59% per cycle reduction was achieved in the energy consumption. The reduction in the required duration of airflow (proportional to energy consumption) in the experimental study was 64% per cycle on average. The feasibility of autonomous control of the directional airflow, has been studied in a simulation experiment by utilizing the Reinforcement Learning algorithm which is an artificial intelligence approach that facilitates autonomous control in unknown environments. In order to demonstrate the feasibility of enabling the existing HVAC systems to control the direction of airflow, a device (called active diffusor) was designed and prototyped. The active diffusor successfully replaced the existing uniform diffusor and was able to effectively target the occupant positions by accurately directing the airflow jet to the desired positions.
|
254 |
Reinforcement Learning with Gaussian Processes for Unmanned Aerial Vehicle NavigationGondhalekar, Nahush Ramesh 03 August 2017 (has links)
We study the problem of Reinforcement Learning (RL) for Unmanned Aerial Vehicle (UAV) navigation with the smallest number of real world samples possible. This work is motivated by applications of learning autonomous navigation for aerial robots in structural inspec- tion. A naive RL implementation suffers from curse of dimensionality in large continuous state spaces. Gaussian Processes (GPs) exploit the spatial correlation to approximate state- action transition dynamics or value function in large state spaces. By incorporating GPs in naive Q-learning we achieve better performance in smaller number of samples. The evalua- tion is performed using simulations with an aerial robot. We also present a Multi-Fidelity Reinforcement Learning (MFRL) algorithm that leverages Gaussian Processes to learn the optimal policy in a real world environment leveraging samples gathered from a lower fidelity simulator. In MFRL, an agent uses multiple simulators of the real environment to perform actions. With multiple levels of fidelity in a simulator chain, the number of samples used in successively higher simulators can be reduced. / Master of Science / Increasing development in the field of infrastructure inspection using Unmanned Aerial Vehicles (UAVs) has been seen in the recent years. This thesis presents work related to UAV navigation using Reinforcement Learning (RL) with the smallest number of real world samples. A naive RL implementation suffers from the curse of dimensionality in large continuous state spaces. Gaussian Processes (GPs) exploit the spatial correlation to approximate state-action transition dynamics or value function in large state spaces. By incorporating GPs in naive Q-learning we achieve better performance in smaller number of samples. The evaluation is performed using simulations with an aerial robot. We also present a Multi-Fidelity Reinforcement Learning (MFRL) algorithm that leverages Gaussian Processes to learn the optimal policy in a real world environment leveraging samples gathered from a lower fidelity simulator. In MFRL, an agent uses multiple simulators of the real environment to perform actions. With multiple levels of fidelity in a simulator chain, the number of samples used in successively higher simulators can be reduced. By developing a bidirectional simulator chain, we try to provide a learning platform for the robots to safely learn required skills in the smallest possible number of real world samples possible.
|
255 |
Design of neurofuzzy controller using reinforcement learning with application to linear system and inverted pendulumSaengdeejing, Apiwat 01 January 1998 (has links)
No description available.
|
256 |
Neuro-Symbolic Distillation of Reinforcement Learning AgentsAbir, Farhan Fuad 01 January 2024 (has links) (PDF)
In the past decade, reinforcement learning (RL) has achieved breakthroughs across various domains, from surpassing human performance in strategy games to enhancing the training of large language models (LLMs) with human feedback. However, RL has yet to gain widespread adoption in mission-critical fields such as healthcare and autonomous vehicles. This is primarily attributed to the inherent lack of trust, explainability, and generalizability of neural networks in deep reinforcement learning (DRL) agents. While neural DRL agents leverage the power of neural networks to solve specific tasks robustly and efficiently, this often comes at the cost of explainability and generalizability. In contrast, pure symbolic agents maintain explainability and trust but often underperform in high-dimensional data. In this work, we developed a method to distill explainable and trustworthy agents using neuro-symbolic AI. Neuro-symbolic distillation combines the strengths of symbolic reasoning and neural networks, creating a hybrid framework that leverages the structured knowledge representation of symbolic systems alongside the learning capabilities of neural networks. The key steps of neuro-symbolic distillation involve training traditional DRL agents, followed by extracting, selecting, and distilling their learned policies into symbolic forms using symbolic regression and tree-based models. These symbolic representations are then employed instead of the neural agents to make interpretable decisions with comparable accuracy. The approach is validated through experiments on Lunar Lander and Pong, demonstrating that symbolic representations can effectively replace neural agents while enhancing transparency and trustworthiness. Our findings suggest that this approach mitigates the black-box nature of neural networks, providing a pathway toward more transparent and trustworthy AI systems. The implications of this research are significant for fields requiring both high performance and explainability, such as autonomous systems, healthcare, and financial modeling.
|
257 |
Towards Human-level Dexterity via Robot LearningKhandate, Gagan Muralidhar January 2024 (has links)
Dexterous intelligence—the ability to perform complex interactions with multi-fingered hands—is a pinnacle of human physical intelligence and emergent higher-order cognitive skills. However, contrary to Moravec's paradox, dexterous intelligence in humans appears simple only superficially. Many million years were spent co-evolving the human brain and hands including rich tactile sensing. Achieving human-level dexterity with robotic hands has long been a fundamental goal in robotics and represents a critical milestone toward general embodied intelligence. In this pursuit, computational sensorimotor learning has made significant progress, enabling feats such as arbitrary in-hand object reorientation. However, we observe that achieving higher levels of dexterity requires overcoming very fundamental limitations of computational sensorimotor learning.
I develop robot learning methods for highly dexterous multi-fingered manipulation by directly addressing these limitations at their root cause. Chiefly, through key studies, this disseration progressively builds an effective framework for reinforcement learning of dexterous multi-fingered manipulation skills. These methods adopt structured exploration, effectively overcoming the limitations of random exploration in reinforcement learning. The insights gained culminate in a highly effective reinforcement learning that incorporates sampling-based planning for direct exploration. Additionally, this thesis explores a new paradigm of using visuo-tactile human demonstrations for dexterity, introducing corresponding imitation learning techniques.
|
258 |
Optimization of Trusses and Frames using Reinforcement Learning / 強化学習を用いたトラスと骨組の最適化Chi-Tathon, Kupwiwat 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(工学) / 甲第25241号 / 工博第5200号 / 新制||工||1992(附属図書館) / 京都大学大学院工学研究科建築学専攻 / (主査)教授 大崎 純, 教授 西山 峰広, 准教授 藤田 皓平 / 学位規則第4条第1項該当 / Doctor of Agricultural Science / Kyoto University / DFAM
|
259 |
Reinforcement Learning Framework For The Unreal EngineWheeler, Justin B 01 March 2023 (has links) (PDF)
This dissertation addresses the need for using machine learning-based methods rather than traditional rule-based methods for controlling non-playable characters (NPCs). The goal of the Reinforcement Learning Framework for the Unreal Engine is to enable game development studios to create, train, and more easily implement smarter, more compelling AI characters in major video game releases. The framework contains three distinct software libraries: an Unreal Engine reinforcement learning library whose purpose is to enable Unreal Engine levels to act as reinforcement learning environments, a python library which provides convenient abstractions and implementations to the reinforcement learning process, and a flexible connection system responsible for the communication between the two sides of the framework. In this dissertation, I describe the framework in detail, demonstrate the framework’s capability by implementing, training, and evaluating on the cartpole benchmark, and prove the system’s viability by comparing it to similar tools already on the market.
|
260 |
Machine Learning Simulation: Torso Dynamics of Robotic BipedRenner, Michael Robert 22 August 2007 (has links)
Military, Medical, Exploratory, and Commercial robots have much to gain from exchanging wheels for legs. However, the equations of motion of dynamic bipedal walker models are highly coupled and non-linear, making the selection of an appropriate control scheme difficult. A temporal difference reinforcement learning method known as Q-learning develops complex control policies through environmental exploration and exploitation. As a proof of concept, Q-learning was applied through simulation to a benchmark single pendulum swing-up/balance task; the value function was first approximated with a look-up table, and then an artificial neural network. We then applied Evolutionary Function Approximation for Reinforcement Learning to effectively control the swing-leg and torso of a 3 degree of freedom active dynamic bipedal walker in simulation. The model began each episode in a stationary vertical configuration. At each time-step the learning agent was rewarded for horizontal hip displacement scaled by torso altitude--which promoted faster walking while maintaining an upright posture--and one of six coupled torque activations were applied through two first-order filters. Over the course of 23 generations, an approximation of the value function was evolved which enabled walking at an average speed of 0.36 m/s. The agent oscillated the torso forward then backward at each step, driving the walker forward for forty-two steps in thirty seconds without falling over. This work represents the foundation for improvements in anthropomorphic bipedal robots, exoskeleton mechanisms to assist in walking, and smart prosthetics. / Master of Science
|
Page generated in 0.1176 seconds