Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
31 |
Relationship of cognitive style and reinforcement learning in counseling /Riemer, Helmut Herbert January 1967 (has links)
No description available.
|
32 |
The Application of Reinforcement Learning for Interceptor GuidancePorter, Daniel Michael 04 October 2024 (has links)
The progression of hypersonic vehicle research and development has presented a challenge to modern missile defenses. These attack vehicles travel at speeds of Mach 5+, have low trajectories that result in late radar detections, and can be highly maneuverable. To counter this, new interceptors must be developed. This work explores using machine learning for the guidance of these interceptors through applied steering commands, with the intent to improve upon traditional guidance methods. Specifically, proximal policy optimization (PPO) was selected as the reinforcement learning algorithm due to its advanced and efficient nature, as well as its successful use in related work. A framework was developed and tuned for the interceptor guidance problem, combining the PPO algorithm with a specialized reward shaping method and tuned parameters for the engagements of interest. Low-fidelity vehicle models were used to reduce training time and narrow the scope of work towards improving the guidance algorithms. Models were trained and tested on several case studies to understand the benefits and limitations of an intelligently guided interceptor. Performance comparisons between the trained guidance models and traditional methods of guidance were made for cases with supersonic, hypersonic, weaving, and dynamically evasive attack vehicles. The models were able to perform well with initial conditions outside of their training sets, but more significant differences in the engagements needed to be included in training. The models were therefore found to be more rigid than desired, limiting their effectiveness in new engagements. Compared to the traditional methods, the PPO-guided interceptor was able to intercept the attacker faster in most cases, and had a smaller miss distance against several evasive attackers. However, the PPO-guided interceptor had a lower percent kill against nonmaneuvering attackers, and typically required larger lateral acceleration commands than traditional methods. This work acts as a strong foundation for using machine learning for guiding missile interceptors, and presents both benefits and limitations of a current implementation. Proposals for future efforts involve increasing the fidelity and complexity of the vehicles, engagements, and guidance methods. / Master of Science / Hypersonic vehicles are advanced threats that are difficult to intercept due to their low trajectories, maneuverability, and high speeds. Machine learning is used to have a model learn to intelligently guide an interceptor against an attack vehicle, with the goal of protecting a target. A framework is developed and tuned to address the specifics of this problem space, using an existing advanced algorithm. Various case studies are explored, with both maneuvering and non-maneuvering attackers. The non-maneuvering cases include supersonic, constant velocity engagements with one or more targets as well as ones with hypersonic attackers and an initially stationary interceptor. The evasive methods include preplanned weaving maneuvers and dynamic evasion. The test results from these guidance models are then compared to traditional methods of guidance. Although the performance varied by case, the machine learning models were found to be fairly rigid and did not perform well in engagements that significantly differed from what they were trained on. However, some performance benefits were observed, and additional strategies may be required to increase adaptability. This work provides a foundation for proposed future work, including improving the fidelity of the models and the complexity of the engagements.
|
33 |
Nonparametric Inverse Reinforcement Learning and Approximate Optimal Control with Temporal Logic TasksPerundurai Rajasekaran, Siddharthan 30 August 2017 (has links)
"This thesis focuses on two key problems in reinforcement learning: How to design reward functions to obtain intended behaviors in autonomous systems using the learning-based control? Given complex mission specification, how to shape the reward function to achieve fast convergence and reduce sample complexity while learning the optimal policy? To answer these questions, the first part of this thesis investigates inverse reinforcement learning (IRL) method with a purpose of learning a reward function from expert demonstrations. However, existing algorithms often assume that the expert demonstrations are generated by the same reward function. Such an assumption may be invalid as one may need to aggregate data from multiple experts to obtain a sufficient set of demonstrations. In the first and the major part of the thesis, we develop a novel method, called Non-parametric Behavior Clustering IRL. This algorithm allows one to simultaneously cluster behaviors while learning their reward functions from demonstrations that are generated from more than one expert/behavior. Our approach is built upon the expectation-maximization formulation and non-parametric clustering in the IRL setting. We apply the algorithm to learn, from driving demonstrations, multiple driver behaviors (e.g., aggressive vs. evasive driving behaviors). In the second task, we study whether reinforcement learning can be used to generate complex behaviors specified in formal logic — Linear Temporal Logic (LTL). Such LTL tasks may specify temporally extended goals, safety, surveillance, and reactive behaviors in a dynamic environment. We introduce reward shaping under LTL constraints to improve the rate of convergence in learning the optimal and probably correct policies. Our approach exploits the relation between reward shaping and actor-critic methods for speeding up the convergence and, as a consequence, reducing training samples. We integrate compositional reasoning in formal methods with actor-critic reinforcement learning algorithms to initialize a heuristic value function for reward shaping. This initialization can direct the agent towards efficient planning subject to more complex behavior specifications in LTL. The investigation takes the initial step to integrate machine learning with formal methods and contributes to building highly autonomous and self-adaptive robots under complex missions."
|
34 |
Cost and Power Loss Aware Coalitions under Uncertainty in Transactive Energy SystemsSadeghi, Mohammad 02 June 2022 (has links)
The need to cope with the rapid transformation of the conventional electrical grid into the future smart grid, with multiple connected microgrids, has led to the investigation of optimal smart grid architectures. The main components of the future smart grids such as generators, substations, controllers, smart meters and collector nodes are evolving; however, truly effective integration of these elements into the microgrid context to guarantee intelligent and dynamic functionality across the whole smart grid remains an open issue. Energy trading is a significant part of this integration.
In microgrids, energy trading refers to the use of surplus energy in one microgrid to satisfy the demand of another microgrid or a group of microgrids that form a microgrid community. Different techniques are employed to manage the energy trading process such as optimization-based and conventional game-theoretical methods, which bring about several challenges including complexity, scalability and ability to learn dynamic environments. A common challenge among all of these methods is adapting to changing circumstances. Optimization methods, for example, show promising performance in static scenarios where the optimal solution is achieved for a specific snapshot of the system. However, to use such a technique in a dynamic environment, finding the optimal solutions for all the time slots is needed, which imposes a significant complexity. Challenges such as this can be best addressed using game theory techniques empowered with machine learning methods across grid infrastructure and microgrid communities.
In this thesis, novel Bayesian coalitional game theory-based and Bayesian reinforcement learning-based coalition formation algorithms are proposed, which allow the microgrids to exchange energy with their coalition members while minimizing the associated cost and power loss. In addition, a deep reinforcement learning scheme is developed to address the problem of large convergence time resulting from the sizeable state-action space of the methods mentioned above. The proposed algorithms can ideally overcome the uncertainty in the system. The advantages of the proposed methods are highlighted by comparing them with the conventional coalitional game theory-based techniques, Q-learning-based technique, random coalition formation, as well as with the case with no coalitions. The results show the superiority of the proposed methods in terms of power loss and cost minimization in dynamic environments.
|
35 |
FAST(ER) DATA GENERATION FOR OFFLINE RL AND FPS ENVIRONMENTS FOR DECISION TRANSFORMERSMark R Trovinger (17549493) 06 December 2023 (has links)
<p dir="ltr">Reinforcement learning algorithms have traditionally been implemented with the goal</p><p dir="ltr">of maximizing a reward signal. By contrast, Decision Transformer (DT) uses a transformer</p><p dir="ltr">model to predict the next action in a sequence. The transformer model is trained on datasets</p><p dir="ltr">consisting of state, action, return trajectories. The original DT paper examined a small</p><p dir="ltr">number of environments, five from the Atari domain, and three from continuous control,</p><p dir="ltr">and one that examined credit assignment. While this gives an idea of what the decision</p><p dir="ltr">transformer can do, the variety of environments in the Atari domain are limited. In this</p><p dir="ltr">work, we propose an extension of the environments that decision transformer can be trained</p><p dir="ltr">on by adding support for the VizDoom environment. We also developed a faster method for</p><p dir="ltr">offline RL dataset generation, using Sample Factory, a library focused on high throughput,</p><p dir="ltr">to generate a dataset comparable in quality to existing methods using significantly less time.</p><p dir="ltr"><br></p>
|
36 |
Physics-based reinforcement learning for autonomous manipulationScholz, Jonathan 07 January 2016 (has links)
With recent research advances, the dream of bringing domestic robots into our everyday lives has become more plausible than ever. Domestic robotics has grown dramatically in the past decade, with applications ranging from house cleaning to food service to health care. To date, the majority of the planning and control machinery for these systems are carefully designed by human engineers. A large portion of this effort goes into selecting the appropriate models and control techniques for each application, and these skills take years to master. Relieving the burden on human experts is therefore a central challenge for bringing robot technology to the masses.
This work addresses this challenge by introducing a physics engine as a model space for an autonomous robot, and defining procedures for enabling robots to decide when and how to learn these models. We also present an appropriate space of motor controllers for these models, and introduce ways to intelligently select when to use each controller based on the estimated model parameters. We integrate these components into a framework called Physics-Based Reinforcement Learning, which features a stochastic physics engine as the core model structure. Together these methods enable a robot to adapt to unfamiliar environments without human intervention.
The central focus of this thesis is on fast online model learning for objects with under-specified dynamics. We develop our approach across a diverse range of domestic tasks, starting with a simple table-top manipulation task, followed by a mobile manipulation task involving a single utility cart, and finally an open-ended navigation task with multiple obstacles impeding robot progress. We also present simulation results illustrating the efficiency of our method compared to existing approaches in the learning literature.
|
37 |
Sequential frameworks for statistics-based value function representation in approximate dynamic programmingFan, Huiyuan. January 2008 (has links)
Thesis (Ph.D.) -- University of Texas at Arlington, 2008.
|
38 |
Corporate classifier systemsTomlinson, Andrew Stephen January 1999 (has links)
No description available.
|
39 |
Reinforcement-learning-based autonomous vehicle navigation in a dynamically changing environmentNgai, Chi-kit. January 2007 (has links)
Thesis (Ph. D.)--University of Hong Kong, 2008. / Also available in print.
|
40 |
Learning successful strategies in repeated general-sum games /Crandall, Jacob W., January 2005 (has links) (PDF)
Thesis (Ph.D.)--Brigham Young University. Dept. of Computer Science, 2005. / Includes bibliographical references (p. 163-168).
|
Page generated in 0.1178 seconds