Spelling suggestions: "subject:"reinforcement."" "subject:"einforcement.""
531 |
Time-normalised discounting in reinforcement learningAkan, Oguzhan, Waara Ankarstrand, Wilmer January 2024 (has links)
Reinforcement learning has emerged as a powerful paradigm in machinelearning, witnessing remarkable progress in recent years. Amongreinforcement algorithms, Q-learning stands out, enabling agents tolearn quickly from past actions. This study aims to investigate andenhance Q-learning methodologies, with a specific focus on tabularQ-learning. In particular, it addresses Q-learning with an actionspace containing actions that require different amounts of time toexecute. With such an action space the algorithm might convergeto a suboptimal solution when using a constant discount factor sincediscounting occurs per action and not per time step. We refer to thisissue as the non-temporal discounting (NTD) problem. By introducinga time-normalised discounting function, we were able to address theissue of NTD. In addition, we were able to stabilise the solutionby implementing a cost for specific actions. As a result, the modelconverged to the expected solution. Building on these results it wouldbe wise to implement time-normalised discounting in a state-of-the-artreinforcement learning model such as deep Q-learning.
|
532 |
Inverse Reinforcement Learning and Routing Metric DiscoveryShiraev, Dmitry Eric 01 September 2003 (has links)
Uncovering the metrics and procedures employed by an autonomous networking system is an important problem with applications in instrumentation, traffic engineering, and game-theoretic studies of multi-agent environments. This thesis presents a method for utilizing inverse reinforcement learning (IRL)techniques for the purpose of discovering a composite metric used by a dynamic routing algorithm on an Internet Protocol (IP) network. The network and routing algorithm are modeled as a reinforcement learning (RL) agent and a Markov decision process (MDP). The problem of routing metric discovery is then posed as a problem of recovering the reward function, given observed optimal behavior. We show that this approach is empirically suited for determining the relative contributions of factors that constitute a composite metric. Experimental results for many classes of randomly generated networks are presented. / Master of Science
|
533 |
Encoding the Sensor Allocation Problem for Reinforcement LearningPenn, Dylan R. 16 May 2024 (has links)
Traditionally, space situational awareness (SSA) sensor networks have relied on dynamic programming theory to generate tasking plans which govern how sensors are allocated to observe resident space objects. Deep reinforcement learning (DRL) techniques, with their ability to be trained on simulated environments, which are readily available for the SSA sensor allocation problem, and demonstrated performance in other fields, have potential to exceed performance of deterministic methods. The research presented in this dissertation develops techniques for encoding an SSA environment model to apply DRL to the sensor allocation problem. This dissertation is the compilation of two separate but related studies. The first study compares two alternative invalid action handling techniques, penalization and masking. The second study examines the performance of policies that have forecast state knowledge incorporated in the observation space. / Doctor of Philosophy / Resident space objects (RSOs) are typically tracked by ground-based sensors (telescopes and radar). Determining how to allocate sensors to RSOs is a complex problem traditionally performed by dynamic programming techniques. Deep reinforcement learning (DRL), a subset of machine learning, has demonstrated performance in other fields, and has the potential to exceed performance of traditional techniques. The research presented in this dissertation develops techniques for encoding a space situational awareness environment model to apply DRL to the sensor allocation problem. This dissertation is the compilation of two separate but related studies. The first study compares two alternative invalid action handling techniques, penalization and masking. The second study examines the performance of policies that have forecast state knowledge incorporated in the observation space.
|
534 |
Learning-based Optimal Control of Time-Varying Linear Systems Over Large Time IntervalsBaddam, Vasanth Reddy January 2023 (has links)
We solve the problem of two-point boundary optimal control of linear time-varying systems with unknown model dynamics using reinforcement learning. Leveraging singular perturbation theory techniques, we transform the time-varying optimal control problem into two time-invariant subproblems. This allows the utilization of an off-policy iteration method to learn the controller gains. We show that the performance of the learning-based controller approximates that of the model-based optimal controller and the approximation accuracy improves as the control problem’s time horizon increases. We also provide a simulation example to verify the results / M.S. / We use reinforcement learning to find two-point boundary optimum controls for linear time-varying systems with uncertain model dynamics. We divided the LTV control problem into two LTI subproblems using singular perturbation theory techniques. As a result, it is possible to identify the controller gains via a learning technique. We show that the training-based controller’s performance approaches that of the model-based optimal controller, with approximation accuracy growing with the temporal horizon of the control issue. In addition, we provide a simulated scenario to back up our findings.
|
535 |
A Shaping Procedure for Introducing Horses to ClippingHardaway, Alison K 12 1900 (has links)
The purpose of the current study is to evaluate a procedure that can be used to introduce horses to clipping. Negative reinforcement was used in a shaping paradigm. Shaping steps were conducted by the handler, starting with touching the horse with the hand, then touching the horse with the clippers while they are off, culminating with touching the horse with the clippers while they are on. When a horse broke contact with either the hand or the clippers, the hand or the clippers were held at that point until the horse emitted an appropriate response. When the horse emitted an appropriate response, the clippers were removed, and the handler stepped away from the horse. For all eight horses, this shaping plan was effective in enabling the clipping of each horse with minimal inappropriate behavior and without additional restraint. The entire process took under an hour for each horse.
|
536 |
MILA - ein Labordemonstrator zur Erprobung von Methoden des Imitationslernens für autonome ArbeitsmaschinenMenz, C., Fränzel, N., Wenzel, A. 18 February 2025 (has links)
Im vorliegenden Beitrag wird ein Labordemonstrator zur Erprobung von Methoden des Imitationslernens
für autonome Arbeitsmaschinen vorgestellt. Der Einsatz miniaturisierter Modelle ermöglicht
dabei eine kosteneffiziente und flexible Testplattform, mit der sich realitätsnahe Szenarien im Kleinmaßstab
abbilden lassen. Nach einer Einführung in das Thema wird der hard- und softwaretechnische
Aufbau des Demonstrators erläutert und auf den aktuellen Realisierungsstand eingegangen. Die
Funktionsfähigkeit wird anhand eines einführenden Anwendungsbeispiels zur Positionsregelung eines
Traktormodells dargelegt. Das vorgestellte System eignet sich als Grundlage für die Erprobung
weiterer Szenarien und Methoden sowie deren möglichen Transfer auf reale Anwendungen.
|
537 |
Effects of Click + Continuous Food Vs. Click + Intermittent Food on the Maintenance of Dog Behavior.Wennmacher, Pamela L. 05 1900 (has links)
There is disagreement among clicker trainers on whether or not food should be delivered every time the clicker (conditioned reinforcer) is used. However, presenting a conditioned reinforcer without food can weaken the strength of the conditioned reinforcer and also disrupt its discriminative stimulus function. A within subjects reversal design was used with 2 dogs to compare the behavioral effects of continuous pairings (C+F condition) vs. intermittent pairings (C+C+F condition) of the clicker with food. Results show that the C+C+F condition affects the frequency, accuracy, topography, and intensity of the behavior, and increases noncompliance and other unwanted behaviors. This study adds to the literature by evaluating the effects of conditioned reinforcement in an applied setting using discrete trials without undergoing extinction.
|
538 |
Internal-External Locus of Control, Perception of Teacher Intermittency of Reinforcement and AchievementWelch, Linda N. 12 1900 (has links)
This study measured the relationships between locus of control, students' perception of the schedule of teacher reinforcement, and academic achievement. The Intellectual Achievement Responsibility questionnaire, Perception of Teacher Reinforcement scale, and Wide Range Achievement Test were used to measure these variables. All subscores of the Intellectual Achievement Responsibility questionnaire correlated significantly with achievement for the females, but no relationships were found for the males. Perception of the teacher as partially rewarding was significantly correlated with reading, spelling, and total achievement for the males and with reading and arithmetic achievement for the females. Perception of the teacher as partially punishing was significantly correlated with arithmetic achievement for the males, but was not related to achievement for the females.
|
539 |
Learned Helplessness: The Result of the Uncontrollability of Reinforcement or the Result of the Uncontrollability of Aversive Stimuli?Benson, James S. 08 1900 (has links)
This research demonstrates that experience with uncontrollable reinforcement, here defined as continuous non-contingent positive feedback to solution attempts of insoluble problems, fails to produce the proactive interference phenomenon, learned helplessness, while uncontrollable aversive events, here defined as negative feedback to solution attempts of insoluble problems, produces that phenomenon. These results partially support the "learned helplessness" hypothesis of Seligman (1975) which predicts that experience with uncontrollable reinforcement, the offset of negative events or the onset of positive ones, results in learning that responding is independent of reinforcement and that learning transfers to subsequent situations. This research further demonstrates that experience with controllability, here defined as solubility, results in enhanced competence.
|
540 |
An Evaluation of Negative Reinforcement During Error Correction ProceduresMaillard, Gloria Nicole 12 1900 (has links)
This study evaluated the effects of error correction procedures on sight word acquisition. Participants were four typically developing children in kindergarten and first grade. We used an adapted alternating treatment design embedded within a multiple baseline design to evaluate instructional efficacy of two error correction procedures; one with preferred items plus error correction and one with error correction only, and a concurrent chain schedule to evaluate participant preference for instructional procedure. The results show that there was no difference in acquisition rates between the procedures. The evaluation also showed children prefer procedures that include a positive reinforcement component.
|
Page generated in 0.048 seconds