291 |
The Effects of Contingency Type on Accuracy and Reaction TimeAdams, Owen James 08 1900 (has links)
Positive and negative reinforcement contingencies have been compared in terms of preference, but the differential effects of positive and negative reinforcement on reaction time and accuracy with other variables controlled remain unclear. Fifteen undergraduate students participated in a sound discrimination task that involved random mixed-trial presentations of positive and negative reinforcement contingencies. The participants' goal was to correctly identify whether the tone was shorter or longer than 600 milliseconds. On positive reinforcement trials, the participants received feedback and money tallies only if they identified the sound length correctly, with each correct response in the positive reinforcement trials earning the participant 10 cents. On negative reinforcement trials, the participants received feedback and money tallies only if they identified the sound length incorrectly, with incorrect trials subtracting 10 cents from the participants' total money (which began at $4.00 to equalize the weights of the positive and negative reinforcement contingencies). Accuracy analyses showed a relatively curvilinear relationship between the number of errors for each participant and the binned duration of the sound stimulus, with no differences across the positive and negative reinforcement conditions. Results also indicated weak linear negative correlations at the single subject level between comparison stimulus duration and reaction time, with similar slopes between positive and negative reinforcement trials, and strong curvilinear correlations at the group level, indicating differences between grouped and individual analyses. Overall our results appear to support abandoning the distinction between positive and negative reinforcement as two separate behavioral processes.
|
292 |
Deep Reinforcement Learning of IoT System Dynamics for Optimal Orchestration and Boosted EfficiencyHaowei Shi (16636062) 30 August 2023 (has links)
<p>This thesis targets the orchestration challenge of the Wearable Internet of Things (IoT) systems, for optimal configurations of the system in terms of energy efficiency, computing, and data transmission activities. We have firstly investigated the reinforcement learning on the simulated IoT environments to demonstrate its effectiveness, and afterwards studied the algorithm on the real-world wearable motion data to show the practical promise. More specifically, firstly, challenge arises in the complex massive-device orchestration, meaning that it is essential to configure and manage the massive devices and the gateway/server. The complexity on the massive wearable IoT devices, lies in the diverse energy budget, computing efficiency, etc. On the phone or server side, it lies in how global diversity can be analyzed and how the system configuration can be optimized. We therefore propose a new reinforcement learning architecture, called boosted deep deterministic policy gradient, with enhanced actor-critic co-learning and multi-view state?transformation. The proposed actor-critic co-learning allows for enhanced dynamics abstraction through the shared neural network component. Evaluated on a simulated massive-device task, the proposed deep reinforcement learning framework has achieved much more efficient system configurations with enhanced computing capabilities and improved energy efficiency. Secondly, we have leveraged the real-world motion data to demonstrate the potential of leveraging reinforcement learning to optimally configure the motion sensors. We used paradigms in sequential data estimation to obtain estimated data for some sensors, allowing energy savings since these sensors no longer need to be activated to collect data for estimation intervals. We then introduced the Deep Deterministic Policy Gradient algorithm to learn to control the estimation timing. This study will provide a real-world demonstration of maximizing energy efficiency wearable IoT applications while maintaining data accuracy. Overall, this thesis will greatly advance the wearable IoT system orchestration for optimal system configurations. </p>
|
293 |
Offline Reinforcement Learning from Imperfect Human Guidance / 不完全な人間の誘導からのオフライン強化学習Zhang, Guoxi 24 July 2023 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24856号 / 情博第838号 / 新制||情||140(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 鹿島, 久嗣, 教授 河原, 達也, 教授 森本, 淳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
294 |
Multi-Task Reinforcement Learning: From Single-Agent to Multi-Agent SystemsTrang, Matthew Luu 06 January 2023 (has links)
Generalized collaborative drones are a technology that has many potential benefits. General purpose drones that can handle exploration, navigation, manipulation, and more without having to be reprogrammed would be an immense breakthrough for usability and adoption of the technology. The ability to develop these multi-task, multi-agent drone systems is limited by the lack of available training environments, as well as deficiencies of multi-task learning due to a phenomenon known as catastrophic forgetting. In this thesis, we present a set of simulation environments for exploring the abilities of multi-task drone systems and provide a platform for testing agents in incremental single-agent and multi-agent learning scenarios. The multi-task platform is an extension of an existing drone simulation environment written in Python using the PyBullet Physics Simulation Engine, with these environments incorporated. Using this platform, we present an analysis of Incremental Learning and detail the beneficial impacts of using the technique for multi-task learning, with respect to multi-task learning speed and catastrophic forgetting. Finally, we introduce a novel algorithm, Incremental Learning with Second-Order Approximation Regularization (IL-SOAR), to mitigate some of the effects of catastrophic forgetting in multi-task learning. We show the impact of this method and contrast the performance relative to a multi-agent multi-task approach using a centralized policy sharing algorithm. / Master of Science / Machine Learning techniques allow drones to be trained to achieve tasks which are otherwise time-consuming or difficult. The goal of this thesis is to facilitate the work of creating these complex drone machine learning systems by exploring Reinforcement Learning (RL), a field of machine learning which involves learning the correct actions to take through experience. Currently, RL methods are effective in the design of drones which are able to solve one particular task. The next step in this technology is to develop RL systems which are able to handle generalization and perform well across multiple tasks. In this thesis, simulation environments for drones to learn complex tasks are created, and algorithms which are able to train drones in multiple hard tasks are developed and tested. We explore the benefits of using a specific multi-task training technique known as Incremental Learning. Additionally, we consider one of the prohibitive factors of multi-task machine learning-based solutions, the degradation problem of agent performance on previously learned tasks, known as catastrophic forgetting. We create an algorithm that aims to prevent the impact of forgetting when training drones sequentially on new tasks. We contrast this approach with a multi-agent solution, where multiple drones learn simultaneously across the tasks.
|
295 |
The Effects of Combining Positive and Negative Reinforcement During Training.Murrey, Nicole A. 05 1900 (has links)
The purpose of this experiment was to compare the effects of combining negative reinforcement and positive reinforcement during teaching with the effects of using positive reinforcement alone. A behavior was trained under two stimulus conditions and procedures. One method involved presenting the cue ven and reinforcing successive approximations to the target behavior. The other method involved presenting the cue punir, physically prompting the target behavior by pulling the leash, and delivering a reinforcer. Three other behaviors were trained using the two cues contingent on their occurrence. The results suggest that stimuli associated with both a positive reinforcer and an aversive stimulus produce a different dynamic than a situation that uses positive reinforcement or punishment alone.
|
296 |
CHILDREN'S DISCRIMINATION LEARNING AS A FUNCTION OF POSITIVE AND NEGATIVE CONSEQUENCES AND ORIENTING RESPONSESDurning, Kathleen Phyllis, 1945- January 1973 (has links)
No description available.
|
297 |
Action selection in modular reinforcement learningZhang, Ruohan 16 September 2014 (has links)
Modular reinforcement learning is an approach to resolve the curse of dimensionality problem in traditional reinforcement learning. We design and implement a modular reinforcement learning algorithm, which is based on three major components: Markov decision process decomposition, module training, and global action selection. We define and formalize module class and module instance concepts in decomposition step. Under our framework of decomposition, we train each modules efficiently using SARSA($\lambda$) algorithm. Then we design, implement, test, and compare three action selection algorithms based on different heuristics: Module Combination, Module Selection, and Module Voting. For last two algorithms, we propose a method to calculate module weights efficiently, by using standard deviation of Q-values of each module. We show that Module Combination and Module Voting algorithms produce satisfactory performance in our test domain. / text
|
298 |
Analysis and design of Laminated Veneer Lumber beams with holesArdalany, Manoochehr January 2013 (has links)
Timber has experienced new interest as a building material in recent times. Although traditionally in New Zealand it has been the main choice for residential construction, with recently introduced engineered wood products like Laminated Veneer Lumber (LVL), the use of timber has developed to other sectors like commercial, industrial, and multi-story buildings. The application of timber in office and commercial buildings poses some challenges with requirements for long span timber beams yet with holes to pass services. The current construction practice with timber is not properly suited for the aforementioned types of structures. There has been significant progress in designing timber structures since the introduction of timber codes like NZ3603-Timber Structures Standard; however, there are still a number of problems such as holes in beams not being addressed in the code. Experimental and numerical investigation is required to address the problem properly. In Europe, there have been a few attempts to address the problem of cutting holes and strength reduction because of holes/penetrations in glulam beams. However, LVL has not received much attention due to smaller production and use. While other researchers are focusing on glulam beams with holes, this research is targeting LVL beams with holes. This thesis extends existing knowledge on LVL beams with holes and reinforcement around the holes through experimental tests and numerical analysis. An experimental program on LVL specimens has been performed to indicate the material properties of interest that will be used in the analysis and design chapters through whole of the thesis. A wide-ranging experimental program was also performed on beams with holes, and beams with reinforcement around the holes. The experimental program pushes forward existing methods of testing by measuring the load in the reinforcement
|
299 |
CONDITIONED REINFORCEMENT FROM SHOCK TERMINATION.HIMADI, WILLIAM GEORGE. January 1982 (has links)
This study addressed the question of whether or not a stimulus paired with the termination of shock would acquire a positive conditioned reinforcing function. Previous investigators have suggested that a stimulus paired with shock termination must increase the frequency of a response upon which it is made contingent. This test for conditioned reinforcement is incomplete because multiple stimulus functions will be established during conditioning trials that can influence the rate of responding. The solution to this multiple stimulus control problem involved the effects of reinforcement upon events antecedent to the criterion response. Reinforcement results in the establishment of discriminative stimulus control. The test for conditioned reinforcement from shock termination, therefore, would involve using the presumed conditioned reinforcer to establish discriminative control for a response. Subjects were four male albino rats of the Wistar strain. The experimental procedure was divided into three phases. The initial phase involved consecutive trials in which a tone was paired with shock offset. The next phase continued tone/shock offset pairings and, in addition, the tone alone was presented sometimes for establishment of a lever press. In the third phase an attempt was made to bring the lever press under the discriminative stimulus control of a light. A successful response shaping effect was obtained for two of the four rats. There was no establishment of discriminative stimulus control for level pressing for the two rats who proceeded to the discrimination test for conditioned reinforcement. Conditioned reinforcement from shock termination was not revealed in this study. The establishment of stable discriminative control over the criterion response would require a strong reinforcer relative to the other established stimulus functions. Future research should concentrate on developing procedures to maximize the conditioned reinforcing properties while minimizing the control from competing stimulus functions.
|
300 |
THE EFFECTS OF VERBAL ELABORATIONS AND SOCIAL REINFORCEMENT ON CHILDREN'S PERFORMANCE IN A SIMPLE DISCRIMINATION TASK.MANOS, MICHAEL JOHN. January 1982 (has links)
In this study, six educationally disadvantaged children were taught beginning letter sounds under two teaching conditions. After a baseline of no intervention, a single subject alternating treatments design was used to compare contingent elaborations and token reinforcement within children. Performance between treatments was analyzed in terms of cumulative number of letter sounds learned, total number of letter sounds learned, and maintenance of learning. Token probes were implemented to ascertain whether tokens remained functionally reinforcing over the course of the study. Five children responded to treatment over baseline. Three of these, characterized by above average Wepman auditory discrimination scores, performed better under elaborations until the final third of the study when differential performance between treatments was less pronounced. Remaining subjects, characterized by below average auditory discrimination, showed similar learning under both treatments or, as in the case of one child, no learning. No differences in maintenance were observed. Implications for the classroom and suggestions for further research were discussed.
|
Page generated in 0.0456 seconds