Global ETD Search

291	Offline Reinforcement Learning from Imperfect Human Guidance / 不完全な人間の誘導からのオフライン強化学習 Zhang, Guoxi 24 July 2023 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24856号 / 情博第838号 / 新制\|\|情\|\|140(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授鹿島, 久嗣, 教授河原, 達也, 教授森本, 淳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Offline Reinforcement Learning Preference-based Reinforcement Learning Human-in-the-loop Reinforcement Learning 007
292	Multi-Task Reinforcement Learning: From Single-Agent to Multi-Agent Systems Trang, Matthew Luu 06 January 2023 (has links) Generalized collaborative drones are a technology that has many potential benefits. General purpose drones that can handle exploration, navigation, manipulation, and more without having to be reprogrammed would be an immense breakthrough for usability and adoption of the technology. The ability to develop these multi-task, multi-agent drone systems is limited by the lack of available training environments, as well as deficiencies of multi-task learning due to a phenomenon known as catastrophic forgetting. In this thesis, we present a set of simulation environments for exploring the abilities of multi-task drone systems and provide a platform for testing agents in incremental single-agent and multi-agent learning scenarios. The multi-task platform is an extension of an existing drone simulation environment written in Python using the PyBullet Physics Simulation Engine, with these environments incorporated. Using this platform, we present an analysis of Incremental Learning and detail the beneficial impacts of using the technique for multi-task learning, with respect to multi-task learning speed and catastrophic forgetting. Finally, we introduce a novel algorithm, Incremental Learning with Second-Order Approximation Regularization (IL-SOAR), to mitigate some of the effects of catastrophic forgetting in multi-task learning. We show the impact of this method and contrast the performance relative to a multi-agent multi-task approach using a centralized policy sharing algorithm. / Master of Science / Machine Learning techniques allow drones to be trained to achieve tasks which are otherwise time-consuming or difficult. The goal of this thesis is to facilitate the work of creating these complex drone machine learning systems by exploring Reinforcement Learning (RL), a field of machine learning which involves learning the correct actions to take through experience. Currently, RL methods are effective in the design of drones which are able to solve one particular task. The next step in this technology is to develop RL systems which are able to handle generalization and perform well across multiple tasks. In this thesis, simulation environments for drones to learn complex tasks are created, and algorithms which are able to train drones in multiple hard tasks are developed and tested. We explore the benefits of using a specific multi-task training technique known as Incremental Learning. Additionally, we consider one of the prohibitive factors of multi-task machine learning-based solutions, the degradation problem of agent performance on previously learned tasks, known as catastrophic forgetting. We create an algorithm that aims to prevent the impact of forgetting when training drones sequentially on new tasks. We contrast this approach with a multi-agent solution, where multiple drones learn simultaneously across the tasks. Reinforcement Learning Drones Catastrophic Forgetting Multi-Agent Reinforcement Learning Multi-Task Reinforcement Learning
293	The Effects of Combining Positive and Negative Reinforcement During Training. Murrey, Nicole A. 05 1900 (has links) The purpose of this experiment was to compare the effects of combining negative reinforcement and positive reinforcement during teaching with the effects of using positive reinforcement alone. A behavior was trained under two stimulus conditions and procedures. One method involved presenting the cue ven and reinforcing successive approximations to the target behavior. The other method involved presenting the cue punir, physically prompting the target behavior by pulling the leash, and delivering a reinforcer. Three other behaviors were trained using the two cues contingent on their occurrence. The results suggest that stimuli associated with both a positive reinforcer and an aversive stimulus produce a different dynamic than a situation that uses positive reinforcement or punishment alone. Positive reinforcement negative reinforcement conditioned reinforcer aversive stimulus discriminative stimulus Dogs -- Training. Behavior modification. Operant conditioning. Reinforcement (Psychology) Animal behavior.
294	CHILDREN'S DISCRIMINATION LEARNING AS A FUNCTION OF POSITIVE AND NEGATIVE CONSEQUENCES AND ORIENTING RESPONSES Durning, Kathleen Phyllis, 1945- January 1973 (has links) No description available. Discrimination learning. Reinforcement (Psychology) Attention in children.
295	Action selection in modular reinforcement learning Zhang, Ruohan 16 September 2014 (has links) Modular reinforcement learning is an approach to resolve the curse of dimensionality problem in traditional reinforcement learning. We design and implement a modular reinforcement learning algorithm, which is based on three major components: Markov decision process decomposition, module training, and global action selection. We define and formalize module class and module instance concepts in decomposition step. Under our framework of decomposition, we train each modules efficiently using SARSA($\lambda$) algorithm. Then we design, implement, test, and compare three action selection algorithms based on different heuristics: Module Combination, Module Selection, and Module Voting. For last two algorithms, we propose a method to calculate module weights efficiently, by using standard deviation of Q-values of each module. We show that Module Combination and Module Voting algorithms produce satisfactory performance in our test domain. / text Modular reinforcement learning Action selection Module weight
296	Analysis and design of Laminated Veneer Lumber beams with holes Ardalany, Manoochehr January 2013 (has links) Timber has experienced new interest as a building material in recent times. Although traditionally in New Zealand it has been the main choice for residential construction, with recently introduced engineered wood products like Laminated Veneer Lumber (LVL), the use of timber has developed to other sectors like commercial, industrial, and multi-story buildings. The application of timber in office and commercial buildings poses some challenges with requirements for long span timber beams yet with holes to pass services. The current construction practice with timber is not properly suited for the aforementioned types of structures. There has been significant progress in designing timber structures since the introduction of timber codes like NZ3603-Timber Structures Standard; however, there are still a number of problems such as holes in beams not being addressed in the code. Experimental and numerical investigation is required to address the problem properly. In Europe, there have been a few attempts to address the problem of cutting holes and strength reduction because of holes/penetrations in glulam beams. However, LVL has not received much attention due to smaller production and use. While other researchers are focusing on glulam beams with holes, this research is targeting LVL beams with holes. This thesis extends existing knowledge on LVL beams with holes and reinforcement around the holes through experimental tests and numerical analysis. An experimental program on LVL specimens has been performed to indicate the material properties of interest that will be used in the analysis and design chapters through whole of the thesis. A wide-ranging experimental program was also performed on beams with holes, and beams with reinforcement around the holes. The experimental program pushes forward existing methods of testing by measuring the load in the reinforcement LVL beams with holes reinforcement plywood
297	CONDITIONED REINFORCEMENT FROM SHOCK TERMINATION. HIMADI, WILLIAM GEORGE. January 1982 (has links) This study addressed the question of whether or not a stimulus paired with the termination of shock would acquire a positive conditioned reinforcing function. Previous investigators have suggested that a stimulus paired with shock termination must increase the frequency of a response upon which it is made contingent. This test for conditioned reinforcement is incomplete because multiple stimulus functions will be established during conditioning trials that can influence the rate of responding. The solution to this multiple stimulus control problem involved the effects of reinforcement upon events antecedent to the criterion response. Reinforcement results in the establishment of discriminative stimulus control. The test for conditioned reinforcement from shock termination, therefore, would involve using the presumed conditioned reinforcer to establish discriminative control for a response. Subjects were four male albino rats of the Wistar strain. The experimental procedure was divided into three phases. The initial phase involved consecutive trials in which a tone was paired with shock offset. The next phase continued tone/shock offset pairings and, in addition, the tone alone was presented sometimes for establishment of a lever press. In the third phase an attempt was made to bring the lever press under the discriminative stimulus control of a light. A successful response shaping effect was obtained for two of the four rats. There was no establishment of discriminative stimulus control for level pressing for the two rats who proceeded to the discrimination test for conditioned reinforcement. Conditioned reinforcement from shock termination was not revealed in this study. The establishment of stable discriminative control over the criterion response would require a strong reinforcer relative to the other established stimulus functions. Future research should concentrate on developing procedures to maximize the conditioned reinforcing properties while minimizing the control from competing stimulus functions. Reinforcement (Psychology) Conditioned response. Shock -- Psychological aspects.
298	THE EFFECTS OF VERBAL ELABORATIONS AND SOCIAL REINFORCEMENT ON CHILDREN'S PERFORMANCE IN A SIMPLE DISCRIMINATION TASK. MANOS, MICHAEL JOHN. January 1982 (has links) In this study, six educationally disadvantaged children were taught beginning letter sounds under two teaching conditions. After a baseline of no intervention, a single subject alternating treatments design was used to compare contingent elaborations and token reinforcement within children. Performance between treatments was analyzed in terms of cumulative number of letter sounds learned, total number of letter sounds learned, and maintenance of learning. Token probes were implemented to ascertain whether tokens remained functionally reinforcing over the course of the study. Five children responded to treatment over baseline. Three of these, characterized by above average Wepman auditory discrimination scores, performed better under elaborations until the final third of the study when differential performance between treatments was less pronounced. Remaining subjects, characterized by below average auditory discrimination, showed similar learning under both treatments or, as in the case of one child, no learning. No differences in maintenance were observed. Implications for the classroom and suggestions for further research were discussed. Learning, Psychology of. Reinforcement (Psychology) Educational psychology.
299	COMPARISON OF LOCUS OF CONTROL AND JOB SATISFACTION OF ARIZONA PHARMACISTS. Hardy, David Lynn. January 1982 (has links) No description available. Pharmacists -- Arizona. Job satisfaction. Reinforcement (Psychology)
300	Learning, cooperation and feedback in pattern recognition Strens, Malcolm John Alexander January 1999 (has links) No description available. 621.3994

Search results