Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
201 |
Using Concurrent Schedules of Reinforcement to Decrease BehaviorPalmer, Ashlyn 12 1900 (has links)
We manipulated delay and magnitude of reinforcers in two concurrent schedules of reinforcement to decrease a prevalent behavior while increasing another behavior already in the participant's repertoire. The first experiment manipulated delay, implementing a five second delay between the behavior and delivery of reinforcement for a behavior targeted for decrease while no delay was implemented after the behavior targeted for increase. The second experiment manipulated magnitude, providing one piece of food for the behavior targeted for decrease while two pieces of food were provided for the behavior targeted for increase. The experiments used an ABAB reversal design. Results suggest that behavior can be decreased without the use of extinction when contingencies favor the desirable behavior.
|
202 |
Cooperative Vehicular Communications for High Throughput Applications / 大容量車載アプリケーションに向けた車車間協調通信Taya, Akihiro 24 September 2019 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22099号 / 情博第709号 / 新制||情||122(附属図書館) / 京都大学大学院情報学研究科通信情報システム専攻 / (主査)教授 守倉 正博, 教授 原田 博司, 教授 梅野 健 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
203 |
Transfer of reinforcement learning for a robotic skillGómez Rosal, Dulce Adriana January 2018 (has links)
In this work, we develop the transfer learning (TL) of reinforcement learning (RL) for the robotic skill of throwing a ball into a basket, from a computer simulated environment to a real-world implementation. Whereas learning of the same skill has been previously explored by using a Programming by Demonstration approach directly on the real-world robot, for our work, the model-based RL algorithm PILCO was employed as an alternative as it provides the robot with no previous knowledge or hints, i.e. the robot begins learning from a tabula rasa state, PILCO learns directly on the simulated environment, and as part of its procedure, PILCO models the dynamics of the inflatable, plastic ball used to perform the task. The robotic skill is represented as a Markov Decision Process, the robotic arm is a Kuka LWR4+, RL is enabled by PILCO, and TL is achieved through policy adjustments. Two learned policies were transferred, and although the results show that no exhaustive policy adjustments are required, large gaps remain between the simulated and the real environment in terms of the ball and robot dynamics. The contributions of this thesis include: a novel TL of RL framework for teaching the basketball skill to the Kuka robotic arm; the development of a pythonised version of PILCO; robust and extendable ROS packages for policy learning and adjustment in a simulated or real robot; a tracking-vision package with a Kinect camera; and an Orocos package for a position controller in the robotic arm.
|
204 |
Towards Superintelligence-Driven Autonomous Network Operation Centers Using Reinforcement LearningAltamimi, Basel 25 October 2021 (has links)
Today's Network Operation Centers (NOC) consist of teams of network professionals responsible for monitoring and taking actions for their network's health. Most of these NOC actions are relatively complex and executed manually; only the simplest tasks can be automated with rules-based software. But today's networks are getting larger and more complex. Therefore, deciding what action to take in the face of non-trivial problems has essentially become an art that depends on collective human intelligence of NOC technicians, specialized support teams organized by technology domains, and vendors' technical support. This model is getting increasingly expensive and inefficient, and the automation of all or at least some NOC tasks is now considered a desirable step towards autonomous and self-healing networks. In this work, we investigate whether such decisions can be taken by Artificial Intelligence instead of collective human intelligence, specifically by Deep-Reinforcement Learning (DRL), which has been shown in computer games to outperform humans. We build an Action Recommendation Engine (ARE) based on RL, train it with expert rules or by letting it explore outcomes by itself, and show that it can learn new and more efficient strategies that outperform expert rules designed by humans by as much as 25%.
|
205 |
Mobile Robot Obstacle Avoidance based on Deep Reinforcement LearningFeng, Shumin January 2018 (has links)
Obstacle avoidance is one of the core problems in the field of autonomous navigation. An obstacle avoidance approach is developed for the navigation task of a reconfigurable multi-robot system named STORM, which stands for Self-configurable and Transformable Omni-Directional Robotic Modules. Various mathematical models have been developed in previous work in this field to avoid collision for such robots. In this work, the proposed collision avoidance algorithm is trained via Deep Reinforcement Learning, which enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. The trained neural network architecture is capable of choosing an action directly based on the input sensor data using the trained neural network architecture. A virtual STORM locomotion module was trained to explore a Gazebo simulation environment without collision, using the proposed collision avoidance strategies based on DRL. The mathematical model of the avoidance algorithm was derived from the simulation and then applied to the prototype of the locomotion module and validated via experiments. Universal software architecture was also designed for the STORM modules. The software architecture has extensible and reusable features that improve the design efficiency and enable parallel development. / Master of Science / In this thesis, an obstacle avoidance approach is described to enable autonomous navigation of a reconfigurable multi-robot system, STORM. The Self-configurable and Transformable Omni-Directional Robotic Modules (STORM) is a novel approach towards heterogeneous swarm robotics. The system has two types of robotic modules, namely the locomotion module and the manipulation module. Each module is able to navigate and perform tasks independently. In addition, the systems are designed to autonomously dock together to perform tasks that the modules individually are unable to accomplish.
The proposed obstacle avoidance approach is designed for the modules of STORM, but can be applied to mobile robots in general. In contrast to the existing collision avoidance approaches, the proposed algorithm was trained via deep reinforcement learning (DRL). This enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. In order to avoid damage to the real robot during the learning phase, a virtual robot was trained inside a Gazebo simulation environment with obstacles. The mathematical model for the collision avoidance strategy obtained through DRL was then validated on a locomotion module prototype of STORM. This thesis also introduces the overall STORM architecture and provides a brief overview of the generalized software architecture designed for the STORM modules. The software architecture has expandable and reusable features that apply well to the swarm architecture while allowing for design efficiency and parallel development.
|
206 |
Leveraging machine learning for managing prefetchers and designing secure standard cellsEris, Furkan 23 May 2022 (has links)
Machine Learning (ML) has gained prominence in recent years and is currently being used in a wide range of applications. Researchers have achieved impressive results at or beyond human levels in image processing, voice recognition, and natural language processing applications. Over the past several years, there has been a lot of work in the area of designing efficient hardware for ML applications. Realizing the power of ML over the years, lately, researchers are exploring the use of ML for designing computing systems. In this thesis, we propose two ML-based design and management approaches - in the first approach, we propose to use ML algorithms to improve hardware prefetching in processors. In the second approach, we leverage Reinforcement Learning (RL)-based algorithms to automatically insert nanoantennas into standard cell libraries to secure them against Hardware Trojans (HTs).
In the first approach, we propose using ML to manage prefetchers and in turn improve processor performance. Classically, prefetcher improvements have been focused on either adding new prefetchers to the existing hybrid prefetching system (a system made out of one or more prefetchers) or increasing the complexity of the existing prefetchers. Both approaches increase the number of prefetcher system configurations (PSCs). Here, a PSC is a given setting for each prefetcher such as whether it is ON or OFF or in the case of more complex prefetchers settings such as the aggressiveness level of the prefetcher. While the choice of PSC of the hybrid prefetching system can be statically optimized for the average case, there are still opportunities to improve the performance at runtime. To this end, we propose a prefetcher manager called Puppeteer to enable dynamic configuration of existing prefetchers. Puppeteer uses a suite of decision trees to adapt PSCs at runtime. We extensively test Puppeteer using a cycle-accurate simulator across 232 traces. We show up to 46.0% instructions-per-cycle (IPC) improvement over no prefetching in 1C, 25.8% in 4C, and 11.9% over 8C. We design Puppeteer using pruning methods to reduce the hardware overhead and ensure feasibility by reducing the overhead to only a few KB for storage.
In the second approach, we propose SecRLCAD, an RL-based Computer-Aided-Design (CAD) flow to secure standard cell libraries. The chip supply chain has become globalized. This globalization has raised security concerns since each step in the chip design, fabrication and testing is now prone to attacks. Prior work has shown that a HT in the form of a single capacitor with a couple of gates can be inserted during the fabrication step and then later be utilized to gain privileged access to a processor. To combat this inserted HT, nanoantennas can be inserted strategically in standard cells to create an optical signature of the chip. However, inserting these nanoantennas is difficult and time-consuming. To aid human designers in speeding up the design of secure standard cells, we design an RL-based flow to insert nanoantennas into each standard cell in a library. We evaluate our flow using Nangate FreePDK 45nm. We can secure and generate a clean library with an average area increase of 56%. / 2023-05-23T00:00:00Z
|
207 |
Kombinatorisk Optimering med Pointer Networks och Reinforcement LearningHolmberg, Axel, Hansson, Wilhelm January 2021 (has links)
Given the complexity and range of combinatorial optimization problems, solving them can be computationally easy or hard. There are many ways to solve them, but all available methods share a problem: they take a long time to run and have to be rerun when new cases are introduced. Machine learning could prove a viable solution to solving combinatorial optimization problems due to the possibility for models to learn and generalize, eliminating the need to run a complex algorithm every time a new instance is presented. Uniter is a management consulting firm that provides services within product modularization. Product modularization results in the possibility for many different product variations to be created based on customer needs. Finding the best combination given a specific customer's need will require solving a combinatorial optimization problem. Based on Uniter's need, this thesis sought to develop and evaluate a machine learning model consisting of a Pointer Network architecture and trained using Reinforcement Learning. The task was to find the combination of parts yielding the lowest cost, given a use case. Each use case had different attributes that specified the need for the final product. For each use case, the model was tasked with selecting the most appropriate combination from a set of 4000 distinct combinations. Three experiments were conducted: examining if the model could suggest an optimal solution after being trained on one use case, if the model could suggest an optimal solution of a previously seen use case, and if the model could suggest an optimal solution of an unseen use case. For all experiments, a single data set was used. The suggested model was compared to three baselines: a weighted random selection, a naive model implementing a feed-forward network, and an exhaustive search. The results showed that the proposed model could not suggest an optimal solution in any of the experiments. In most tests conducted, the proposed model was significantly slower at suggesting a solution than any baseline. The proposed model had high accuracy in all experiments, meaning it suggested almost only feasible solutions in the allowed solution space. However, when the model converged, it suggested only one combination for every use case, with the feed-forward baseline showing the same behavior. This behavior could suggest that the model misinterpreted the task and identified a solution that would work in most cases instead of suggesting the optimal solution for each use case. The discussion concludes that an exhaustive search is preferable for the studied data set and that an alternate approach using supervised learning may be a better solution.
|
208 |
Remembering the past to predict the future: a scale-invariant timeline for memory and anticipationGoh, Wei Zhong 14 March 2022 (has links)
To guide action, animals anticipate what events will occur, and when they will occur, based on experience. How animals anticipate future events is an unsettled question. Although reinforcement learning is often used to model anticipation, it is resource-intensive outside of the simplest scenarios. In this dissertation, I show evidence of memory that is persistent and carries timing information, and specify an algorithm for how animals might anticipate the identity and timing of future events.
This dissertation consists of two studies. In the first study, I found that identity and timing of remembered odors are jointly represented in the same cells in the dentate gyrus and lateral entorhinal cortex. Further, odor memories persist well after new odors emerge. The study analyzed results from an experiment conducted by Woods et al. (2020) on mice passively exposed to separate odors for a period of 20 s per exposure. The results are consistent with a memory framework known as timing using inverse Laplace transform (TILT).
In the second study, I constructed a computational algorithm based on the TILT memory framework to anticipate the identity and timing of future events. The algorithm generates predictions based on memories of past events, and stored associations between cues and outcomes. The algorithm is resource-efficient even when the future depends on the indefinite past. The algorithm is scale-invariant and works well with chains of events.
Together, the studies support a novel computational mechanism which anticipates what events will occur, and when they will occur. The algorithm could be applied in machine learning in cases of long-range dependence on history. These studies predict that behavioral and neural responses of animals could depend on events well into the past. / 2024-03-13T00:00:00Z
|
209 |
Multi-Agent Reinforcement Learning: Analysis and ApplicationPaulo Cesar Heredia (12428121) 20 April 2022 (has links)
<p>With the increasing availability of data and the rise of networked systems such as autonomous vehicles, drones, and smart girds, the application of data-driven, machine learning methods with multi-agents systems have become an important topic. In particular, reinforcement learning has gained a lot of popularity due to its similarities with optimal control, with the potential of allowing us to develop optimal control systems using only observed data and without the need for a model of a system's state dynamics. In this thesis work, we explore the application of reinforcement learning with multi-agents systems, which is known as multi-agent reinforcement learning (MARL). We have developed algorithms that address some challenges in the cooperative setting of MARL. We have also done work on better understanding the convergence guarantees of some known multi-agent reinforcement learning algorithms, which combine reinforcement learning with distributed consensus methods. And, with the aim of making MARL better suited to real-world problems, we have also developed algorithms to address some practical challenges with MARL and we have applied MARL on a real-world problem.</p>
<p>In the first part of this thesis, we focus on developing algorithms to address some open problems in MARL. One of these challenges is learning with output feedback, which is known as partial observability in the reinforcement learning literature. One of the main assumptions of reinforcement learning in the singles agent case is that the agent can fully observe the state of the plant it is controlling (we note the “plant" is often referred to as the “environment" in the reinforcement learning literature. We will use these terms interchangeably). In the single agent case this assumption can be reasonable since it only requires one agent to fully observe its environment. In the multi-agent setting, however, this assumption would require all agents to fully observe the state and furthermore since each agent could affect the plant (or environment) with its actions, the assumption would also require that agent's know the actions of other agents. We have also developed algorithms to address practical issues that may arise when applying reinforcement learning (RL) or MARL on large-scale real-world systems. One such algorithm is a distributed reinforcement learning algorithm that allows us to learn in cases where the states and actions are both continuous and of large dimensionality, which is the case for many real-world applications. Without the ability to handle continuous states and actions, many algorithms require discretization, which with high dimensional systems can become impractical. We have also developed a distributed reinforcement learning algorithm that addresses data scalability of RL. By data scalability we mean how to learn from a very large dataset that cannot be efficiently processed by a single agent with limited resources.</p>
<p>In the second part of this thesis, we provide a finite-sample analysis of some distributed reinforcement learning algorithms. By finite-sample analysis, we mean we provide an upper bound on the squared error of the algorithm for a given iteration of the algorithm. Or equivalently, since each iteration uses one data sample, we provide an upper bound of the squared error for a given number of data samples used. This type of analysis had been missing in the MARL literature, where most works on MARL have only provided asymptotic results for their proposed algorithms, which only tells us how the algorithmic error behaves as the number of samples used goes to infinity. </p>
<p>The third part of this thesis focuses on applications with real-world systems. We have explored a real-world problem, namely transactive energy systems (TES), which can be represented as a multi-agent system. We have applied various reinforcement learning algorithms with the aim of learning an optimal control policy for this system. Through simulations, we have compared the performance of these algorithms and have illustrated the effect of partial observability (output feedback) when compared to full state feedback.</p>
<p>In the last part we present some other work, specifically we present a distributed observer that aims to address learning with output feedback by estimating the state. The proposed algorithm is designed so that we do not require a complete model of state dynamics, and instead we use a parameterized model where the parameters are estimated along with the state.</p>
|
210 |
Learning enhances encoding of time and temporal surprise in primary sensory cortexRabinovich, Rebecca January 2022 (has links)
Primary sensory cortex has long been believed to play a straightforward role in the initial processing of sensory information. Yet, the superficial layers of cortex overall are sparsely active, even during strong sensory stimulation; moreover, cortical activity is influenced by other modalities, task context, reward, and behavioral state. The experiments described in this thesis demonstrate that reinforcement learning dramatically alters representations among longitudinally imaged neurons in superficial layers of mouse primary somatosensory cortex. Cells were confirmed to be sparsely active in naïve animals; however, learning an object detection task recruited previously unresponsive neurons, enlarging the neuronal population sensitive to tactile stimuli.
In contrast, cortical responses habituated, decreasing upon repeated exposure to unrewarded stimuli. In addition, after conditioning, the cell population as well as individual neurons better encoded the rewarded stimuli, as well as behavioral choice. Furthermore, in well-trained mice, the neuronal population encoded of the passage of time. We further found evidence that the temporal information was contained in sequences of cell activity, meaning that different cells in the population activated at different moments within the trial. This kind of time-keeping was not observed in naïve animals, nor did it arise after repeated stimulus exposure. Finally, unexpected deviations in trial timing elicited even stronger responses than touch did. In conclusion, the superficial layers of sensory cortex exhibit a high degree of learning-dependent plasticity and are strongly modulated by non-sensory but behaviorally-relevant features, such as timing and surprise.
|
Page generated in 0.1313 seconds