Global ETD Search

111	Cocaine Use Modulates Neural Prediction Error During Aversive Learning Wang, John Mujia 08 June 2015 (has links) Cocaine use has contributed to 5 million individuals falling into the cycle of addiction. Prior research in cocaine dependence mainly focused on rewards. Losses also play a critical role in cocaine dependence as dependent individuals fail to avoid social, health, and economic losses even when they acknowledge them. However, dependent individuals are extremely adept at escaping negative states like withdrawal. To further understand whether cocaine use may contribute to dysfunctions in aversive learning, this paper uses fMRI and an aversive learning task to examine cocaine dependent individuals abstinent from cocaine use (C-) and using as usual (C+). Specifically of interest is the neural signal representing actual loss compared to the expected loss, better known as prediction error (δ), which individuals use to update future expectations. When abstinent (C-), dependent individuals exhibited higher positive prediction error (Î´+) signal in their striatum than when they were using as usual. Furthermore, their striatal δ+ signal enhancements from drug abstinence were predicted by higher positive learning rate (α+) enhancements. However, no relationships were found between drug abstinence enhancements to negative learning rates (α±-) and negative prediction error (δ-) striatal signals. Abstinent (C-) individuals' striatal δ+ signal was predicted by longer drug use history, signifying possible relief learning adaptations with time. Lastly, craving measures, especially the desire to use cocaine and positive effects of cocaine, also positively correlated with C- individuals' striatal δ+ signal. This suggests possible relief learning adaptations in response to higher craving and withdrawal symptoms. Taken together, enhanced striatal δ+ signal when abstinent and adaptations in relief learning provide evidence in supporting dependent individuals' lack of aversive learning ability while using as usual and enhanced relief learning ability for the purpose of avoiding negative situations such as withdrawal, suggesting a neurocomputational mechanism that pushes the dependent individual to maintains dependence. / Master of Science reinforcement learning prediction error cocaine dopamine fMRI
112	Sample Complexity of Incremental Policy Gradient Methods for Solving Multi-Task Reinforcement Learning Bai, Yitao 05 April 2024 (has links) We consider a multi-task learning problem, where an agent is presented a number of N reinforcement learning tasks. To solve this problem, we are interested in studying the gradient approach, which iteratively updates an estimate of the optimal policy using the gradients of the value functions. The classic policy gradient method, however, may be expensive to implement in the multi-task settings as it requires access to the gradients of all the tasks at every iteration. To circumvent this issue, in this paper we propose to study an incremental policy gradient method, where the agent only uses the gradient of only one task at each iteration. Our main contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives at a sublinear rate O(1/√k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments. / Master of Science / First, we introduce a popular machine learning technique called Reinforcement Learning (RL), where an agent, such as a robot, uses a policy to choose an action, like moving forward, based on observations from sensors like cameras. The agent receives a reward that helps judge if the policy is good or bad. The objective of the agent is to find a policy that maximizes the cumulative reward it receives by repeating the above process. RL has many applications, including Cruise autonomous cars, Google industry automation, training ChatGPT language models, and Walmart inventory management. However, RL suffers from task sensitivity and requires a lot of training data. For example, if the task changes slightly, the agent needs to train the policy from the beginning. This motivates the technique called Multi-Task Reinforcement Learning (MTRL), where different tasks give different rewards and the agent maximizes the sum of cumulative rewards of all the tasks. We focus on the incremental setting where the agent can only access the tasks one by one randomly. In this case, we only need one agent and it is not required to know which task it is performing. We show that the incremental policy gradient methods we proposed converge to the optimal value of the MTRL objectives at a sublinear rate O(1/ √ k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments. Markov decision processes Multi-task reinforcement learning
113	Reinforcing Reachable Routes Thirunavukkarasu, Muthukumar 13 May 2004 (has links) Reachability routing is a newly emerging paradigm in networking, where the goal is to determine all paths between a sender and a receiver. It is becoming relevant with the changing dynamics of the Internet and the emergence of low-bandwidth wireless/ad hoc networks. This thesis presents the case for reinforcement learning (RL) as the framework of choice to realize reachability routing, within the confines of the current Internet backbone infrastructure. The setting of the reinforcement learning problem offers several advantages, including loop resolution, multi-path forwarding capability, cost-sensitive routing, and minimizing state overhead, while maintaining the incremental spirit of the current backbone routing algorithms. We present the design and implementation of a new reachability algorithm that uses a model-based approach to achieve cost-sensitive multi-path forwarding. Performance assessment of the algorithm in various troublesome topologies shows consistently superior performance over classical reinforcement learning algorithms. Evaluations of the algorithm based on different criteria on many types of randomly generated networks as well as realistic topologies are presented. / Master of Science Multipath routing Probabilistic algorithms Reachability Reinforcement learning
114	Android Game Testing using Reinforcement Learning Khurana, Suhani 30 June 2023 (has links) Android is the most popular operating system and occupies close to 70% of the market share. With the growth in the usage of Android OS, the number of games also increased and the Android play store has over 500,000 games. Testing of Android games is done either manually or through some of the existing tools which automate some parts of this testing. Manual testing requires a great deal of effort and can be expensive to afford. The existing tools which automate testing do not make use of any domain knowledge. This can cause the testing to be ineffective as the game may involve complex strategies, intricate details, widgets, etc. Existing tools like Android Monkey and Time Machine generate random Android events, including gestures like touch, swipe, clicks, and other system-level events across the application. Some deep learning methods like Wuji were only created for combat-type games. These limitations make it imperative to create a testing paradigm that uses domain knowledge as well as is easy to use by a developer who doesn't have any machine or deep learning knowledge. In this work, we develop a tool called DRAG- Deep Reinforcement learning based Android Gamer - which leverages Reinforcement Learning to learn the requisite domain knowledge and play the game in a fashion like a human would. DRAG uses a unified Reinforcement Learning agent and a Unified Reinforcement Learning environment. It only customizes the action space for each game. This generalization is done in the following ways- 1) Record an 8-minute demo video of the game and capture the underlying Android action log. 2) Analyze the recorded video and the action log to generate an action space for the Reinforcement Learning Agent. The unified RL agent is trained by providing it the score and coverage as a reward and screenshots of the game as observed states. We chose a set of 19 different open-sourced games for evaluation of the created tool. These games differ in the action set required by each of them - some require tapping icons, some require swiping in random directions, and some require more complex actions which are a combination of different gestures. The evaluation of our tool outperformed state-of-the-art TimeMachine for all 19 games and outperformed Monkey in 16 of the 19 games. This strengthens the fact that Deep Reinforcement Learning can be used to test Android games and can provide better results than tools that make no use of any domain knowledge. / Master of Science / The popularity of the Android operating system has led to a significant increase in the number of available Android games, with over 500,000 games on the Android Play Store alone. However, ensuring the quality and functionality of these games has become a challenge. Traditional testing methods involve either time-consuming manual testing or the use of existing tools that lack the necessary domain knowledge to handle complex game mechanics effectively. To overcome these limitations, we propose a solution called DRAG: the Deep Reinforcement Learning-based Android Gamer. Our tool utilizes Reinforcement Learning (RL) to acquire the domain knowledge needed to play Android games in a manner similar to human players. Unlike other tools, DRAG incorporates a unified RL agent and environment that can be customized for each specific game.The process of customizing the action space involves two main steps. First, we record an 8-minute demonstration video of the game while capturing the underlying Android action log. Then, we analyze the video and action log to generate a tailored action space for the game. The unified RL agent is trained using rewards based on the game's score and coverage, while observed screenshots of the game serve as input states.\\We evaluated DRAG using a diverse set of 19 open-source games, each requiring different actions such as tapping icons, swiping in random directions, or complex combinations of gestures. Our results demonstrate that DRAG outperforms state-of-the-art tools like TimeMachine in all 19 games and outperforms Monkey in 16 of the 19 games. These findings highlight the effectiveness of Deep Reinforcement Learning for testing Android games and its ability to deliver better results compared to tools lacking domain knowledge.Our work introduces a new testing approach that combines RL and domain knowledge, providing a user-friendly solution for developers without extensive machine or deep learning expertise. By automating game testing to replicate human gameplay, DRAG offers the potential for more efficient and effective quality assurance in the Android gaming ecosystem. Android Testing Reinforcement Learning Software Engineering
115	Reinforcement Learning for Self-adapting Time Discretizations of Complex Systems Gallagher, Conor Dietrich 27 August 2021 (has links) The overarching goal of this project is to develop intelligent, self-adapting numerical algorithms for the time discretization of complex real-world problems with Q-Learning methodologies. The specific application is ordinary differential equations which can resolve problems in mathematics, social and natural sciences, but which usually require approximations to solve because direct analytical solutions are rare. Using the traditional Brusellator and Lorenz differential equations as test beds, this research develops models to determine reward functions and dynamically tunes controller parameters that minimize both the error and number of steps required for approximate mathematical solutions. Our best reward function is based on an error that does not overly punish rejected states. The Alpha-Beta Adjustment and Safety Factor Adjustment Model is the most efficient and accurate method for solving these mathematical problems. Allowing the model to change the alpha/beta value and safety factor by small amounts provides better results than if the model chose values from discrete lists. This method shows potential for training dynamic controllers with Reinforcement Learning. / Master of Science / This research applies Q-Learning, a subset of Reinforcement Learning and Machine Learning, to solve complex mathematical problems that are unable to be solved analytically and therefore require approximate solutions. Specifically, this research applies mathematical modeling of ordinary differential equations which are used in many fields, from theoretical sciences such and physics and chemistry, to applied technical fields such as medicine and engineering, to social and consumer-oriented fields such as finance and consumer purchasing habits, and to the realms of national and international security and communications. Q-Learning develops mathematical models that make decisions, and depending on the outcome, learns if the decision is good or bad, and uses this information to make the next decision. The research develops approaches to determine reward functions and controller parameters that minimize the error and number of steps associated with approximate mathematical solutions to ordinary differential equations. Error is how far the model's answer is from the true answer, and the number of steps is related to how long it takes and how much computational time and cost is associated with the solution. The Alpha-Beta Adjustment and Safety Factor Adjustment Model is the most efficient and accurate method for solving these mathematical problems and has potential for solving complex mathematical and societal problems. Ordinary Differential Equations Controllers Reinforcement Learning
116	Inverse Reinforcement Learning and Routing Metric Discovery Shiraev, Dmitry Eric 01 September 2003 (has links) Uncovering the metrics and procedures employed by an autonomous networking system is an important problem with applications in instrumentation, traffic engineering, and game-theoretic studies of multi-agent environments. This thesis presents a method for utilizing inverse reinforcement learning (IRL)techniques for the purpose of discovering a composite metric used by a dynamic routing algorithm on an Internet Protocol (IP) network. The network and routing algorithm are modeled as a reinforcement learning (RL) agent and a Markov decision process (MDP). The problem of routing metric discovery is then posed as a problem of recovering the reward function, given observed optimal behavior. We show that this approach is empirically suited for determining the relative contributions of factors that constitute a composite metric. Experimental results for many classes of randomly generated networks are presented. / Master of Science Inverse Reinforcement Learning Routing Network Metrics
117	Encoding the Sensor Allocation Problem for Reinforcement Learning Penn, Dylan R. 16 May 2024 (has links) Traditionally, space situational awareness (SSA) sensor networks have relied on dynamic programming theory to generate tasking plans which govern how sensors are allocated to observe resident space objects. Deep reinforcement learning (DRL) techniques, with their ability to be trained on simulated environments, which are readily available for the SSA sensor allocation problem, and demonstrated performance in other fields, have potential to exceed performance of deterministic methods. The research presented in this dissertation develops techniques for encoding an SSA environment model to apply DRL to the sensor allocation problem. This dissertation is the compilation of two separate but related studies. The first study compares two alternative invalid action handling techniques, penalization and masking. The second study examines the performance of policies that have forecast state knowledge incorporated in the observation space. / Doctor of Philosophy / Resident space objects (RSOs) are typically tracked by ground-based sensors (telescopes and radar). Determining how to allocate sensors to RSOs is a complex problem traditionally performed by dynamic programming techniques. Deep reinforcement learning (DRL), a subset of machine learning, has demonstrated performance in other fields, and has the potential to exceed performance of traditional techniques. The research presented in this dissertation develops techniques for encoding a space situational awareness environment model to apply DRL to the sensor allocation problem. This dissertation is the compilation of two separate but related studies. The first study compares two alternative invalid action handling techniques, penalization and masking. The second study examines the performance of policies that have forecast state knowledge incorporated in the observation space. space traffic management resource allocation reinforcement learning
118	Learning-based Optimal Control of Time-Varying Linear Systems Over Large Time Intervals Baddam, Vasanth Reddy January 2023 (has links) We solve the problem of two-point boundary optimal control of linear time-varying systems with unknown model dynamics using reinforcement learning. Leveraging singular perturbation theory techniques, we transform the time-varying optimal control problem into two time-invariant subproblems. This allows the utilization of an off-policy iteration method to learn the controller gains. We show that the performance of the learning-based controller approximates that of the model-based optimal controller and the approximation accuracy improves as the control problem’s time horizon increases. We also provide a simulation example to verify the results / M.S. / We use reinforcement learning to find two-point boundary optimum controls for linear time-varying systems with uncertain model dynamics. We divided the LTV control problem into two LTI subproblems using singular perturbation theory techniques. As a result, it is possible to identify the controller gains via a learning technique. We show that the training-based controller’s performance approaches that of the model-based optimal controller, with approximation accuracy growing with the temporal horizon of the control issue. In addition, we provide a simulated scenario to back up our findings. optimal control singular perturbation reinforcement learning
119	Time-normalised discounting in reinforcement learning Akan, Oguzhan, Waara Ankarstrand, Wilmer January 2024 (has links) Reinforcement learning has emerged as a powerful paradigm in machinelearning, witnessing remarkable progress in recent years. Amongreinforcement algorithms, Q-learning stands out, enabling agents tolearn quickly from past actions. This study aims to investigate andenhance Q-learning methodologies, with a specific focus on tabularQ-learning. In particular, it addresses Q-learning with an actionspace containing actions that require different amounts of time toexecute. With such an action space the algorithm might convergeto a suboptimal solution when using a constant discount factor sincediscounting occurs per action and not per time step. We refer to thisissue as the non-temporal discounting (NTD) problem. By introducinga time-normalised discounting function, we were able to address theissue of NTD. In addition, we were able to stabilise the solutionby implementing a cost for specific actions. As a result, the modelconverged to the expected solution. Building on these results it wouldbe wise to implement time-normalised discounting in a state-of-the-artreinforcement learning model such as deep Q-learning. Mathematics Matematik
120	Bayesian methods for knowledge transfer and policy search in reinforcement learning Wilson, Aaron (Aaron Creighton) 28 July 2012 (has links) How can an agent generalize its knowledge to new circumstances? To learn effectively an agent acting in a sequential decision problem must make intelligent action selection choices based on its available knowledge. This dissertation focuses on Bayesian methods of representing learned knowledge and develops novel algorithms that exploit the represented knowledge when selecting actions. Our first contribution introduces the multi-task Reinforcement Learning setting in which an agent solves a sequence of tasks. An agent equipped with knowledge of the relationship between tasks can transfer knowledge between them. We propose the transfer of two distinct types of knowledge: knowledge of domain models and knowledge of policies. To represent the transferable knowledge, we propose hierarchical Bayesian priors on domain models and policies respectively. To transfer domain model knowledge, we introduce a new algorithm for model-based Bayesian Reinforcement Learning in the multi-task setting which exploits the learned hierarchical Bayesian model to improve exploration in related tasks. To transfer policy knowledge, we introduce a new policy search algorithm that accepts a policy prior as input and uses the prior to bias policy search. A specific implementation of this algorithm is developed that accepts a hierarchical policy prior. The algorithm learns the hierarchical structure and reuses components of the structure in related tasks. Our second contribution addresses the basic problem of generalizing knowledge gained from previously-executed policies. Bayesian Optimization is a method of exploiting a prior model of an objective function to quickly identify the point maximizing the modeled objective. Successful use of Bayesian Optimization in Reinforcement Learning requires a model relating policies and their performance. Given such a model, Bayesian Optimization can be applied to search for an optimal policy. Early work using Bayesian Optimization in the Reinforcement Learning setting ignored the sequential nature of the underlying decision problem. The work presented in this thesis explicitly addresses this problem. We construct new Bayesian models that take advantage of sequence information to better generalize knowledge across policies. We empirically evaluate the value of this approach in a variety of Reinforcement Learning benchmark problems. Experiments show that our method significantly reduces the amount of exploration required to identify the optimal policy. Our final contribution is a new framework for learning parametric policies from queries presented to an expert. In many domains it is difficult to provide expert demonstrations of desired policies. However, it may still be a simple matter for an expert to identify good and bad performance. To take advantage of this limited expert knowledge, our agent presents experts with pairs of demonstrations and asks which of the demonstrations best represents a latent target behavior. The goal is to use a small number of queries to elicit the latent behavior from the expert. We formulate a Bayesian model of the querying process, an inference procedure that estimates the posterior distribution over the latent policy space, and an active procedure for selecting new queries for presentation to the expert. We show, in multiple domains, that the algorithm successfully learns the target policy and that the active learning strategy generally improves the speed of learning. / Graduation date: 2013 Machine Learning Reinforcement Learning Bayesian Transfer Reinforcement learning Bayesian statistical decision theory Mathematical optimization

Search results