Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
111 |
Trajectories of Risk Learning and Real-World Risky Behaviors During AdolescenceWang, John M. 31 August 2020 (has links)
Adolescence is a transition period during which individuals have increasing autonomy in decision-making for themselves (Casey, Jones, and Hare, 2008), often choosing among options about which they have little knowledge and experience. This process of individuation and independence is reflected as real-world risk taking behaviors (Silveri et al., 2004), including higher motor accidents, unwanted pregnancies, sexually transmitted diseases, drug addictions, and death (Casey et al., 2008). The extent to which adolescents continue to display increased behaviors with negative consequences during this period of life depends critically on their ability to explore and learn potential consequences of actions within novel environments. This learning is not limited to the value of the outcome associated with making choices, but extends to the levels of risk taken in making those choices. While the existing adolescence literature has focused on neural substrates of risk preferences, how adolescents behaviorally and neurally learn about risks remain unknown. Success or failure to learn the potential variability of these consequences, or the risks involved, in ambiguous decisions is hypothesized to be a crucial process to allow the individuals to make decisions based on their risk preferences. The alternative in which adolescents fail to learn about the risks involved in their decisions leaves the adolescent in a state of continued exploration of the ambiguity, reflected as continued risk-taking behavior. This dissertation comprises 2 papers. The first paper is a perspective paper outlining a paradigm that risk taking behavior observed during adolescents may be a product of each adolescent's abilities to learn about risk. The second paper builds on the hypothesis of the perspective paper by first examining neural correlates of risk learning and quantifying individual risk learning abilities and then examining longitudinal risk learning developmental trajectories in relation to real-world risk-trajectories in adolescent individuals. / Doctor of Philosophy / Adolescence is a transition period during which individuals have increasing autonomy in decision-making for themselves, often choosing among options about which they have little knowledge and experience. This process of individuation and independence begins with the adolescent exploring their world and those options they are ignorant of. This is reflected as real-world risk-taking behaviors, including higher motor accidents, unwanted pregnancies, sexually transmitted diseases, drug addictions, and death. We hypothesized and tested the premise that whether adolescents who succeeded or fail to learn about the negative consequences of their actions while exploring will continue to partake in behaviors with negative consequences. This learning is not limited to the value of the outcome associated with making choices, but extends to the range of possible outcomes of the choices or the risks involved. Indeed, the failure to learn the risks involved in decisions with no known information show continued and greater risk-taking behavior, perhaps remaining in a state of continued exploration of the unknown.
|
112 |
Altered Neural and Behavioral Associability-Based Learning in Posttraumatic Stress DisorderBrown, Vanessa 24 April 2015 (has links)
Posttraumatic stress disorder (PTSD) is accompanied by marked alterations in cognition and behavior, particularly when negative, high-value information is present (Aupperle, Melrose, Stein, & Paulus, 2012; Hayes, Vanelzakker, & Shin, 2012) . However, the underlying processes are unclear; such alterations could result from differences in how this high value information is updated or in its effects on processing future information. To untangle the effects of different aspects of behavior, we used a computational psychiatry approach to disambiguate the roles of increased learning from previously surprising outcomes (i.e. associability; Li, Schiller, Schoenbaum, Phelps, & Daw, 2011) and from large value differences (i.e. prediction error; Montague, 1996; Schultz, Dayan, & Montague, 1997) in PTSD. Combat-deployed military veterans with varying levels of PTSD symptoms completed a learning task while undergoing fMRI; behavioral choices and neural activation were modeled using reinforcement learning. We found that associability-based loss learning at a neural and behavioral level increased with PTSD severity, particularly with hyperarousal symptoms, and that the interaction of PTSD severity and neural markers of associability based learning predicted behavior. In contrast, PTSD severity did not modulate prediction error neural signal or behavioral learning rate. These results suggest that increased associability-based learning underlies neurobehavioral alterations in PTSD. / Master of Science
|
113 |
Cocaine Use Modulates Neural Prediction Error During Aversive LearningWang, John Mujia 08 June 2015 (has links)
Cocaine use has contributed to 5 million individuals falling into the cycle of addiction. Prior research in cocaine dependence mainly focused on rewards. Losses also play a critical role in cocaine dependence as dependent individuals fail to avoid social, health, and economic losses even when they acknowledge them. However, dependent individuals are extremely adept at escaping negative states like withdrawal. To further understand whether cocaine use may contribute to dysfunctions in aversive learning, this paper uses fMRI and an aversive learning task to examine cocaine dependent individuals abstinent from cocaine use (C-) and using as usual (C+). Specifically of interest is the neural signal representing actual loss compared to the expected loss, better known as prediction error (δ), which individuals use to update future expectations. When abstinent (C-), dependent individuals exhibited higher positive prediction error (δ+) signal in their striatum than when they were using as usual. Furthermore, their striatal δ+ signal enhancements from drug abstinence were predicted by higher positive learning rate (α+) enhancements. However, no relationships were found between drug abstinence enhancements to negative learning rates (α±-) and negative prediction error (δ-) striatal signals. Abstinent (C-) individuals' striatal δ+ signal was predicted by longer drug use history, signifying possible relief learning adaptations with time. Lastly, craving measures, especially the desire to use cocaine and positive effects of cocaine, also positively correlated with C- individuals' striatal δ+ signal. This suggests possible relief learning adaptations in response to higher craving and withdrawal symptoms. Taken together, enhanced striatal δ+ signal when abstinent and adaptations in relief learning provide evidence in supporting dependent individuals' lack of aversive learning ability while using as usual and enhanced relief learning ability for the purpose of avoiding negative situations such as withdrawal, suggesting a neurocomputational mechanism that pushes the dependent individual to maintains dependence. / Master of Science
|
114 |
Sample Complexity of Incremental Policy Gradient Methods for Solving Multi-Task Reinforcement LearningBai, Yitao 05 April 2024 (has links)
We consider a multi-task learning problem, where an agent is presented a number of N reinforcement learning tasks. To solve this problem, we are interested in studying the gradient approach, which iteratively updates an estimate of the optimal policy using the gradients of the value functions. The classic policy gradient method, however, may be expensive to implement in the multi-task settings as it requires access to the gradients of all the tasks at every iteration. To circumvent this issue, in this paper we propose to study an incremental policy gradient method, where the agent only uses the gradient of only one task at each iteration. Our main contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives at a sublinear rate O(1/√k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments. / Master of Science / First, we introduce a popular machine learning technique called Reinforcement Learning (RL), where an agent, such as a robot, uses a policy to choose an action, like moving forward, based on observations from sensors like cameras. The agent receives a reward that helps judge if the policy is good or bad. The objective of the agent is to find a policy that maximizes the cumulative reward it receives by repeating the above process. RL has many applications, including Cruise autonomous cars, Google industry automation, training ChatGPT language models, and Walmart inventory management. However, RL suffers from task sensitivity and requires a lot of training data. For example, if the task changes slightly, the agent needs to train the policy from the beginning. This motivates the technique called Multi-Task Reinforcement Learning (MTRL), where different tasks give different rewards and the agent maximizes the sum of cumulative rewards of all the tasks. We focus on the incremental setting where the agent can only access the tasks one by one randomly. In this case, we only need one agent and it is not required to know which task it is performing. We show that the incremental policy gradient methods we proposed converge to the optimal value of the MTRL objectives at a sublinear rate O(1/ √ k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments.
|
115 |
Reinforcing Reachable RoutesThirunavukkarasu, Muthukumar 13 May 2004 (has links)
Reachability routing is a newly emerging paradigm in networking, where the goal is to determine all paths between a sender and a receiver. It is becoming relevant with the changing dynamics of the Internet and the emergence of low-bandwidth wireless/ad hoc networks. This thesis presents the case for reinforcement learning (RL) as the framework of choice to realize reachability routing, within the confines of the current Internet backbone infrastructure. The setting of the reinforcement learning problem offers several advantages, including loop resolution, multi-path forwarding capability, cost-sensitive routing, and minimizing state overhead, while maintaining the incremental spirit of the current backbone routing algorithms. We present the design and implementation of a new reachability algorithm that uses a model-based approach to achieve cost-sensitive multi-path forwarding. Performance assessment of the algorithm in various troublesome topologies shows consistently superior performance over classical reinforcement learning algorithms. Evaluations of the algorithm based on different criteria on many types of randomly generated networks as well as realistic topologies are presented. / Master of Science
|
116 |
Android Game Testing using Reinforcement LearningKhurana, Suhani 30 June 2023 (has links)
Android is the most popular operating system and occupies close to 70% of the market share. With the growth in the usage of Android OS, the number of games also increased and the Android play store has over 500,000 games. Testing of Android games is done either manually or through some of the existing tools which automate some parts of this testing. Manual testing requires a great deal of effort and can be expensive to afford. The existing tools which automate testing do not make use of any domain knowledge. This can cause the testing to be ineffective as the game may involve complex strategies, intricate details, widgets, etc. Existing tools like Android Monkey and Time Machine generate random Android events, including gestures like touch, swipe, clicks, and other system-level events across the application. Some deep learning methods like Wuji were only created for combat-type games. These limitations make it imperative to create a testing paradigm that uses domain knowledge as well as is easy to use by a developer who doesn't have any machine or deep learning knowledge.
In this work, we develop a tool called DRAG- Deep Reinforcement learning based Android Gamer - which leverages Reinforcement Learning to learn the requisite domain knowledge and play the game in a fashion like a human would. DRAG uses a unified Reinforcement Learning agent and a Unified Reinforcement Learning environment. It only customizes the action space for each game. This generalization is done in the following ways- 1) Record an 8-minute demo video of the game and capture the underlying Android action log. 2) Analyze the recorded video and the action log to generate an action space for the Reinforcement Learning Agent. The unified RL agent is trained by providing it the score and coverage as a reward and screenshots of the game as observed states. We chose a set of 19 different open-sourced games for evaluation of the created tool. These games differ in the action set required by each of them - some require tapping icons, some require swiping in random directions, and some require more complex actions which are a combination of different gestures.
The evaluation of our tool outperformed state-of-the-art TimeMachine for all 19 games and outperformed Monkey in 16 of the 19 games. This strengthens the fact that Deep Reinforcement Learning can be used to test Android games and can provide better results than tools that make no use of any domain knowledge. / Master of Science / The popularity of the Android operating system has led to a significant increase in the number of available Android games, with over 500,000 games on the Android Play Store alone. However, ensuring the quality and functionality of these games has become a challenge. Traditional testing methods involve either time-consuming manual testing or the use of existing tools that lack the necessary domain knowledge to handle complex game mechanics effectively.
To overcome these limitations, we propose a solution called DRAG: the Deep Reinforcement Learning-based Android Gamer. Our tool utilizes Reinforcement Learning (RL) to acquire the domain knowledge needed to play Android games in a manner similar to human players. Unlike other tools, DRAG incorporates a unified RL agent and environment that can be customized for each specific game.The process of customizing the action space involves two main steps. First, we record an 8-minute demonstration video of the game while capturing the underlying Android action log. Then, we analyze the video and action log to generate a tailored action space for the game. The unified RL agent is trained using rewards based on the game's score and coverage, while observed screenshots of the game serve as input states.\\We evaluated DRAG using a diverse set of 19 open-source games, each requiring different actions such as tapping icons, swiping in random directions, or complex combinations of gestures.
Our results demonstrate that DRAG outperforms state-of-the-art tools like TimeMachine in all 19 games and outperforms Monkey in 16 of the 19 games. These findings highlight the effectiveness of Deep Reinforcement Learning for testing Android games and its ability to deliver better results compared to tools lacking domain knowledge.Our work introduces a new testing approach that combines RL and domain knowledge, providing a user-friendly solution for developers without extensive machine or deep learning expertise. By automating game testing to replicate human gameplay, DRAG offers the potential for more efficient and effective quality assurance in the Android gaming ecosystem.
|
117 |
Reinforcement Learning for Self-adapting Time Discretizations of Complex SystemsGallagher, Conor Dietrich 27 August 2021 (has links)
The overarching goal of this project is to develop intelligent, self-adapting numerical algorithms for the time discretization of complex real-world problems with Q-Learning methodologies. The specific application is ordinary differential equations which can resolve problems in mathematics, social and natural sciences, but which usually require approximations to solve because direct analytical solutions are rare. Using the traditional Brusellator and Lorenz differential equations as test beds, this research develops models to determine reward functions and dynamically tunes controller parameters that minimize both the error and number of steps required for approximate mathematical solutions. Our best reward function is based on an error that does not overly punish rejected states. The Alpha-Beta Adjustment and Safety Factor Adjustment Model is the most efficient and accurate method for solving these mathematical problems. Allowing the model to change the alpha/beta value and safety factor by small amounts provides better results than if the model chose values from discrete lists. This method shows potential for training dynamic controllers with Reinforcement Learning. / Master of Science / This research applies Q-Learning, a subset of Reinforcement Learning and Machine Learning, to solve complex mathematical problems that are unable to be solved analytically and therefore require approximate solutions. Specifically, this research applies mathematical modeling of ordinary differential equations which are used in many fields, from theoretical sciences such and physics and chemistry, to applied technical fields such as medicine and engineering, to social and consumer-oriented fields such as finance and consumer purchasing habits, and to the realms of national and international security and communications. Q-Learning develops mathematical models that make decisions, and depending on the outcome, learns if the decision is good or bad, and uses this information to make the next decision. The research develops approaches to determine reward functions and controller parameters that minimize the error and number of steps associated with approximate mathematical solutions to ordinary differential equations. Error is how far the model's answer is from the true answer, and the number of steps is related to how long it takes and how much computational time and cost is associated with the solution. The Alpha-Beta Adjustment and Safety Factor Adjustment Model is the most efficient and accurate method for solving these mathematical problems and has potential for solving complex mathematical and societal problems.
|
118 |
Optimizing Urban Traffic Management Through AI with Digital Twin Simulation and ValidationSioldea, Daniel 08 1900 (has links)
The number of vehicles on the road continuously increases, revealing a lack of robust
and effective traffic management systems in urban settings. Urban traffic makes up
a substantial portion of the total traffic problem, and current traffic light architecture has been limiting the traffic flow noticeably. This thesis focuses on developing
an artificial intelligence-based smart traffic management system using a double duelling deep Q network (DDDQN), validated through a user-controlled 3D simulation,
determining the system’s effectiveness.
This work leverages current fisheye camera architecture to present a system that
can be implemented into current architecture with little intrusion. The challenges
surrounding large computer vision datasets, and the challenges and limitations surrounding fisheye cameras are discussed. The data and conditions required to replicate
these features in a simulated environment are identified. Finally, a baseline traffic flow
and traffic light phase model is created using camera data from the City of Hamilton.
A DDDQN optimization algorithm used to reduce individual traffic light queue
length and wait times is developed using the SUMO traffic simulator. The algorithm
is trained over different maps and is then deployed onto a large map of various streets
in the City of Hamilton. The algorithm is tested through a user-controlled driving
simulator, observing excellent performance results over long routes. / Thesis / Master of Applied Science (MASc)
|
119 |
Inverse Reinforcement Learning and Routing Metric DiscoveryShiraev, Dmitry Eric 01 September 2003 (has links)
Uncovering the metrics and procedures employed by an autonomous networking system is an important problem with applications in instrumentation, traffic engineering, and game-theoretic studies of multi-agent environments. This thesis presents a method for utilizing inverse reinforcement learning (IRL)techniques for the purpose of discovering a composite metric used by a dynamic routing algorithm on an Internet Protocol (IP) network. The network and routing algorithm are modeled as a reinforcement learning (RL) agent and a Markov decision process (MDP). The problem of routing metric discovery is then posed as a problem of recovering the reward function, given observed optimal behavior. We show that this approach is empirically suited for determining the relative contributions of factors that constitute a composite metric. Experimental results for many classes of randomly generated networks are presented. / Master of Science
|
120 |
Encoding the Sensor Allocation Problem for Reinforcement LearningPenn, Dylan R. 16 May 2024 (has links)
Traditionally, space situational awareness (SSA) sensor networks have relied on dynamic programming theory to generate tasking plans which govern how sensors are allocated to observe resident space objects. Deep reinforcement learning (DRL) techniques, with their ability to be trained on simulated environments, which are readily available for the SSA sensor allocation problem, and demonstrated performance in other fields, have potential to exceed performance of deterministic methods. The research presented in this dissertation develops techniques for encoding an SSA environment model to apply DRL to the sensor allocation problem. This dissertation is the compilation of two separate but related studies. The first study compares two alternative invalid action handling techniques, penalization and masking. The second study examines the performance of policies that have forecast state knowledge incorporated in the observation space. / Doctor of Philosophy / Resident space objects (RSOs) are typically tracked by ground-based sensors (telescopes and radar). Determining how to allocate sensors to RSOs is a complex problem traditionally performed by dynamic programming techniques. Deep reinforcement learning (DRL), a subset of machine learning, has demonstrated performance in other fields, and has the potential to exceed performance of traditional techniques. The research presented in this dissertation develops techniques for encoding a space situational awareness environment model to apply DRL to the sensor allocation problem. This dissertation is the compilation of two separate but related studies. The first study compares two alternative invalid action handling techniques, penalization and masking. The second study examines the performance of policies that have forecast state knowledge incorporated in the observation space.
|
Page generated in 0.2194 seconds