• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 675
  • 81
  • 66
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1090
  • 1090
  • 272
  • 225
  • 203
  • 185
  • 167
  • 164
  • 156
  • 154
  • 150
  • 131
  • 128
  • 123
  • 117
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

A Reinforcement Learning Characterization of Thermostatic Control for HVAC Demand Response and Experimentation Framework for Simulated Building Energy Control

Eubel, Christopher J. 27 October 2022 (has links)
No description available.
102

Zooming Algorithm for Lipschitz Bandits with Linear SafetyConstraints

Hu, Tengmu January 2021 (has links)
No description available.
103

Cognitive Control in Cognitive Dynamic Systems and Networks

FATEMI BOOSHEHRI, SEYED MEHDI 29 January 2015 (has links)
The main idea of this thesis is to define and formulate the role of cognitive control in cognitive dynamic systems and complex networks in order to control the directed flow of information. A cognitive dynamic system is based on Fuster's principles of cognition, the most basic of which is the so-called global perception-action cycle, that the other three build on. Cognitive control, by definition, completes the executive part of this important cycle. In this thesis, we first provide the rationales for defining cognitive control in a way that it suits engineering requirements. To this end, the novel idea of entropic state and thereby the two-state model is first described. Next, on the sole basis of entropic state and the concept of directed information flow, we formulate the learning algorithm as the first process of cognitive control. Most importantly, we show that the derived algorithm is indeed a special case of the celebrated Bellman's dynamic programming. Another significant key point is that cognitive control intrinsically differs from the generic dynamic programming and its approximations (commonly known as reinforcement learning) in that it is stateless by definition. As a result, the main two desired characteristics of the derived algorithm are described as follows: a) it is convergent to optimal policy, and b) it is free of curse of dimensionality. Next, the predictive planning is described as the second process of cognitive control. The planning process is on the basis of shunt cycles (called mutually composite cycles herein) to bypass the environment and facilitate the prediction of future global perception-action cycles. Our results demonstrate predictive planning to have a very significant improvement to the functionality of cognitive control. We also deploy the explore/exploit strategy in order to apply a simplistic form of executive attention. The thesis is then expanded by applying cognitive control into two different applications of practical importance. The first one involves cognitive tracking radar, which is based on a benchmark example and provides the means for testing the theory. In order to have a frame of reference, the results are compared to other cognitive controllers, which use traditional Q-learning and the method of dynamic optimization. In both cases, the new algorithm demonstrates considerable improvement with less computational load. For the second application, the problem of observability in stochastic complex networks has been picked due to its importance in many practical situations. Having known cognitive control theory and its significant performance, the idea here is to view the network as the environment of a cognitive dynamic system; thereby, cognitive dynamic system with the cognitive controller plays a supervisory role over the network. The proposed methodology differs from the state-of-the-art in the literature in two accounts: 1) stochasticity both in modelling as well as monitoring processes, and 2) complexity in terms of edge density. We present several examples to demonstrate the information processing power of cognitive control in this context too. The thesis will finish by drawing line for future research in three main directions. / Thesis / Doctor of Philosophy (PhD)
104

Making Sense of Serotonin Through Spike Frequency Adaptation

Harkin, Emerson 04 December 2023 (has links)
What does serotonin do? Just as the diffuse axonal arbours of midbrain serotonin neurons touch nearly every corner of the forebrain, so too is this ancient neuromodulator involved in nearly every aspect of learning and behaviour. The role of serotonin in reward processing has received increasing attention in recent years, but there is little agreement about how the perplexing responses of serotonin neurons to emotionally salient stimuli should be interpreted, and essentially nothing is known about how they arise. Here I approach these two aspects of serotonergic function in reverse order. In the first part of this thesis, I construct an experimentally-constrained spiking neural network model of the dorsal raphe nucleus (DRN), the main source of forebrain serotonergic input, and characterize its signal processing features. I show that potent spike-frequency adaptation deeply shapes DRN output while other aspects of its physiology are relatively less important. Overall, this part of my work suggests that in vivo serotonergic activity patterns arise from a temporal-derivative-like computation. But the temporal derivative of what? In the second part, I consider the possibility that the DRN is driven by an input that represents cumulative future reward, a quantity called state value in reinforcement learning theory. The resulting model reproduces established tuning features of serotonin neurons, including phasic activation by reward predicting cues and punishments, reward-specific surprise tuning, and tonic modulation by reward and punishment context. Because these features are the basis of many and varied existing serotonergic theories, these results show that my theory, which I call value prediction, provides a unifying perspective on serotonergic function. Finally, in an empirical test of the theory, I re-analyze data from an in vivo trace conditioning experiment and find that value prediction accounts for the firing rates of serotonin neurons to a precision ≪0.1 Hz, outperforming previous models by a large margin. Here I establish serotonin as a new neural substrate of prediction and reward, a significant step towards understanding the role of serotonin signalling in the brain.
105

An overview of the applications of reinforcement learning to robot programming: discussion on the literature and the potentials

Sunilkumar, Abishek, Bahrpeyma, Fouad, Reichelt, Dirk 13 February 2024 (has links)
There has been remarkable progress in the field of robotics over the past few years, whether it is stationary robots that perform dynamically changing tasks in the manufacturing sector or automated guided vehicles for warehouse management or space exploration. The use of artificial intelligence (AI), especially reinforcement learning (RL), has contributed significantly to the success of various robotics tasks, proving that the shift toward intelligent control paradigms is successful and feasible. A fascinating aspect of RL is its ability to function both as low-level controller and as a high-level decision-making tool at the same time. An example of this is the manipulator robot whose task is to guide itself through an environment with irregular and recurrent obstacles. In this scenario, low-level controllers can receive the joint angles and execute smooth motion using the Joint Trajectory controllers. On a higher level, RL can also be used to define complex paths designed to avoid obstacles and self-collisions. An important aspect of successful operation of an AGV is the ability to make timely decisions. When Convolutional Neural Networks (CNN) based networks are incorporated with RL, agents can decide to direct AGVs to the destination effectively, which is mitigating the risk of catastrophic collisions. Even though many of these challenges can be addressed with classical solutions, devising such solutions takes a great deal of time and effort, making this process quite expensive. With an eye on different categories of RL applications to robotics, this study will provide an overview of the use of RL in robotic applications, examining the advantages and disadvantages of state-of-the-art applications. Additionally, we provide a targeted comparative analysis between classical robotics methods and RL-based robotics methods. Along with drawing conclusions from our analysis, an outline of the future possibilities and advancements that may accelerate the progress and autonomy of robotics in the future is provided.
106

Reinforcement Learning Application in Wavefront Sensorless Adaptive Optics System

Zou, Runnan 13 February 2024 (has links)
With the increasing exploration of space and widespread use of communication tools worldwide, near-ground satellite communication has emerged as a promising tool in various fields such as aerospace, military, and microscopy. However, the presence of air and water in the atmosphere causes distortion in the light signal, and thus, it is essential for the ground base to retrieve the original signal from the distorted light signal sent from the satellite. Traditionally, Shack-Hartmann sensors or charge-coupled devices are integrated in the system for distortion measurement. In our pursuit of a cost-effective system establishment with optimal performance and enhanced response speed, sensors and charge-coupled devices have been replaced by a photodiode and a single mode fiber in this project. Since the system has limited observation capability, it requires a powerful controller for optimal performance. To address this issue, we have implemented an off-policy reinforcement learning framework, the soft actor-critic, in the adaptive optics system controller. This integration results in a model-free online controller capable of mitigating wavefront distortion. The soft actor-critic controller processes the acquired data matrix from the photodiode and generates a two-dimensional array control signal for the deformable mirror, which corrects the wavefront distortion induced by the atmosphere, and refocusing the signal to maximize the incoming power. The parameters of the soft actor-critic controller have been tuned to achieve optimal system performance. Simulations have been conducted to compare the performance of the proposed controller with respect to wavefront sensor-based methods. The training and verification of the proposed controller have been conducted in both static and semi-dynamic atmospheres, under different atmospheric conditions. Simulation results demonstrate that, in severe atmospheric conditions, the adaptive optics system with the soft actor-critic controller achieves more than 55% and 30% Strehl ratio on average in static and semi-dynamic atmospheres, respectively. Furthermore, the distorted wavefront's power can be concentrated at the center of the focal plane and the fiber, providing an improved signal.
107

Behavioral Training of Reward Learning Increases Reinforcement Learning Parameters and Decreases Depression Symptoms Across Repeated Sessions

Goyal, Shivani 12 1900 (has links)
Background: Disrupted reward learning has been suggested to contribute to the etiology and maintenance of depression. If deficits in reward learning are core to depression, we would expect that improving reward learning would decrease depression symptoms across time. Whereas previous studies have shown that changing reward learning can be done in a single study session, effecting clinically meaningful change in learning requires this change to endure beyond task completion and transfer to real world environments. With a longitudinal design, we investigate the potential for repeated sessions of behavioral training to create change in reward learning and decrease depression symptoms across time. Methods: 929 online participants (497 depression-present; 432 depression-absent) recruited from Amazon’s Mechanical Turk platform completed a behavioral training paradigm and clinical selfreport measures for up to eight total study visits. Participants were randomly assigned to one of 12 arms of the behavioral training paradigm, in which they completed a probabilistic reward learning task interspersed with queries about a feature of the task environment (11 learning arms) or a control query (1 control arm). Learning queries trained participants on one of four computational-based learning targets known to affect reinforcement learning (probability, average or extreme outcome values, and value comparison processes). A reinforcement learning model previously shown to distinguish depression related differences in learning was fit to behavioral responses using hierarchical Bayesian estimation to provide estimates of reward sensitivity and learning rate for each participant on each visit. Reward sensitivity captured participants’ value dissociation between high versus low outcome values, while learning rate informed how much participants learned from previously experienced outcomes. Mixed linear models assessed relationships between model-agnostic task performance, computational model-derived reinforcement learning parameters, depression symptoms, and study progression. Results: Across time, learning queries increased individuals’ reward sensitivities in depression-absent participants (β = 0.036, p =< 0.001, 95% CI (0.022, 0.049)). In contrast, control queries did not change reward sensitivities in depression-absent participants across time ((β = 0.016, p = 0.303, 95% CI (-0.015, 0.048)). Learning rates were not affected across time for participants receiving learning queries (β = 0.001, p = 0.418, 95% CI (-0.002, 0.004)) or control queries (β = 0.002, p = 0.558, 95% CI (-0.005, 0.009). Of the learning queries, those targeting value comparison processes improved depression symptoms (β = -0.509, p = 0.015, 95% CI (-0.912, - 0.106)) and increased reward sensitivities across time (β = 0.052, p =< 0.001, 95% CI (0.030, 0.075)) in depression-present participants. Increased reward sensitivities related to decreased depression symptoms across time in these participants (β = -2.905, p = 0.002, 95% CI (-4.75, - 1.114)). Conclusions: Multiple sessions of targeted behavioral training improved reward learning for participants with a range of depression symptoms. Improved behavioral reward learning was associated with improved clinical symptoms with time, possibly because learning transferred to real world scenarios. These results support disrupted reward learning as a mechanism contributing to the etiology and maintenance of depression and suggest the potential of repeated behavioral training to target deficits in reward learning. / Master of Science / Disrupted reward learning has been suggested to be central to depression. Work investigating how changing reward learning affects clinical symptoms has the potential to clarify the role of reward learning in depression. Here, we address this question by investigating if multiple sessions of behavioral training changes reward learning and decreases depression symptoms across time. We recruited 929 online participants to complete up to eight study visits. On each study visit participants completed a depression questionnaire and one of 12 arms of a behavioral training paradigm, in which they completed a reward learning task interspersed with queries about the task. Queries trained participants on one of four learning targets known to affect reward learning (probability, average or extreme outcome values, and value comparison processes). We used reinforcement learning to quantify specific reward learning processes, including how much participants valued high vs. low outcomes (reward sensitivity) and how much participants learned from previously experienced outcomes (learning rates). Across study visits, we found that participants without depression symptoms that completed the targeted behavioral training increased reward sensitivities (β = 0.036, p =< 0.001, 95% CI (0.022, 0.049)). Of the queries, those targeting value comparison processes improved both depression symptoms (β = -0.509, p = 0.015, 95% CI (-0.912, -0.106)) and reward sensitivities (β = 0.052, p =< 0.001, 95% CI (0.030, 0.075)) across study visits for participants with depression symptoms. These results suggest that multiple sessions of behavioral training can increase reward learning across time for participants with and without depression symptoms. Further, these results support the role of disrupted reward learning in depression and suggest the potential for behavioral training to improve both reward learning and symptoms in depression.
108

Automatic Selection of Dynamic Loop Scheduling Algorithms for Load Balancing using Reinforcement Learning

Dhandayuthapani, Sumithra 07 August 2004 (has links)
Scientific applications are large, complex, irregular, and computationally intensive and are characterized by data parallel loops. The prevalence of independent iterations in these loops, makes parallel computing as the natural choice for solving these applications. The computational requirements of these problems vary due to variations in problem, algorithmic and systemic characteristics during parallelization, leading to performance degradation. Considerable amount of research has been dedicated to the development of dynamic scheduling techniques based on probabilistic analysis to address these predictable and unpredictable factors that lead to severe load imbalance. The mathematical foundations of these scheduling algorithms have been previously developed and published in the literature. These techniques have been successfully integrated into scientific applications as well as into runtime systems. Recently, efforts have also been directed to integrate these techniques into dynamic load balancing libraries for scientific applications. The optimal scheduling algorithm to load balance a specific scientific application in a dynamic parallel computing environment is very difficult without the exhaustive testing of all the scheduling techniques. This is a time consuming process, and therefore, there is a need for developing an automatic mechanism for the selection of dynamic scheduling algorithms. In recent years, extensive work has been dedicated to the development of reinforcement learning and some of its techniques have addressed load-balancing problems. However, they do not cover a number of aspects regarding the performance of scientific applications. First, these previously developed techniques address the load balancing problem only at a coarse granularity level (for example, job scheduling), and the reinforcement learning techniques used for load balancing are based on learning from trained datasets which are obtained prior to the execution of the application. Moreover, scientific applications contain parameters whose variations are so irregular that the use of training sets would not be able to accurately capture the entire spectrum of possible characteristics. Finally, algorithm selection using reinforcement learning has only been used for simple sequential problems. This thesis addresses these limitations and provides a novel integrated approach for automating the selection of dynamic scheduling algorithms at a finer granularity level to improve the performance of scientific applications using reinforcement learning. This integrated approach will experimentally be tested on a scientific application that involves a large number of time steps: The Quantum Trajectory Method (QTM). A qualitative and quantitative analysis of the effectiveness of this novel approach will be presented to underscore the significance of its use in improving the performance of large-scale scientific applications.
109

Trajectories of Risk Learning and Real-World Risky Behaviors During Adolescence

Wang, John M. 31 August 2020 (has links)
Adolescence is a transition period during which individuals have increasing autonomy in decision-making for themselves (Casey, Jones, and Hare, 2008), often choosing among options about which they have little knowledge and experience. This process of individuation and independence is reflected as real-world risk taking behaviors (Silveri et al., 2004), including higher motor accidents, unwanted pregnancies, sexually transmitted diseases, drug addictions, and death (Casey et al., 2008). The extent to which adolescents continue to display increased behaviors with negative consequences during this period of life depends critically on their ability to explore and learn potential consequences of actions within novel environments. This learning is not limited to the value of the outcome associated with making choices, but extends to the levels of risk taken in making those choices. While the existing adolescence literature has focused on neural substrates of risk preferences, how adolescents behaviorally and neurally learn about risks remain unknown. Success or failure to learn the potential variability of these consequences, or the risks involved, in ambiguous decisions is hypothesized to be a crucial process to allow the individuals to make decisions based on their risk preferences. The alternative in which adolescents fail to learn about the risks involved in their decisions leaves the adolescent in a state of continued exploration of the ambiguity, reflected as continued risk-taking behavior. This dissertation comprises 2 papers. The first paper is a perspective paper outlining a paradigm that risk taking behavior observed during adolescents may be a product of each adolescent's abilities to learn about risk. The second paper builds on the hypothesis of the perspective paper by first examining neural correlates of risk learning and quantifying individual risk learning abilities and then examining longitudinal risk learning developmental trajectories in relation to real-world risk-trajectories in adolescent individuals. / Doctor of Philosophy / Adolescence is a transition period during which individuals have increasing autonomy in decision-making for themselves, often choosing among options about which they have little knowledge and experience. This process of individuation and independence begins with the adolescent exploring their world and those options they are ignorant of. This is reflected as real-world risk-taking behaviors, including higher motor accidents, unwanted pregnancies, sexually transmitted diseases, drug addictions, and death. We hypothesized and tested the premise that whether adolescents who succeeded or fail to learn about the negative consequences of their actions while exploring will continue to partake in behaviors with negative consequences. This learning is not limited to the value of the outcome associated with making choices, but extends to the range of possible outcomes of the choices or the risks involved. Indeed, the failure to learn the risks involved in decisions with no known information show continued and greater risk-taking behavior, perhaps remaining in a state of continued exploration of the unknown.
110

Altered Neural and Behavioral Associability-Based Learning in Posttraumatic Stress Disorder

Brown, Vanessa 24 April 2015 (has links)
Posttraumatic stress disorder (PTSD) is accompanied by marked alterations in cognition and behavior, particularly when negative, high-value information is present (Aupperle, Melrose, Stein, & Paulus, 2012; Hayes, Vanelzakker, & Shin, 2012) . However, the underlying processes are unclear; such alterations could result from differences in how this high value information is updated or in its effects on processing future information. To untangle the effects of different aspects of behavior, we used a computational psychiatry approach to disambiguate the roles of increased learning from previously surprising outcomes (i.e. associability; Li, Schiller, Schoenbaum, Phelps, & Daw, 2011) and from large value differences (i.e. prediction error; Montague, 1996; Schultz, Dayan, & Montague, 1997) in PTSD. Combat-deployed military veterans with varying levels of PTSD symptoms completed a learning task while undergoing fMRI; behavioral choices and neural activation were modeled using reinforcement learning. We found that associability-based loss learning at a neural and behavioral level increased with PTSD severity, particularly with hyperarousal symptoms, and that the interaction of PTSD severity and neural markers of associability based learning predicted behavior. In contrast, PTSD severity did not modulate prediction error neural signal or behavioral learning rate. These results suggest that increased associability-based learning underlies neurobehavioral alterations in PTSD. / Master of Science

Page generated in 0.0848 seconds