Global ETD Search

61	Robot Navigation in Cluttered Environments with Deep Reinforcement Learning Weideman, Ryan 01 June 2019 (has links) (PDF) The application of robotics in cluttered and dynamic environments provides a wealth of challenges. This thesis proposes a deep reinforcement learning based system that determines collision free navigation robot velocities directly from a sequence of depth images and a desired direction of travel. The system is designed such that a real robot could be placed in an unmapped, cluttered environment and be able to navigate in a desired direction with no prior knowledge. Deep Q-learning, coupled with the innovations of double Q-learning and dueling Q-networks, is applied. Two modifications of this architecture are presented to incorporate direction heading information that the reinforcement learning agent can utilize to learn how to navigate to target locations while avoiding obstacles. The performance of the these two extensions of the D3QN architecture are evaluated in simulation in simple and complex environments with a variety of common obstacles. Results show that both modifications enable the agent to successfully navigate to target locations, reaching 88% and 67% of goals in a cluttered environment, respectively. robotics reinforcement learning navigation machine learning Robotics
62	Machine Translation For Machines Tebbifakhr, Amirhossein 25 October 2021 (has links) Traditionally, Machine Translation (MT) systems are developed by targeting fluency (i.e. output grammaticality) and adequacy (i.e. semantic equivalence with the source text) criteria that reflect the needs of human end-users. However, recent advancements in Natural Language Processing (NLP) and the introduction of NLP tools in commercial services have opened new opportunities for MT. A particularly relevant one is related to the application of NLP technologies in low-resource language settings, for which the paucity of training data reduces the possibility to train reliable services. In this specific condition, MT can come into play by enabling the so-called “translation-based” workarounds. The idea is simple: first, input texts in the low-resource language are translated into a resource-rich target language; then, the machine-translated text is processed by well-trained NLP tools in the target language; finally, the output of these downstream components is projected back to the source language. This results in a new scenario, in which the end-user of MT technology is no longer a human but another machine. We hypothesize that current MT training approaches are not the optimal ones for this setting, in which the objective is to maximize the performance of a downstream tool fed with machine-translated text rather than human comprehension. Under this hypothesis, this thesis introduces a new research paradigm, which we named “MT for machines”, addressing a number of questions that raise from this novel view of the MT problem. Are there different quality criteria for humans and machines? What makes a good translation from the machine standpoint? What are the trade-offs between the two notions of quality? How to pursue machine-oriented objectives? How to serve different downstream components with a single MT system? How to exploit knowledge transfer to operate in different language settings with a single MT system? Elaborating on these questions, this thesis: i) introduces a novel and challenging MT paradigm, ii) proposes an effective method based on Reinforcement Learning analysing its possible variants, iii) extends the proposed method to multitask and multilingual settings so as to serve different downstream applications and languages with a single MT system, iv) studies the trade-off between machine-oriented and human-oriented criteria, and v) discusses the successful application of the approach in two real-world scenarios.
63	Evolutionary Optimization of Decision Trees for Interpretable Reinforcement Learning Custode, Leonardo Lucio 27 April 2023 (has links) While Artificial Intelligence (AI) is making giant steps, it is also raising concerns about its trustworthiness, due to the fact that widely-used black-box models cannot be exactly understood by humans. One of the ways to improve humans’ trust towards AI is to use interpretable AI models, i.e., models that can be thoroughly understood by humans, and thus trusted. However, interpretable AI models are not typically used in practice, as they are thought to be less performing than black-box models. This is more evident in Reinforce- ment Learning, where relatively little work addresses the problem of performing Reinforce- ment Learning with interpretable models. In this thesis, we address this gap, proposing methods for Interpretable Reinforcement Learning. For this purpose, we optimize Decision Trees by combining Reinforcement Learning with Evolutionary Computation techniques, which allows us to overcome some of the challenges tied to optimizing Decision Trees in Reinforcement Learning scenarios. The experimental results show that these approaches are competitive with the state-of-the-art score while being extremely easier to interpret. Finally, we show the practical importance of Interpretable AI by digging into the inner working of the solutions obtained.
64	Action-Based Representation Discovery in Markov Decision Processes Osentoski, Sarah 01 September 2009 (has links) This dissertation investigates the problem of representation discovery in discrete Markov decision processes, namely how agents can simultaneously learn representation and optimal control. Previous work on function approximation techniques for MDPs largely employed hand-engineered basis functions. In this dissertation, we explore approaches to automatically construct these basis functions and demonstrate that automatically constructed basis functions significantly outperform more traditional, hand-engineered approaches. We specifically examine two problems: how to automatically build representations for action-value functions by explicitly incorporating actions into a representation, and how representations can be automatically constructed by exploiting a pre-specified task hierarchy. We first introduce a technique for learning basis functions directly in state-action space. The approach constructs basis functions using spectral analysis of a state-action graph which captures the underlying structure of the state-action space of the MDP. We describe two approaches to constructing these graphs and evaluate the approach on MDPs with discrete state and action spaces. We show how our approach can be used to approximate state-action value functions when the agent has access to macro-actions: actions that take more than one time step and have predefined policies. We describe how the state-action graphs can be modified to incorporate information about the macro-actions and experimentally evaluate this approach for SMDPs with discrete state and action spaces. Finally, we describe how hierarchical reinforcement learning can be used to scale up automatic basis function construction. We extend automatic basis function construction techniques to multi-level task hierarchies and describe how basis function construction can exploit the value function decomposition given by a fixed task hierarchy. We demonstrate that combining task hierarchies with automatic basis function construction allows basis function techniques to scale to larger problems and leads to a significant speed-up in learning. Machine Learning Reinforcement Learning Computer Sciences
65	A Reinforcement Learning Characterization of Thermostatic Control for HVAC Demand Response and Experimentation Framework for Simulated Building Energy Control Eubel, Christopher J. 27 October 2022 (has links) No description available. Mechanical Engineering
66	Zooming Algorithm for Lipschitz Bandits with Linear SafetyConstraints Hu, Tengmu January 2021 (has links) No description available. Mathematics Industrial Engineering Computer Science Reinforcement Learning
67	Cognitive Control in Cognitive Dynamic Systems and Networks FATEMI BOOSHEHRI, SEYED MEHDI 29 January 2015 (has links) The main idea of this thesis is to define and formulate the role of cognitive control in cognitive dynamic systems and complex networks in order to control the directed flow of information. A cognitive dynamic system is based on Fuster's principles of cognition, the most basic of which is the so-called global perception-action cycle, that the other three build on. Cognitive control, by definition, completes the executive part of this important cycle. In this thesis, we first provide the rationales for defining cognitive control in a way that it suits engineering requirements. To this end, the novel idea of entropic state and thereby the two-state model is first described. Next, on the sole basis of entropic state and the concept of directed information flow, we formulate the learning algorithm as the first process of cognitive control. Most importantly, we show that the derived algorithm is indeed a special case of the celebrated Bellman's dynamic programming. Another significant key point is that cognitive control intrinsically differs from the generic dynamic programming and its approximations (commonly known as reinforcement learning) in that it is stateless by definition. As a result, the main two desired characteristics of the derived algorithm are described as follows: a) it is convergent to optimal policy, and b) it is free of curse of dimensionality. Next, the predictive planning is described as the second process of cognitive control. The planning process is on the basis of shunt cycles (called mutually composite cycles herein) to bypass the environment and facilitate the prediction of future global perception-action cycles. Our results demonstrate predictive planning to have a very significant improvement to the functionality of cognitive control. We also deploy the explore/exploit strategy in order to apply a simplistic form of executive attention. The thesis is then expanded by applying cognitive control into two different applications of practical importance. The first one involves cognitive tracking radar, which is based on a benchmark example and provides the means for testing the theory. In order to have a frame of reference, the results are compared to other cognitive controllers, which use traditional Q-learning and the method of dynamic optimization. In both cases, the new algorithm demonstrates considerable improvement with less computational load. For the second application, the problem of observability in stochastic complex networks has been picked due to its importance in many practical situations. Having known cognitive control theory and its significant performance, the idea here is to view the network as the environment of a cognitive dynamic system; thereby, cognitive dynamic system with the cognitive controller plays a supervisory role over the network. The proposed methodology differs from the state-of-the-art in the literature in two accounts: 1) stochasticity both in modelling as well as monitoring processes, and 2) complexity in terms of edge density. We present several examples to demonstrate the information processing power of cognitive control in this context too. The thesis will finish by drawing line for future research in three main directions. / Thesis / Doctor of Philosophy (PhD) Cognitive Control Network Science Reinforcement Learning
68	Making Sense of Serotonin Through Spike Frequency Adaptation Harkin, Emerson 04 December 2023 (has links) What does serotonin do? Just as the diffuse axonal arbours of midbrain serotonin neurons touch nearly every corner of the forebrain, so too is this ancient neuromodulator involved in nearly every aspect of learning and behaviour. The role of serotonin in reward processing has received increasing attention in recent years, but there is little agreement about how the perplexing responses of serotonin neurons to emotionally salient stimuli should be interpreted, and essentially nothing is known about how they arise. Here I approach these two aspects of serotonergic function in reverse order. In the first part of this thesis, I construct an experimentally-constrained spiking neural network model of the dorsal raphe nucleus (DRN), the main source of forebrain serotonergic input, and characterize its signal processing features. I show that potent spike-frequency adaptation deeply shapes DRN output while other aspects of its physiology are relatively less important. Overall, this part of my work suggests that in vivo serotonergic activity patterns arise from a temporal-derivative-like computation. But the temporal derivative of what? In the second part, I consider the possibility that the DRN is driven by an input that represents cumulative future reward, a quantity called state value in reinforcement learning theory. The resulting model reproduces established tuning features of serotonin neurons, including phasic activation by reward predicting cues and punishments, reward-specific surprise tuning, and tonic modulation by reward and punishment context. Because these features are the basis of many and varied existing serotonergic theories, these results show that my theory, which I call value prediction, provides a unifying perspective on serotonergic function. Finally, in an empirical test of the theory, I re-analyze data from an in vivo trace conditioning experiment and find that value prediction accounts for the firing rates of serotonin neurons to a precision ≪0.1 Hz, outperforming previous models by a large margin. Here I establish serotonin as a new neural substrate of prediction and reward, a significant step towards understanding the role of serotonin signalling in the brain. neuroscience serotonin reinforcement learning neural coding
69	An overview of the applications of reinforcement learning to robot programming: discussion on the literature and the potentials Sunilkumar, Abishek, Bahrpeyma, Fouad, Reichelt, Dirk 13 February 2024 (has links) There has been remarkable progress in the field of robotics over the past few years, whether it is stationary robots that perform dynamically changing tasks in the manufacturing sector or automated guided vehicles for warehouse management or space exploration. The use of artificial intelligence (AI), especially reinforcement learning (RL), has contributed significantly to the success of various robotics tasks, proving that the shift toward intelligent control paradigms is successful and feasible. A fascinating aspect of RL is its ability to function both as low-level controller and as a high-level decision-making tool at the same time. An example of this is the manipulator robot whose task is to guide itself through an environment with irregular and recurrent obstacles. In this scenario, low-level controllers can receive the joint angles and execute smooth motion using the Joint Trajectory controllers. On a higher level, RL can also be used to define complex paths designed to avoid obstacles and self-collisions. An important aspect of successful operation of an AGV is the ability to make timely decisions. When Convolutional Neural Networks (CNN) based networks are incorporated with RL, agents can decide to direct AGVs to the destination effectively, which is mitigating the risk of catastrophic collisions. Even though many of these challenges can be addressed with classical solutions, devising such solutions takes a great deal of time and effort, making this process quite expensive. With an eye on different categories of RL applications to robotics, this study will provide an overview of the use of RL in robotic applications, examining the advantages and disadvantages of state-of-the-art applications. Additionally, we provide a targeted comparative analysis between classical robotics methods and RL-based robotics methods. Along with drawing conclusions from our analysis, an outline of the future possibilities and advancements that may accelerate the progress and autonomy of robotics in the future is provided.
70	Reinforcement Learning Application in Wavefront Sensorless Adaptive Optics System Zou, Runnan 13 February 2024 (has links) With the increasing exploration of space and widespread use of communication tools worldwide, near-ground satellite communication has emerged as a promising tool in various fields such as aerospace, military, and microscopy. However, the presence of air and water in the atmosphere causes distortion in the light signal, and thus, it is essential for the ground base to retrieve the original signal from the distorted light signal sent from the satellite. Traditionally, Shack-Hartmann sensors or charge-coupled devices are integrated in the system for distortion measurement. In our pursuit of a cost-effective system establishment with optimal performance and enhanced response speed, sensors and charge-coupled devices have been replaced by a photodiode and a single mode fiber in this project. Since the system has limited observation capability, it requires a powerful controller for optimal performance. To address this issue, we have implemented an off-policy reinforcement learning framework, the soft actor-critic, in the adaptive optics system controller. This integration results in a model-free online controller capable of mitigating wavefront distortion. The soft actor-critic controller processes the acquired data matrix from the photodiode and generates a two-dimensional array control signal for the deformable mirror, which corrects the wavefront distortion induced by the atmosphere, and refocusing the signal to maximize the incoming power. The parameters of the soft actor-critic controller have been tuned to achieve optimal system performance. Simulations have been conducted to compare the performance of the proposed controller with respect to wavefront sensor-based methods. The training and verification of the proposed controller have been conducted in both static and semi-dynamic atmospheres, under different atmospheric conditions. Simulation results demonstrate that, in severe atmospheric conditions, the adaptive optics system with the soft actor-critic controller achieves more than 55% and 30% Strehl ratio on average in static and semi-dynamic atmospheres, respectively. Furthermore, the distorted wavefront's power can be concentrated at the center of the focal plane and the fiber, providing an improved signal. wavefront sensorless adaptive optics reinforcement learning

Search results