Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
471 |
Reinforcement Learning Strategies for a Context-Aware Adaptive Cruise ControlJoganantham, Rubina 29 April 2022 (has links)
Adaptive Cruise Control (ACC), which is a smart combination of pre-existing cruise control and time gap control, plays a major role in rendering driving comfort for the
drivers. Currently available ACC system allows the vehicle to maintain the set speed and to automatically adjust the speed to keep up the fixed distance to the vehicle
ahead. Here, the speed and the distance are set as per user preferences. Each individual user has their own perceptions and preferences but the existing ACC system
lacks the property of user adaption. Hence, this thesis focuses on automatizing the distance settings of the ACC system, which can be adapted to each individual users.
In order to incorporate the property of user specific distance setting for ACC, the most relevant contexts in which a change in ACC distance needed is sorted out and
a standard distance setting is assigned. Reinforcement-Learning strategies are handled where by the pre-existing distance settings can be modified and adapted to the
user once they start driving.
|
472 |
Exploiting Multi-Modal Fusion for Urban Autonomous Driving Using Latent Deep Reinforcement LearningKhalil, Yasser 29 April 2022 (has links)
Human driving decisions are the leading cause of road fatalities. Autonomous driving naturally eliminates such incompetent decisions and thus can improve traffic safety and efficiency. Deep reinforcement learning (DRL) has shown great potential in learning complex tasks. Recently, researchers investigated various DRL-based approaches for autonomous driving. However, exploiting multi-modal fusion to generate pixel-wise perception and motion prediction and then leveraging these predictions to train a latent DRL has not been targeted yet. Unlike other DRL algorithms, the latent DRL algorithm distinguishes representation learning from task learning, enhancing sampling efficiency for reinforcement learning. In addition, supplying the latent DRL algorithm with accurate perception and motion prediction simplifies the surrounding urban scenes, improving training and thus learning a better driving policy. To that end, this Ph.D. research initially develops LiCaNext, a novel real-time multi-modal fusion network to produce accurate joint perception and motion prediction at a pixel level. Our proposed approach relies merely on a LIDAR sensor, where its multi-modal input is composed of bird's-eye view (BEV), range view (RV), and range residual images. Further, this Ph.D. thesis proposes leveraging these predictions with another simple BEV image to train a sequential latent maximum entropy reinforcement learning (MaxEnt RL) algorithm. A sequential latent model is deployed to learn a more compact latent representation from high-dimensional inputs. Subsequently, the MaxEnt RL model trains on this latent space to learn a driving policy. The proposed LiCaNext is trained on the public nuScenes dataset. Results demonstrated that LiCaNext operates in real-time and performs better than the state-of-the-art in perception and motion prediction, especially for small and distant objects. Furthermore, simulation experiments are conducted on CARLA to evaluate the performance of our proposed approach that exploits LiCaNext predictions to train sequential latent MaxEnt RL algorithm. The simulated experiments manifest that our proposed approach learns a better driving policy outperforming other prevalent DRL-based algorithms. The learned driving policy achieves the objectives of safety, efficiency, and comfort. Experiments also reveal that the learned policy maintains its effectiveness under different environments and varying weather conditions.
|
473 |
Fleet management strategies for urban Mobility-on-Demand systemsChaudhari, Harshal Anil 23 February 2022 (has links)
In recent years, the paradigm of personal urban mobility has radically evolved as an increasing number of Mobility-on-Demand (MoD) systems continue to revolutionize urban transportation. Hailed as the future of sustainable transportation, with significant implications on urban planning, these systems typically utilize a fleet of shared vehicles such as bikes, electric scooters, cars, etc., and provide a centralized matching platform to deliver point-to-point mobility to passengers. In this dissertation, we study MoD systems along three operational directions – (1) modeling: developing analytical models that capture the rich stochasticity of passenger demand and its impact on the fleet distribution, (2) economics: devising strategies to maximize revenue, and (3) control: developing coordination mechanisms aimed at optimizing platform throughput.
First, we focus on the metropolitan bike-sharing systems where platforms typically do not have access to real-time location data to ascertain the exact spatial distribution of their fleet. We formulate the problem of accurately predicting the fleet distribution as a Markov Chain monitoring problem on a graph representation of a city. Specifically, each monitor provides information on the exact number of bikes transitioning to a specific node or traversing a specific edge at a particular time. Under budget constraints on the number of such monitors, we design efficient algorithms to determine appropriate monitoring operations and demonstrate their efficacy over synthetic and real datasets.
Second, we focus on the revenue maximization strategies for individual strategic driving partners on ride-hailing platforms. Under the key assumption that large-scale platform dynamics are agnostic to the actions of an individual strategic driver, we propose a series of dynamic programming-based algorithms to devise contingency plans that maximize the expected earnings of a driver. Using robust optimization techniques, we rigorously reason about and analyze the sensitivity of such strategies to perturbations in passenger demand distributions.
Finally, we address the problem of large-scale fleet management. Recent approaches for the fleet management problem have leveraged model-free deep reinforcement learning (RL) based algorithms to tackle complex decision-making problems. However, such methods suffer from a lack of explainability and often fail to generalize well. We consider an explicit need-based coordination mechanism to propose a non-deep RL-based algorithm that augments tabular Q-learning with a combinatorial optimization problem. Empirically, a case study on the New York City taxi demand enables a rigorous assessment of the value, robustness, and generalizability of the proposed approaches.
|
474 |
Methode zur Gestaltung sicherer präskriptiver Systeme für die TherapieunterstützungSkowron, Philipp 01 March 2022 (has links)
Die vorliegende Arbeit befasst sich mit der Gestaltung sicherer präskriptiver Systeme für die Therapieunterstützung. Ziel dabei ist es, den Entwicklungsprozess von der Definition der Ziele, bis hin zur Abwicklung beim Endkunden abzubilden und verschiedene organisatorische, technische, sicherheitskritische und therapeutische Aspekte explizit einzubinden. Dabei lassen sich vorab Probleme und Hindernisse im Entwicklungsprozess abwenden, die möglicherweise ein Scheitern oder eine Inakzeptanz nach sich ziehen würden. Im speziellen Fokus der Methode liegt die explizite Betrachtung und Abbildung der Sicherheit von lernenden und automatisierten entscheidungsunterstützenden Algorithmen, welche eine Therapieunterstützung aktiv fördern. Dies wird mit einer ausdrücklichen Darstellung von sicherheitsrelevanten Anforderungen und deren Integration in alle Phasen des Vorgehensmodells der Methode, in den in dieser Arbeit entwickelten Ansatz, eingebracht. Hierbei spielen nicht nur die technischen und die organisatorischen Absicherungen eine Rolle, sondern ebenso der Brückenschlag zwischen Entwicklung und Domäne, welcher durchgängig im Vorgehensmodell der Methode einen kontinuierlichen Wissenstransfer zur Gewährleistung der Sicherheit und Nützlichkeit des Therapiesystems ermöglicht. Zusätzlich, zu der Wissenskopplung zwischen Entwicklung und Domäne, unterstützt das entwickelte Messsystem zur Risikoabschätzung von präskriptiven Algorithmen die Bewertung von Sicherheitsrisiken, indem es in bestehende Managementmethoden eine prozessuale und bewertbare Risikoabschätzung integriert. Insgesamt stellt die entwickelte Methode mit ihren Komponenten Techniken, Verfahren und Abläufe zur Verfügung, um die Gestaltung von sicheren und therapeutisch zielgerichteten entscheidungsunterstützenden Systemen, unter Einbezug der Zielgruppe, zu ermöglichen.
|
475 |
Reinforcement Learning for Multiple Time Series: Forex Trading ApplicationDong, Juntao January 2020 (has links)
No description available.
|
476 |
The effect of stress on the explore-exploit dilemmaFerguson, Thomas 05 April 2022 (has links)
When humans are faced with multiple options, they must decide whether to choose a novel or less certain option (explore) or stick with what they know (exploit). Exploration is a fundamental cognitive process. Importantly, when humans attempt to solve the explore-exploit dilemma, they must effectively incorporate both feedback and uncertainty to guide their actions. While prior work has shown that both acute (short-term) and chronic (long-term) stress can disrupt how humans solve the explore-exploit dilemma, the mechanisms of how this occurs are unclear. For example, does stress disrupt how people integrate feedback to guide their decisions to explore or exploit, or does stress disrupt computations of uncertainty regarding their choices? Importantly, the use of electroencephalography as a tool can help reveal the impact of stress on explore-exploit decision making by measuring neural signals sensitive to feedback learning and uncertainty. In the present dissertation, I provide evidence from a series of experiments where I examined the impact of both acute and chronic stress on the explore-exploit dilemma while electroencephalographic data was collected. In experiment 1, I exposed participants to an acute stressor and then examined their decisions to switch or stay – as a proxy for explore and exploit decisions – in a multi-arm bandit paradigm. I found tentative evidence that the acute stress response disrupted both the feedback learning signal (the reward positivity) and the uncertainty signal (the switch P300). In experiment 2 I adopted a computational neuroscience approach and directly classified participants decisions as explorations or exploitations using reinforcement learning models. There was only an effect of the acute stress response on feedback signals, in this case, the feedback P300. In experiments 1 and 2, I used contextual bandit tasks where the reward probabilities of the options shifted throughout, and there was no behavioural effect of acute stress on task performance or exploration rate. However, in experiment 3, I examined a learnable bandit where one option was preferred. Again, using computational modelling and electroencephalography, I found tentative evidence that the acute stress response disrupted the feedback learning signals (the feedback P300) and stronger evidence that acute stress disrupted the uncertainty signal (the exploration P300). As well, I observed that the acute stress response reduced task performance and increased exploration rate. Lastly, in experiment 4, I examined the impact of chronic stress exposure on explore-exploit decision making and electrophysiology – while I found no effects of chronic stress, I believe future research is necessary. Taken together, these findings provide novel evidence for the neural mechanisms of how the acute stress response impacts the explore-exploit dilemma through disruptions to feedback learning and assessments of uncertainty. These findings also highlight how theories of the P300 signal may not be properly capturing the varied role of the P300 in cognition. / Graduate
|
477 |
Pre-processing and Feature Extraction Methods for Smart Biomedical Signal Monitoring : Algorithms and ApplicationsChahid, Abderrazak 11 1900 (has links)
Human health is monitored through several physiological measurements such as heart rate, blood pressure, brain activity, etc. These measurements are taken at predefined points in the body and recorded as temporal signals or colorful images for diagnosis purposes. During the diagnosis, physicians analyze these recordings, sometimes visually, to make treatment decisions. These recordings are usually contaminated with noise caused by different factors such as physiological artifacts or electronic noises of the used electrodes/instruments. Therefore, the pre-processing of these signals and images becomes a crucial need to provide clinicians with useful information to make the right decisions. This Ph.D. work proposes and discusses different biomedical signal processing algorithms and their applications. It develops novel signal/image pre-processing algorithms, based on the Semi-Classical Signal Analysis method (SCSA), to enhance the quality of biomedical signals and images. The SCSA method is based on the decomposition of the input signal or image, using the squared eigenfunctions of a Semi-Classical Schrodinger operator. This approach shows great potential in denoising, and residual water-peak suppression for Magnetic Resonance Spectroscopy (MRS) signals compared to the existing methods. In addition, it shows very promising noise removal, particularly from pulse-shaped signals and from Magnetic Resonance (MR) images. In clinical practice, extracting informative characteristics or features from these pre-processed recordings is very important for advanced analysis and diagnosis. Therefore, new features and proposed are extracted based on the SCSA and fed to machine learning models for smart biomedical diagnosis such as predicting epileptic spikes in Magnetoencephalography (MEG). Moreover, a new Quantization-based Position Weight Matrix (QuPWM) feature extraction method is proposed for other biomedical classifications, such as predicting true Poly(A) regions in a DNA sequence, multiple hand gesture prediction. These features can be used to understand different complex systems, such as hand gesture/motion mechanism and help in the smart decision-making process. Finally, combining such features with reinforcement learning models will undoubtedly help automate the diagnoses and enhance the decision-making, which will accelerate the digitization of different industrial sectors. For instance, these features can help to study and understand fish growth in an End-To-End system for aquaculture environments. Precisely, this application’s preliminary results show very encouraging insights in optimally controlling the feeding while preserving the desired growth profile.
|
478 |
Cognitive control modulates pain during effortful goal-directed behaviourHeydari, Sepideh 10 September 2020 (has links)
Many theories of decision-making consider pain, monetary loss, and other forms of punishment to be interchangeable quantities that are processed by the same neural system. For example, standard reinforcement learning models utilize a single reinforcement term to represent both monetary losses and pain signals. By contrast, I propose that 1) pain signals present unique computational challenges, 2) these challenges are addressed in humans and other animals by anterior cingulate cortex (ACC), and 3) pain is regulated by cognitive control during goal-directed tasks, using principles of the hierarchical reinforcement learning model of the ACC (HRL-ACC). To show this, I conducted 3 studies. In Study 1, I conducted an electrophysiological study to investigate the effect of task goals on event-related brain potentials (ERPs) during conditions where pain and reward are used. Specifically, I investigated whether feedback stimuli predicting forthcoming pain would elicit the reward positivity, an ERP component that is more positive-going to positive feedback than to negative feedback, when the goal of the task is to find electrical shocks. Contrary to my predictions, a standard reward positivity was not elicited by pain feedback in this task. In Study 2, I conducted three behavioral experiments wherein the subjective costs of mild electrical shocks were equated with monetary losses for each individual participant using a calibration procedure. I hypothesized that decision-making behavior in face of painful events and decision making behavior in the face of monetary losses would be different from each other despite the outcomes (pain vs. monetary loss) being equated for their subjective costs. This prediction was confirmed, demonstrating that the costs associated with pain and monetary losses differ in more than just magnitude. In Study 3, to explain these results, I developed an extension to an existing computational framework, the HRL-ACC model. The present model provides insight into choice behaviour in the pain and monetary loss (ML) conditions by showing that cognitive control levels converge to an average level across trials. In the pain condition, cognitive control fluctuates from trial to trial in a systematic fashion, causing trials with low shock levels to be over-valued and shocks with high-shock levels to be undervalued. By contrast, in the ML condition cognitive wanes across trials because it is not needed and the model displays normative behavior. These findings are in line with psychological approaches to pain treatment and provide neuro-cognitive explanations that underlie their mechanisms. In line with the HRL-ACC theory, I propose that the ACC regulates pain by motivating good performance in the face of physical punishments (but not monetary losses) in order to achieve long-term goals that are produced by ACC. / Graduate / 2021-08-18
|
479 |
Optimization Foundations of Reinforcement LearningBhandari, Jalaj January 2020 (has links)
Reinforcement learning (RL) has attracted rapidly increasing interest in the machine learning and artificial intelligence communities in the past decade. With tremendous success already demonstrated for Game AI, RL offers great potential for applications in more complex, real world domains, for example in robotics, autonomous driving and even drug discovery. Although researchers have devoted a lot of engineering effort to deploy RL methods at scale, many state-of-the art RL techniques still seem mysterious - with limited theoretical guarantees on their behaviour in practice.
In this thesis, we focus on understanding convergence guarantees for two key ideas in reinforcement learning, namely Temporal difference learning and policy gradient methods, from an optimization perspective. In Chapter 2, we provide a simple and explicit finite time analysis of Temporal difference (TD) learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. Our convergence results extend seamlessly to the study of TD learning with eligibility traces, known as TD(λ), and to Q-learning for a class of high-dimensional optimal stopping problems.
In Chapter 3, we turn our attention to policy gradient methods and present a simple and general understanding of their global convergence properties. The main challenge here is that even for simple control problems, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to a stationary point of the objective. We identify structural properties -- shared by finite MDPs and several classic control problems -- which guarantee that despite non-convexity, any stationary point of the policy gradient objective is globally optimal. In the final chapter, we extend our analysis for finite MDPs to show linear convergence guarantees for many popular variants of policy gradient methods like projected policy gradient, Frank-Wolfe, mirror descent and natural policy gradients.
|
480 |
Learning from Scholarly Attributed Graphs for Scientific DiscoveryAkujuobi, Uchenna Thankgod 18 October 2020 (has links)
Research and experimentation in various scientific fields are based on the knowledge and ideas from scholarly literature. The advancement of research and development has, thus, strengthened the importance of literary analysis and understanding. However, in recent years, researchers have been facing massive scholarly documents published at an exponentially increasing rate. Analyzing this vast number of publications is far beyond the capability of individual researchers.
This dissertation is motivated by the need for large scale analyses of the exploding number of scholarly literature for scientific knowledge discovery. In the first part of this dissertation, the interdependencies between scholarly literature are studied. First, I develop Delve – a data-driven search engine supported by our designed semi-supervised edge classification method. This system enables users to search and analyze the relationship between datasets and scholarly literature. Based on the Delve system, I propose to study information extraction as a node classification problem in attributed networks. Specifically, if we can learn the research topics of documents (nodes in a network), we can aggregate documents by topics and retrieve information specific to each topic (e.g., top-k popular datasets).
Node classification in attributed networks has several challenges: a limited number of labeled nodes, effective fusion of topological structure and node/edge attributes, and the co-existence of multiple labels for one node. Existing node classification approaches can only address or partially address a few of these challenges. This dissertation addresses these challenges by proposing semi-supervised multi-class/multi-label node classification models to integrate node/edge attributes and topological relationships.
The second part of this dissertation examines the problem of analyzing the interdependencies between terms in scholarly literature. I present two algorithms for the automatic hypothesis generation (HG) problem, which refers to the discovery of meaningful implicit connections between scientific terms, including but not limited to diseases, drugs, and genes extracted from databases of biomedical publications. The automatic hypothesis generation problem is modeled as a future connectivity prediction in a dynamic attributed graph. The key is to capture the temporal evolution of node-pair (term-pair) relations. Experiment results and case study analyses highlight the effectiveness of the proposed algorithms compared to the baselines’ extension.
|
Page generated in 0.1083 seconds