Global ETD Search

41	Deep Reinforcement Learning for Autonomous Highway Driving Scenario Pradhan, Neil January 2021 (has links) We present an autonomous driving agent on a simulated highway driving scenario with vehicles such as cars and trucks moving with stochastically variable velocity profiles. The focus of the simulated environment is to test tactical decision making in highway driving scenarios. When an agent (vehicle) maintains an optimal range of velocity it is beneficial both in terms of energy efficiency and greener environment. In order to maintain an optimal range of velocity, in this thesis work I proposed two novel reward structures: (a) gaussian reward structure and (b) exponential rise and fall reward structure. I trained respectively two deep reinforcement learning agents to study their differences and evaluate their performance based on a set of parameters that are most relevant in highway driving scenarios. The algorithm implemented in this thesis work is double-dueling deep-Q-network with prioritized experience replay buffer. Experiments were performed by adding noise to the inputs, simulating Partially Observable Markov Decision Process in order to obtain reliability comparison between different reward structures. Velocity occupancy grid was found to be better than binary occupancy grid as input for the algorithm. Furthermore, methodology for generating fuel efficient policies has been discussed and demonstrated with an example. / Vi presenterar ett autonomt körföretag på ett simulerat motorvägsscenario med fordon som bilar och lastbilar som rör sig med stokastiskt variabla hastighetsprofiler. Fokus för den simulerade miljön är att testa taktiskt beslutsfattande i motorvägsscenarier. När en agent (fordon) upprätthåller ett optimalt hastighetsområde är det fördelaktigt både när det gäller energieffektivitet och grönare miljö. För att upprätthålla ett optimalt hastighetsområde föreslog jag i detta avhandlingsarbete två nya belöningsstrukturer: (a) gaussisk belöningsstruktur och (b) exponentiell uppgång och nedgång belöningsstruktur. Jag utbildade respektive två djupförstärkande inlärningsagenter för att studera deras skillnader och utvärdera deras prestanda baserat på en uppsättning parametrar som är mest relevanta i motorvägsscenarier. Algoritmen som implementeras i detta avhandlingsarbete är dubbel-duell djupt Q- nätverk med prioriterad återuppspelningsbuffert. Experiment utfördes genom att lägga till brus i ingångarna, simulera delvis observerbar Markov-beslutsprocess för att erhålla tillförlitlighetsjämförelse mellan olika belöningsstrukturer. Hastighetsbeläggningsgaller visade sig vara bättre än binärt beläggningsgaller som inmatning för algoritmen. Dessutom har metodik för att generera bränsleeffektiv politik diskuterats och demonstrerats med ett exempel. Deep reinforcement learning Highway driving scenario Tactical decision making fuel reduction high-level decision making autonomous driving Lärande om djupförstärkning motorvägsscenario taktiskt beslutsfattande bränslereduktion beslut på hög nivå autonom körning Computer and Information Sciences Data- och informationsvetenskap
42	A belief-desire-intention architechture with a logic-based planner for agents in stochastic domains Rens, Gavin B. 02 1900 (has links) This dissertation investigates high-level decision making for agents that are both goal and utility driven. We develop a partially observable Markov decision process (POMDP) planner which is an extension of an agent programming language called DTGolog, itself an extension of the Golog language. Golog is based on a logic for reasoning about action—the situation calculus. A POMDP planner on its own cannot cope well with dynamically changing environments and complicated goals. This is exactly a strength of the belief-desire-intention (BDI) model: BDI theory has been developed to design agents that can select goals intelligently, dynamically abandon and adopt new goals, and yet commit to intentions for achieving goals. The contribution of this research is twofold: (1) developing a relational POMDP planner for cognitive robotics, (2) specifying a preliminary BDI architecture that can deal with stochasticity in action and perception, by employing the planner. / Computing / M. Sc. (Computer Science) Cognitive robotics Intelligent agents Partial observability Situation calculus Planning POMDP Belief-desire-intention paradigm BDI theory Logic Situation calculus Golog 006.3 Markov processes Statistical decision Dynamic programming Robots -- Control systems Automatic control Robotics
43	Des algorithmes presque optimaux pour les problèmes de décision séquentielle à des fins de collecte d'information Araya-López, Mauricio 04 February 2013 (has links) (PDF) Le formalisme des MDP, comme ses variantes, sert typiquement à contrôler l'état d'un système par l'intermédiaire d'un agent et de sa politique. Lorsque l'agent fait face à des informations incomplètes, sa politique peut eff ectuer des actions pour acquérir de l'information typiquement (1) dans le cas d'une observabilité partielle, ou (2) dans le cas de l'apprentissage par renforcement. Toutefois cette information ne constitue qu'un moyen pour contrôler au mieux l'état du système, de sorte que la collecte d'informations n'est qu'une conséquence de la maximisation de la performance escomptée. Cette thèse s'intéresse au contraire à des problèmes de prise de décision séquentielle dans lesquels l'acquisition d'information est une fin en soi. Plus précisément, elle cherche d'abord à savoir comment modi fier le formalisme des POMDP pour exprimer des problèmes de collecte d'information et à proposer des algorithmes pour résoudre ces problèmes. Cette approche est alors étendue à des tâches d'apprentissage par renforcement consistant à apprendre activement le modèle d'un système. De plus, cette thèse propose un nouvel algorithme d'apprentissage par renforcement bayésien, lequel utilise des transitions locales optimistes pour recueillir des informations de manière e fficace tout en optimisant la performance escomptée. Grâce à une analyse de l'existant, des résultats théoriques et des études empiriques, cette thèse démontre que ces problèmes peuvent être résolus de façon optimale en théorie, que les méthodes proposées sont presque optimales, et que ces méthodes donnent des résultats comparables ou meilleurs que des approches de référence. Au-delà de ces résultats concrets, cette thèse ouvre la voie (1) à une meilleure compréhension de la relation entre la collecte d'informations et les politiques optimales dans les processus de prise de décision séquentielle, et (2) à une extension des très nombreux travaux traitant du contrôle de l'état d'un système à des problèmes de collecte d'informations. collecte d'informations transitions optimistes POMDP apprentissage par renforcement bayésien apprentissage du modèle d'un MDP modèles bayésiens
44	A belief-desire-intention architechture with a logic-based planner for agents in stochastic domains Rens, Gavin B. 02 1900 (has links) This dissertation investigates high-level decision making for agents that are both goal and utility driven. We develop a partially observable Markov decision process (POMDP) planner which is an extension of an agent programming language called DTGolog, itself an extension of the Golog language. Golog is based on a logic for reasoning about action—the situation calculus. A POMDP planner on its own cannot cope well with dynamically changing environments and complicated goals. This is exactly a strength of the belief-desire-intention (BDI) model: BDI theory has been developed to design agents that can select goals intelligently, dynamically abandon and adopt new goals, and yet commit to intentions for achieving goals. The contribution of this research is twofold: (1) developing a relational POMDP planner for cognitive robotics, (2) specifying a preliminary BDI architecture that can deal with stochasticity in action and perception, by employing the planner. / Computing / M. Sc. (Computer Science) Cognitive robotics Intelligent agents Partial observability Situation calculus Planning POMDP Belief-desire-intention paradigm BDI theory Logic Situation calculus Golog 006.3 Markov processes Statistical decision Dynamic programming Robots -- Control systems Automatic control Robotics
45	Reimagining Human-Machine Interactions through Trust-Based Feedback Kumar Akash (8862785) 17 June 2020 (has links) <div>Intelligent machines, and more broadly, intelligent systems, are becoming increasingly common in the everyday lives of humans. Nonetheless, despite significant advancements in automation, human supervision and intervention are still essential in almost all sectors, ranging from manufacturing and transportation to disaster-management and healthcare. These intelligent machines<i> interact and collaborate</i> with humans in a way that demands a greater level of trust between human and machine. While a lack of trust can lead to a human's disuse of automation, over-trust can result in a human trusting a faulty autonomous system which could have negative consequences for the human. Therefore, human trust should be <i>calibrated </i>to optimize these human-machine interactions. This calibration can be achieved by designing human-aware automation that can infer human behavior and respond accordingly in real-time.</div><div><br></div><div>In this dissertation, I present a probabilistic framework to model and calibrate a human's trust and workload dynamics during his/her interaction with an intelligent decision-aid system. More specifically, I develop multiple quantitative models of human trust, ranging from a classical state-space model to a classification model based on machine learning techniques. Both models are parameterized using data collected through human-subject experiments. Thereafter, I present a probabilistic dynamic model to capture the dynamics of human trust along with human workload. This model is used to synthesize optimal control policies aimed at improving context-specific performance objectives that vary automation transparency based on human state estimation. I also analyze the coupled interactions between human trust and workload to strengthen the model framework. Finally, I validate the optimal control policies using closed-loop human subject experiments. The proposed framework provides a foundation toward widespread design and implementation of real-time adaptive automation based on human states for use in human-machine interactions.</div> Mechanical Engineering Human Machine Interaction Control systems Human Robot Interaction trust in automation Behavior Modeling user models Machine Learning classification algorithm POMDP Markov processes Optimal Control Optimal Policy Human Subjects Research human subjects data Adaptive automation Trust Calibration state-space-model parameter estimation algorithms

Page generated in 0.043 seconds