Global ETD Search

271	Risk-aware Autonomous Driving Using POMDPs and Responsibility-Sensitive Safety / POMDP-modellerad Riskmedveten Autonom Körning med Riskmått Skoglund, Caroline January 2021 (has links) Autonomous vehicles promise to play an important role aiming at increased efficiency and safety in road transportation. Although we have seen several examples of autonomous vehicles out on the road over the past years, how to ensure the safety of autonomous vehicle in the uncertain and dynamic environment is still a challenging problem. This thesis studies this problem by developing a risk-aware decision making framework. The system that integrates the dynamics of an autonomous vehicle and the uncertain environment is modelled as a Partially Observable Markov Decision Process (POMDP). A risk measure is proposed based on the Responsibility-Sensitive Safety (RSS) distance, which quantifying the minimum distance to other vehicles for ensuring safety. This risk measure is incorporated into the reward function of POMDP for achieving a risk-aware decision making. The proposed risk-aware POMDP framework is evaluated in two case studies. In a single-lane car following scenario, it is shown that the ego vehicle is able to successfully avoid a collision in an emergency event where a vehicle in front of it makes a full stop. In the merge scenario, the ego vehicle successfully enters the main road from a ramp with a satisfactory distance to other vehicles. As a conclusion, the risk-aware POMDP framework is able to realize a trade-off between safety and usability by keeping a reasonable distance and adapting to other vehicles behaviours. / Autonoma fordon förutspås spela en stor roll i framtiden med målen att förbättra effektivitet och säkerhet för vägtransporter. Men även om vi sett flera exempel av autonoma fordon ute på vägarna de senaste åren är frågan om hur säkerhet ska kunna garanteras ett utmanande problem. Det här examensarbetet har studerat denna fråga genom att utveckla ett ramverk för riskmedvetet beslutsfattande. Det autonoma fordonets dynamik och den oförutsägbara omgivningen modelleras med en partiellt observerbar Markov-beslutsprocess (POMDP från engelskans “Partially Observable Markov Decision Process”). Ett riskmått föreslås baserat på ett säkerhetsavstånd förkortat RSS (från engelskans “Responsibility-Sensitive Safety”) som kvantifierar det minsta avståndet till andra fordon för garanterad säkerhet. Riskmåttet integreras i POMDP-modellens belöningsfunktion för att åstadkomma riskmedvetna beteenden. Den föreslagna riskmedvetna POMDP-modellen utvärderas i två fallstudier. I ett scenario där det egna fordonet följer ett annat fordon på en enfilig väg visar vi att det egna fordonet kan undvika en kollision då det framförvarande fordonet bromsar till stillastående. I ett scenario där det egna fordonet ansluter till en huvudled från en ramp visar vi att detta görs med ett tillfredställande avstånd till andra fordon. Slutsatsen är att den riskmedvetna POMDP-modellen lyckas realisera en avvägning mellan säkerhet och användbarhet genom att hålla ett rimligt säkerhetsavstånd och anpassa sig till andra fordons beteenden. Computer Science - Robotics Autonomous driving motion planning under uncertainty risk estimation Elektroteknik och elektronik
272	Geometry of Optimization in Markov Decision Processes and Neural Network-Based PDE Solvers Müller, Johannes 07 June 2024 (has links) This thesis is divided into two parts dealing with the optimization problems in Markov decision processes (MDPs) and different neural network-based numerical solvers for partial differential equations (PDEs). In Part I we analyze the optimization problem arising in (partially observable) Markov decision processes using tools from algebraic statistics and information geometry, which can be viewed as neighboring fields of applied algebra and differential geometry, respectively. Here, we focus on infinite horizon problems and memoryless stochastic policies. Markov decision processes provide a mathematical framework for sequential decision-making on which most current reinforcement learning algorithms are built. They formalize the task of optimally controlling the state of a system through appropriate actions. For fully observable problems, the action can be selected knowing the current state of the system. This case has been studied extensively and optimizing the action selection is known to be equivalent to solving a linear program over the (generalized) stationary distributions of the Markov decision process, which are also referred to as state-action frequencies. In Chapter 3, we study partially observable problems where an action must be chosen based solely on an observation of the current state, which might not fully reveal the underlying state. We characterize the feasible state-action frequencies of partially observable Markov decision processes by polynomial inequalities. In particular, the optimization problem in partially observable MDPs is described as a polynomially constrained linear objective program that generalizes the (dual) linear programming formulation of fully observable problems. We use this to study the combinatorial and algebraic complexity of this optimization problem and to upper bound the number of critical points over the individual boundary components of the feasible set. Furthermore, we show that our polynomial programming formulation can be used to effectively solve partially observable MDPs using interior point methods, numerical algebraic techniques, and convex relaxations. Gradient-based methods, including variants of natural gradient methods, have gained tremendous attention in the theoretical reinforcement learning community, where they are commonly referred to as (natural) policy gradient methods. In Chapter 4, we provide a unified treatment of a variety of natural policy gradient methods for fully observable problems by studying their state-action frequencies from the standpoint of information geometry. For a variety of NPGs and reward functions, we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Morimura and co-authors and Kakade by observing that these arise from the Hessian geometries of the entropy and conditional entropy, respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. We provide experimental evidence indicating that our predicted rates are essentially tight. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the inverse penalization strength, which recovers existing results as special cases. Part II addresses neural network-based PDE solvers that have recently experienced tremendous growth in popularity and attention in the scientific machine learning community. We focus on two approaches that represent the approximation of a solution of a PDE as the minimization over the parameters of a neural network: the deep Ritz method and physically informed neural networks. In Chapter 5, we study the theoretical properties of the boundary penalty for these methods and obtain a uniform convergence result for the deep Ritz method for a large class of potentially nonlinear problems. For linear PDEs, we estimate the error of the deep Ritz method in terms of the optimization error, the approximation capabilities of the neural network, and the strength of the penalty. This reveals a trade-off in the choice of the penalization strength, where too little penalization allows large boundary values, and too strong penalization leads to a poor solution of the PDE inside the domain. For physics-informed networks, we show that when working with neural networks that have zero boundary values also the second derivatives of the solution are approximated whereas otherwise only lower-order derivatives are approximated. In Chapter 6, we propose energy natural gradient descent, a natural gradient method with respect to second-order information in the function space, as an optimization algorithm for physics-informed neural networks and the deep Ritz method. We show that this method, which can be interpreted as a generalized Gauss-Newton method, mimics Newton’s method in function space except for an orthogonal projection onto the tangent space of the model. We show that for a variety of PDEs, natural energy gradients converge rapidly and approximations to the solution of the PDE are several orders of magnitude more accurate than gradient descent, Adam and Newton’s methods, even when these methods are given more computational time.:Chapter 1. Introduction 1 1.1 Notation and conventions 7 Part I. Geometry of Markov decision processes 11 Chapter 2. Background on Markov decision processes 12 2.1 State-action frequencies 19 2.2 The advantage function and Bellman optimality 23 2.3 Rational structure of the reward and an explicit line theorem 26 2.4 Solution methods for Markov decision processes 35 Chapter 3. State-action geometry of partially observable MDPs 44 3.1 The state-action polytope of fully observables systems 45 3.2 State-action geometry of partially observable systems 54 3.3 Number and location of critical points 69 3.4 Reward optimization in state-action space (ROSA) 83 Chapter 4. Geometry and convergence of natural policy gradient methods 94 4.1 Natural gradients 96 4.2 Natural policy gradient methods 101 4.3 Convergence of natural policy gradient flows 107 4.4 Locally quadratic convergence for regularized problems 128 4.5 Discussion and outlook 131 Part II. Neural network-based PDE solvers 133 Chapter 5. Theoretical analysis of the boundary penalty method for neural network-based PDE solvers 134 5.1 Presentation and discussion of the main results 137 5.2 Preliminaries regarding Sobolev spaces and neural networks 146 5.3 Proofs regarding uniform convergence for the deep Ritz method 150 5.4 Proofs of error estimates for the deep Ritz method 156 5.5 Proofs of implications of exact boundary values in residual minimization 167 Chapter 6. Energy natural gradients for neural network-based PDE solvers 174 6.1 Energy natural gradients 176 6.2 Experiments 183 6.3 Conclusion and outlook 192 Bibliography 193 info:eu-repo/classification/ddc/500 ddc:500
273	Cognitive Networks: Foundations to Applications Friend, Daniel 21 April 2009 (has links) Fueled by the rapid advancement in digital and wireless technologies, the ever-increasing capabilities of wireless devices have placed upon us a tremendous challenge - how to put all of this capability to effective use. Individually, wireless devices have outpaced the ability of users to optimally configure them. Collectively, the complexity is far more daunting. Research in cognitive networks seeks to provide a solution to the diffculty of effectively using the expanding capabilities of wireless networks by embedding greater degrees of intelligence within the network itself. In this dissertation, we address some fundamental questions related to cognitive networks, such as "What is a cognitive network?" and "What methods may be used to design a cognitive network?" We relate cognitive networks to a common artificial intelligence (AI) framework, the multi-agent system (MAS). We also discuss the key elements of learning and reasoning, with the ability to learn being the primary differentiator for a cognitive network. Having discussed some of the fundamentals, we proceed to further illustrate the cognitive networking principle by applying it to two problems: multichannel topology control for dynamic spectrum access (DSA) and routing in a mobile ad hoc network (MANET). The multichannel topology control problem involves confguring secondary network parameters to minimize the probability that the secondary network will cause an outage to a primary user in the future. This requires the secondary network to estimate an outage potential map, essentially a spatial map of predicted primary user density, which must be learned using prior observations of spectral occupancy made by secondary nodes. Due to the complexity of the objective function, we provide a suboptimal heuristic and compare its performance against heuristics targeting power-based and interference-based topology control objectives. We also develop a genetic algorithm to provide reference solutions since obtaining optimal solutions is impractical. We show how our approach to this problem qualifies as a cognitive network. In presenting our second application, we address the role of network state observations in cognitive networking. Essentially, we need a way to quantify how much information is needed regarding the state of the network to achieve a desired level of performance. This question is applicable to networking in general, but becomes increasingly important in the cognitive network context because of the potential volume of information that may be desired for decision-making. In this case, the application is routing in MANETs. Current MANET routing protocols are largely adapted from routing algorithms developed for wired networks. Although optimal routing in wired networks is grounded in dynamic programming, the critical assumption, static link costs and states, that enables the use of dynamic programming for wired networks need not apply to MANETs. We present a link-level model of a MANET, which models the network as a stochastically varying graph that possesses the Markov property. We present the Markov decision process as the appropriate framework for computing optimal routing policies for such networks. We then proceed to analyze the relationship between optimal policy and link state information as a function of minimum distance from the forwarding node. The applications that we focus on are quite different, both in their models as well as their objectives. This difference is intentional and signficant because it disassociates the technology, i.e. cognitive networks, from the application of the technology. As a consequence, the versatility of the cognitive networks concept is demonstrated. Simultaneously, we are able to address two open problems and provide useful results, as well as new perspective, on both multichannel topology control and MANET routing. This material is posted here with permission from the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Virginia Tech library's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this material, you agree to all provisions of the copyright laws protecting it. / Ph. D. genetic algorithm Markov decision process routing dynamic spectrum access mobile ad hoc network channel allocation distributed optimization reasoning and learning multichannel topology control cognitive networks
274	Attityd-beteendegap hos svenska resenärer: : Strategier för puffar och ansvarsfull turism inom inhemska resor Ilehag, Benjamin, Sidén, Elin, Nygren, Teresa January 2024 (has links) I dagens samhälle är det svårt att undvika den negativa påverkan som resande har på miljön.Fossila bränslen dominerar fortfarande det globala energisystemet, och turistindustrins bidragtill klimatförändringarna är betydande. Trots detta finns det möjligheter för resenärer attminska sin miljöpåverkan genom att reducera sina egna utsläpp. Tidigare studier har visat atten positiv attityd gentemot miljö och hållbarhet inte alltid leder till ansvarsfullt beteendebland konsumenter, särskilt under semesterresor, vilket skapar ett attityd-beteendegap.Genom att använda puffar kan man förändra beslutsfattarens miljö, informera om olikaalternativ och främja mer hållbara val, vilket hjälper konsumenter att fatta mer ansvarsfullabeslut. Denna studie tillämpade teorier om attityd-beteendegap och puffar för att undersökaom ett sådant gap förekommer hos svenskar som reser inom Sverige, och vilka puffar somkan användas för att främja ansvarsfull turism. En kombination av kvantitativa ochkvalitativa metoder användes för att initialt undersöka attityder och beteenden genom enenkätundersökning, som sedan kompletterades med djupgående insikter från kvalitativaintervjuer. Resultatet visade på ett attityd-beteendegap bland svenska resenärer därmajoriteten hade en positiv inställning till hållbarhet och agerade ansvarsfullt i sin vardag,men att samma attityd och beteende inte applicerades på deras beslut av resa. Studienidentifierade att svenska resenärer främst efterfrågar självgynnande puffar för att kunna göramer ansvarsfulla val under sina resor. Dessa puffar kan inkludera ekonomiskt fördelaktigaerbjudanden, lättillgänglig information och fler hållbara alternativ som inte kompromissarmed resenärens bekvämlighet, smidighet eller tidsmässiga fördelar. / In today's society, it's hard to ignore the negative impact travel has on the environment. Fossilfuels currently dominate the global energy system, and the tourism industry's contribution toclimate change is significant. However, travelers have opportunities to reduce their ownenvironmental impact by lowering their emissions. Studies have shown that even whenpeople have positive attitudes towards the environment and sustainability, this doesn't alwayslead to responsible behavior, especially during vacations. This disconnect is known as theattitude-behavior gap. Using nudges can help change the decision-making environment,informing people about different options and promoting more sustainable choices. This studyexplored whether an attitude-behavior gap exists among Swedes traveling within Sweden andwhich nudges that could encourage more responsible tourism. A mix of quantitative surveysand qualitative interviews was used to examine attitudes and behaviors in detail. The findingsrevealed that while most Swedish travelers have a positive attitude towards sustainability andmake responsible choices at home, they often abandon these behaviors when traveling. It wasfound that Swedish travelers are most responsive to nudges that benefit them personally, suchas cost-saving offers, more accessible information, and sustainable options that do notcompromise their comfort, convenience, or time. Tourism Sustainable tourism responsible tourism nudges attitude-behavior gap purchase decision process Turism hållbar turism ansvarsfull turism puffar attityd-beteendegapet köpbeslutsprocess Other Civil Engineering Annan samhällsbyggnadsteknik
275	Hur fungerar egentligen köpbeslutsprocessen inom e-handel? : En kvalitativ studie om konsumenternas beteende vid konsumtion på Internet. Hjärne, Sara, Perem, Mathilda, Wallin, Ewelina January 2014 (has links) Title: How does the buying decision process really function within e-commerce? - A qualitative study of consumer behavior when they consume on Internet. Institution: School of Economics, Linnaeus University, Växjö. Course code: 2FE16E. <img src="file:///page3image2784" />Authors: Sara Hjärne, Mathilda Perem, Ewelina Wallin. Tutor: Dan Halvarsson. Examiner: Åsa Devine. Key words: Buying decision process, purchase decision process need recognition, information search, evaluation, purchase decision, postpurchase behavior, consumer decision making, E-commerce, E- commerce channels, online shopping, online purchase, e-retail, internet shopping, electronic shopping, consumer behavior, online appereal shopping, social media, decision making, online retailing, website design, customer satisfaction, webshopping, perceived risk, convinience, price, online consumption behavior. Background: Buying decision process is a model that marketers use to get a better understanding of their customers and their behavior when purchasing a product. This process consists of five different steps; need recognition, information search, evaluation of alternatives, purchase decision and evaluation. Buying decision process has for a long time been an accepted model but scientists argue that the introduction of Internet as a channel for consumption has changed this process. The Internet has also led to a change of power in which customers today have greater influence, which greatly affects the buying decision process in e-commerce Purpose: The purpose is to explore how consumers perceive their behavior when they consume through e-commerce. Research questions:How do consumers perceive the buying decision process they experience when they consume through e-commerce?How do consumers perceive different factors that are important to them when they consume through e-commerce? Methodology: Qualitative study, cross-sectional design, semi-structured interviews. Conclusion: This thesis shows that the traditional model of the buying decision process is not consistent with consumers' perception of how they are undergoing the process when applied to an e-commerce context. This process is influenced by different factors, which is influence, convenience, webpage's atmosphere, risk, price, supply, consumption occasions, expectations, delivery and return. / Titel: Hur fungerar egentligen köpbeslutsprocessen inom e-handel?- En kvalitativ studie om konsumenternas beteende vid konsumtion på Internet. Institution: Ekonomihögskolan, Linnéuniversitetet, Växjö. Kurskod: 2FE16E. Författare: Sara Hjärne, Mathilda Perem, Ewelina Wallin.<img src="file:///page5image4008" /> Handledare: Dan Halvarsson. Examinator: Åsa Devine. Nyckelord: Buying decision process, purchase decision process need recognition, information search, evaluation, purchase decision, postpurchase behavior, consumer decision making, E-commerce, E- commerce channels, online shopping, online purchase, e-retail, internet shopping, electronic shopping, consumer behavior, online appereal shopping, social media, decision making, online retailing, website design, customer satisfaction, webshopping, perceived risk, convinience, price, online consumption behavior. Bakgrund: Köpbeslutsprocessen är en modell som marknadsförare använder för att få en bättre förståelse för sina kunder och deras beteende vid köp av en produkt. Denna process består av fem olika steg; behov, informationssökning, värdering av alternativ, köpbeslut och utvärdering. Köpbeslutsprocessen har länge varit en accepterad modell men forskare menar att introduktionen av Internet som en kanal för konsumtion har även förändrat denna process. Internet har även lett till ett maktskifte där kunderna idag har större inflytande, vilket i hög grad påverkar köpbeslutsprocessen inom e-handel Syfte: Syftet är att utforska hur konsumenterna uppfattar sitt beteende när de konsumerar genom e-handel. Forskningsfrågor: Hur uppfattar konsumenter den köpbeslutsprocess de går igenom när de konsumerar genom e-handel?Hur uppfattar konsumenterna olika faktorer som är betydelsefulla för dem när de konsumerar genom e-handel? Metod: Kvalitativ studie, Tvärsnittsdesign, Semi-strukturerade intervjuer. Slutsats: Studien visar att den traditionella modellen för köpbeslutsprocessen inte stämmer överens med konsumenternas uppfattning om hur de genomgår processen i en e-handel kontext. Denna process påverkas av faktorerna inflytande, bekvämlighet, webbsidans atmosfär, risk, pris, utbud, konsumtionstillfällen, förväntningar, leverans och retur. Buying decision process information search evaluation purchase decision postpurchase behavior consumer decision making E-commerce E-commerce channels online shopping online purchase e-retail internet shopping electronic shopping consumer behavior online appereal shopping social media decision making online retailing website design customer satisfaction webshopping perceived risk convinience price online consumption behavior.
276	Parsimonious reasoning in reinforcement learning for better credit assignment Ma, Michel 08 1900 (has links) Le contenu de cette thèse explore la question de l’attribution de crédits à long terme dans l’apprentissage par renforcement du point de vue d’un biais inductif de parcimonie. Dans ce contexte, un agent parcimonieux cherche à comprendre son environnement en utilisant le moins de variables possible. Autrement dit, si l’agent est crédité ou blâmé pour un certain comportement, la parcimonie l’oblige à attribuer ce crédit (ou blâme) à seulement quelques variables latentes sélectionnées. Avant de proposer de nouvelles méthodes d’attribution parci- monieuse de crédits, nous présentons les travaux antérieurs relatifs à l’attribution de crédits à long terme en relation avec l’idée de sparsité. Ensuite, nous développons deux nouvelles idées pour l’attribution de crédits dans l’apprentissage par renforcement qui sont motivées par un raisonnement parcimonieux : une dans le cadre sans modèle et une pour l’apprentissage basé sur un modèle. Pour ce faire, nous nous appuyons sur divers concepts liés à la parcimonie issus de la causalité, de l’apprentissage supervisé et de la simulation, et nous les appliquons dans un cadre pour la prise de décision séquentielle. La première, appelée évaluation contrefactuelle de la politique, prend en compte les dévi- ations mineures de ce qui aurait pu être compte tenu de ce qui a été. En restreignant l’espace dans lequel l’agent peut raisonner sur les alternatives, l’évaluation contrefactuelle de la politique présente des propriétés de variance favorables à l’évaluation des politiques. L’évaluation contrefactuelle de la politique offre également une nouvelle perspective sur la rétrospection, généralisant les travaux antérieurs sur l’attribution de crédits a posteriori. La deuxième contribution de cette thèse est un algorithme augmenté d’attention latente pour l’apprentissage par renforcement basé sur un modèle : Latent Sparse Attentive Value Gra- dients (LSAVG). En intégrant pleinement l’attention dans la structure d’optimisation de la politique, nous montrons que LSAVG est capable de résoudre des tâches de mémoire active que son homologue sans modèle a été conçu pour traiter, sans recourir à des heuristiques ou à un biais de l’estimateur original. / The content of this thesis explores the question of long-term credit assignment in reinforce- ment learning from the perspective of a parsimony inductive bias. In this context, a parsi- monious agent looks to understand its environment through the least amount of variables possible. Alternatively, given some credit or blame for some behavior, parsimony forces the agent to assign this credit (or blame) to only a select few latent variables. Before propos- ing novel methods for parsimonious credit assignment, previous work relating to long-term credit assignment is introduced in relation to the idea of sparsity. Then, we develop two new ideas for credit assignment in reinforcement learning that are motivated by parsimo- nious reasoning: one in the model-free setting, and one for model-based learning. To do so, we build upon various parsimony-related concepts from causality, supervised learning, and simulation, and apply them to the Markov Decision Process framework. The first of which, called counterfactual policy evaluation, considers minor deviations of what could have been given what has been. By restricting the space in which the agent can reason about alternatives, counterfactual policy evaluation is shown to have favorable variance properties for policy evaluation. Counterfactual policy evaluation also offers a new perspective to hindsight, generalizing previous work in hindsight credit assignment. The second contribution of this thesis is a latent attention augmented algorithm for model-based reinforcement learning: Latent Sparse Attentive Value Gradients (LSAVG). By fully inte- grating attention into the structure for policy optimization, we show that LSAVG is able to solve active memory tasks that its model-free counterpart was designed to tackle, without resorting to heuristics or biasing the original estimator. Credit assignment Reinforcement learning Model-free Model-based Attention Counterfactual Hindsight Long-term credit assignment Parsimony Machine learning Markov decision process Attribution de crédits Apprentissage par renforcement Apprentissage basé sur un modèle Contrefactuelle Rétrospection Attribution de crédits à long terme Parcimonie Troncation Markov Decision Process Évaluation de la politique Apprentissage automatique
277	Deep Reinforcement Learning for Autonomous Highway Driving Scenario Pradhan, Neil January 2021 (has links) We present an autonomous driving agent on a simulated highway driving scenario with vehicles such as cars and trucks moving with stochastically variable velocity profiles. The focus of the simulated environment is to test tactical decision making in highway driving scenarios. When an agent (vehicle) maintains an optimal range of velocity it is beneficial both in terms of energy efficiency and greener environment. In order to maintain an optimal range of velocity, in this thesis work I proposed two novel reward structures: (a) gaussian reward structure and (b) exponential rise and fall reward structure. I trained respectively two deep reinforcement learning agents to study their differences and evaluate their performance based on a set of parameters that are most relevant in highway driving scenarios. The algorithm implemented in this thesis work is double-dueling deep-Q-network with prioritized experience replay buffer. Experiments were performed by adding noise to the inputs, simulating Partially Observable Markov Decision Process in order to obtain reliability comparison between different reward structures. Velocity occupancy grid was found to be better than binary occupancy grid as input for the algorithm. Furthermore, methodology for generating fuel efficient policies has been discussed and demonstrated with an example. / Vi presenterar ett autonomt körföretag på ett simulerat motorvägsscenario med fordon som bilar och lastbilar som rör sig med stokastiskt variabla hastighetsprofiler. Fokus för den simulerade miljön är att testa taktiskt beslutsfattande i motorvägsscenarier. När en agent (fordon) upprätthåller ett optimalt hastighetsområde är det fördelaktigt både när det gäller energieffektivitet och grönare miljö. För att upprätthålla ett optimalt hastighetsområde föreslog jag i detta avhandlingsarbete två nya belöningsstrukturer: (a) gaussisk belöningsstruktur och (b) exponentiell uppgång och nedgång belöningsstruktur. Jag utbildade respektive två djupförstärkande inlärningsagenter för att studera deras skillnader och utvärdera deras prestanda baserat på en uppsättning parametrar som är mest relevanta i motorvägsscenarier. Algoritmen som implementeras i detta avhandlingsarbete är dubbel-duell djupt Q- nätverk med prioriterad återuppspelningsbuffert. Experiment utfördes genom att lägga till brus i ingångarna, simulera delvis observerbar Markov-beslutsprocess för att erhålla tillförlitlighetsjämförelse mellan olika belöningsstrukturer. Hastighetsbeläggningsgaller visade sig vara bättre än binärt beläggningsgaller som inmatning för algoritmen. Dessutom har metodik för att generera bränsleeffektiv politik diskuterats och demonstrerats med ett exempel. Deep reinforcement learning Highway driving scenario Tactical decision making fuel reduction high-level decision making autonomous driving Lärande om djupförstärkning motorvägsscenario taktiskt beslutsfattande bränslereduktion beslut på hög nivå autonom körning Computer and Information Sciences Data- och informationsvetenskap
278	Contributions to Simulation-based High-dimensional Sequential Decision Making / Contributions sur la prise de décision séquentielle basée sur des simulations dans des environnements complexes de grande dimension Hoock, Jean-Baptiste 10 April 2013 (has links) Ma thèse s'intitule « Contributions sur la prise de décision séquentielle basée sur des simulations dans des environnements complexes de grande dimension ». Le cadre de la thèse s'articule autour du jeu, de la planification et des processus de décision markovien. Un agent interagit avec son environnement en prenant successivement des décisions. L'agent part d'un état initial jusqu'à un état final dans lequel il ne peut plus prendre de décision. A chaque pas de temps, l'agent reçoit une observation de l'état de l'environnement. A partir de cette observation et de ses connaissances, il prend une décision qui modifie l'état de l'environnement. L'agent reçoit en conséquence une récompense et une nouvelle observation. Le but est de maximiser la somme des récompenses obtenues lors d'une simulation qui part d'un état initial jusqu'à un état final. La politique de l'agent est la fonction qui, à partir de l'historique des observations, retourne une décision. Nous travaillons dans un contexte où (i) le nombre d'états est immense, (ii) les récompenses apportent peu d'information, (iii) la probabilité d'atteindre rapidement un bon état final est faible et (iv) les connaissances a priori de l'environnement sont soit inexistantes soit difficilement exploitables. Les 2 applications présentées dans cette thèse répondent à ces contraintes : le jeu de Go et le simulateur 3D du projet européen MASH (Massive Sets of Heuristics). Afin de prendre une décision satisfaisante dans ce contexte, plusieurs solutions sont apportées :1. simuler en utilisant le compromis exploration/exploitation (MCTS)2. réduire la complexité du problème par des recherches locales (GoldenEye)3. construire une politique qui s'auto-améliore (RBGP)4. apprendre des connaissances a priori (CluVo+GMCTS) L'algorithme Monte-Carlo Tree Search (MCTS) est un algorithme qui a révolutionné le jeu de Go. A partir d'un modèle de l'environnement, MCTS construit itérativement un arbre des possibles de façon asymétrique en faisant des simulations de Monte-Carlo et dont le point de départ est l'observation courante de l'agent. L'agent alterne entre l'exploration du modèle en prenant de nouvelles décisions et l'exploitation des décisions qui obtiennent statistiquement une bonne récompense cumulée. Nous discutons de 2 moyens pour améliorer MCTS : la parallélisation et l'ajout de connaissances a priori. La parallélisation ne résout pas certaines faiblesses de MCTS ; notamment certains problèmes locaux restent des verrous. Nous proposons un algorithme (GoldenEye) qui se découpe en 2 parties : détection d'un problème local et ensuite sa résolution. L'algorithme de résolution réutilise des principes de MCTS et fait ses preuves sur une base classique de problèmes difficiles. L'ajout de connaissances à la main est laborieuse et ennuyeuse. Nous proposons une méthode appelée Racing-based Genetic Programming (RBGP) pour ajouter automatiquement de la connaissance. Le point fort de cet algorithme est qu'il valide rigoureusement l'ajout d'une connaissance a priori et il peut être utilisé non pas pour optimiser un algorithme mais pour construire une politique. Dans certaines applications telles que MASH, les simulations sont coûteuses en temps et il n'y a ni connaissance a priori ni modèle de l'environnement; l'algorithme Monte-Carlo Tree Search est donc inapplicable. Pour rendre MCTS applicable dans MASH, nous proposons une méthode pour apprendre des connaissances a priori (CluVo). Nous utilisons ensuite ces connaissances pour améliorer la rapidité de l'apprentissage de l'agent et aussi pour construire un modèle. A partir de ce modèle, nous utilisons une version adaptée de Monte-Carlo Tree Search (GMCTS). Cette méthode résout de difficiles problématiques MASH et donne de bons résultats dans une application dont le but est d'améliorer un tirage de lettres. / My thesis is entitled "Contributions to Simulation-based High-dimensional Sequential Decision Making". The context of the thesis is about games, planning and Markov Decision Processes. An agent interacts with its environment by successively making decisions. The agent starts from an initial state until a final state in which the agent can not make decision anymore. At each timestep, the agent receives an observation of the state of the environment. From this observation and its knowledge, the agent makes a decision which modifies the state of the environment. Then, the agent receives a reward and a new observation. The goal is to maximize the sum of rewards obtained during a simulation from an initial state to a final state. The policy of the agent is the function which, from the history of observations, returns a decision. We work in a context where (i) the number of states is huge, (ii) reward carries little information, (iii) the probability to reach quickly a good final state is weak and (iv) prior knowledge is either nonexistent or hardly exploitable. Both applications described in this thesis present these constraints : the game of Go and a 3D simulator of the european project MASH (Massive Sets of Heuristics). In order to take a satisfying decision in this context, several solutions are brought : 1. Simulating with the compromise exploration/exploitation (MCTS) 2. Reducing the complexity by local solving (GoldenEye) 3. Building a policy which improves itself (RBGP) 4. Learning prior knowledge (CluVo+GMCTS) Monte-Carlo Tree Search (MCTS) is the state of the art for the game of Go. From a model of the environment, MCTS builds incrementally and asymetrically a tree of possible futures by performing Monte-Carlo simulations. The tree starts from the current observation of the agent. The agent switches between the exploration of the model and the exploitation of decisions which statistically give a good cumulative reward. We discuss 2 ways for improving MCTS : the parallelization and the addition of prior knowledge. The parallelization does not solve some weaknesses of MCTS; in particular some local problems remain challenges. We propose an algorithm (GoldenEye) which is composed of 2 parts : detection of a local problem and then its resolution. The algorithm of resolution reuses some concepts of MCTS and it solves difficult problems of a classical database. The addition of prior knowledge by hand is laborious and boring. We propose a method called Racing-based Genetic Programming (RBGP) in order to add automatically prior knowledge. The strong point is that RBGP rigorously validates the addition of a prior knowledge and RBGP can be used for building a policy (instead of only optimizing an algorithm). In some applications such as MASH, simulations are too expensive in time and there is no prior knowledge and no model of the environment; therefore Monte-Carlo Tree Search can not be used. So that MCTS becomes usable in this context, we propose a method for learning prior knowledge (CluVo). Then we use pieces of prior knowledge for improving the rapidity of learning of the agent and for building a model, too. We use from this model an adapted version of Monte-Carlo Tree Search (GMCTS). This method solves difficult problems of MASH and gives good results in an application to a word game. Fouille d'arbres Monte-Carlo Apprentissage avec simulations Jeux Planification Processus de décision markovien MoGo MASH Monte Carlo Tree Search Learning from simulations Games Planning Markov decision process MoGo MASH
279	Sociálně pedagogická práce s minoritami / The socio-pedagogical work with minorities Matuštíková, Hana January 2011 (has links) This Diploma Thesis deals with the issues of social and pedagogical work with the minorities. The opening chapters aim to determine the basic theoretical terms and they introduce basic information about some selected national minorities. The core of the practical part of this work describes the processes and specific activities, which can help in integrating pupils, who belong to ethnic minorities, into the educational system. The closing chapters of this Diploma Thesis include an empirical survey, focused on professional orientation and vocational selection among the Roma pupils at the basic school in Obrnice. Attention is drawn to the family influence and the role of school and other institutions operating in this field.
280	Contribuições ao processo de tomada de decisão estratégica a partir dos conhecimentos da neurociência cognitiva / Contributions to the strategic decision making process from the cognitive neuroscience knowledges Porto, Maria Cecilia Galante 06 October 2015 (has links) Avanços recentes no tema de fronteira que exerce fascínio e curiosidade - a Neurociência - vêm explicitando conceitos sofisticados sobre um assunto emergente na Administração: o aumento do conhecimento na área da Neurociência Cognitiva e suas contribuições para a área de tomada de decisão. À luz desses avanços, a presente pesquisa possui natureza exploratória, cuja proposta contribui para integrar os conhecimentos em Neurociência Cognitiva e tomada de decisão estratégica em administração, sob a ótica comportamental. O objetivo principal do estudo foi propor contribuições ao processo de tomada de decisão estratégica a partir dos conhecimentos da Neurociência Cognitiva. Utilizou-se o método da revisão em profundidade da literatura, com o objetivo de apoiar a análise do conteúdo nas dimensões-alvo do estudo: processo de tomada de decisão estratégica, pensamento estratégico sob a ótica da racionalidade limitada, Neurociência Cognitiva e neurociência da decisão. As contribuições obtidas estão alicerçadas em três vertentes: (1) contribuições para a pesquisa, (2) contribuições para as práticas de gestão e (3) contribuições para a didática e ensino. Na perspectiva da pesquisa, a Neurociência Cognitiva possibilita evidências confirmatórias sobre fatores subjetivos, sobretudo os emocionais, que guiam o comportamento do decisor durante as fases do processo decisório, mediante o fornecimento de metodologias para testar teorias e novos conceitos. Na perspectiva das contribuições para a gestão, a ampliação da consciência dos gestores sobre as emoções, heurísticas e vieses presentes no processo decisório estratégico permite: (a) o alinhamento de expectativas sobre os resultados da decisão estratégica; (b) estimular as atitudes da liderança para uma postura mais protagonista no decorrer do processo, resultando em maior inovação nas práticas de gestão; (c) o reconhecimento da intuição associada à criatividade como competência importante para a decisão estratégica, assegurando maior precisão sobre o futuro da decisão; (d) o aceite das heurísticas da mente, possibilitando simplicidade, facilitando o entendimento de todos os envolvidos e gerando transparência no processo decisório; (e) considerar os objetivos individuais dos decisores não declarados no nível da organização, otimizando a implementação do plano estratégico; (f) o fornecimento de informações sobre a política nas decisões estratégicas, mediante a aplicação de técnicas neurocientíficas que possam trazer maior conhecimento sobre o peso da evidência na tomada de decisão estratégica. Há de se considerar, ainda, que o reforço da aprendizagem, acarretando possíveis mudanças biológicas nas sinapses cerebrais, contribui para o exercício do pensamento estratégico e, consequentemente, maior precisão nas decisões futuras. A incorporação da abordagem neurocientífica na didática do ensino sobre tomada de decisão estratégica contribui para: (a) preparo do aluno afim de superar fatores de ordem cognitiva no nível individual e em grupo que encontrarão no processo decisório estratégico; (b) facilitação do embasamento das constatações da teoria de decisão comportamental; (c) reforço da aprendizagem, sugerindo-se a inserção das técnicas de cenários e a análise ambiental com vistas à prática de avaliações prévias sobre eventos incertos que possam afetar o processo decisório estratégico; (d) incorporação do ensino de decisão das competências analíticas e intuitivas encontradas, por exemplo, nos cursos de criatividade e inovação, alinhando-se as técnicas formais de ensino com a prática da gestão. Além da relevância dos pontos citados, a pesquisa da temática é inédita, o que possibilita uma nova abordagem de pesquisas em decisão estratégica que incorpore as contribuições da Neurociência Cognitiva. / Recent advances in the pioneering theme that brings fascination and curiosity - Neuroscience - have been explaining sophisticated concepts in an Administration emergent topic: the improving knowledge in the Cognitive Neuroscience field and its contribution for the decision making studies. In light of these advances, this research has an exploratory approach, which proposal contributes to the integration of the Cognitive Neuroscience and strategic decision making in administration from the behavioral viewpoint. The main goal of this study is to propose involvement to the strategic decision making process from the Cognitive Neuroscience knowledge. The deep literature revision method was used to target the content analysis of the study dimensions: strategic decision making process, strategic thinking from the perspective of bounded rationality, Cognitive Neuroscience and decision neuroscience. There are three thresholds for the achieved contributions: (1) involvement to the research, (2) management of best practices and (3) inputs to the teaching and learning process. From the perspective of this research, the Cognitive Neuroscience provides confirmatory evidence on subjective factors especially the emotional ones, which guide the decision maker behavior in the decision process, by providing methodologies for new theories and concepts proof. From the management contribution perspective, the expansion of the manager awareness on emotions, heuristics and biases in decision-making process allows: (a) alignment of the expectations on the decision making results; (b) encourage the leadership attitude to assume a protagonist posture in the process, resulting in higher innovation in management practices; (c) recognition of the intuition associated with creativity as an important competence for the strategic decision, ensuring a better precision on the decision future; (d) acceptance of the heuristics minds allowing simplicity, facilitating the understanding of all those involved and creating transparency in the decision making process; (e) consider the non-declared decision maker individual goals at the organization level, optimizing the strategic plan implementation; (f) information provision about the strategic decision policies, by applying neuroscientific techniques that can bring better insights into the evidence relevance in the strategic decision making process. One must also consider that the learning enhancement, resulting in possible brain synapses biological changes, contributes to the strategic thinking exercise and, consequently, to more accurate future decisions. The incorporation of neuroscientific approach in didactic teaching on strategic decision-making contributes to: (a) student preparation to overcome cognitive order factors, individually and in groups, that will be found in the strategic decision process; (b) facilitate the basis of the findings of the behavioral decision theory; (c) learning improvement suggesting the insertion of scenario techniques and the environmental analysis focusing on previous assessments practices of uncertain events that may affect the strategic decision process; (d) incorporation of analytical and intuitive decision competences teaching, aligning the formal teaching techniques with the management practices. Besides the relevance of the above mentioned points, the thematic research is unprecedented, which enables a new approach in strategic decision researches to incorporate the Cognitive Neuroscience contributions Administração estratégica Cognitive neuroscience Decision neuroscience Gestão estratégica Neurociência cognitiva Neurociência da decisão Organizational neuroscience Processo decisório estratégico Strategic decision making Strategic decision process Strategic management Tomada de Decisão Tomada de decisão estratégica

Search results