Global ETD Search

171	Online Learning for Linearly Parametrized Control Problems Abbasi-Yadkori, Yasin Unknown Date No description available. Online Learning Confidence Sets Linear Bandits Reinforcement Learning
172	Complying with norms : a neurocomputational exploration Colombo, Matteo January 2012 (has links) The subject matter of this thesis can be summarized by a triplet of questions and answers. Showing what these questions and answers mean is, in essence, the goal of my project. The triplet goes like this: Q: How can we make progress in our understanding of social norms and norm compliance? A: Adopting a neurocomputational framework is one effective way to make progress in our understanding of social norms and norm compliance. Q: What could the neurocomputational mechanism of social norm compliance be? A: The mechanism of norm compliance probably consists of Bayesian - Reinforcement Learning algorithms implemented by activity in certain neural populations. Q: What could information about this mechanism tell us about social norms and social norm compliance? A: Information about this mechanism tells us that: a1: Social norms are uncertainty-minimizing devices. a2: Social norm compliance is one trick that agents employ to interact coadaptively and smoothly in their social environment. Most of the existing treatments of norms and norm compliance (e.g. Bicchieri 2006; Binmore 1993; Elster 1989; Gintis 2010; Lewis 1969; Pettit 1990; Sugden 1986; Ullmann‐Margalit 1977) consist in what Cristina Bicchieri (2006) refers to as “rational reconstructions.” A rational reconstruction of the concept of social norm “specifies in which sense one may say that norms are rational, or compliance with a norm is rational” (Ibid., pp. 10-11). What sets my project apart from these types of treatments is that it aims, first and foremost, at providing a description of some core aspects of the mechanism of norm compliance. The single most original idea put forth in my project is to bring an alternative explanatory framework to bear on social norm compliance. This is the framework of computational cognitive neuroscience. The chapters of this thesis describe some ways in which central issues concerning social norms can be fruitfully addressed within a neurocomputational framework. In order to qualify and articulate the triplet above, my strategy consists firstly in laying down the beginnings of a model of the mechanism of norm compliance behaviour, and then zooming in on specific aspects of the model. Such a model, the chapters of this thesis argue, explains apparently important features of the psychology and neuroscience of norm compliance, and helps us to understand the nature of the social norms we live by. 306
173	A study of learning models for analyzing prisoners' dilemma game data / 囚犯困境資料分析之學習模型研究賴宜祥, Lai, Yi Hsiang Unknown Date (has links) 人們如何在重覆的囚犯困境賽局選擇策略是本文探討的議題，其中的賽局學習理論就是預測賽局的參與者(player)會選擇何種策略。本文使用的資料包括3個囚犯困境的實驗，各自有不同的實驗設定及配對程序，參加者都是政治大學的大學部學生，我們將使用這些資料比較不同的學習模型。除了常見的3個學習模型：增強學習模型(Reinforcement Learning model)、信念學習模型(Belief Learning model)及加權經驗吸引模型(Experience-Weighted Attraction model)，本文也提出一個延伸的增強學習模型(Extended reinforcement learning model)。接著將分析劃為Training (in-sample)及Testing (out-sample)，並比較各實驗間或模型間的結果。　　雖然延伸增強學習模型(Extended reinforcement learning model)較原始的增強學習模型(Reinforcement learning model)多了一個參數，該模型(Extended reinforcement learning model)在Training(in-sample)及Testing(out-sample)表現多較之前的模型來得些許的好。 / How people choose strategies in a finite repeated prisoners’ dilemma game is of interest in Game Theory. The way to predict which strategies the people choose in a game is so-called game learning theory. The objective of this study is to find a proper learning model for the prisoners’ dilemma game data collected in National Cheng-Chi University. The game data consist of three experiments with different game and matching rules. Four learning models are considered, including Reinforcement learning model, Belief learning model, Experience Weighted Attraction learning model and a proposed model modified from reinforcement learning model. The data analysis was divided into 2 parts: training (in-sample) and testing (out-sample). The proposed learning model is slightly better than the original reinforcement learning model no matter when in training or testing prediction although one more parameter is added. The performances of prediction by model fitting are all better than guessing the decisions with equal chance. Game learning model Reinforcement learning Attraction Prisoners' dilemma Belief learning
174	Model-based active learning in hierarchical policies Cora, Vlad M. 05 1900 (has links) Hierarchical task decompositions play an essential role in the design of complex simulation and decision systems, such as the ones that arise in video games. Game designers find it very natural to adopt a divide-and-conquer philosophy of specifying hierarchical policies, where decision modules can be constructed somewhat independently. The process of choosing the parameters of these modules manually is typically lengthy and tedious. The hierarchical reinforcement learning (HRL) field has produced elegant ways of decomposing policies and value functions using semi-Markov decision processes. However, there is still a lack of demonstrations in larger nonlinear systems with discrete and continuous variables. To narrow this gap between industrial practices and academic ideas, we address the problem of designing efficient algorithms to facilitate the deployment of HRL ideas in more realistic settings. In particular, we propose Bayesian active learning methods to learn the relevant aspects of either policies or value functions by focusing on the most relevant parts of the parameter and state spaces respectively. To demonstrate the scalability of our solution, we have applied it to The Open Racing Car Simulator (TORCS), a 3D game engine that implements complex vehicle dynamics. The environment is a large topological map roughly based on downtown Vancouver, British Columbia. Higher level abstract tasks are also learned in this process using a model-based extension of the MAXQ algorithm. Our solution demonstrates how HRL can be scaled to large applications with complex, discrete and continuous non-linear dynamics. Hierarchical Reinforcement Learning Decision Theory Bayesian Active Learning Robotics
175	A service-oriented approach to topology formation and resource discovery in wireless ad-hoc networks Gonzalez Valenzuela, Sergio 05 1900 (has links) The past few years have witnessed a significant evolution in mobile computing and communications, in which new trends and applications have the traditional role of computer networks into that of distributed service providers. In this thesis we explore an alternative way to form wireless ad-hoc networks whose topologies can be customized as required by the users’ software applications. In particular, we investigate the applicability of mobile codes to networks created by devices equipped with Bluetooth technology. Computer simulations results suggest that our proposed approach can achieve this task effectively, while matching the level of efficiency seen in other salient proposals in this area. This thesis also addresses the issue of service discovery in mobile ad-hoc networks. We propose the use of a directory whose network location varies in an attempt to reduce traffic overhead driven by users’ hosts looking for service information. We refer to this scheme as the Service Directory Placement Algorithm, or SDPA. We formulate the directory relocation problem as a Markov Decision Process that is solved by using Q-learning. Performance evaluations through computer simulations reveal bandwidth overhead reductions that range between 40% and 48% when compared with a basic broadcast flooding approach for networks comprising hosts moving at pedestrian speeds. We then extend our proposed approach and introduce a multi-directory service discovery system called the Service Directory Placement Protocol, or SDPP. Our findings reveal bandwidth overhead reductions typically ranging from 15% to 75% in networks comprising slow-moving hosts with restricted memory availability. In the fourth and final part of this work, we present the design foundations and architecture of a middleware system that called WISEMAN – WIreless Sensors Employing Mobile Agents. We employ WISEMAN for dispatching and processing mobile programs in Wireless Sensor Networks (WSNs). Our proposed system enables the dynamic creation of semantic relationships between network nodes that cooperate to provide an aggregate service. We present discussions on the advantages of our proposed approach, and in particular, how WISEMAN facilitates the realization of service-oriented tasks in WSNs. Service discovery Topology formation Reinforcement learning Mobile computing
176	A Biologically Inspired Four Legged Walking Robot shiqi.peng@woodside.com.au, Shiqi Peng January 2006 (has links) This Ph.D. thesis presents the design and implementation of a biologically inspired four-phase walking strategy using behaviours for a four legged walking robot. In particular, the walking strategy addresses the balance issue, including both static and dynamic balance that were triggered non-deterministically based on the robots realtime interaction with the environment. Four parallel Subsumption Architectures (SA) and a simple Central Pattern Producer (CPP) are employed in the physical implementation of the walking strategy. An implementation framework for such a parallel Subsumption Architecture is also proposed to facilitate the reusability of the system. A Reinforcement Learning (RL) method was integrated into the CPP to allow the robot to learn the optimal walking cycle interval (OWCI), appropriate for the robot walking on various terrain conditions. Experimental results demonstrate that the robot employs the proposed walking strategy and can successfully carry out its walking behaviours under various experimental terrain conditions, such as flat ground, incline, decline and uneven ground. Interactions of all the behaviours of the robot enable it to exhibit a combination of both preset and emergent walking behaviours. Behaviour Legged Robot Phase Biologically Subsumption Architecture Reinforcement Learning
177	Reinforcement-learning based output-feedback controller for nonlinear discrete-time system with application to spark ignition engines operating lean and EGR Shih, Peter, January 2007 (has links) (PDF) Thesis (M.S.)--University of Missouri--Rolla, 2007. / Vita. The entire thesis text is included in file. Title from title screen of thesis/dissertation PDF file (viewed May 16, 2007) Includes bibliographical references.
178	Mobilized ad-hoc networks: A reinforcement learning approach Chang, Yu-Han, Ho, Tracey, Kaelbling, Leslie Pack 04 December 2003 (has links) Research in mobile ad-hoc networks has focused on situations in whichnodes have no control over their movements. We investigate animportant but overlooked domain in which nodes do have controlover their movements. Reinforcement learning methods can be used tocontrol both packet routing decisions and node mobility, dramaticallyimproving the connectivity of the network. We first motivate theproblem by presenting theoretical bounds for the connectivityimprovement of partially mobile networks and then present superiorempirical results under a variety of different scenarios in which themobile nodes in our ad-hoc network are embedded with adaptive routingpolicies and learned movement policies. AI reinforcement learning multi-agent learning ad-hoc networking
179	Computational Modeling of the Basal Ganglia : Functional Pathways and Reinforcement Learning Berthet, Pierre January 2015 (has links) We perceive the environment via sensor arrays and interact with it through motor outputs. The work of this thesis concerns how the brain selects actions given the information about the perceived state of the world and how it learns and adapts these selections to changes in this environment. Reinforcement learning theories suggest that an action will be more or less likely to be selected if the outcome has been better or worse than expected. A group of subcortical structures, the basal ganglia (BG), is critically involved in both the selection and the reward prediction. We developed and investigated a computational model of the BG. We implemented a Bayesian-Hebbian learning rule, which computes the weights between two units based on the probability of their activations. We were able test how various configurations of the represented pathways impacted the performance in several reinforcement learning and conditioning tasks. Then, following the development of a more biologically plausible version with spiking neurons, we simulated lesions in the different pathways and assessed how they affected learning and selection. We observed that the evolution of the weights and the performance of the models resembled qualitatively experimental data. The absence of an unique best way to configure the model over all the learning paradigms tested indicates that an agent could dynamically configure its action selection mode, mainly by including or not the reward prediction values in the selection process. We present hypotheses on possible biological substrates for the reward prediction pathway. We base these on the functional requirements for successful learning and on an analysis of the experimental data. We further simulate a loss of dopaminergic neurons similar to that reported in Parkinson’s disease. We suggest that the associated motor symptoms are mostly causedby an impairment of the pathway promoting actions, while the pathway suppressing them seems to remain functional. / <p>At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 3: Manuscript.</p><p> </p> computational neuroscience modelisation reinforcement learning basal ganglia dopamine
180	Aplicação da rede GTSOM para navegação de robôs móveis utilizando aprendizado por reforço / Using the GTSOM network for mobile robot navigation with reinforcement learning Menegaz, Mauricio January 2009 (has links) Neste trabalho será descrita uma arquitetura de agente robótico autônomo projetada para ser capaz de criar uma representação de estado do ambiente e de realizar o aprendizado de tarefas simples em cima desta representação. A rede GTSOM (BASTOS, 2007) foi selecionada como método para classificação de estados. Sua tarefa é transformar os dados multidimensionais e contínuos lidos dos sensores em uma representação discreta, permitindo o uso de aprendizado por reforço convencional. Algumas modificações no algoritmo da rede foram necessárias para que pudesse ser aplicada neste contexto. Juntamente com esta rede, foi utilizado um mapa de grade que permite associar as experiências sensoriais com sua localização espacial. Enquanto a rede GTSOM é o ponto central de um sistema de classificação de estados, o algoritmo Q-Learning de aprendizado por reforço foi utilizado para a realização da tarefa. Utilizando a representação compacta de estado criada pela rede auto-organizável, o agente aprende as ações que devem ser executadas em cada ponto, para atingimento de seus objetivos. O modelo foi testado com um experimento que consiste em encontrar um objeto em um labirinto. Os resultados obtidos nos testes mostraram que o modelo consegue segmentar adequadamente o espaço de estados, e realiza o aprendizado da tarefa. O agente consegue aprender a evitar colisões e memorizar a localização do alvo, podendo chegar até ele independentemente de sua posição inicial. Além disso, é capaz de expandir sua representação sempre que se depara com situações não conhecidas, ao mesmo tempo que gradualmente remove da memória estados associados a experiências que não se repetem. / This work describes an architecture for an autonomous robotic agent that is capable of creating a state representation of its environment and learning how to execute simple tasks using this representation. The GTSOM Neural Network was chosen as the method for state clustering. It is used to transform the multidimensional and continuous state signal into a discrete representation, allowing the use of conventional reinforcement learning techniques. Some modifications on the algorithm were necessary so that it could be used in this project. This network is used together with a grid map algorithm that allows the model to associate the sensor readings with the places where they ocurred. While the GTSOM network is the main component of a state clustering system, the Q-Learning reinforcement learning method was chosen for the task execution. Using the compact state representation created by the self-organizing network, the agent learns which actions to execute at each state in order to achieve its objectives. The model was tested in an experiment that consists in finding the path in a maze. The results show that it can divide the state space in an useful way, and is capable of executing the task. It learns to avoid collisions and remembers the location of the target, even when the robot’s initial position is changed. Furthermore, the representation is expanded when the agent faces an unknown situation, and at the same time, states associated with old experiences are forgotten. Inteligência artificial Redes neurais Robotics Neural networks Reinforcement learning

Search results