Global ETD Search

11	Mathematical Description of Differential Hebbian Plasticity and its Relation to Reinforcement Learning / Mathematische Beschreibung Hebb'scher Plastizität und deren Beziehung zu Bestärkendem Lernen Kolodziejski, Christoph Markus 13 February 2009 (has links) No description available. 530 Physik RDH 200 Mathematics and Natural Science Theoretische Neurowissenschaft Plastizität Hebb'sche Plastizität Theoretical Neuroscience Plasticity Hebbian Plasticity Spike-Timing-Dependent Plasticity Reinforcement Learning Temporal Difference Learning 33.19
12	A formal investigation of dopamine’s role in Attention-Deficit/Hyperactive Disorder: evidence for asymmetrically effective reinforcement learning signals Cockburn, Jeffrey 14 January 2010 (has links) Attention-Deficit/Hyperactive Disorder is a well studied but poorly understood disorder. Given that the underlying neurological mechanisms involved in the disorder have yet to be established, diagnosis is dependent upon behavioural markers. However, recent research has begun to associate a dopamine system dysfunction with ADHD; though, consensus on the nature of dopamine’s role in ADHD has yet to be established. Here, I use a computational modelling approach to investigate two opposing theories of the dopaminergic dysfunction in ADHD. The hyper-active dopamine theory posits that ADHD is associated with a midbrain dopamine system that produces abnormally large prediction errors signals; whereas the dynamic developmental theory argues that abnormally small prediction errors give rise to ADHD. Given that these two theories center on the size of prediction errors encoded by the midbrain dopamine system, I have formally investigated the implications of each theory within the framework of temporal-difference learning, a reinforcement learning algorithm demonstrated to model midbrain dopamine activity. The results presented in this thesis suggest that neither theory provides a good account for the behaviour of children and animal models of ADHD. Instead, my results suggest ADHD is the result of asymmetrically effective reinforcement learning signals encoded by the midbrain dopamine system. More specifically, the model presented here reproduced behaviours associated with ADHD when positive prediction errors were more effective than negative prediction errors. The biological sources of this asymmetry are considered, as are other computational models of ADHD. ADHD dopamine reinforcement model temporal difference learning prediction error asymmetry
13	Deep Reinforcement Learning for the Optimization of Combining Raster Images in Forest Planning Wen, Yangyang January 2021 (has links) Raster images represent the treatment options of how the forest will be cut. Economic benefits from cutting the forest will be generated after the treatment is selected and executed. Existing raster images have many clusters and small sizes, this becomes the principal cause of overhead. If we can fully explore the relationship among the raster images and combine the old data sets according to the optimization algorithm to generate a new raster image, then this result will surpass the existing raster images and create higher economic benefits. The question of this project is can we create a dynamic model that treats the updating pixel’s status as an agent selecting options for an empty raster image in response to neighborhood environmental and landscape parameters. This project is trying to explore if it is realistic to use deep reinforcement learning to generate new and superior raster images. Finally, this project aims to explore the feasibility, usefulness, and effectiveness of deep reinforcement learning algorithms in optimizing existing treatment options. The problem was modeled as a Markov decision process, in which the pixel to be updated was an agent of the empty raster image, which would determine the choice of the treatment option for the current empty pixel. This project used the Deep Q learning neural network model to calculate the Q values. The temporal difference reinforcement learning algorithm was applied to predict future rewards and to update model parameters. After the modeling was completed, this project set up the model usefulness experiment to test the usefulness of the model. Then the parameter correlation experiment was set to test the correlation between the parameters and the benefit of the model. Finally, the trained model was used to generate a larger size raster image to test its effectiveness. Raster images Optimization Deep Reinforcement Learning Markov Decision Process Deep Q Learning Neural Network Temporal Difference Model Usefulness Parameter Correlation Model Effectiveness. Computer Systems Datorsystem
14	Hraní nedeterministických her s učením / Playing of Nondeterministic Games with Learning Bukovský, Marek January 2011 (has links) The thesis is dedicated to the study and implementation of methods used for learning from the course of playing. The chosen game for this thesis is Backgammon. The algorithm used for training neural networks is called the temporal difference learning with use of eligible traces. This algorithm is also known as TD(lambda). The theoretical part describes algorithms for playing games without learning, introduction to reinforcement learning, temporal difference learning and introduction to artificial neural networks. The practical part deals with application of combination of neural networks and TD(lambda) algorithms.
15	Modèle informatique du coapprentissage des ganglions de la base et du cortex : l'apprentissage par renforcement et le développement de représentations Rivest, François 12 1900 (has links) Tout au long de la vie, le cerveau développe des représentations de son environnement permettant à l’individu d’en tirer meilleur profit. Comment ces représentations se développent-elles pendant la quête de récompenses demeure un mystère. Il est raisonnable de penser que le cortex est le siège de ces représentations et que les ganglions de la base jouent un rôle important dans la maximisation des récompenses. En particulier, les neurones dopaminergiques semblent coder un signal d’erreur de prédiction de récompense. Cette thèse étudie le problème en construisant, à l’aide de l’apprentissage machine, un modèle informatique intégrant de nombreuses évidences neurologiques. Après une introduction au cadre mathématique et à quelques algorithmes de l’apprentissage machine, un survol de l’apprentissage en psychologie et en neuroscience et une revue des modèles de l’apprentissage dans les ganglions de la base, la thèse comporte trois articles. Le premier montre qu’il est possible d’apprendre à maximiser ses récompenses tout en développant de meilleures représentations des entrées. Le second article porte sur l'important problème toujours non résolu de la représentation du temps. Il démontre qu’une représentation du temps peut être acquise automatiquement dans un réseau de neurones artificiels faisant office de mémoire de travail. La représentation développée par le modèle ressemble beaucoup à l’activité de neurones corticaux dans des tâches similaires. De plus, le modèle montre que l’utilisation du signal d’erreur de récompense peut accélérer la construction de ces représentations temporelles. Finalement, il montre qu’une telle représentation acquise automatiquement dans le cortex peut fournir l’information nécessaire aux ganglions de la base pour expliquer le signal dopaminergique. Enfin, le troisième article évalue le pouvoir explicatif et prédictif du modèle sur différentes situations comme la présence ou l’absence d’un stimulus (conditionnement classique ou de trace) pendant l’attente de la récompense. En plus de faire des prédictions très intéressantes en lien avec la littérature sur les intervalles de temps, l’article révèle certaines lacunes du modèle qui devront être améliorées. Bref, cette thèse étend les modèles actuels de l’apprentissage des ganglions de la base et du système dopaminergique au développement concurrent de représentations temporelles dans le cortex et aux interactions de ces deux structures. / Throughout lifetime, the brain develops abstract representations of its environment that allow the individual to maximize his benefits. How these representations are developed while trying to acquire rewards remains a mystery. It is reasonable to assume that these representations arise in the cortex and that the basal ganglia are playing an important role in reward maximization. In particular, dopaminergic neurons appear to code a reward prediction error signal. This thesis studies the problem by constructing, using machine learning tools, a computational model that incorporates a number of relevant neurophysiological findings. After an introduction to the machine learning framework and to some of its algorithms, an overview of learning in psychology and neuroscience, and a review of models of learning in the basal ganglia, the thesis comprises three papers. The first article shows that it is possible to learn a better representation of the inputs while learning to maximize reward. The second paper addresses the important and still unresolved problem of the representation of time in the brain. The paper shows that a time representation can be acquired automatically in an artificial neural network acting like a working memory. The representation learned by the model closely resembles the activity of cortical neurons in similar tasks. Moreover, the model shows that the reward prediction error signal could accelerate the development of the temporal representation. Finally, it shows that if such a learned representation exists in the cortex, it could provide the necessary information to the basal ganglia to explain the dopaminergic signal. The third article evaluates the explanatory and predictive power of the model on the effects of differences in task conditions such as the presence or absence of a stimulus (classical versus trace conditioning) while waiting for the reward. Beyond making interesting predictions relevant to the timing literature, the paper reveals some shortcomings of the model that will need to be resolved. In summary, this thesis extends current models of reinforcement learning of the basal ganglia and the dopaminergic system to the concurrent development of representation in the cortex and to the interactions between these two regions. Apprentissage par renforcement Reinforcement learning Apprentissage par différence temporelle Temporal-difference learning Conditionnement classique Classical conditioning Conditionnement de trace Trace conditioning Cortex Cortex Dopamine Dopamine Ganglions de la base Basal ganglia Intervalle de temps Interval timing Neuroscience informatique Computational neuroscience Représentation abstraite Abstract representation
16	Modèle informatique du coapprentissage des ganglions de la base et du cortex : l'apprentissage par renforcement et le développement de représentations Rivest, François 12 1900 (has links) Tout au long de la vie, le cerveau développe des représentations de son environnement permettant à l’individu d’en tirer meilleur profit. Comment ces représentations se développent-elles pendant la quête de récompenses demeure un mystère. Il est raisonnable de penser que le cortex est le siège de ces représentations et que les ganglions de la base jouent un rôle important dans la maximisation des récompenses. En particulier, les neurones dopaminergiques semblent coder un signal d’erreur de prédiction de récompense. Cette thèse étudie le problème en construisant, à l’aide de l’apprentissage machine, un modèle informatique intégrant de nombreuses évidences neurologiques. Après une introduction au cadre mathématique et à quelques algorithmes de l’apprentissage machine, un survol de l’apprentissage en psychologie et en neuroscience et une revue des modèles de l’apprentissage dans les ganglions de la base, la thèse comporte trois articles. Le premier montre qu’il est possible d’apprendre à maximiser ses récompenses tout en développant de meilleures représentations des entrées. Le second article porte sur l'important problème toujours non résolu de la représentation du temps. Il démontre qu’une représentation du temps peut être acquise automatiquement dans un réseau de neurones artificiels faisant office de mémoire de travail. La représentation développée par le modèle ressemble beaucoup à l’activité de neurones corticaux dans des tâches similaires. De plus, le modèle montre que l’utilisation du signal d’erreur de récompense peut accélérer la construction de ces représentations temporelles. Finalement, il montre qu’une telle représentation acquise automatiquement dans le cortex peut fournir l’information nécessaire aux ganglions de la base pour expliquer le signal dopaminergique. Enfin, le troisième article évalue le pouvoir explicatif et prédictif du modèle sur différentes situations comme la présence ou l’absence d’un stimulus (conditionnement classique ou de trace) pendant l’attente de la récompense. En plus de faire des prédictions très intéressantes en lien avec la littérature sur les intervalles de temps, l’article révèle certaines lacunes du modèle qui devront être améliorées. Bref, cette thèse étend les modèles actuels de l’apprentissage des ganglions de la base et du système dopaminergique au développement concurrent de représentations temporelles dans le cortex et aux interactions de ces deux structures. / Throughout lifetime, the brain develops abstract representations of its environment that allow the individual to maximize his benefits. How these representations are developed while trying to acquire rewards remains a mystery. It is reasonable to assume that these representations arise in the cortex and that the basal ganglia are playing an important role in reward maximization. In particular, dopaminergic neurons appear to code a reward prediction error signal. This thesis studies the problem by constructing, using machine learning tools, a computational model that incorporates a number of relevant neurophysiological findings. After an introduction to the machine learning framework and to some of its algorithms, an overview of learning in psychology and neuroscience, and a review of models of learning in the basal ganglia, the thesis comprises three papers. The first article shows that it is possible to learn a better representation of the inputs while learning to maximize reward. The second paper addresses the important and still unresolved problem of the representation of time in the brain. The paper shows that a time representation can be acquired automatically in an artificial neural network acting like a working memory. The representation learned by the model closely resembles the activity of cortical neurons in similar tasks. Moreover, the model shows that the reward prediction error signal could accelerate the development of the temporal representation. Finally, it shows that if such a learned representation exists in the cortex, it could provide the necessary information to the basal ganglia to explain the dopaminergic signal. The third article evaluates the explanatory and predictive power of the model on the effects of differences in task conditions such as the presence or absence of a stimulus (classical versus trace conditioning) while waiting for the reward. Beyond making interesting predictions relevant to the timing literature, the paper reveals some shortcomings of the model that will need to be resolved. In summary, this thesis extends current models of reinforcement learning of the basal ganglia and the dopaminergic system to the concurrent development of representation in the cortex and to the interactions between these two regions. Apprentissage par renforcement Reinforcement learning Apprentissage par différence temporelle Temporal-difference learning Conditionnement classique Classical conditioning Conditionnement de trace Trace conditioning Cortex Cortex Dopamine Dopamine Ganglions de la base Basal ganglia Intervalle de temps Interval timing Neuroscience informatique Computational neuroscience Représentation abstraite Abstract representation
17	MP-Draughts - Um Sistema Multiagente de Aprendizagem Automática para Damas Baseado em Redes Neurais de Kohonen e Perceptron Multicamadas Duarte, Valquíria Aparecida Rosa 17 July 2009 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The goal of this work is to present MP-Draughts (MultiPhase- Draughts), that is a multiagent environment for Draughts, where one agent - named IIGA- is built and trained such as to be specialized for the initial and the intermediate phases of the games and the remaining ones for the final phases of them. Each agent of MP-Draughts is a neural network which learns almost without human supervision (distinctly from the world champion agent Chinook). MP-Draughts issues from a continuous activity of research whose previous product was the efficient agent VisionDraughts. Despite its good general performance, VisionDraughts frequently does not succeed in final phases of a game, even being in advantageous situation compared to its opponent (for instance, getting into endgame loops). In order to try to reduce this misbehavior of the agent during endgames, MP-Draughts counts on 25 agents specialized for endgame phases, each one trained such as to be able to deal with a determined cluster of endgame boardstates. These 25 clusters are mined by a Kohonen-SOM Network from a Data Base containing a large quantity of endgame boardstates. After trained, MP-Draughts operates in the following way: first, an optimized version of VisionDraughts is used as IIGA; next, the endgame agent that represents the cluster which better fits the current endgame board-state will replace it up to the end of the game. This work shows that such a strategy significantly improves the general performance of the player agents. / O objetivo deste trabalho é propor um sistema de aprendizagem de Damas, o MPDraughts (MultiPhase- Draughts): um sistema multiagentes, em que um deles - conhecido como IIGA (Initial/Intermediate Game Agent)- é desenvolvido e treinado para ser especializado em fases iniciais e intermediárias de jogo e os outros 25 agentes, em fases finais. Cada um dos agentes que compõe o MP-Draughts é uma rede neural que aprende a jogar com o mínimo possível de intervenção humana (distintamente do agente campeão do mundo Chinook). O MP-Draughts é fruto de uma contínua atividade de pesquisa que teve como produto anterior o VisionDraughts. Apesar de sua eficiência geral, o Vision- Draughts, muitas vezes, tem seu bom desempenho comprometido na fase de finalização de partidas, mesmo estando em vantagem no jogo em comparação com o seu oponente (por exemplo, entrando em loop de final de jogo). No sentido de reduzir o comportamento indesejado do jogador, o MP-Draughts conta com 25 agentes especializados em final de jogo, sendo que cada um é treinado para lidar com um determinado tipo de cluster de tabuleiros de final de jogo. Esses 25 clusters são minerados por redes de Kohonen-SOM de uma base de dados que contém uma grande quantidade de estado de tabuleiro de final de jogo. Depois de treinado, o MP-Draughts atua da seguinte maneira: primeiro, uma versão aprimorada do VisionDraughts é usada como o IIGA; depois, um agente de final de jogo que representa o cluster que mais se aproxima do estado corrente do tabuleiro do jogo deverá substituir o IIGA e conduzir o jogo até o final. Este trabalho mostra que essa estratégia melhorou, significativamente, o desempenho geral do agente jogador. / Mestre em Ciência da Computação Sistemas multi-agentes Algoritmos de clusterização Redes neurais artificiais Aprendizagem por reforço Aprendizagem por diferenças temporais Busca eficiente Jogos Inteligência artificial Jogos (Dama) Redes neurais - Computação Multiagent system Clustering algorithm Artificial neural network Reinforcement learning Temporal difference learning Network Efficient search Game
18	Accelerated algorithms for temporal difference learning methods Rankawat, Anushree 12 1900 (has links) L'idée centrale de cette thèse est de comprendre la notion d'accélération dans les algorithmes d'approximation stochastique. Plus précisément, nous tentons de répondre à la question suivante : Comment l'accélération apparaît-elle naturellement dans les algorithmes d'approximation stochastique ? Nous adoptons une approche de systèmes dynamiques et proposons de nouvelles méthodes accélérées pour l'apprentissage par différence temporelle (TD) avec approximation de fonction linéaire : Polyak TD(0) et Nesterov TD(0). Contrairement aux travaux antérieurs, nos méthodes ne reposent pas sur une conception des méthodes de TD comme des méthodes de descente de gradient. Nous étudions l'interaction entre l'accélération, la stabilité et la convergence des méthodes accélérées proposées en temps continu. Pour établir la convergence du système dynamique sous-jacent, nous analysons les modèles en temps continu des méthodes d'approximation stochastique accélérées proposées en dérivant la loi de conservation dans un système de coordonnées dilaté. Nous montrons que le système dynamique sous-jacent des algorithmes proposés converge à un rythme accéléré. Ce cadre nous fournit également des recommandations pour le choix des paramètres d'amortissement afin d'obtenir ce comportement convergent. Enfin, nous discrétisons ces ODE convergentes en utilisant deux schémas de discrétisation différents, Euler explicite et Euler symplectique, et nous analysons leurs performances sur de petites tâches de prédiction linéaire. / The central idea of this thesis is to understand the notion of acceleration in stochastic approximation algorithms. Specifically, we attempt to answer the question: How does acceleration naturally show up in SA algorithms? We adopt a dynamical systems approach and propose new accelerated methods for temporal difference (TD) learning with linear function approximation: Polyak TD(0) and Nesterov TD(0). In contrast to earlier works, our methods do not rely on viewing TD methods as gradient descent methods. We study the interplay between acceleration, stability, and convergence of the proposed accelerated methods in continuous time. To establish the convergence of the underlying dynamical system, we analyze continuous-time models of the proposed accelerated stochastic approximation methods by deriving the conservation law in a dilated coordinate system. We show that the underlying dynamical system of our proposed algorithms converges at an accelerated rate. This framework also provides us recommendations for the choice of the damping parameters to obtain this convergent behavior. Finally, we discretize these convergent ODEs using two different discretization schemes, explicit Euler, and symplectic Euler, and analyze their performance on small, linear prediction tasks. Temporal difference learning Stochastic Approximation Accelerated methods Momentum methods Reinforcement learning Approximate Dynamic Programming Function approximation Conservation laws Convergence rates Machine learning Méthodes des différences temporelles Approximation Stochastique Méthodes accélérées Méthodes de quantité de mouvement Apprentissage par renforcement Programmation dynamique approchée Lois de conservation Taux de convergence Apprentissage automatique
19	Uma nova abordagem de aprendizagem de máquina combinando elicitação automática de casos, aprendizagem por reforço e mineração de padrões sequenciais para agentes jogadores de damas Castro Neto, Henrique de 21 November 2016 (has links) Fundação de Amparo a Pesquisa do Estado de Minas Gerais / Agentes que operam em ambientes onde as tomadas de decisão precisam levar em conta, além do ambiente, a atuação minimizadora de um oponente (tal como nos jogos), é fundamental que o agente seja dotado da habilidade de, progressivamente, traçar um perĄl de seu adversário que o auxilie em seu processo de seleção de ações apropriadas. Entretanto, seria improdutivo construir um agente com um sistema de tomada de decisão baseado apenas na elaboração desse perĄl, pois isso impediria o agente de ter uma Şidentidade própriaŤ, o que o deixaria a mercê de seu adversário. Nesta direção, este trabalho propõe um sistema automático jogador de Damas híbrido, chamado ACE-RL-Checkers, dotado de um mecanismo dinâmico de tomada de decisões que se adapta ao perĄl de seu oponente no decorrer de um jogo. Em tal sistema, o processo de seleção de ações (movimentos) é conduzido por uma composição de Rede Neural de Perceptron Multicamadas e biblioteca de casos. No caso, a Rede Neural representa a ŞidentidadeŤ do agente, ou seja, é um módulo tomador de decisões estático já treinado e que faz uso da técnica de Aprendizagem por Reforço TD( ). Por outro lado, a biblioteca de casos representa o módulo tomador de decisões dinâmico do agente que é gerada pela técnica de Elicitação Automática de Casos (um tipo particular de Raciocínio Baseado em Casos). Essa técnica possui um comportamento exploratório pseudo-aleatório que faz com que a tomada de decisão dinâmica do agente seja guiada, ora pelo perĄl de jogo do adversário, ora aleatoriamente. Contudo, ao conceber tal arquitetura, é necessário evitar o seguinte problema: devido às características inerentes à técnica de Elicitação Automática de Casos, nas fases iniciais do jogo Ű em que a quantidade de casos disponíveis na biblioteca é extremamente baixa em função do exíguo conhecimento do perĄl do adversário Ű a frequência de tomadas de decisão aleatórias seria muito elevada, o que comprometeria o desempenho do agente. Para atacar tal problema, este trabalho também propõe incorporar à arquitetura do ACE-RLCheckers um terceiro módulo, composto por uma base de regras de experiência extraída a partir de jogos de especialistas humanos, utilizando uma técnica de Mineração de Padrões Sequenciais. O objetivo de utilizar tal base é reĄnar e acelerar a adaptação do agente ao perĄl de seu adversário nas fases iniciais dos confrontos entre eles. Resultados experimentais conduzidos em torneio envolvendo ACE-RL-Checkers e outros agentes correlacionados com este trabalho, conĄrmam a superioridade da arquitetura dinâmica aqui proposta. / ake into account, in addition to the environment, the minimizing action of an opponent (such as in games), it is fundamental that the agent has the ability to progressively trace a proĄle of its adversary that aids it in the process of selecting appropriate actions. However, it would be unsuitable to construct an agent with a decision-making system based on only the elaboration of this proĄle, as this would prevent the agent from having its Şown identityŤ, which would leave it at the mercy of its opponent. Following this direction, this work proposes an automatic hybrid Checkers player, called ACE-RL-Checkers, equipped with a dynamic decision-making mechanism, which adapts to the proĄle of its opponent over the course of the game. In such a system, the action selection process (moves) is conducted through a composition of Multi-Layer Perceptron Neural Network and case library. In the case, Neural Network represents the ŞidentityŤ of the agent, i.e., it is an already trained static decision-making module and makes use of the Reinforcement Learning TD( ) techniques. On the other hand, the case library represents the dynamic decision-making module of the agent, which is generated by the Automatic Case Elicitation technique (a particular type of Case-Based Reasoning). This technique has a pseudo-random exploratory behavior, which makes the dynamic decision-making on the part of the agent to be directed, either by the game proĄle of the opponent or randomly. However, when devising such an architecture, it is necessary to avoid the following problem: due to the inherent characteristics of the Automatic Case Elicitation technique, in the game initial phases, in which the quantity of available cases in the library is extremely low due to low knowledge content concerning the proĄle of the adversary, the decisionmaking frequency for random decisions is extremely high, which would be detrimental to the performance of the agent. In order to attack this problem, this work also proposes to incorporate onto the ACE-RL-Checkers architecture a third module composed of a base of experience rules, extracted from games played by human experts, using a Sequential Pattern Mining technique. The objective behind using such a base is to reĄne and accelerate the adaptation of the agent to the proĄle of its opponent in the initial phases of their confrontations. Experimental results conducted in tournaments involving ACE-RL-Checkers and other agents correlated with this work, conĄrm the superiority of the dynamic architecture proposed herein. / Tese (Doutorado) Computação Jogo de damas por computador Teoria dos jogos Aprendizado do computador Aprendizagem de Máquina Aprendizagem por Reforço Método das Diferenças Temporais Raciocínio Baseado em Casos Elicitação Automática de Casos Mineração de Padrões Sequenciais Mineração de Dados Computação Evolutiva Algoritmo Genético Game Theory Machine Learning Reinforcement Learning Temporal Difference Methods Case-Based Reasoning Automatic Case Elicitation Sequential Pattern Mining Data Mining Evolutionary Computation Genetic Algorithm

Search results