• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 688
  • 81
  • 68
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1109
  • 1109
  • 277
  • 232
  • 212
  • 188
  • 168
  • 167
  • 159
  • 157
  • 152
  • 134
  • 128
  • 127
  • 118
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
181

A Biologically Inspired Four Legged Walking Robot

shiqi.peng@woodside.com.au, Shiqi Peng January 2006 (has links)
This Ph.D. thesis presents the design and implementation of a biologically inspired four-phase walking strategy using behaviours for a four legged walking robot. In particular, the walking strategy addresses the balance issue, including both static and dynamic balance that were triggered non-deterministically based on the robot’s realtime interaction with the environment. Four parallel Subsumption Architectures (SA) and a simple Central Pattern Producer (CPP) are employed in the physical implementation of the walking strategy. An implementation framework for such a parallel Subsumption Architecture is also proposed to facilitate the reusability of the system. A Reinforcement Learning (RL) method was integrated into the CPP to allow the robot to learn the optimal walking cycle interval (OWCI), appropriate for the robot walking on various terrain conditions. Experimental results demonstrate that the robot employs the proposed walking strategy and can successfully carry out its walking behaviours under various experimental terrain conditions, such as flat ground, incline, decline and uneven ground. Interactions of all the behaviours of the robot enable it to exhibit a combination of both preset and emergent walking behaviours.
182

Reinforcement-learning based output-feedback controller for nonlinear discrete-time system with application to spark ignition engines operating lean and EGR

Shih, Peter, January 2007 (has links) (PDF)
Thesis (M.S.)--University of Missouri--Rolla, 2007. / Vita. The entire thesis text is included in file. Title from title screen of thesis/dissertation PDF file (viewed May 16, 2007) Includes bibliographical references.
183

Mobilized ad-hoc networks: A reinforcement learning approach

Chang, Yu-Han, Ho, Tracey, Kaelbling, Leslie Pack 04 December 2003 (has links)
Research in mobile ad-hoc networks has focused on situations in whichnodes have no control over their movements. We investigate animportant but overlooked domain in which nodes do have controlover their movements. Reinforcement learning methods can be used tocontrol both packet routing decisions and node mobility, dramaticallyimproving the connectivity of the network. We first motivate theproblem by presenting theoretical bounds for the connectivityimprovement of partially mobile networks and then present superiorempirical results under a variety of different scenarios in which themobile nodes in our ad-hoc network are embedded with adaptive routingpolicies and learned movement policies.
184

Computational Modeling of the Basal Ganglia : Functional Pathways and Reinforcement Learning

Berthet, Pierre January 2015 (has links)
We perceive the environment via sensor arrays and interact with it through motor outputs. The work of this thesis concerns how the brain selects actions given the information about the perceived state of the world and how it learns and adapts these selections to changes in this environment. Reinforcement learning theories suggest that an action will be more or less likely to be selected if the outcome has been better or worse than expected. A group of subcortical structures, the basal ganglia (BG), is critically involved in both the selection and the reward prediction. We developed and investigated a computational model of the BG. We implemented a Bayesian-Hebbian learning rule, which computes the weights between two units based on the probability of their activations. We were able test how various configurations of the represented pathways impacted the performance in several reinforcement learning and conditioning tasks. Then, following the development of a more biologically plausible version with spiking neurons, we simulated lesions in the different pathways and assessed how they affected learning and selection. We observed that the evolution of the weights and the performance of the models resembled qualitatively experimental data. The absence of an unique best way to configure the model over all the learning paradigms tested indicates that an agent could dynamically configure its action selection mode, mainly by including or not the reward prediction values in the selection process. We present hypotheses on possible biological substrates for the reward prediction pathway. We base these on the functional requirements for successful learning and on an analysis of the experimental data. We further simulate a loss of dopaminergic neurons similar to that reported in Parkinson’s disease. We suggest that the associated motor symptoms are mostly causedby an impairment of the pathway promoting actions, while the pathway suppressing them seems to remain functional. / <p>At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 3: Manuscript.</p><p> </p>
185

Aplicação da rede GTSOM para navegação de robôs móveis utilizando aprendizado por reforço / Using the GTSOM network for mobile robot navigation with reinforcement learning

Menegaz, Mauricio January 2009 (has links)
Neste trabalho será descrita uma arquitetura de agente robótico autônomo projetada para ser capaz de criar uma representação de estado do ambiente e de realizar o aprendizado de tarefas simples em cima desta representação. A rede GTSOM (BASTOS, 2007) foi selecionada como método para classificação de estados. Sua tarefa é transformar os dados multidimensionais e contínuos lidos dos sensores em uma representação discreta, permitindo o uso de aprendizado por reforço convencional. Algumas modificações no algoritmo da rede foram necessárias para que pudesse ser aplicada neste contexto. Juntamente com esta rede, foi utilizado um mapa de grade que permite associar as experiências sensoriais com sua localização espacial. Enquanto a rede GTSOM é o ponto central de um sistema de classificação de estados, o algoritmo Q-Learning de aprendizado por reforço foi utilizado para a realização da tarefa. Utilizando a representação compacta de estado criada pela rede auto-organizável, o agente aprende as ações que devem ser executadas em cada ponto, para atingimento de seus objetivos. O modelo foi testado com um experimento que consiste em encontrar um objeto em um labirinto. Os resultados obtidos nos testes mostraram que o modelo consegue segmentar adequadamente o espaço de estados, e realiza o aprendizado da tarefa. O agente consegue aprender a evitar colisões e memorizar a localização do alvo, podendo chegar até ele independentemente de sua posição inicial. Além disso, é capaz de expandir sua representação sempre que se depara com situações não conhecidas, ao mesmo tempo que gradualmente remove da memória estados associados a experiências que não se repetem. / This work describes an architecture for an autonomous robotic agent that is capable of creating a state representation of its environment and learning how to execute simple tasks using this representation. The GTSOM Neural Network was chosen as the method for state clustering. It is used to transform the multidimensional and continuous state signal into a discrete representation, allowing the use of conventional reinforcement learning techniques. Some modifications on the algorithm were necessary so that it could be used in this project. This network is used together with a grid map algorithm that allows the model to associate the sensor readings with the places where they ocurred. While the GTSOM network is the main component of a state clustering system, the Q-Learning reinforcement learning method was chosen for the task execution. Using the compact state representation created by the self-organizing network, the agent learns which actions to execute at each state in order to achieve its objectives. The model was tested in an experiment that consists in finding the path in a maze. The results show that it can divide the state space in an useful way, and is capable of executing the task. It learns to avoid collisions and remembers the location of the target, even when the robot’s initial position is changed. Furthermore, the representation is expanded when the agent faces an unknown situation, and at the same time, states associated with old experiences are forgotten.
186

EFFECTS OF RESPONSE FREQUENCY CONSTRAINTS ON LEARNING IN A NON-STATIONARY MULTI-ARMED BANDIT TASK

Racey, Deborah Elaine 01 December 2009 (has links)
An n-armed bandit task was used to investigate the trade-off between exploratory (choosing lesser-known options) and exploitive (choosing options with the greatest probability of reinforcement) human choice in a trial-and-error learning problem. In Experiment 1 a different probability of reinforcement was assigned to each of 8 response options using random-ratios (RRs), and participants chose by clicking buttons in a circular display on a computer screen using a computer mouse. Relative frequency thresholds (ranging from .10 to 1.0) were randomly assigned to each participant and acted as task constraints limiting the proportion of total responses that could be attributed to any response option. Preference for the richer keys was shown, and those with greater constraints explored more and earned less reinforcement. Those with the highest constraints showed no preference, distributing their responses among the options with equal probability. In Experiment 2 the payoff probabilities changed partway through, for some the leanest options increased to richest, and for others the richest became leanest. When the RRs changed, the decrease participants with moderate and low constraints showed immediate increases in exploration and change in preference to the new richest keys, while increase participants showed no increase in exploration, and more gradual changes in preference. For Experiment 3 the constraint was held constant at .85, and the two richest options were decreased midway through the task by varying amounts (0 to .60). Decreases were detected early for participants in all but the smallest decrease conditions, and exploration increased.
187

Touchscreen assessment of motivation and reinforcement-related choice behaviour in mice

Phillips, Benjamin January 2018 (has links)
This dissertation describes the development and optimisation of a set of touchscreen tasks for the assessment of motivation and reinforcement processes in mice. Changes in motivational state and reinforcement sensitivity are characteristic of depressive symptoms in a number of neuropsychiatric and neurodegenerative conditions. However, the efficacy of current therapeutic strategies is limited. Thus, the development of methods for studying these processes is important both to facilitate investigation of underlying mechanisms and subsequently develop targeted therapeutics. Part 1 describes the optimisation and application of fixed and progressive ratio tasks for the assessment of motivational state in mice. First, a comparison of multiple reinforcers under fixed and progressive ratio schedules is described. Given the limitations of performance measures derived from performance across the entire session, these experiments are complemented by the use of response rate analysis, which reveals dissociations between standard measures and rate of responding. Next, these tasks are applied to a model of maternal high fat diet (hfd), revealing selective alterations in motivated behaviour. Finally, the motivational state of the TgR406W mouse model of frontotemporal dementia, a condition frequently characterised by depressive symptoms, is investigated. It is shown that the motivational state of TgR406W mice is age dependent, with apparent early apathy-like behaviour and a late disinhibition in responding. This is accompanied by deficits in attentional performance as measured by a 3-choice serial reaction time task. Thus, part 1 demonstrates the applicability of the battery of touchscreen tasks for assessment of motivation to diverse mouse models. Part 2 describes the development and validation of touchscreen tasks for the assessment of reinforcement-related choice in mice. First, it is demonstrated that C57BL/6 mice systematically discount a preferred reinforcer in a delay discounting procedure. Subsequently, choice behaviour is modified by systemic treatment of dopaminergic drugs and a reinforcer pre-feeding procedure. Next, procedures for the assessment of learning and choice, dependent on the integration of positive and negative feedback, are described. Specifically, the valence-probe visual-discrimination (VPVD) task is an adapted version of the standard visual discrimination with the addition of a probabilistic stimulus that is reinforced 50% of the time. This modification allows for assessment of the impact of positive and negative feedback on discrimination learning. In addition, a spatial within-session probabilistic reversal learning procedure was adapted for the touchscreen apparatus. The effect of serotonin 2C receptor manipulation on these tasks is presented. Serotonin 2C receptor antagonism is shown to disrupt discrimination learning and result in diminished sensitivity to positive feedback. Conversely, 2C receptor agonism results in changes in feedback sensitivity consistent with heightened sensitivity to positive feedback and diminished sensitivity to negative feedback. Taken together, these results show that these tasks can be used to interrogate reinforcement-related behavioural profiles in experimental mice. Finally, the implications of these tasks with respect to investigation of motivation, reinforcement sensitivity and adaptive decision-making in rodent models, and their relevance to MDD in humans is discussed.
188

Aplicação da rede GTSOM para navegação de robôs móveis utilizando aprendizado por reforço / Using the GTSOM network for mobile robot navigation with reinforcement learning

Menegaz, Mauricio January 2009 (has links)
Neste trabalho será descrita uma arquitetura de agente robótico autônomo projetada para ser capaz de criar uma representação de estado do ambiente e de realizar o aprendizado de tarefas simples em cima desta representação. A rede GTSOM (BASTOS, 2007) foi selecionada como método para classificação de estados. Sua tarefa é transformar os dados multidimensionais e contínuos lidos dos sensores em uma representação discreta, permitindo o uso de aprendizado por reforço convencional. Algumas modificações no algoritmo da rede foram necessárias para que pudesse ser aplicada neste contexto. Juntamente com esta rede, foi utilizado um mapa de grade que permite associar as experiências sensoriais com sua localização espacial. Enquanto a rede GTSOM é o ponto central de um sistema de classificação de estados, o algoritmo Q-Learning de aprendizado por reforço foi utilizado para a realização da tarefa. Utilizando a representação compacta de estado criada pela rede auto-organizável, o agente aprende as ações que devem ser executadas em cada ponto, para atingimento de seus objetivos. O modelo foi testado com um experimento que consiste em encontrar um objeto em um labirinto. Os resultados obtidos nos testes mostraram que o modelo consegue segmentar adequadamente o espaço de estados, e realiza o aprendizado da tarefa. O agente consegue aprender a evitar colisões e memorizar a localização do alvo, podendo chegar até ele independentemente de sua posição inicial. Além disso, é capaz de expandir sua representação sempre que se depara com situações não conhecidas, ao mesmo tempo que gradualmente remove da memória estados associados a experiências que não se repetem. / This work describes an architecture for an autonomous robotic agent that is capable of creating a state representation of its environment and learning how to execute simple tasks using this representation. The GTSOM Neural Network was chosen as the method for state clustering. It is used to transform the multidimensional and continuous state signal into a discrete representation, allowing the use of conventional reinforcement learning techniques. Some modifications on the algorithm were necessary so that it could be used in this project. This network is used together with a grid map algorithm that allows the model to associate the sensor readings with the places where they ocurred. While the GTSOM network is the main component of a state clustering system, the Q-Learning reinforcement learning method was chosen for the task execution. Using the compact state representation created by the self-organizing network, the agent learns which actions to execute at each state in order to achieve its objectives. The model was tested in an experiment that consists in finding the path in a maze. The results show that it can divide the state space in an useful way, and is capable of executing the task. It learns to avoid collisions and remembers the location of the target, even when the robot’s initial position is changed. Furthermore, the representation is expanded when the agent faces an unknown situation, and at the same time, states associated with old experiences are forgotten.
189

Aprendizado por reforço em ambientes não-estacionários

Silva, Bruno Castro da January 2007 (has links)
Neste trabalho apresentamos o RL-CD (Reinforcement Learning with Context Detection), um método desenvolvido a fim de lidar com o problema do aprendizado por reforço (RL) em ambientes não-estacionários. Embora os métodos existentes de RL consigam, muitas vezes, superar a não-estacionariedade, o fazem sob o inconveniente de terem de reaprender políticas que já haviam sido calculadas, o que implica perda de desempenho durante os períodos de readaptação. O método proposto baseia-se em um mecanismo geral através do qual são criados, atualizados e selecionados um dentre vários modelos e políticas parciais. Os modelos parciais do ambiente são incrementalmente construídos de acordo com a capacidade do sistema de fazer predições eficazes. A determinação de tal medida de eficácia baseia-se no cálculo de qualidades globais para cada modelo, as quais refletem o ajuste total necessário para tornar cada modelo coerente com as experimentações reais. Depois de apresentadas as bases teóricas necessárias para fundamentar o RL-CD e suas equações, são propostos e discutidos um conjunto de experimentos que demonstram sua eficiência, tanto em relação a estratégias clássicas de RL quanto em comparação a algoritmos especialmente projetados para lidar com cenários não-estacionários. O RL-CD é comparado com métodos reconhecidos na área de aprendizado por reforço e também com estratégias RL multi-modelo. Os resultados obtidos sugerem que o RLCD constitui uma abordagem eficiente para lidar com uma subclasse de ambientes nãoestacionários, especificamente aquela formada por ambientes cuja dinâmica é corretamente representada por um conjunto finito de Modelos de Markov estacionários. Por fim, apresentamos a análise teórica de um dos parâmetros mais importantes do RL-CD, possibilitada pela aproximação empírica de distribuições de probabilidades via métodos de Monte Carlo. Essa análise permite que os valores ideais de tal parâmetro sejam calculados, tornando assim seu ajuste independente da aplicação específica sendo estudada. / In this work we introduce RL-CD (Reinforcement Learning with Context Detection), a novel method for solving reinforcement learning (RL) problems in non-stationary environments. In face of non-stationary scenarios, standard RL methods need to continually readapt themselves to the changing dynamics of the environment. This causes a performance drop during the readjustment phase and implies the need for relearning policies even for dynamics which have already been experienced. RL-CD overcomes these problems by implementing a mechanism for creating, updating and selecting one among several partial models of the environment. The partial models are incrementally built according to the system’s capability of making predictions regarding a given sequence of observations. First, we present the motivations and the theorical basis needed to develop the conceptual framework of RL-CD. Afterwards, we propose, formalize and show the efficiency of RL-CD both in a simple non-stationary environment and in a noisy scenarios. We show that RL-CD performs better than two standard reinforcement learning algorithms and that it has advantages over methods specifically designed to cope with non-stationarity. Finally, we present the theoretical examination of one of RL-CD’s most important parameters, made possible by means of the analysis of probability distributions obtained via Monte Carlo methods. This analysis makes it possible for us to calculate the optimum values for this parameter, so that its adjustment can be performed independently of the scenario being studied.
190

Efficient Methods for Prediction and Control in Partially Observable Environments

Hefny, Ahmed 01 April 2018 (has links)
State estimation and tracking (also known as filtering) is an integral part of any system performing inference in a partially observable environment, whether it is a robot that is gauging an environment through noisy sensors or a natural language processing system that is trying to model a sequence of characters without full knowledge of the syntactic or semantic state of the text. In this work, we develop a framework for constructing state estimators. The framework consists of a model class, referred to as predictive state models, and a learning algorithm, referred to as two-stage regression. Our framework is based on two key concepts: (1) predictive state: where our belief about the latent state of the environment is represented as a prediction of future observation features and (2) instrumental regression: where features of previous observations are used to remove sampling noise from future observation statistics, allowing for unbiased estimation of system dynamics. These two concepts allow us to develop efficient and tractable learning methods that reduce the unsupervised problem of learning an environment model to a supervised regression problem: first, a regressor is used to remove noise from future observation statistics. Then another regressor uses the denoised observation features to estimate the dynamics of the environment. We show that our proposed framework enjoys a number of theoretical and practical advantages over existing methods, and we demonstrate its efficacy in a prediction setting, where the task is to predict future observations, as well as a control setting, where the task is to optimize a control policy via reinforcement learning.

Page generated in 0.0956 seconds