Global ETD Search

11	Exploring how spatial learning can affect the firing of place cells and head direction cells : the influence of changes in landmark configuration and the development of goal-directed spatial behaviour Huang, Yen-Chen Steven January 2010 (has links) Rats learn to navigate to a specific location faster in a familiar environment (Keith and Mcvety 1988). It has been proposed that place learning does not require specific reward signals, but rather, that it occurs automatically. One of the strongest pieces of evidence for the automatic nature of place learning comes from the observation that place and head direction cells reference their receptive fields to prominent landmarks in an environment without needing a reward signal (O’Keefe and Conway 1978; Taube et al. 1990b). It has also been proposed that an allocentric representation of an environment would be bound to the landmarks with the greatest relative stability to guide its orientation (O’Keefe and Nadel 1978). The first two parts of this thesis explore whether place and head direction cells automatically use the most coherent landmarks for orientation. Head direction cells have been shown to orient their preferred firing directs coherently when being exposed to conflicting landmarks in an environment (Yoganarasimha et al. 2006). A model of head direction cells was thus used to explore the necessary mechanisms required to implement an allocentric system that selects landmarks based on their relative stability. We found that the simple addition of Hebbian projections combined with units representing the orientation of landmarks to the head direction cell system is sufficient for the system to exhibit such a capacity. We then recorded both entorhinal head direction cells and CA1 place cells and at the same time subjected the rats to repeated experiences of landmark conflicts. During the conflicts a subset of landmarks always maintained a fixed relative relationship with each other. We found that the visual landmarks retained their ability to control the place and head direction cells even after repeated experience of conflict and that the simultaneously recorded place cells exhibited coherent representations between conflicts. However, the ’stable landmarks’ did not show significantly greater control over the place and head direction cells when comparing to the unstable landmarks. This argues against the hypothesis that the relative stability between landmarks is encoded automatically. We did observe a trend that, with more conflict experience, the ’stable landmarks’ appeared to exert greater control over the cells. The last part of the thesis explores whether goal sensitive cells (Ainge et al. 2007a) discovered from CA1 of hippocampus are developed due to familiarity with the environment or from the demands for rats to perform a win-stay behaviour. We used the same win-stay task as in Ainge et al. and found that there were few or no goal sensitive cells on the first day of training. Subsequent development of goal sensitive activity correlated significantly with the rat’s performance during the learning phase of the task. The correlation provides support to the hypothesis that the development of goal sensitive cells is associated to the learning of the win-stay task though it does not rule out the possibility that these goal sensitive cells are developed due to the accumulated experience on the maze. In summary, this thesis explores what kind of spatial information is encoded by place and head direction cells and finds that relative stability between landmarks without a reward signal is not automatically encoded. On the other hand, when additional information is required to solve a task, CA1 place cells adapt their spatial code to provide the necessary information to guide successful navigation. 153.15
12	Computational model-based functional magnetic resonance imaging of reinforcement learning in humans Erdeniz, Burak January 2013 (has links) The aim of this thesis is to determine the changes in BOLD signal of the human brain during various stages of reinforcement learning. In order to accomplish that goal two probabilistic reinforcement-learning tasks were developed and assessed with healthy participants by using functional magnetic resonance imaging (fMRI). For both experiments the brain imaging data of the participants were analysed by using a combination of univariate and model–based techniques. In Experiment 1 there were three types of stimulus-response pairs where they predict either a reward, a neutral or a monetary loss outcome with a certain probability. The Experiment 1 tested the following research questions: Where does the activity occur in the brain for expecting and receiving a monetary reward and a punishment ? Does avoiding a loss outcome activate similar brain regions as gain outcomes and vice a verse does avoiding a reward outcome activate similar brain regions as loss outcomes? Where in the brain prediction errors, and predictions for rewards and losses are calculated? What are the neural correlates of reward and loss predictions for reward and loss during early and late phases in learning? The results of the Experiment 1 have shown that expectation for reward and losses activate overlapping brain areas mainly in the anterior cingulate cortex and basal ganglia but outcomes of rewards and losses activate separate brain regions, outcomes of losses mainly activate insula and amygdala whereas reward activate bilateral medial frontal gyrus. The model-based analysis also revealed early versus late learning related changes. It was found that predicted-value in early trials is coded in the ventro-medial orbito frontal cortex but later in learning the activation for the predicted value was found in the putamen. The second experiment was designed to find out the differences in processing novel versus familiar reward-predictive stimuli. The results revealed that dorso-lateral prefrontal cortex and several regions in the parietal cortex showed greater activation for novel stimuli than for familiar stimuli. As an extension to the fourth research question of Experiment 1, reward predictedvalues of the conditional stimuli and prediction errors of unconditional stimuli were also assessed in Experiment 2. The results revealed that during learning there is a significant activation of the prediction error mainly in the ventral striatum with extension to various cortical regions but for familiar stimuli no prediction error activity was observed. Moreover, predicted values for novel stimuli activate mainly ventro-medial orbito frontal cortex and precuneus whereas the predicted value of familiar stimuli activates putamen. The results of Experiment 2 for the predictedvalues reviewed together with the early versus later predicted values in Experiment 1 suggest that during learning of CS-US pairs activation in the brain shifts from ventro-medial orbito frontal structures to sensori-motor parts of the striatum. 153.15
13	Reinforcement learning with time perception Liu, Chong January 2012 (has links) Classical value estimation reinforcement learning algorithms do not perform very well in dynamic environments. On the other hand, the reinforcement learning of animals is quite flexible: they can adapt to dynamic environments very quickly and deal with noisy inputs very effectively. One feature that may contribute to animals' good performance in dynamic environments is that they learn and perceive the time to reward. In this research, we attempt to learn and perceive the time to reward and explore situations where the learned time information can be used to improve the performance of the learning agent in dynamic environments. The type of dynamic environments that we are interested in is that type of switching environment which stays the same for a long time, then changes abruptly, and then holds for a long time before another change. The type of dynamics that we mainly focus on is the time to reward, though we also extend the ideas to learning and perceiving other criteria of optimality, e.g. the discounted return, so that they can still work even when the amount of reward may also change. Specifically, both the mean and variance of the time to reward are learned and then used to detect changes in the environment and to decide whether the agent should give up a suboptimal action. When a change in the environment is detected, the learning agent responds specifically to the change in order to recover quickly from it. When it is found that the current action is still worse than the optimal one, the agent gives up this time's exploration of the action and then remakes its decision in order to avoid longer than necessary exploration. The results of our experiments using two real-world problems show that they have effectively sped up learning, reduced the time taken to recover from environmental changes, and improved the performance of the agent after the learning converges in most of the test cases compared with classical value estimation reinforcement learning algorithms. In addition, we have successfully used spiking neurons to implement various phenomena of classical conditioning, the simplest form of animal reinforcement learning in dynamic environments, and also pointed out a possible implementation of instrumental conditioning and general reinforcement learning using similar models. 153.15

Page generated in 0.025 seconds