Global ETD Search

151	Towards a mechanistic understanding of the neurobiological mechanisms underlying psychosis Haarsma, Joost January 2018 (has links) Psychotic symptoms are prevalent in a wide variety of psychiatric and neurological disorders. Yet, despite decades of research, the neurobiological mechanisms via which these symptoms come to manifest themselves remain to be elucidated. I argue in this thesis that using a mechanistic approach towards understanding psychosis that borrows heavily from the predictive coding framework, can help us understand the relationship between neurobiology and symptomology. In the first results chapter I present new data on a biomarker that has often been cited in relation to psychotic disorders, which is glutamate levels in the anterior cingulate cortex (ACC), as measured with magnetic resonance spectroscopy. In this chapter I aimed to replicate previous results that show differences in glutamate levels in psychosis and health. However, no statistically significant group differences and correlations with symptomology were found. In order to elucidate the potential mechanism underlying glutamate changes in the anterior cingulate cortex in psychosis, I tested whether a pharmacological challenge of Bromocriptine or Sulpiride altered glutamate levels in the anterior cingulate cortex. However, no significant group differences were found, between medication groups. In the second results chapter I aimed to address a long-standing question in the field of computational psychiatry, which is whether prior expectations have a stronger or weaker influence on inference in psychosis. I go on to show that this depends on the origin of the prior expectation and disease stage. That is, cognitive priors are stronger in first episode psychosis but not in people at risk for psychosis, whereas perceptual priors seem to be weakened in individuals at risk for psychosis compared to healthy individuals and individuals with first episode psychosis. Furthermore, there is some evidence that these alterations are correlated with glutamate levels. In the third results chapter I aimed to elucidate the nature of reward prediction error aberrancies in chronic schizophrenia. There has been some evidence suggesting that schizophrenia is associated with aberrant coding of reward prediction errors during reinforcement learning. However it is unclear whether these aberrancies are related to disease years and medication use. Here I provide evidence for a small but significant alteration in the coding of reward prediction errors that is correlated with medication use. In the fourth results chapter I aimed to study the influence of uncertainty on the coding of unsigned prediction errors during learning. It has been hypothesized by predictive coding theorists that dopamine plays a role in the precision-weighting of unsigned prediction error. This theory is of particular relevance to psychosis research, as this might provide a mechanism via which dopamine aberrancies, might lead to psychotic symptoms. I found that blocking dopamine using Sulpiride abolishes precision-weighting of unsigned prediction error, providing evidence for a dopamine mediated precision-weighting mechanism. In the fifth results chapter I aimed to extend this research into early psychosis, to elucidate whether psychosis is indeed associated with a failure to precision-weight prediction error. I found that first episode psychosis is indeed associated with a failure to precision-weight prediction errors, an effect that is explained by the experience of positive symptoms. In the sixth results chapter I explore whether the degree of precision-weighting of unsigned prediction errors is correlated with glutamate levels in the anterior cingulate cortex. Such a correlation might be plausible given that psychosis has been associated with both. However, I did not find such a relationship, even in a sample of 137 individuals. Thus I concluded that anterior cingulate glutamate levels might be more related to non-positive symptoms associated with psychotic disorders. In summary, a mechanistic approach towards understanding psychosis can give us valuable insights into the disease mechanisms at play. I have shown here that the influence of expectations on perception is different across disease stage in psychosis. Furthermore, aberrancies in prediction error mechanisms might explain positive symptoms in psychosis, a process likely mediated by dopaminergic mechanisms, whereas evidence for glutamatergic mediation remains absent.
152	Bounding Box Improvement with Reinforcement Learning Cleland, Andrew Lewis 12 June 2018 (has links) In this thesis, I explore a reinforcement learning technique for improving bounding box localizations of objects in images. The model takes as input a bounding box already known to overlap an object and aims to improve the fit of the box through a series of transformations that shift the location of the box by translation, or change its size or aspect ratio. Over the course of these actions, the model adapts to new information extracted from the image. This active localization approach contrasts with existing bounding-box regression methods, which extract information from the image only once. I implement, train, and test this reinforcement learning model using data taken from the Portland State Dog-Walking image set. The model balances exploration with exploitation in training using an ε-greedy policy. I find that the performance of the model is sensitive to the ε-greedy configuration used during training, performing best when the epsilon parameter is set to very low values over the course of training. With = 0.01, I find the algorithm can improve bounding boxes in about 78% of test cases for the "dog" object category, and 76% for the "human" category. Reinforcement learning Machine learning Computer vision Computer Sciences
153	On the Selection of Just-in-time Interventions Jaimes, Luis Gabriel 20 March 2015 (has links) A deeper understanding of human physiology, combined with improvements in sensing technologies, is fulfilling the vision of affective computing, where applications monitor and react to changes in affect. Further, the proliferation of commodity mobile devices is extending these applications into the natural environment, where they become a pervasive part of our daily lives. This work examines one such pervasive affective computing application with significant implications for long-term health and quality of life adaptive just-in-time interventions (AJITIs). We discuss fundamental components needed to design AJITIs based for one kind of affective data, namely stress. Chronic stress has significant long-term behavioral and physical health consequences, including an increased risk of cardiovascular disease, cancer, anxiety and depression. This dissertation presents the state-of-the-art of Just-in-time interventions for stress. It includes a new architecture. that is used to describe the most important issues in the design, implementation, and evaluation of AJITIs. Then, the most important mechanisms available in the literature are described, and classified. The dissertation also presents a simulation model to study and evaluate different strategies and algorithms for interventions selection. Then, a new hybrid mechanism based on value iteration and monte carlo simulation method is proposed. This semi-online algorithm dynamically builds a transition probability matrix (TPM) which is used to obtain a new policy for intervention selection. We present this algorithm in two different versions. The first version uses a pre-determined number of stress episodes as a training set to create a TPM, and then to generate the policy that will be used to select interventions in the future. In the second version, we use each new stress episode to update the TPM, and a pre-determined number of episodes to update our selection policy for interventions. We also present a completely online learning algorithm for intervention selection based on Q-learning with eligibility traces. We show that this algorithm could be used by an affective computing system to select and deliver in mobile environments. Finally, we conducts posthoc experiments and simulations to demonstrate feasibility of both real-time stress forecasting and stress intervention adaptation and optimization. Affective Computing mHealth Reinforcement Learning Ubiquitous Computing Electrical and Computer Engineering
154	Policy-Gradient Algorithms for Partially Observable Markov Decision Processes Aberdeen, Douglas Alexander, doug.aberdeen@anu.edu.au January 2003 (has links) Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the so-called policy-gradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the long-term average of a reward signal. Policy-gradient methods are attractive as a \emph{scalable} approach for controlling partially observable Markov decision processes (POMDPs). ¶ In the most general case POMDP policies require some form of internal state, or memory, in order to act optimally. Policy-gradient methods have shown promise for problems admitting memory-less policies but have been less successful when memory is required. This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting. Directly, when the dynamics of the world are known, and via Monte-Carlo methods otherwise. The algorithms simultaneously learn how to act and what to remember. ¶ Monte-Carlo policy-gradient approaches tend to produce gradient estimates with high variance. Two novel methods for reducing variance are introduced. The first uses high-order filters to replace the eligibility trace of the gradient estimator. The second uses a low-variance value-function method to learn a subset of the parameters and a policy-gradient method to learn the remainder. ¶ The algorithms are applied to large domains including a simulated robot navigation scenario, a multi-agent scenario with 21,000 states, and the complex real-world task of large vocabulary continuous speech recognition. To the best of the author's knowledge, no other policy-gradient algorithms have performed well at such tasks. ¶ The high variance of Monte-Carlo methods requires lengthy simulation and hence a super-computer to train agents within a reasonable time. The ANU ``Bunyip'' Linux cluster was built with such tasks in mind. It was used for several of the experimental results presented here. One chapter of this thesis describes an application written for the Bunyip cluster that won the international Gordon-Bell prize for price/performance in 2001. POMDP Reinforcement Learning Policy gradient cluster high performance computing
155	Policy Gradient Methods: Variance Reduction and Stochastic Convergence Greensmith, Evan, evan.greensmith@gmail.com January 2005 (has links) In a reinforcement learning task an agent must learn a policy for performing actions so as to perform well in a given environment. Policy gradient methods consider a parameterized class of policies, and using a policy from the class, and a trajectory through the environment taken by the agent using this policy, estimate the performance of the policy with respect to the parameters. Policy gradient methods avoid some of the problems of value function methods, such as policy degradation, where inaccuracy in the value function leads to the choice of a poor policy. However, the estimates produced by policy gradient methods can have high variance.¶ In Part I of this thesis we study the estimation variance of policy gradient algorithms, in particular, when augmenting the estimate with a baseline, a common method for reducing estimation variance, and when using actor-critic methods. A baseline adjusts the reward signal supplied by the environment, and can be used to reduce the variance of a policy gradient estimate without adding any bias. We find the baseline that minimizes the variance. We also consider the class of constant baselines, and find the constant baseline that minimizes the variance. We compare this to the common technique of adjusting the rewards by an estimate of the performance measure. Actor-critic methods usually attempt to learn a value function accurate enough to be used in a gradient estimate without adding much bias. In this thesis we propose that in learning the value function we should also consider the variance. We show how considering the variance of the gradient estimate when learning a value function can be beneficial, and we introduce a new optimization criterion for selecting a value function.¶ In Part II of this thesis we consider online versions of policy gradient algorithms, where we update our policy for selecting actions at each step in time, and study the convergence of the these online algorithms. For such online gradient-based algorithms, convergence results aim to show that the gradient of the performance measure approaches zero. Such a result has been shown for an algorithm which is based on observing trajectories between visits to a special state of the environment. However, the algorithm is not suitable in a partially observable setting, where we are unable to access the full state of the environment, and its variance depends on the time between visits to the special state, which may be large even when only few samples are needed to estimate the gradient. To date, convergence results for algorithms that do not rely on a special state are weaker. We show that, for a certain algorithm that does not rely on a special state, the gradient of the performance measure approaches zero. We show that this continues to hold when using certain baseline algorithms suggested by the results of Part I. reinforcement learning policy gradient stochastic convergence variance reduction
156	All learning is local: Multi-agent learning in global reward games Chang, Yu-Han, Ho, Tracey, Kaelbling, Leslie P. 01 1900 (has links) In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and effectively learn a near-optimal policy in a wide variety of settings. A sequence of increasingly complex empirical tests verifies the efficacy of this technique. / Singapore-MIT Alliance (SMA) Kalman filtering multi-agent systems Q-learning reinforcement learning
157	Importance Sampling for Reinforcement Learning with Multiple Objectives Shelton, Christian Robert 01 August 2001 (has links) This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms. We employ importance sampling (likelihood ratios) to achieve good performance in partially observable Markov decision processes with few data. Our importance sampling estimator requires no knowledge about the environment and places few restrictions on the method of collecting data. It can be used efficiently with reactive controllers, finite-state controllers, or policies with function approximation. We present theoretical analyses of the estimator and incorporate it into a reinforcement learning algorithm. Additionally, this method provides a complete return surface which can be used to balance multiple objectives dynamically. We demonstrate the need for multiple goals in a variety of applications and natural solutions based on our sampling method. The thesis concludes with example results from employing our algorithm to the domain of automated electronic market-making. AI reinforcement learning RL importance sampling estimation market-making
158	The Essential Dynamics Algorithm: Essential Results Martin, Martin C. 01 May 2003 (has links) This paper presents a novel algorithm for learning in a class of stochastic Markov decision processes (MDPs) with continuous state and action spaces that trades speed for accuracy. A transform of the stochastic MDP into a deterministic one is presented which captures the essence of the original dynamics, in a sense made precise. In this transformed MDP, the calculation of values is greatly simplified. The online algorithm estimates the model of the transformed MDP and simultaneously does policy search against it. Bounds on the error of this approximation are proven, and experimental results in a bicycle riding domain are presented. The algorithm learns near optimal policies in orders of magnitude fewer interactions with the stochastic MDP, using less domain knowledge. All code used in the experiments is available on the project's web site. AI Reinforcement learning bicycle policy search markov decision processes
159	Mobilized ad-hoc networks: A reinforcement learning approach Chang, Yu-Han, Ho, Tracey, Kaelbling, Leslie Pack 04 December 2003 (has links) Research in mobile ad-hoc networks has focused on situations in which nodes have no control over their movements. We investigate an important but overlooked domain in which nodes do have control over their movements. Reinforcement learning methods can be used to control both packet routing decisions and node mobility, dramatically improving the connectivity of the network. We first motivate the problem by presenting theoretical bounds for the connectivity improvement of partially mobile networks and then present superior empirical results under a variety of different scenarios in which the mobile nodes in our ad-hoc network are embedded with adaptive routing policies and learned movement policies. AI reinforcement learning multi-agent learning ad-hoc networking
160	Reinforcement Learning by Policy Search Peshkin, Leonid 14 February 2003 (has links) One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations can be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. This means learning a policy---a mapping of observations into actions---based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multi-agent systems. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience re-use. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network. AI POMDP policy search adaptive systems reinforcement learning adaptive behavior

Search results