Spelling suggestions: "subject:"modelbased reinforcement learning"" "subject:"model.based reinforcement learning""
1 |
Computational modelling of the neural systems involved in schizophreniaThurnham, A. J. January 2008 (has links)
The aim of this thesis is to improve our understanding of the neural systems involved in schizophrenia by suggesting possible avenues for future computational modelling in an attempt to make sense of the vast number of studies relating to the symptoms and cognitive deficits relating to the disorder. This multidisciplinary research has covered three different levels of analysis: abnormalities in the microscopic brain structure, dopamine dysfunction at a neurochemical level, and interactions between cortical and subcortical brain areas, connected by cortico-basal ganglia circuit loops; and has culminated in the production of five models that provide useful clarification in this difficult field. My thesis comprises three major relevant modelling themes. Firstly, in Chapter 3 I looked at an existing neural network model addressing the Neurodevelopmental Hypothesis of Schizophrenia by Hoffman and McGlashan (1997). However, it soon became clear that such models were overly simplistic and brittle when it came to replication. While they focused on hallucinations and connectivity in the frontal lobes they ignored other symptoms and the evidence of reductions in volume of the temporal lobes in schizophrenia. No mention was made of the considerable evidence of dysfunction of the dopamine system and associated areas, such as the basal ganglia. This led to my second line of reasoning: dopamine dysfunction. Initially I helped create a novel model of dopamine neuron firing based on the Computational Substrate for Incentive Salience by McClure, Daw and Montague (2003), incorporating temporal difference (TD) reward prediction errors (Chapter 5). I adapted this model in Chapter 6 to address the ongoing debate as to whether or not dopamine encodes uncertainty in the delay period between presentation of a conditioned stimulus and receipt of a reward, as demonstrated by sustained activation seen in single dopamine neuron recordings (Fiorillo, Tobler & Schultz 2003). An answer to this question could result in a better understanding of the nature of dopamine signaling, with implications for the psychopathology of cognitive disorders, like schizophrenia, for which dopamine is commonly regarded as having a primary role. Computational modelling enabled me to suggest that while sustained activation is common in single trials, there is the possibility that it increases with increasing probability, in which case dopamine may not be encoding uncertainty in this manner. Importantly, these predictions can be tested and verified by experimental data. My third modelling theme arose as a result of the limitations to using TD alone to account for a reinforcement learning account of action control in the brain. In Chapter 8 I introduce a dual weighted artificial neural network, originally designed by Hinton and Plaut (1987) to address the problem of catastrophic forgetting in multilayer artificial neural networks. I suggest an alternative use for a model with fast and slow weights to address the problem of arbitration between two systems of control. This novel approach is capable of combining the benefits of model free and model based learning in one simple model, without need for a homunculus and may have important implications in addressing how both goal directed and stimulus response learning may coexist. Modelling cortical-subcortical loops offers the potential of incorporating both the symptoms and cognitive deficits associated with schizophrenia by taking into account the interactions between midbrain/striatum and cortical areas.
|
2 |
Towards adaptive deep model-based reinforcement learningRahimi-Kalahroudi, Ali 08 1900 (has links)
L'une des principales caractéristiques comportementales utilisées en neurosciences afin de déterminer si le sujet d'étude --- qu'il s'agisse d'un rongeur ou d'un humain --- démontre un apprentissage basé sur un modèle (model-based) est une adaptation efficace aux changements locaux de l'environnement. Dans l'apprentissage par renforcement (RL), cependant, nous démontrons, en utilisant une version améliorée de la configuration d'adaptation au changement local (LoCA) récemment introduite, que les méthodes bien connues d'apprentissage par renforcement basées sur un modèle (MBRL) telles que PlaNet et DreamerV2 présentent un déficit dans leur capacité à s'adapter aux changements environnementaux locaux. En combinaison avec des travaux antérieurs qui ont fait une observation similaire sur l'autre méthode populaire basée sur un modèle, MuZero, une tendance semble émerger, suggérant que les méthodes MBRL profondes actuelles ont de sérieuses limites. Nous approfondissons les causes de ces mauvaises performances en identifiant les éléments qui nuisent au comportement adaptatif et en les reliant aux techniques sous-jacentes fréquemment utilisées dans la RL basée sur un modèle profond, à la fois en matière d'apprentissage du modèle mondial et de la routine de planification. Nos résultats démontrent qu'une exigence particulièrement difficile pour les méthodes MBRL profondes est qu'il est difficile d'atteindre un modèle mondial suffisamment précis dans toutes les parties pertinentes de l'espace d'état en raison de l'oubli catastrophique. Et tandis qu'un tampon de relecture peut atténuer les effets de l'oubli catastrophique, un tampon de relecture traditionnel premier-entré-premier-sorti empêche une adaptation efficace en raison du maintien de données obsolètes. Nous montrons qu'une variante conceptuellement simple de ce tampon de relecture traditionnel est capable de surmonter cette limitation. En supprimant uniquement les échantillons du tampon de la région locale des échantillons nouvellement observés, des modèles de monde profond peuvent être construits qui maintiennent leur précision dans l'espace d'état, tout en étant capables de s'adapter efficacement aux changements locaux de la fonction de récompense. Nous démontrons qu’en appliquant notre variation de tampon de relecture à une version profonde de la méthode Dyna classique, ainsi qu'à des méthodes récentes telles que PlaNet et DreamerV2, les méthodes basées sur des modèles profonds peuvent également s'adapter efficacement aux changements locaux de l'environnement. / One of the key behavioral characteristics used in neuroscience to determine whether the subject of study---be it a rodent or a human---exhibits model-based learning is effective adaptation to local changes in the environment. In reinforcement learning (RL), however, we demonstrate, using an improved version of the recently introduced Local Change Adaptation (LoCA) setup, that well-known model-based reinforcement learning (MBRL) methods such as PlaNet and DreamerV2 perform poorly in their ability to adapt to local environmental changes. Combined with prior work that made a similar observation about the other popular model-based method, MuZero, a trend appears to emerge, suggesting that current deep MBRL methods have serious limitations. We dive deeper into the causes of this poor performance by identifying elements that hurt adaptive behavior and linking these to underlying techniques frequently used in deep model-based RL, both in terms of learning the world model and the planning routine. Our findings demonstrate that one particularly challenging requirement for deep MBRL methods is that attaining a world model that is sufficiently accurate throughout relevant parts of the state-space is challenging due to catastrophic forgetting. And while a replay buffer can mitigate the effects of catastrophic forgetting, the traditional first-in-first-out replay buffer precludes effective adaptation due to maintaining stale data. We show that a conceptually simple variation of this traditional replay buffer is able to overcome this limitation. By removing only samples from the buffer from the local neighbourhood of the newly observed samples, deep world models can be built that maintain their accuracy across the state-space, while also being able to effectively adapt to local changes in the reward function. We demonstrate this by applying our replay-buffer variation to a deep version of the classical Dyna method, as well as to recent methods such as PlaNet and DreamerV2, demonstrating that deep model-based methods can adapt effectively as well to local changes in the environment.
|
3 |
AI-based Detection Against Cyberattacks in Cyber-Physical Distribution SystemsSahani, Nitasha 05 June 2024 (has links)
Integration of a cyber system and communication systems with the traditional power grid has enabled better monitoring and control of the smart grid making it more reliable and resilient. This empowers the system operators to make informed decisions as a result of better system visibility. The grid has moved from a completely air-gapped structure to a well-connected network. However, this remote-control capability to control distributed physical components in a distribution system can be exploited by adversaries with malicious intent to disrupt the power supply to the customers. Therefore, while taking advantage of the cyber-physical posture in the smart grid for improved controllability, there is a critical need for cybersecurity research to protect the critical power infrastructure from cyberattacks.
While the literature regarding cybersecurity in distribution systems has focused on detecting and mitigating the cyberattack impact on the physical system, there has been limited effort towards a preventive approach for detecting cyberattacks. With this in mind, this dissertation focuses on developing intelligent solutions to detect cyberattacks in the cyber layer of the distribution grid and prevent the attack from impacting the physical grid. There has been a particular emphasis on the impact of coordinated attacks and the design of proactive defense to detect the attacker's intent to predict the attack trajectory.
The vulnerability assessment of the cyber-physical system in this work identifies the key areas in the system that are prone to cyberattacks and failure to detect attacks timely can lead to cascading outages. A comprehensive cyber-physical system is developed to deploy different intrusion detection solutions and quantify the effect of proactive detection in the cyber layer. The attack detection approach is driven by artificial intelligence to learn attack patterns for effective attack path prediction in both a fully observable and partially observable distribution system. The role of effective communication technology in attack detection is also realized through detailed modeling of 5G and latency requirements are validated. / Doctor of Philosophy / The traditional power grid was designed to supply electricity from the utility side to the customers. This grid model has shifted from a one-directional supply of power to a bi-directional one where customers with generation capacity can provide power to the grid. This is possible through bi-directional data flow which ensures the complete power system observability and allows the utility to monitor and control distributed power components remotely. This connectivity depends on the cyber system and efficient communication for ensuring stable and reliable system operations. However, this also makes the grid vulnerable to cyberattacks as the traditional air-gapped grid has evolved into a highly connected network, thus increasing the attack surface for attackers. They might pose the capability to intrude on the network by exploiting network vulnerability, move laterally through different aspects of the network, and cause operational disruption. The type of disruption can be minor voltage fluctuations or even widespread power outages depending on the ultimate malicious attack goal of such adversaries. Therefore, cybersecurity measures for protecting critical power infrastructure are extremely important to ensure smooth system operations.
There has been recent research effort for detecting such attacks, isolating the attacked parts in the grid, and mitigating the impact of the attack, however, instead of a passive response there is a need for a preventive or proactive detection mechanism. This can ensure capturing the attack at the cyber layer before intruders can impact the physical grid. This is the primary motivation to design an intrusion detection system that can detect different coordinated attacks (where different attacks are related and directed towards a specific goal) and can predict the attack path.
This dissertation focuses on first identifying the vulnerabilities in the distribution system and a comprehensive cyber-physical system is developed. Different detection algorithms are developed to detect cyberattacks in the distribution grid and have the intelligence to learn the attack patterns to successfully predict the attack path. Additionally, the effectiveness of advanced communication such as 5G is also tested for different system operations in the distribution system.
|
4 |
Model-based hyperparameter optimizationCrouther, Paul 04 1900 (has links)
The primary goal of this work is to propose a methodology for discovering hyperparameters.
Hyperparameters aid systems in convergence when well-tuned and handcrafted. However,
to this end, poorly chosen hyperparameters leave practitioners in limbo, between concerns
with implementation or improper choice in hyperparameter and system configuration. We
specifically analyze the choice of learning rate in stochastic gradient descent (SGD), a popular
algorithm. As a secondary goal, we attempt the discovery of fixed points using smoothing of
the loss landscape by exploiting assumptions about its distribution to improve the update
rule in SGD. Smoothing of the loss landscape has been shown to make convergence possible in
large-scale systems and difficult black-box optimization problems. However, we use stochastic
value gradients (SVG) to smooth the loss landscape by learning a surrogate model and then
backpropagate through this model to discover fixed points on the real task SGD is trying to
solve. Additionally, we construct a gym environment for testing model-free algorithms, such
as Proximal Policy Optimization (PPO) as a hyperparameter optimizer for SGD. For tasks,
we focus on a toy problem and analyze the convergence of SGD on MNIST using model-free
and model-based reinforcement learning methods for control. The model is learned from
the parameters of the true optimizer and used specifically for learning rates rather than for
prediction. In experiments, we perform in an online and offline setting. In the online setting,
we learn a surrogate model alongside the true optimizer, where hyperparameters are tuned
in real-time for the true optimizer. In the offline setting, we show that there is more potential
in the model-based learning methodology than in the model-free configuration due to this
surrogate model that smooths out the loss landscape and makes for more helpful gradients
during backpropagation. / L’objectif principal de ce travail est de proposer une méthodologie de découverte des hyperparamètres.
Les hyperparamètres aident les systèmes à converger lorsqu’ils sont bien réglés et
fabriqués à la main. Cependant, à cette fin, des hyperparamètres mal choisis laissent les praticiens
dans l’incertitude, entre soucis de mise en oeuvre ou mauvais choix d’hyperparamètre et
de configuration du système. Nous analysons spécifiquement le choix du taux d’apprentissage
dans la descente de gradient stochastique (SGD), un algorithme populaire. Comme objectif
secondaire, nous tentons de découvrir des points fixes en utilisant le lissage du paysage des
pertes en exploitant des hypothèses sur sa distribution pour améliorer la règle de mise à jour
dans SGD. Il a été démontré que le lissage du paysage des pertes rend la convergence possible
dans les systèmes à grande échelle et les problèmes difficiles d’optimisation de la boîte noire.
Cependant, nous utilisons des gradients de valeur stochastiques (SVG) pour lisser le paysage
des pertes en apprenant un modèle de substitution, puis rétropropager à travers ce modèle
pour découvrir des points fixes sur la tâche réelle que SGD essaie de résoudre. De plus, nous
construisons un environnement de gym pour tester des algorithmes sans modèle, tels que
Proximal Policy Optimization (PPO) en tant qu’optimiseur d’hyperparamètres pour SGD.
Pour les tâches, nous nous concentrons sur un problème de jouet et analysons la convergence
de SGD sur MNIST en utilisant des méthodes d’apprentissage par renforcement sans modèle
et basées sur un modèle pour le contrôle. Le modèle est appris à partir des paramètres du
véritable optimiseur et utilisé spécifiquement pour les taux d’apprentissage plutôt que pour
la prédiction. Dans les expériences, nous effectuons dans un cadre en ligne et hors ligne.
Dans le cadre en ligne, nous apprenons un modèle de substitution aux côtés du véritable
optimiseur, où les hyperparamètres sont réglés en temps réel pour le véritable optimiseur.
Dans le cadre hors ligne, nous montrons qu’il y a plus de potentiel dans la méthodologie
d’apprentissage basée sur un modèle que dans la configuration sans modèle en raison de ce
modèle de substitution qui lisse le paysage des pertes et crée des gradients plus utiles lors de
la rétropropagation.
|
Page generated in 0.1303 seconds