Global ETD Search

11	Reinforcement Learning in Problems with Continuous Action Spaces : a Comparative Study Larsson, Axel January 2021 (has links) Reinforcement learning (RL) is one of the three main areas in machine learning (ML) with a solid theoretical background and progress. Generally, RL can provide solutions to many real- world applications, such as self-driving cars and protein folding. A class of RL problems with an infinite number of actions from each state has recently received significant attention, namely infinite action space RL problems. There are several standard algorithms for RL problems, and depending on the nature of the problem, one should choose a proper RL algorithm which can be a challenging task. To compare RL algorithms, we carefully implement them on different tasks and store the relevant results. To have a fair comparison, we tune the algorithms and iteratively test and update them beforehand. This study compares four different RL algorithms. Our results show that the RL algorithms that store the steps of their path, or have a model for the environment, have the highest rate of convergence. By updating the value of every step of the path after a reward, instead of only looking backward a single step, the algorithms find a solution faster and more often. Having a model to help the algorithm plan ahead also contributed to faster and more stable learning. RL algorithms that use a deep neural network for evaluation are the least stable. Our results can provide a good basis for selecting appropriate algorithms for infinite action space RL problems. It can be built upon, simplifying the development of improvements for researchers on the RL algorithms that exist today. / Förstärkningsinlärning är ett av de tre huvudområdena inom maskininlärning med en stark teoretisk bakgrund och stor utveckling. I allmänhet kan förstärkningsinlärning tillhandahålla lösningar för många applikationer som används i praktiken, såsom självkörande bilar och proteinveckning. En klass av förstärkningsinlärningsproblem med oändligt antal handlingar från varje tillstånd har nyligen fått betydande uppmärksamhet, nämligen förstärkningsinlärningsproblem med oändliga handlingsrum. Det finns flera standardalgoritmer för förstärkningsinlärningsproblem och en utmanande uppgift blir därför att välja en passande förstärkningsinlärningsalgoritm beroende på problemets natur. För att jämföra algoritmerna implementerar vi dem noggrant på olika uppgifter och lagrar relevanta resultat. För att få en rättvis jämförelse justerar vi och testar algoritmerna iterativt och uppdaterar dem i förväg. Denna studie jämför fyra olika förstärkningsinlärningsalgoritmer. Våra resultat visar att de algoritmer som lagrar varje steg under vägen, eller har en modell för miljön, har den högsta konvergensgraden. Genom att uppdatera värdet för varje steg på vägen efter en belöning, istället för att bara se bakåt ett steg, hittar algoritmerna en lösning snabbare och oftare. Att ha en modell för att hjälpa algoritmen att planera sina handlingar bidrar också till snabbare och mer stabilt lärande. Förstärkningsinlärningsalgoritmer som använder ett djupt neuralt nätverk för evaluering är minst stabila. Våra resultat kan ge en bra grund för att välja lämpliga algoritmer för förstärkningsinlärningsproblem med oändliga handlingsrum. Det här kan byggas vidare på, vilket förenklar utvecklingen av förbättringar för forskare på de förstärkningsinlärningsalgoritmer som finns idag. Actor-critic deep learning machine learning reinforcement learning Q-learning. Aktör-kritiker djupinlärning förstärkningsinlärning maskininlärning Q- inlärning. Computer and Information Sciences Data- och informationsvetenskap
12	Optimizing vertical farming : control and scheduling algorithms for enhanced plant growth Vu, Cong Vinh 10 1900 (has links) L’agriculture verticale permet de contrôler presque totalement les conditions pour croître des plantes, qu’il s’agisse des conditions météorologiques, des nutriments nécessaires à la croissance des plantes ou même de la lutte contre les parasites. Il est donc possible de trouver et de définir des paramètres susceptibles d’augmenter le rendement et la qualité des récoltes et de minimiser la consommation d’énergie dans la mesure du possible. À cette fin, ce mémoire présente des algorithmes d’optimisation tels qu’une version améliorée du recuit simulé qui peut être utilisée pour trouver et donner des lignes directrices pour les paramètres de l’agriculture verticale. Nous présentons égalementune contribution sur la façon dont les algorithmes de contrôle, p. ex. l’apprentissage par renforcement profond avec les méthodes critiques d’acteurs, peuvent être améliorés grâce à une exploration plus efficace en prenant en compte de l’incertitude épistémique lors de la sélection des actions. cette contribution peut profiter aux systèmes de contrôle conçus pour l’agriculture verticale. Nous montrons que notre travail est capable de surpasser certains algorithmes utilisés pour l’optimisation et le contrôle continu. / Vertical farming provides a way to have almost total control over agriculture, whether it be controlling weather conditions, nutrients necessary for plant growth, or even pest control. As such, it is possible to find and set parameters that can increase crop yield, and quality, and minimize energy consumption where possible. To that end, this thesis presents optimization algorithms such as an enhanced version of Simulated Annealing that can be used to find and give guidelines for those parameters. We also present work on how real-time control algorithms such as Actor-Critic methods can be made to perform better through more efficient exploration by taking into account epistemic uncertainty during action selection which can also benefit control systems made for vertical farming. We show that our work is able to outperform some algorithms used for optimization and continuous control. Agriculture verticale Recuit simulé Apprentissage par renforcement profond Méthodes critiques d’acteurs Vertical farming Simulated annealing Deep reinforcement learning Actor-Critic methods
13	REAL-TIME UPDATING AND NEAR-OPTIMAL ENERGY MANAGEMENT SYSTEM FOR MULTI-MODE ELECTRIFIED POWERTRAIN WITH REINFORCEMENT LEARNING CONTROL Biswas, Atriya January 2021 (has links) Energy management systems (EMSs), implemented in the electronic control unit (ECU) of an actual vehicle with electri ed powertrain, is a much simpler version of the theoretically developed EMS. Such simpli cation is done to accommodate the EMS within the given memory constraint and computational capacity of the ECU. The simpli cation should ensure reasonable performance compared to theoretical EMS under real-life driving scenarios. The process of simpli cation must be effective to create a versatile and utilitarian EMS. The reinforcement learning-based controllers feature pro table characteristics in optimizing the performance of controllable physical systems as they do not mandatorily require a mathematical model of system dynamics (i.e. they are model-free). Quite naturally, it can aspired to testify such prowess of reinforcement learning-based controllers in achieving near-global optimal performance for energy management system (supervisory) of electri ed powertrains. Before deployment of any supervisory controller as a mainstream controller, they should be essentially scrutinized through various levels of virtual simulation platforms with an ascending order of physical system emulating-capability. The controller evolves from a mathematical concept to an utilitarian embedded system through a series of these levels where it undergoes gradual transformation to finally become apposite for a real physical system. Implementation of the control strategy in a Simulink-based forward simulation model could be the first stage of the aforementioned evolution process. This brief will delineate all the steps required for implementing an reinforcement learning-based supervisory controller in a forward simulation model of a hybrid electric vehicle. A novel framework of loss-minimization based instantaneous optimal strategy is introduced for the energy management system of a multi-mode hybrid electric powertrain in this brief. The loss-minimization strategy is flexible enough to be implemented in any architecture of electrified powertrains. It is mathematically proven that the overall system loss minimization is equivalent to the minimization of fuel consumption. An online simulation framework is developed in this article to evaluate the performance of a multi-mode electrified powertrain equipped with more than one power source. An electrically variable transmission with two planetary gear-set has been chosen as the centerpiece of the powertrain considering the versatility and future prospects of such transmissions. It is noteworthy to mention that a novel architecture topology selected for this dissertation is engendered through a series of rigorous screening process whose workflow is presented here with brevity. One of the legitimate concern of multi-mode transmission is it's proclivity to contribute discontinuity of power-flow in the downstream of the powertrain. Mode-shift events can be predominantly held responsible for engendering such discontinuity. Advent of dynamic coordinated control as a technique for ameliorating such discontinuity has been substantiated by many scholars in literature. Hence, a system-level coordinated control is employed within the energy management system which governs the mode schedule of the multi-mode powertrain in real-time simulation. / Thesis / Doctor of Philosophy (PhD) Energy management system Reinforcement learning Real-time Hybrid electric vehicle Deep reinforcement learning Actor-Critic Asynchronous training High-fidelity transmission Multi-mode electrified powertrain Charge sustainability Unfamiliar driving scenario Markov decision process
14	Rôles complémentaires du cortex préfrontal et du striatum dans l'apprentissage et le changement de stratégies de navigation basées sur la récompense chez le rat Khamassi, Mehdi 26 September 2007 (has links) (PDF) Les mammifères ont la capacité de suivre différents comportements de navigation, définis comme des " stratégies " ne faisant pas forcément appel à des processus conscients, suivant la tâche spécifique qu'ils ont à résoudre. Dans certains cas où un indice visuel indique le but, ils peuvent suivre une simple stratégie stimulus-réponse (S-R). À l'opposé, d'autres tâches nécessitent que l'animal mette en oeuvre une stratégie plus complexe basée sur l'élaboration d'une certaine représentation de l'espace lui permettant de se localiser et de localiser le but dans l'environnement. De manière à se comporter de façon efficace, les animaux doivent non seulement être capables d'apprendre chacune de ces stratégies, mais ils doivent aussi pouvoir passer d'une stratégie à l'autre lorsque les exigences de l'environnement changent. La thèse présentée ici adopte une approche pluridisciplinaire - comportement, neurophysiologie, neurosciences computationnelles et robotique autonome - de l'étude du rôle du striatum et du cortex préfrontal dans l'apprentissage et l'alternance de ces stratégies de navigation chez le rat, et leur application possible à la robotique. Elle vise notamment à préciser les rôles respectifs du cortex préfrontal médian (mPFC) et de différentes parties du striatum (DLS :dorsolateral ; VS : ventral) dans l'ensemble de ces processus, ainsi que la nature de leurs interactions. Le travail expérimental effectué a consisté à : (1) étudier le rôle du striatum dans l'apprentissage S-R en : (a) analysant des données électrophysiologiques enregistrées dans le VS chez le rat pendant une tâche de recherche de récompense dans un labyrinthe en croix ; (b) élaborant un modèle Actor-Critic de l'apprentissage S-R où le VS est le Critic qui guide l'apprentissage, tandis que le DLS est l'Actor qui mémorise les associations S-R. Ce modèle est étendu à la simulation robotique et ses performances sont comparées avec des modèles Actor-Critic existants dans un labyrinthe en croix virtuel ; (2) Dans un deuxième temps, le rôle du striatum dans l'apprentissage de stratégies de type localisation étant supposé connu, nous nous sommes focalisés sur l'étude du rôle du mPFC dans l'alternance entre stratégies de navigation, en effectuant des enregistrements électrophysiologiques dans le mPFC du rat lors d'une tâche requiérant ce type d'alternance. Les principaux résultats de ce travail suggèrent que : (1) dans le cadre S-R : (a) comme chez le singe, le VS du rat élabore des anticipations de récompense cohérentes avec la théorie Actor-Critic ; (b) ces anticipations de récompense peuvent être combinées avec des cartes auto-organisatrices dans un modèle Actor-Critic obtenant de meilleures performances que des modèles existants dans un labyrinthe en croix virtuel, et disposant de capacités de généralisation intéressantes pour la robotique autonome ; (2) le mPFC semble avoir un rôle important lorsque la performance de l'animal est basse et qu'il faut apprendre une nouvelle stratégie. D'autre part, l'activité de population dans le mPFC change rapidement, en correspondance avec les transitions de stratégies dans le comportement du rat, suggérant une contribution de cette partie du cerveau dans la sélection flexible des stratégies comportementales. Nous concluons ce manuscrit par une discussion de nos résultats dans le cadre de travaux précédents en comportement, électrophysiologie et modélisation. Nous proposons une nouvelle architecture du système préfronto-striatal chez le rat dans laquelle des sous-parties du striatum apprennent différentes stratégies de navigation, et où le cortex préfrontal médian décide à chaque instant quelle stratégie devra régir le comportement du rat. [INFO:INFO_RB] Computer Science/Robotics [INFO:INFO_RB] Informatique/Robotique Cortex préfrontal striatum stratégies de navigation apprentissage alternance TD-learning récompense modèle Actor-Critic
15	Online Learning and Simulation Based Algorithms for Stochastic Optimization Lakshmanan, K January 2012 (has links) (PDF) In many optimization problems, the relationship between the objective and parameters is not known. The objective function itself may be stochastic such as a long-run average over some random cost samples. In such cases finding the gradient of the objective is not possible. It is in this setting that stochastic approximation algorithms are used. These algorithms use some estimates of the gradient and are stochastic in nature. Amongst gradient estimation techniques, Simultaneous Perturbation Stochastic Approximation (SPSA) and Smoothed Functional(SF) scheme are widely used. In this thesis we have proposed a novel multi-time scale quasi-Newton based smoothed functional (QN-SF) algorithm for unconstrained as well as constrained optimization. The algorithm uses the smoothed functional scheme for estimating the gradient and the quasi-Newton method to solve the optimization problem. The algorithm is shown to converge with probability one. We have also provided here experimental results on the problem of optimal routing in a multi-stage network of queues. Policies like Join the Shortest Queue or Least Work Left assume knowledge of the queue length values that can change rapidly or hard to estimate. If the only information available is the expected end-to-end delay as with our case, such policies cannot be used. The QN-SF based probabilistic routing algorithm uses only the total end-to-end delay for tuning the probabilities. We observe from the experiments that the QN-SF algorithm has better performance than the gradient and Jacobi versions of Newton based smoothed functional algorithms. Next we consider constrained routing in a similar queueing network. We extend the QN-SF algorithm to this case. We study the convergence behavior of the algorithm and observe that the constraints are satisfied at the point of convergence. We provide experimental results for the constrained routing setup as well. Next we study reinforcement learning algorithms which are useful for solving Markov Decision Process(MDP) when the precise information on transition probabilities is not known. When the state, and action sets are very large, it is not possible to store all the state-action tuples. In such cases, function approximators like neural networks have been used. The popular Q-learning algorithm is known to diverge when used with linear function approximation due to the ’off-policy’ problem. Hence developing stable learning algorithms when used with function approximation is an important problem. We present in this thesis a variant of Q-learning with linear function approximation that is based on two-timescale stochastic approximation. The Q-value parameters for a given policy in our algorithm are updated on the slower timescale while the policy parameters themselves are updated on the faster scale. We perform a gradient search in the space of policy parameters. Since the objective function and hence the gradient are not analytically known, we employ the efficient one-simulation simultaneous perturbation stochastic approximation(SPSA) gradient estimates that employ Hadamard matrix based deterministic perturbations. Our algorithm has the advantage that, unlike Q-learning, it does not suffer from high oscillations due to the off-policy problem when using function approximators. Whereas it is difficult to prove convergence of regular Q-learning with linear function approximation because of the off-policy problem, we prove that our algorithm which is on-policy is convergent. Numerical results on a multi-stage stochastic shortest path problem show that our algorithm exhibits significantly better performance and is more robust as compared to Q-learning. Future work would be to compare it with other policy-based reinforcement learning algorithms. Finally, we develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process(MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multistage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point. Stochastic Approximation Algorithms Stochastic Optimization Markov Decision Process Reinforcement Learning Algorithm Queueing Networks Queuing Theory Online Q-Learning Algorithm Online Actor-Critic Algorithm Markov Decision Processes Q-learning Algorithm Linear Function Approximation Computer Science
16	[pt] ESTUDO DE TÉCNICAS DE APRENDIZADO POR REFORÇO APLICADAS AO CONTROLE DE PROCESSOS QUÍMICOS / [en] STUDY OF REINFORCEMENT LEARNING TECHNIQUES APPLIED TO THE CONTROL OF CHEMICAL PROCESSES 30 December 2021 (has links) [pt] A indústria 4.0 impulsionou o desenvolvimento de novas tecnologias para atender as demandas atuais do mercado. Uma dessas novas tecnologias foi a incorporação de técnicas de inteligência computacional no cotidiano da indústria química. Neste âmbito, este trabalho avaliou o desempenho de controladores baseados em aprendizado por reforço em processos químicos industriais. A estratégia de controle interfere diretamente na segurança e no custo do processo. Quanto melhor for o desempenho dessa estrategia, menor será a produção de efluentes e o consumo de insumos e energia. Os algoritmos de aprendizado por reforço apresentaram excelentes resultados para o primeiro estudo de caso, o reator CSTR com a cinética de Van de Vusse. Entretanto, para implementação destes algoritmos na planta química do Tennessee Eastman Process mostrou-se que mais estudos são necessários. A fraca ou inexistente propriedade Markov, a alta dimensionalidade e as peculiaridades da planta foram fatores dificultadores para os controladores desenvolvidos obterem resultados satisfatórios. Foram avaliados para o estudo de caso 1, os algoritmos Q-Learning, Actor Critic TD, DQL, DDPG, SAC e TD3, e para o estudo de caso 2 foram avaliados os algoritmos CMA-ES, TRPO, PPO, DDPG, SAC e TD3. / [en] Industry 4.0 boosted the development of new technologies to meet current market demands. One of these new technologies was the incorporation of computational intelligence techniques into the daily life of the chemical industry. In this context, this present work evaluated the performance of controllers based on reinforcement learning in industrial chemical processes. The control strategy directly affects the safety and cost of the process. The better the performance of this strategy, the lower will be the production of effluents and the consumption of input and energy. The reinforcement learning algorithms showed excellent results for the first case study, the Van de Vusse s reactor. However, to implement these algorithms in the Tennessee Eastman Process chemical plant it was shown that more studies are needed. The weak Markov property, the high dimensionality and peculiarities of the plant were factors that made it difficult for the developed controllers to obtain satisfactory results. For case study 1, the algorithms Q-Learning, Actor Critic TD, DQL, DDPG, SAC and TD3 were evaluated, and for case study 2 the algorithms CMA-ES, TRPO, PPO, DDPG, SAC and TD3 were evaluated. [pt] APRENDIZADO POR REFORCO [pt] SAC [pt] TD3 [pt] DDPG [pt] DEEP Q-LEARNING [pt] ATOR-CRITICO [pt] REATOR DE VAN DE VUSSE [pt] CONTROLE DE PROCESSOS QUIMICOS [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] Q-LEARNING [pt] PROCESSO TENNESSEE EASTMAN [en] REINFORCEMENT LEARNING [en] SAC [en] TD3 [en] DDPG [en] DEEP Q-LEARNING [en] ACTOR CRITIC [en] CHEMICAL PROCESS CONTROL [en] DEEP REINFORCEMENT LEARNING [en] Q-LEARNING [en] TENNESSEE EASTMAN PROCESS
17	Prediction of Protein-Protein Interactions Using Deep Learning Techniques Soleymani, Farzan 24 April 2023 (has links) Proteins are considered the primary actors in living organisms. Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. PPI identification has been addressed by various experimental methods such as the yeast two-hybrid, mass spectrometry, and protein microarrays, to mention a few. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. Therefore a sequence-based framework called ProtInteract is developed to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequential pattern by extracting uncorrelated attributes and more expressive descriptors. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction. Three different scenarios formulate the prediction task. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The present study makes two significant contributions to the field of protein-protein interaction (PPI) prediction. Firstly, it addresses the computational challenges posed by the high dimensionality of protein datasets through the use of dimensionality reduction techniques, which extract highly informative sequence attributes. Secondly, the proposed framework, ProtInteract, utilises this information to identify the interaction characteristics of a protein based on its amino acid configuration. ProtInteract encodes the protein's primary structure into a lower-dimensional vector space, thereby reducing the computational complexity of PPI prediction. Our results provide evidence of the proposed framework's accuracy and efficiency in predicting protein-protein interactions. Long-short Term Memory Recurrent Neural Networks Protein-Protein Interaction Temporal Convolutional Network Convolutional Neural Network Autoencoder Reinforcement learning actor-critic portfolio management stock market prediction coverage control multi-agent system SARSA Q-learning Graph convolutional neural network GCN state-action-reward-state-action

Page generated in 0.0316 seconds