51 |
Learning Strategies in Multi-Agent Systems - Applications to the Herding ProblemGadre, Aditya Shrikant 14 December 2001 (has links)
"Multi-Agent systems" is a topic for a lot of research, especially research involving strategy, evolution and cooperation among various agents. Various learning algorithm schemes have been proposed such as reinforcement learning and evolutionary computing.
In this thesis two solutions to a multi-agent herding problem are presented. One solution is based on Q-learning algorithm, while the other is based on modeling of artificial immune system.
Q-learning solution for the herding problem is developed, using region-based local learning for each individual agent. Individual and batch processing reinforcement algorithms are implemented for non-cooperative agents. Agents in this formulation do not share any information or knowledge. Issues such as computational requirements, and convergence are discussed.
An idiotopic artificial immune network is proposed that includes individual B-cell model for agents and T-cell model for controlling the interaction among these agents. Two network models are proposed--one for evolving group behavior/strategy arbitration and the other for individual action selection.
A comparative study of the Q-learning solution and the immune network solution is done on important aspects such as computation requirements, predictability, and convergence. / Master of Science
|
52 |
Joint radio and power resource optimal management for wireless cellular networks interconnected through smart grids / Optimisation conjointe d'une architecture de réseau cellulaire hétérogène et du réseau électrique intelligent associéMendil, Mouhcine 08 October 2018 (has links)
Face à l'explosion du trafic mobile entraînée par le succès des smartphones, les opérateurs de réseaux mobiles (MNOs) densifient leurs réseaux à travers le déploiement massif des stations de base à faible portée (SBS), capable d’offrir des services très haut débit et de remplir les exigences de capacité et de couverture. Cette nouvelle infrastructure, appelée réseau cellulaire hétérogène (HetNet), utilise un mix de stations de base hiérarchisées, comprenant des macro-cellule à forte puissance et des SBS à faible puissance.La prolifération des HetNets soulève une nouvelle préoccupation concernant leur consommation d'énergie et empreinte carbone. Dans ce contexte, l'utilisation de technologies de production d'énergie dans les réseaux mobiles a suscité un intérêt particulier. Les sources d'énergie respectueuses de l'environnement couplées à un système de stockage d'énergie ont le potentiel de réduire les émissions carbone ainsi que le coût opérationnel énergétique des MNOs.L'intégration des énergies renouvelables (panneau solaire) et du stockage d'énergie (batterie) dans un SBS gagne en efficacité grâce aux leviers technologiques et économiques apportés par le smart grid (SG). Cependant, l'architecture résultante, que nous appelons Green Small-Cell Base station (GSBS), est complexe. Premièrement, la multitude de sources d'énergie, le phénomène de viellissement du système et le prix dynamique de l'électricité dans le SG sont des facteurs qui nécessitent planification et gestion pour un fonctionnement plus efficace du GSBS. Deuxièmement, il existe une étroite dépendance entre le dimensionnement et le contrôle en temps réel du système, qui nécessite une approche commune capable de résoudre conjointement ces deux problèmes. Enfin, la gestion holistique d’un HetNet nécessite un schéma de contrôle à grande échelle pour optimiser simultanément les ressources énergétiques locales et la collaboration radio entre les SBSs.Par conséquent, nous avons élaboré un cadre d'optimisation pour le pré-déploiement et le post-déploiement du GSBS, afin de permettre aux MNOs de réduire conjointement leurs dépenses d'électricité et le vieillissement de leurs équipements. L'optimisation pré-déploiement consiste en un dimensionnement du GSBS qui tient compte du vieillissement de la batterie et de la stratégie de gestion des ressources énergétiques. Le problème associé est formulé et le dimensionnement optimal est approché en s'appuyant des profils moyens (production, consommation et prix de l'électricité) à travers une méthode itérative basée sur le solveur non-linéaire “fmincon”. Le schéma de post-déploiement repose sur des capacités d'apprentissage permettant d'ajuster dynamiquement la gestion énergétique du GSBS à son environnement (conditions météorologiques, charge de trafic et coût de l'électricité). La solution s'appuie sur le fuzzy Q-learning qui consiste à combiner le système d'inférence floue avec l'algorithme Q-learning. Ensuite, nous formalisons un système d'équilibrage de charge capable d'étendre la gestion énergétique locale à une collaboration à l'échelle réseau. Nous proposons à ce titre un algorithme en deux étapes, combinant des contrôleurs hiérarchiques au niveau du GSBS et au niveau du réseau. Les deux étapes s'alternent pour continuellement planifier et adapter la gestion de l'énergie à la collaboration radio dans le HetNet.Les résultats de la simulation montrent que, en considérant le vieillissement de la batterie et l'impact mutuel de la conception du système sur la stratégie énergétique (et vice-versa), le dimensionnement optimal du GSBS est capable de maximiser le retour sur investissement. En outre, grâce à ses capacités d'apprentissage, le GSBS peut être déployé de manière plug-and-play, avec la possibilité de s'auto-organiser, d'améliorer le coût énergétique du système et de préserver la durée de vie de la batterie. / Pushed by an unprecedented increase in data traffic, Mobile Network Operators (MNOs) are densifying their networks through the deployment of Small-cell Base Stations (SBS), low-range radio-access transceivers that offer enhanced capacity and improved coverage. This new infrastructure – Heterogeneous cellular Network (HetNet) -- uses a hierarchy of high-power Macro-cell Base Stations overlaid with several low-power (SBSs).The augmenting deployment and operation of the HetNets raise a new crucial concern regarding their energy consumption and carbon footprint. In this context, the use of energy-harvesting technologies in mobile networks have gained particular interest. The environment-friendly power sources coupled with energy storage capabilities have the potential to reduce the carbon emissions as well as the electricity operating expenditures of MNOs.The integration of renewable energy (solar panel) and energy storage capability (battery) in SBSs gain in efficiency thanks to the technological and economic enablers brought by the Smart Grid (SG). However, the obtained architecture, which we call Green Small-Cell Base Station (GSBS), is complex. First, the multitude of power sources, the system aging, and the dynamic electricity price in the (SG) are factors that require design and management to enable the (GSBS) to efficiently operate. Second, there is a close dependence between the system sizing and control, which requires an approach to address these problems simultaneously. Finally, the achievement of a holistic management in a (HetNet) requires a network-level energy-aware scheme that jointly optimizes the local energy resources and radio collaboration between the SBSs.Accordingly, we have elaborated pre-deployment and post-deployment optimization frameworks for GSBSs that allow the MNOs to jointly reduce their electricity expenses and the equipment degradation. The pre-deployment optimization consists in an effective sizing of the GSBS that accounts for the battery aging and the associated management of the energy resources. The problem is formulated and the optimal sizing is approximated using average profiles, through an iterative method based on the non-linear solver “fmincon”. The post-deployment scheme relies on learning capabilities to dynamically adjust the GSBS energy management to its environment (weather conditions, traffic load, and electricity cost). The solution is based on the fuzzy Q-learning that consists in tuning a fuzzy inference system (which represents the energy arbitrage in the system) with the Q-learning algorithm. Then, we formalize an energy-aware load-balancing scheme to extend the local energy management to a network-level collaboration. We propose a two-stage algorithm to solve the formulated problem by combining hierarchical controllers at the GSBS-level and at the network-level. The two stages are alternated to continuously plan and adapt the energy management to the radio collaboration in the HetNet.Simulation results show that, by considering the battery aging and the impact of the system design and the energy strategy on each other, the optimal sizing of the GSBS is able to maximize the return on investment with respect to the technical and economic conditions of the deployment. Also, thanks to its learning capabilities, the GSBSs can be deployed in a plug-and-play fashion, with the ability to self-organize, improve the operating energy cost of the system, and preserves the battery lifespan.
|
53 |
深度增強學習在動態資產配置上之應用— 以美國ETF為例 / The Application of Deep Reinforcement Learning on Dynamic Asset Allocation : A Case Study of U.S. ETFs劉上瑋 Unknown Date (has links)
增強式學習(Reinforcement Learning)透過與環境不斷的互動來學習,以達到極大化每一期報酬的總和的目標,廣泛被運用於多期的決策過程。基於這些特性,增強式學習可以應用於建立需不斷動態調整投資組合配置比例的動態資產配置策略。
本研究應用Deep Q-Learning演算法建立動態資產配置策略,研究如何在每期不同的環境狀態之下,找出最佳的配置權重。採用2007年7月2日至2017年6月30日的美國中大型股的股票ETF及投資等級的債券ETF建立投資組合,以其日報酬率資料進行訓練,並與買進持有策略及固定比例投資策略比較績效,檢視深度增強式學習在動態資產配置適用性。 / Reinforcement learning learns by interacting with the environment continuously, in order to achieve the target of maximizing the sum of each return. It has been used to solve multi-period decision making problem broadly. Because of these characteristics, reinforcement learning can be applied to build the strategies of dynamic asset allocation which keep reallocating the mix of portfolio consistently.
In this study, we apply deep Q-Learning algorithm to build the strategies of dynamic asset allocation. Studying how to find the optimal weights in the different environment. We use Large-Cap, Mid-Cap ETFs and investment-grade bond ETFs in the U.S. to build up the portfolio. We train the model with the data of daily return, and then we measure its performance by comparing with buy-and-hold and constant-mix strategy to check the fitness of deep Q-Learning.
|
54 |
Bayesian Reinforcement Learning Methods for Network Intrusion PreventionNesti Lopes, Antonio Frederico January 2021 (has links)
A growing problem in network security stems from the fact that both attack methods and target systems constantly evolve. This problem makes it difficult for human operators to keep up and manage the security problem. To deal with this challenge, a promising approach is to use reinforcement learning to adapt security policies to a changing environment. However, a drawback of this approach is that traditional reinforcement learning methods require a large amount of data in order to learn effective policies, which can be both costly and difficult to obtain. To address this problem, this thesis investigates ways to incorporate prior knowledge in learning systems for network security. Our goal is to be able to learn security policies with less data compared to traditional reinforcement learning algorithms. To investigate this question, we take a Bayesian approach and consider Bayesian reinforcement learning methods as a complement to current algorithms in reinforcement learning. Specifically, in this work, we study the following algorithms: Bayesian Q-learning, Bayesian REINFORCE, and Bayesian Actor-Critic. To evaluate our approach, we have implemented the mentioned algorithms and techniques and applied them to different simulation scenarios of intrusion prevention. Our results demonstrate that the Bayesian reinforcement learning algorithms are able to learn more efficiently compared to their non-Bayesian counterparts but that the Bayesian approach is more computationally demanding. Further, we find that the choice of prior and the kernel function have a large impact on the performance of the algorithms. / Ett växande problem inom cybersäkerhet är att både attackmetoder samt system är i en konstant förändring och utveckling: å ena sidan så blir attackmetoder mer och mer sofistikerade, och å andra sidan så utvecklas system via innovationer samt uppgraderingar. Detta problem gör det svårt för mänskliga operatörer att hantera säkerhetsproblemet. En lovande metod för att hantera denna utmaning är förstärkningslärande. Med förstärkningslärande kan en autonom agent automatiskt lära sig att anpassa säkerhetsstrategier till en föränderlig miljö. En utmaning med detta tillvägagångsätt är dock att traditionella förstärkningsinlärningsmetoder kräver en stor mängd data för att lära sig effektiva strategier, vilket kan vara både kostsamt och svårt att erskaffa. För att lösa detta problem så undersöker denna avhandling Bayesiska metoder för att inkorporera förkunskaper i inlärningsalgoritmen, vilket kan möjliggöra lärande med mindre data. Specifikt så studerar vi följande Bayesiska algoritmer: Bayesian Q-learning, Bayesian REINFORCE och Bayesian Actor- Critic. För att utvärdera vårt tillvägagångssätt har vi implementerat de nämnda algoritmerna och utvärderat deras prestanda i olika simuleringsscenarier för intrångsförebyggande samt analyserat deras komplexitet. Våra resultat visar att de Bayesiska förstärkningsinlärningsalgoritmerna kan användas för att lära sig strategier med mindre data än vad som kravs vid användande av icke-Bayesiska motsvarigheter, men att den Bayesiska metoden är mer beräkningskrävande. Vidare finner vi att metoden för att inkorporera förkunskap i inlärningsalgoritmen, samt val av kernelfunktion, har stor inverkan på algoritmernas prestanda.
|
55 |
A Deep Reinforcement Learning Approach for Dynamic Traffic Light Control with Transit Signal PriorityNousch, Tobias, Zhou, Runhao, Adam, Django, Hirrle, Angelika, Wang, Meng 23 June 2023 (has links)
Traffic light control (TLC) with transit signal priority (TSP) is an effective way to deal with urban congestion and travel delay. The growing amount of available connected vehicle data offers opportunities for signal control with transit priority, but the conventional control algorithms fall short in fully exploiting those datasets. This paper proposes a novel approach for dynamic TLC with TSP at an urban intersection. We propose a deep reinforcement learning based framework JenaRL to deal with the complex real-world intersections. The optimisation focuses on TSP while balancing the delay of all vehicles. A two-layer state space is defined to capture the real-time traffic information, i.e. vehicle position, type and incoming lane. The discrete action space includes the optimal phase and phase duration based on the real-time traffic situation. An intersection in the inner city of Jena is constructed in an open-source microscopic traffic simulator SUMO. A time-varying traffic demand of motorised individual traffic (MIT), the current TLC controller of the city, as well as the original timetables of the public transport (PT) are implemented in simulation to construct a realistic traffic environment. The results of the simulation with the proposed framework indicate a significant enhancement in the performance of traffic light controller by reducing the delay of all vehicles, and especially minimising the loss time of PT.
|
56 |
Evaluation of Deep Q-Learning Applied to City Environment Autonomous DrivingWedén, Jonas January 2024 (has links)
This project’s goal was to assess both the challenges of implementing the Deep Q-Learning algorithm to create an autonomous car in the CARLA simulator, and the driving performance of the resulting model. An agent was trained to follow waypoints based on two main approaches. First, a camera-based approach, which allowed the agent to gather information about the environment from a camera sensor. The image along with other driving features were fed to a convolutional neural network. Second, an approach focused purely on following the waypoints without the camera sensor. The camera sensor was substituted for an array containing the agent’s angle with respect to the upcoming waypoints along with other driving features. Even though the camera-based approach was the best during evaluation, no approach was successful in consistently following the waypoints of a straight route. To increase the performance of the camera-based approach more training episodes need to be provided. Furthermore, both approaches would greatly benefit from experimentation and optimization of the model’s neural network configuration and its hyperparameters.
|
57 |
Reinforcement Learning for Racecar ControlCleland, Benjamin George January 2006 (has links)
This thesis investigates the use of reinforcement learning to learn to drive a racecar in the simulated environment of the Robot Automobile Racing Simulator. Real-life race driving is known to be difficult for humans, and expert human drivers use complex sequences of actions. There are a large number of variables, some of which change stochastically and all of which may affect the outcome. This makes driving a promising domain for testing and developing Machine Learning techniques that have the potential to be robust enough to work in the real world. Therefore the principles of the algorithms from this work may be applicable to a range of problems. The investigation starts by finding a suitable data structure to represent the information learnt. This is tested using supervised learning. Reinforcement learning is added and roughly tuned, and the supervised learning is then removed. A simple tabular representation is found satisfactory, and this avoids difficulties with more complex methods and allows the investigation to concentrate on the essentials of learning. Various reward sources are tested and a combination of three are found to produce the best performance. Exploration of the problem space is investigated. Results show exploration is essential but controlling how much is done is also important. It turns out the learning episodes need to be very long and because of this the task needs to be treated as continuous by using discounting to limit the size of the variables stored. Eligibility traces are used with success to make the learning more efficient. The tabular representation is made more compact by hashing and more accurate by using smaller buckets. This slows the learning but produces better driving. The improvement given by a rough form of generalisation indicates the replacement of the tabular method by a function approximator is warranted. These results show reinforcement learning can work within the Robot Automobile Racing Simulator, and lay the foundations for building a more efficient and competitive agent.
|
58 |
Reinforcement Learning Control with Approximation of Time-Dependent Agent DynamicsKirkpatrick, Kenton 03 October 2013 (has links)
Reinforcement Learning has received a lot of attention over the years for systems ranging from static game playing to dynamic system control. Using Reinforcement Learning for control of dynamical systems provides the benefit of learning a control policy without needing a model of the dynamics. This opens the possibility of controlling systems for which the dynamics are unknown, but Reinforcement Learning methods like Q-learning do not explicitly account for time. In dynamical systems, time-dependent characteristics can have a significant effect on the control of the system, so it is necessary to account for system time dynamics while not having to rely on a predetermined model for the system.
In this dissertation, algorithms are investigated for expanding the Q-learning algorithm to account for the learning of sampling rates and dynamics approximations. For determining a proper sampling rate, it is desired to find the largest sample time that still allows the learning agent to control the system to goal achievement. An algorithm called Sampled-Data Q-learning is introduced for determining both this sample time and the control policy associated with that sampling rate. Results show that the algorithm is capable of achieving a desired sampling rate that allows for system control while not sampling “as fast as possible”.
Determining an approximation of an agent’s dynamics can be beneficial for the control of hierarchical multiagent systems by allowing a high-level supervisor to use the dynamics approximations for task allocation decisions. To this end, algorithms are investigated for learning first- and second-order dynamics approximations. These algorithms are respectively called First-Order Dynamics Learning and Second-Order Dynamics Learning. The dynamics learning algorithms are evaluated on several examples that show their capability to learn accurate approximations of state dynamics. All of these algorithms are then evaluated on hierarchical multiagent systems for determining task allocation. The results show that the algorithms successfully determine appropriated sample times and accurate dynamics approximations for the agents investigated.
|
59 |
Coordinating transportation services in a hospital environment using Deep Reinforcement LearningLundström, Caroline, Hedberg, Sara January 2018 (has links)
Artificial Intelligence has in the recent years become a popular subject, many thanks to the recent progress in the area of Machine Learning and particularly to the achievements made using Deep Learning. When combining Reinforcement Learning and Deep Learning, an agent can learn a successful behavior for a given environment. This has opened the possibility for a new domain of optimization. This thesis evaluates if a Deep Reinforcement Learning agent can learn to aid transportation services in a hospital environment. A Deep Q-learning Networkalgorithm (DQN) is implemented, and the performance is evaluated compared to a Linear Regression-, a random-, and a smart agent. The result indicates that it is possible for an agent to learn to aid transportation services in a hospital environment, although it does not outperform linear regression on the most difficult task. For the more complex tasks, the learning process of the agent is unstable, and implementation of a Double Deep Q-learning Network may stabilize the process. An overall conclusion is that Deep Reinforcement Learning can perform well on these types of problems and more applied research may result in greater innovations.
|
60 |
Solu??es de coexist?ncia LTE/Wi-Fi em banda n?o licenciadaSantana, Pedro Maia de 07 December 2017 (has links)
Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2018-04-02T12:36:23Z
No. of bitstreams: 1
PedroMaiaDeSantana_DISSERT.pdf: 2069422 bytes, checksum: 1a983f5131609d3cf7fe3bd96b0ef27d (MD5) / Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2018-04-04T13:34:35Z (GMT) No. of bitstreams: 1
PedroMaiaDeSantana_DISSERT.pdf: 2069422 bytes, checksum: 1a983f5131609d3cf7fe3bd96b0ef27d (MD5) / Made available in DSpace on 2018-04-04T13:34:36Z (GMT). No. of bitstreams: 1
PedroMaiaDeSantana_DISSERT.pdf: 2069422 bytes, checksum: 1a983f5131609d3cf7fe3bd96b0ef27d (MD5)
Previous issue date: 2017-12-07 / Este trabalho tem como objetivo realizar um estudo sobre a aplica??o de redes
LTE no espectro ISM (Industrial Scientific and Medical) e seu consequente impacto
sobre tecnologias comumente coexistentes na mesma faixa de frequ?ncia. Inicialmente,
? realizada uma elucida??o te?rica sobre as regulamenta??es que envolvem o uso de
espectro n?o-licenciado. Na sequ?ncia, s?o apresentadas as principais solu??es de
coexist?ncia do LTE nesse meio, destacando-se o mecanismo recentemente padronizado
pelo 3GPP, o LTE-LBT, e tecnologias espec?ficas de empresas pioneiras na ?rea, tais
como a solu??o LTE-DC. Como elemento pr?tico complementar ? investiga??o te?rica
inicial, s?o desenvolvidas an?lises de desempenho das respectivas solu??es utilizando
o simulador ns-3. A novidade do trabalho ? materializada pela apresenta??o de uma
proposta de solu??o para o mecanismo Carrier-Sensing Adaptive Transmission (CSAT).
Essa solu??o, baseada em aprendizado de m?quina, visa melhorar o desempenho conjunto
dos sistemas que coexistem na faixa ISM. Este trabalho tamb?m prop?e uma solu??o de
coexist?ncia do LTE-DC consigo pr?prio a partir de uma abordagem utilizando teoria
dos jogos. Essas solu??es s?o comparada com as solu??es cl?ssicas e o seus ganhos s?o
evidenciado em cen?rios definidos por ?rg?os de padroniza??o mundial. / This work aims to perform a study about the application of LTE networks in
ISM (Industrial Scientific and Medical) spectrum and its impact over technologies
that communly coexist in the same frequency range. Initially, it?s made a theoretical
elucidation about regulamentations involving the non licensed spectrum usage. In
sequence, it?s presented the main LTE coexistence solutions in this field, highlighting
the recent mechanism standardized by 3GPP, the LTE-LBT, and specific technologies
of pioneering companies in this domain, like LTE-DC solution. As a practical
element complementary to the initial theoretical investigation, it?s developed performance
analyzes of the respective solutions using ns-3 simulator. The novelty of the work is
materialized by the presentation of a solution proposal for the Carrier-Sensing Adaptive
Transmission (CSAT). This solution, based on machine learning, aims to improve the
joint performance of systems that coexist in the ISM band. This work also propose a
solution for LTE-DC self-coexistence by a game theory approach. These solutions are
compared to the classical ones and their gains are evidenced in scenarios defined by global
standardization institutions.
|
Page generated in 0.0785 seconds