Global ETD Search

61	Complementary Layered Learning Mondesire, Sean 01 January 2014 (has links) Layered learning is a machine learning paradigm used to develop autonomous robotic-based agents by decomposing a complex task into simpler subtasks and learns each sequentially. Although the paradigm continues to have success in multiple domains, performance can be unexpectedly unsatisfactory. Using Boolean-logic problems and autonomous agent navigation, we show poor performance is due to the learner forgetting how to perform earlier learned subtasks too quickly (favoring plasticity) or having difficulty learning new things (favoring stability). We demonstrate that this imbalance can hinder learning so that task performance is no better than that of a suboptimal learning technique, monolithic learning, which does not use decomposition. Through the resulting analyses, we have identified factors that can lead to imbalance and their negative effects, providing a deeper understanding of stability and plasticity in decomposition-based approaches, such as layered learning. To combat the negative effects of the imbalance, a complementary learning system is applied to layered learning. The new technique augments the original learning approach with dual storage region policies to preserve useful information from being removed from an agent’s policy prematurely. Through multi-agent experiments, a 28% task performance increase is obtained with the proposed augmentations over the original technique. Machine learning reinforcement learning layered learning evolutionary computation q learning forgetting stability plasticity dilemma Computer Sciences Engineering
62	Control of an Inverted Pendulum Using Reinforcement Learning Methods Kärn, Joel January 2021 (has links) In this paper the two reinforcement learning algorithmsQ-learning and deep Q-learning (DQN) are used tobalance an inverted pendulum. In order to compare the two, bothalgorithms are optimized to some extent, by evaluating differentvalues for some parameters of the algorithms. Since the differencebetween Q-learning and DQN is a deep neural network (DNN),some benefits of a DNN are then discussed.The conclusion is that this particular problem is simple enoughfor the Q-learning algorithm to work well and is preferable,even though the DQN algorithm solves the problem in fewerepisodes. This is due to the stability of the Q-learning algorithmand because more time is required to find a suitable DNN andevaluate appropriate parameters for the DQN algorithm, than tofind the proper parameters for the Q-learning algorithm. / I denna rapport används två algoritmer inom förstärkningsinlärning och djup Q-inlärning (DQN), för att balancera en omvänd pendel. För att jämföra dem så optimeras algoritmerna i viss utsträckning genom att testa olika värden för vissa av deras parametrar. Eftersom att skillnaden mellan Q-inlärning och DQN är ett djupt neuralt nätverk (DNN) så diskuterades fördelen med ett DNN. Slutstatsen är att för ett så pass enkelt problem så fungerar Q-inlärningsalgoritmen bra och är att föredra, trots att DQNalgoritmen löser problemet på färre episoder. Detta är pågrund av Q-inlärningsalgoritmens stabilitet och att mer tid krävs för att hitta ett passande DNN och hitta lämpliga parametrar för DQN-algoritmen än vad det krävs för att hitta bra parametrar för Q-inlärningsalgoritmen. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Reinforcement Learning Q-learning DQN CartPole Inverted Pendulum OpenAI Elektroteknik och elektronik
63	Energy Efficiency of 5G Radio Access Networks Peesapati, Saivenkata Krishna Gowtam January 2020 (has links) The roll-out of the fifth-generation (5G) wireless networks alongside existing generations and characterized by a dense deployment of base stations (BSs) to serve an ever-increasing number of users and services leads to a drastic increase in the overall network energy consumption (EC). It can lead to an unprecedented rise in operational expenditure (OPEX) for the network operators and an increased global carbon footprint. The present-day networks are dimensioned according to the peak traffic demands, and hence are under-utilized due to the daily traffic variations. Therefore, to save energy, BSs can be put into sleep with different levels following the daily load variations. Selection of the right sleep level at the right instant is important to adapt the availability of the resources to the traffic load to maximize the energy savings without degrading the performance of the network. Previous studies focused on the selection of sleep modes (SMs) to maximize energy saving or the sleep duration given configuration and network resources. However, adaptive BS configuration together with SMs have not been investigated. In this thesis, the goal is to consider the design of the wireless network resources to cover an area with a given traffic demand in combination with sleep mode management. To achieve this, a novel EC model is proposed to capture the activity time of a 5G BS in a multi-cell environment. The activity factor of a BS is defined as the fraction of time the BS is transmitting over a fixed period and is dependent on the amount of BS resources. The new model captures the variation in power consumption by configuring three BS resources: 1) the active array size, 2) the bandwidth, and 3) the spatial multiplexing factor. We then implement a Q-learning algorithm to adapt these resources following the traffic demand and also the selection of sleep levels. Our results show that the difference in the average daily EC of BSs considered can be as high as 60% depending on the deployment area. Furthermore, the EC of a BS can be reduced by 57% during the low traffic hours by having deeper sleep levels as compared to the baseline scenario with no sleep modes. Implementing the resource adaptation algorithm further reduces the average EC of the BS by up to 20% as compared to the case without resource adaptation. However, the EE gain obtained by the algorithm depends on its convergence, which varies with the distribution of the users in the cell, the peak traffic demand, and the BS resources available. Our results show that by combining resource adaptation with deep sleep levels, one can obtain significant energy savings under variable traffic load. However, to ensure the reliability of the results obtained, we emphasize the need to guarantee the convergence of the algorithm before its use for resource adaptation. / Under de senaste åren har intresset för energieffektivitet (EE) av mobila kommunikationssystem ökat på grund av den ökande energiförbrukningen (EF). Med femte generationens mobilsystem, vilket kännetecknas av mer komplexa och kraftfulla basstationer (BS) för att betjäna ett ständigt ökande antal användare och tjänster, riskerar nätverkets totala EF att öka ytterligare. Detta kan leda till en markant ökning av operativa utgifter (OPEX) för nätoperatörerna och ett ökat globalt koldioxidavtryck. Många studier har visat att dagens nätverk ofta är överdimensionerade och att radioresurserna är underutnyttjade på grund av variationerna i det dagliga trafikbehovet. Genom att anpassa BS radioresurser efter trafikbehovet kan man säkerställa att man uppfyller användarkraven samtidigt som man minskar den totala EF. I denna studie föreslås en aktivitetsbaserad metod för att utvärdera EF för en BS. Aktivitetsfaktorn för en BS definieras som den bråkdel av tiden som BS är aktiv (sänder data) under en fast period och är beroende av mängden radioresurser. För att kvantifiera EF för en BS föreslås en ny modell som beräknar in effekt till BS som funktion av utstrålad effekt från BS. Den nya modellen fångar variationen i energiförbrukning med tre huvudsakliga radioresurser som är: 1) antal sändarantenner 2) bandbredd och 3) den spatiella multiplexingfaktorn (antal användare som schemaläggs samtidigt). Därefter implementeras en Q- inlärningsalgoritm för att anpassa dessa resurser efter det upplevda trafikbehovet och vilolägen som BS kan växla till när den är inaktiv. Ett viloläge innebär att viss hårdvara i BS stängs av. Resultatet visar att man genom att identifiera rätt typ av BS utifrån lokala trafikförhållanden kan få energibesparingar så höga som 60%. Vidare kan EF för en BS reduceras med 57% under den tid av dygnet då trafiken är som lägst genom att ha djupare vilolägen jämfört med basscenariot utan vilolägen. Genom att implementera Q-inlärningsalgoritmen som anpassar tillgängliga radioresurser till trafikbehovet minskar den genomsnittliga EF för BS ytterligare med upp till 20%. Vinsten i EE som erhålls av algoritmen beror dock till stor del på dess konvergens, som varierar med fördelningen av användarna i cellen, topptrafikbehovet och BS tillgängliga radioresurser. Resultatet visar att genom att kombinera resursanpassning med vilolägen kan man få betydande energibesparingar under varierande trafikbelastning. För att säkerställa tillförlitligheten av de erhållna resultaten betonas emellertid behovet av att garantera konvergensen av algoritmen innan den används för resursanpassning. Resource adaptation Activity factor Power modeling Energy-efficiency 5G Q-learning Elektroteknik och elektronik
64	Agentes-Q: um algoritmo de roteamento distribuído e adaptativo para redes de telecomunicações / Q-Agents: an adaptive and distributed routing algorithm for telecommunications networks Vittori, Karla 14 April 2000 (has links) As redes de telecomunicações são responsáveis pelo envio de informação entre pontos de origem e destino. Dentre os diversos dispositivos que participam deste processo, destaca-se o sistema de roteamento, que realiza a seleção das rotas a serem percorridas pelas mensagens ao longo da rede e sua condução ao destino desejado. O avanço das tecnologias utilizadas pelas redes de telecomunicações provocou a necessidade de novos sistemas de roteamento, que sejam capazes de lidar corretamente com as diversas situações enfrentadas atualmente. Dentro deste contexto, este projeto de pesquisa desenvolveu um algoritmo de roteamento adaptativo e distribuído, resultado da integração de três estratégias de aprendizagem e da adição de alguns mecanismos extras, com o objetivo de obter um algoritmo eficiente e robusto às diversas variações das condições de operação da rede. As abordagens utilizadas foram a aprendizagem-Q, aprendizagem por reforço dual e aprendizagem baseada no comportamento coletivo de formigas. O algoritmo desenvolvido foi aplicado a duas redes de comutação de circuitos e seu desempenho foi comparado ao de dois algoritmos baseados no comportamento coletivo de formigas, que foram aplicados com sucesso ao problema de roteamento. Os experimentos conduzidos envolveram situações reais enfrentadas pelas redes, como variações dos seus padrões de tráfego, nível de carga e topologia. Além disto, foram realizados testes envolvendo a presença de ruído nas informações utilizadas para a seleção das rotas a serem percorridas pelas chamadas. O algoritmo proposto obteve melhores resultados que os demais, apresentando maior capacidade de adaptação às diversas situações consideradas. Os experimentos demonstraram que novos mecanismos de otimização devem ser anexados ao algoritmo proposto, para melhorar seu comportamento exploratório sob variações permanentes do nível de carga da rede e presença de ruído nos dados utilizados em suas tarefas. / The telecommunications networks are responsible for transmiting information between source and destination points in a fast, secure and reliable way, providing low cost and high quality services. Among the several devices that takes place on this process, there is thre routing system, which selects the routes to be traversed by the messages through the network and their forwarding to the destination desired. The advances in tecnologies used by telecommunications networks caused the necessity of new routing systems, that can work correctly with the situations faced by current telecommunications networks. Hence, this research project developed an adaptive and distributed routing algorithm, resulting of the integration of three leaming strategies and addition of some extra mechanisms, with the goal of having a robust and adaptive algorithm to the several variations on operation network conditions. The approaches chosen were Q-learning, dual reinforcement learning and learning based on collective behavior of ants. The developed algorithm was applied to two circuit-switching telecommunications networks and its performance was compared to two algorithms based on ant colony behavior, which were used with success to solve the routing problem. The experiments run comprised real situations faced by telecommunications networks, like variations on the network traffic patterns, load level and topology. Moreover, we did some tests with the presence of noise in information used to select the routes to be traversed by calls. The algorithm proposed produced better results than the others, showing higher capacity of adaptation to the several situations considered. The experiments showed that new optimization mechanisms must be added to the routing algorithm developed, to improve its exploratory behavior under permanent variations on network load level and presence of noise in data used in its tasks. Agents based on ant colonies behavior Aprendizagem por reforço dual Aprendizagem-Q Dual reinforcement learning Q-learning Redes de telecomunicações Roteamento Routing Telecommunications networks
65	Synthèse de comportements par apprentissages par renforcement parallèles : application à la commande d'un micromanipulateur plan Laurent, Guillaume 18 December 2002 (has links) (PDF) En microrobotique, la commande des systèmes est délicate car les phénomènes physiques liés à l'échelle microscopique sont complexes. Les méthodes dites d'apprentissage par renforcement constituent une approche intéressante car elles permettent d'établir une stratégie de commande sans connaissance \emph(a priori) sur le système. Au vu des grandes dimensions des espaces d'états des systèmes étudiés, nous avons développé une approche parallèle qui s'inspire à la fois des architectures comportementales et de l'apprentissage par renforcement. Cette architecture, basée sur la parallélisation de l'algorithme du Q-Learning, permet de réduire la complexité du système et d'accélérer l'apprentissage. Sur une application simple de labyrinthe, les résultats obtenus sont bons mais le temps d'apprentissage est trop long pour envisager la commande d'un système réel. Le Q-Learning a alors été remplacé par l'algorithme du Dyna-Q que nous avons adapté à la commande de systèmes non déterministes en ajoutant un historique des dernières transitions. Cette architecture, baptisée Dyna-Q parallèle, permet non seulement d'améliorer la vitesse de convergence, mais aussi de trouver de meilleures stratégies de contrôle. Les expérimentations sur le système de manipulation montrent que l'apprentissage est alors possible en temps réel et sans utiliser de simulation. La fonction de coordination des comportements est efficace si les obstacles sont relativement éloignés les uns des autres. Si ce n'est pas le cas, cette fonction peut créer des maxima locaux qui entraînent temporairement le système dans un cycle. Nous avons donc élaboré une autre fonction de coordination qui synthétise un modèle plus global du système à partir du modèle de transition construit par le Dyna-Q. Cette nouvelle fonction de coordination permet de sortir très efficacement des maxima locaux à condition que la fonction de mise en correspondance utilisée par l'architecture soit robuste. commande par apprentissage processus décisionnels de Markov programmation dynamique apprentissage par renforcement Q-Learning Dyna-Q architecture comportementale microrobotique micromanipulation
66	Paramétrage Dynamique et Optimisation Automatique des Réseaux Mobiles 3G et 3G+ Nasri, Ridha 23 January 2009 (has links) (PDF) La télécommunication radio mobile connait actuellement une évolution importante en termes de diversité de technologies et de services fournis à l'utilisateur final. Il apparait que cette diversité complexifie les réseaux cellulaires et les opérations d'optimisation manuelle du paramétrage deviennent de plus en plus compliquées et couteuses. Par conséquent, les couts d'exploitation du réseau augmentent corrélativement pour les operateurs. Il est donc essentiel de simplifier et d'automatiser ces taches, ce qui permettra de réduire les moyens consacrés à l'optimisation manuelle des réseaux. De plus, en optimisant ainsi de manière automatique les réseaux mobiles déployés, il sera possible de retarder les opérations de densification du réseau et l'acquisition de nouveaux sites. Le paramétrage automatique et optimal permettra donc aussi d'étaler voire même de réduire les investissements et les couts de maintenance du réseau. Cette thèse introduit de nouvelles méthodes de paramétrage automatique (auto-tuning) des algorithmes RRM (Radio Resource Management) dans les réseaux mobiles 3G et au delà du 3G. L'auto-tuning est un processus utilisant des outils de contrôle comme les contrôleurs de logique floue et d'apprentissage par renforcement. Il ajuste les paramètres des algorithmes RRM afin d'adapter le réseau aux fluctuations du trafic. Le fonctionnement de l'auto-tuning est basé sur une boucle de régulation optimale pilotée par un contrôleur qui est alimenté par les indicateurs de qualité du réseau. Afin de trouver le paramétrage optimal du réseau, le contrôleur maximise une fonction d'utilité, appelée aussi fonction de renforcement. Quatre cas d'études sont décrits dans cette thèse. Dans un premier temps, l'auto-tuning de l'algorithme d'allocation des ressources radio est présenté. Afin de privilégier les utilisateurs du service temps réel (voix), une bande de garde est réservée pour eux. Cependant dans le cas ou le trafic temps réel est faible, il est important d'exploiter cette ressource pour d'autres services. L'auto-tuning permet donc de faire un compromis optimal de la qualité perçue dans chaque service en adaptant les ressources réservées en fonction du trafic de chaque classe du service. Le second cas est l'optimisation automatique et dynamique des paramètres de l'algorithme du soft handover en UMTS. Pour l'auto-tuning du soft handover, un contrôleur est implémenté logiquement au niveau du RNC et règle automatiquement les seuils de handover en fonction de la charge radio de chaque cellule ainsi que de ses voisines. Cette approche permet d'équilibrer la charge radio entre les cellules et ainsi augmenter implicitement la capacité du réseau. Les simulations montrent que l'adaptation des seuils du soft handover en UMTS augmente la capacité de 30% par rapport au paramétrage fixe. L'approche de l'auto-tuning de la mobilité en UMTS est étendue pour les systèmes LTE (3GPP Long Term Evolution) mais dans ce cas l'auto-tuning est fondé sur une fonction d'auto-tuning préconstruite. L'adaptation des marges de handover en LTE permet de lisser les interférences intercellulaires et ainsi augmenter le débit perçu pour chaque utilisateur du réseau. Finalement, un algorithme de mobilité adaptative entre les deux technologies UMTS et WLAN est proposé. L'algorithme est orchestré par deux seuils, le premier est responsable du handover de l'UMTS vers le WLAN et l'autre du handover dans le sens inverse. L'adaptation de ces deux seuils permet une exploitation optimale et conjointe des ressources disponibles dans les deux technologies. Les résultats de simulation d'un réseau multi-systèmes exposent également un gain important en capacité. Réseaux mobiles 3G UMTS 3GPP LTE WLAN Mobilité Auto-adaptation paramétrage auto-optimisation Contrôleur de Logique Floue apprentissage par renforcement Q-Learning Floue
67	Melhoria na converg?ncia do algoritmo Q-Learning na aplica??o de sistemas tutores inteligentes Paiva, ?verton de Oliveira 16 August 2016 (has links) Submitted by Jos? Henrique Henrique (jose.neves@ufvjm.edu.br) on 2017-06-22T22:29:53Z No. of bitstreams: 2 everton_oliveira_paiva.pdf: 3688473 bytes, checksum: 00c67bcc4d4564b69bb64a0b596743fc (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Rodrigo Martins Cruz (rodrigo.cruz@ufvjm.edu.br) on 2017-06-23T13:21:09Z (GMT) No. of bitstreams: 2 everton_oliveira_paiva.pdf: 3688473 bytes, checksum: 00c67bcc4d4564b69bb64a0b596743fc (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-06-23T13:21:09Z (GMT). No. of bitstreams: 2 everton_oliveira_paiva.pdf: 3688473 bytes, checksum: 00c67bcc4d4564b69bb64a0b596743fc (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2016 / O uso sistemas computacionais como complemento ou substitui??o da sala de aula ? cada vez mais comum na educa??o e os Sistemas Tutores Inteligentes (STIs) s?o uma dessas alternativas. Portanto ? fundamental desenvolver STIs capazes tanto de ensinar quanto aprender informa??es relevantes sobre o aluno atrav?s de t?cnicas de intelig?ncia artificial. Esse aprendizado acontece por meio da intera??o direta entre o STI e o aluno que ? geralmente demorada. Esta disserta??o apresenta a inser??o da metaheur?sticas Lista Tabu e GRASP com o objetivo de acelerar esse aprendizado. Para avaliar o desempenho dessa modifica??o, foi desenvolvido um simulador de STI. Nesse sistema, foram realizadas simula??es computacionais para comparar o desempenho da tradicional pol?tica de explora??o aleat?ria e as metaheur?sticas propostas Lista Tabu e GRASP. Os resultados obtidos atrav?s dessas simula??es e os testes estat?sticos aplicados indicam fortemente que a introdu??o de meta-heur?sticas adequadas melhoram o desempenho do algoritmo de aprendizado em STIs. / Disserta??o (Mestrado Profissional) ? Programa de P?s-Gradua??o em Educa??o, Universidade Federal dos Vales do Jequitinhonha e Mucuri, 2016. / Using computer systems as a complement or replacement for the classroom experience is an increasingly common practice in education and Intelligent Tutoring Systems (ITS) are one of these alternatives. Therefore, it is crucial to develop ITS that are capable of both teaching and learning relevant information about the student through artificial intelligence techniques. This learning process occurs by means of direct, and generally slow, interaction between the ITS and the student. This dissertation presents the insertion of meta-heuristic Tabu search and GRASP with the purpose of accelera ting learning. An ITS simulator was developed to evaluate the performance of this change. Computer simulations were conducted in order to compare the performance of traditional randomized search methods with the meta-heuristic Tabu search. Results obtained from these simulations and statistical tests strongly indicate that the introduction of meta-heuristics in exploration policy improves the performance of the learning algorithm in ITS. Sistemas Tutores Inteligentes Modelagem aut?noma Melhoria na converg?ncia Metaheur?sticas Lista Tabu Q-Learning GRASP Intelligent tutoring system Autonomous model Convergence improvement Tabu search
68	Algoritmo Q-learning como estrat?gia de explora??o e/ou explota??o para metaheur?sticas GRASP e algoritmo gen?tico Lima J?nior, Francisco Chagas de 20 March 2009 (has links) Made available in DSpace on 2014-12-17T14:54:52Z (GMT). No. of bitstreams: 1 FranciscoCLJ.pdf: 1181019 bytes, checksum: b3894e0c93f85d3cf920c7015daef964 (MD5) Previous issue date: 2009-03-20 / Techniques of optimization known as metaheuristics have achieved success in the resolution of many problems classified as NP-Hard. These methods use non deterministic approaches that reach very good solutions which, however, don t guarantee the determination of the global optimum. Beyond the inherent difficulties related to the complexity that characterizes the optimization problems, the metaheuristics still face the dilemma of xploration/exploitation, which consists of choosing between a greedy search and a wider exploration of the solution space. A way to guide such algorithms during the searching of better solutions is supplying them with more knowledge of the problem through the use of a intelligent agent, able to recognize promising regions and also identify when they should diversify the direction of the search. This way, this work proposes the use of Reinforcement Learning technique - Q-learning Algorithm - as exploration/exploitation strategy for the metaheuristics GRASP (Greedy Randomized Adaptive Search Procedure) and Genetic Algorithm. The GRASP metaheuristic uses Q-learning instead of the traditional greedy-random algorithm in the construction phase. This replacement has the purpose of improving the quality of the initial solutions that are used in the local search phase of the GRASP, and also provides for the metaheuristic an adaptive memory mechanism that allows the reuse of good previous decisions and also avoids the repetition of bad decisions. In the Genetic Algorithm, the Q-learning algorithm was used to generate an initial population of high fitness, and after a determined number of generations, where the rate of diversity of the population is less than a certain limit L, it also was applied to supply one of the parents to be used in the genetic crossover operator. Another significant change in the hybrid genetic algorithm is the proposal of a mutually interactive cooperation process between the genetic operators and the Q-learning algorithm. In this interactive/cooperative process, the Q-learning algorithm receives an additional update in the matrix of Q-values based on the current best solution of the Genetic Algorithm. The computational experiments presented in this thesis compares the results obtained with the implementation of traditional versions of GRASP metaheuristic and Genetic Algorithm, with those obtained using the proposed hybrid methods. Both algorithms had been applied successfully to the symmetrical Traveling Salesman Problem, which was modeled as a Markov decision process / T?cnicas de otimiza??o conhecidas como metaheur?sticas t?m obtido sucesso na resolu??o de problemas classificados como NP - ?rduos. Estes m?todos utilizam abordagens n?o determin?sticas que geram solu??es pr?ximas do ?timo sem, no entanto, garantir a determina??o do ?timo global. Al?m das dificuldades inerentes ? complexidade que caracteriza os problemas NP-?rduos, as metaheur?sticas enfrentam ainda o dilema de explora??o/explota??o, que consiste em escolher entre intensifica??o da busca em uma regi?o espec?fica e a explora??o mais ampla do espa?o de solu??es. Uma forma de orientar tais algoritmos em busca de melhores solu??es ? supri-los de maior conhecimento do problema atrav?s da utiliza??o de um agente inteligente, capaz de reconhecer regi?es promissoras e/ou identificar em que momento dever? diversificar a dire??o de busca, isto pode ser feito atrav?s da aplica??o de Aprendizagem por Refor?o. Neste contexto, este trabalho prop?e o uso de uma t?cnica de Aprendizagem por Refor?o - especificamente o Algoritmo Q-learning - como uma estrat?gia de explora??o/explota??o para as metaheur?sticas GRASP (Greedy Randomized Adaptive Search Procedure) e Algoritmo Gen?tico. Na implementa??o da metaheur?stica GRASP proposta, utilizou-se o Q-learning em substitui??o ao algoritmo guloso-aleat?rio tradicionalmente usado na fase de constru??o. Tal substitui??o teve como objetivo melhorar a qualidade das solu??es iniciais que ser?o utilizadas na fase de busca local do GRASP, e, ao mesmo tempo, suprir esta metaheur?sticas de um mecanismo de mem?ria adaptativa que permita a reutiliza??o de boas decis?es tomadas em itera??es passadas e que evite a repeti??o de decis?es n?o promissoras. No Algoritmo Gen?tico, o algoritmo Q-learning foi utilizado para gerar uma popula??o inicial de alta aptid?o, e ap?s um determinado n?mero de gera??es, caso a taxa de diversidade da popula??o seja menor do que um determinado limite L, ele ? tamb?m utilizado em uma forma alternativa de operador de cruzamento. Outra modifica??o importante no algoritmo gen?tico h?brido ? a proposta de um processo de intera??o mutuamente cooperativa entre o os operadores gen?ticos e o Algoritmo Q-learning. Neste processo interativo/cooperativo o algoritmo Q-learning recebe uma atualiza??o adicional na matriz dos Q-valores com base na solu??o elite da popula??o corrente. Os experimentos computacionais apresentados neste trabalho consistem em comparar os resultados obtidos com a implementa??o de vers?es tradicionais das metaheur?sticas citadas, com aqueles obtidos utilizando os m?todos h?bridos propostos. Ambos os algoritmos foram aplicados com sucesso ao problema do caixeiro viajante sim?trico, que por sua vez, foi modelado como um processo de decis?o de Markov Metaheur?sticaGRASP Algoritmos gen?ticos AlgoritmoQ-learning Problema do caixeiro viajante GRASP metaheuristic Genetic algorithm Q-learning algorithm Travelling salesman problem CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
69	Agentes-Q: um algoritmo de roteamento distribuído e adaptativo para redes de telecomunicações / Q-Agents: an adaptive and distributed routing algorithm for telecommunications networks Karla Vittori 14 April 2000 (has links) As redes de telecomunicações são responsáveis pelo envio de informação entre pontos de origem e destino. Dentre os diversos dispositivos que participam deste processo, destaca-se o sistema de roteamento, que realiza a seleção das rotas a serem percorridas pelas mensagens ao longo da rede e sua condução ao destino desejado. O avanço das tecnologias utilizadas pelas redes de telecomunicações provocou a necessidade de novos sistemas de roteamento, que sejam capazes de lidar corretamente com as diversas situações enfrentadas atualmente. Dentro deste contexto, este projeto de pesquisa desenvolveu um algoritmo de roteamento adaptativo e distribuído, resultado da integração de três estratégias de aprendizagem e da adição de alguns mecanismos extras, com o objetivo de obter um algoritmo eficiente e robusto às diversas variações das condições de operação da rede. As abordagens utilizadas foram a aprendizagem-Q, aprendizagem por reforço dual e aprendizagem baseada no comportamento coletivo de formigas. O algoritmo desenvolvido foi aplicado a duas redes de comutação de circuitos e seu desempenho foi comparado ao de dois algoritmos baseados no comportamento coletivo de formigas, que foram aplicados com sucesso ao problema de roteamento. Os experimentos conduzidos envolveram situações reais enfrentadas pelas redes, como variações dos seus padrões de tráfego, nível de carga e topologia. Além disto, foram realizados testes envolvendo a presença de ruído nas informações utilizadas para a seleção das rotas a serem percorridas pelas chamadas. O algoritmo proposto obteve melhores resultados que os demais, apresentando maior capacidade de adaptação às diversas situações consideradas. Os experimentos demonstraram que novos mecanismos de otimização devem ser anexados ao algoritmo proposto, para melhorar seu comportamento exploratório sob variações permanentes do nível de carga da rede e presença de ruído nos dados utilizados em suas tarefas. / The telecommunications networks are responsible for transmiting information between source and destination points in a fast, secure and reliable way, providing low cost and high quality services. Among the several devices that takes place on this process, there is thre routing system, which selects the routes to be traversed by the messages through the network and their forwarding to the destination desired. The advances in tecnologies used by telecommunications networks caused the necessity of new routing systems, that can work correctly with the situations faced by current telecommunications networks. Hence, this research project developed an adaptive and distributed routing algorithm, resulting of the integration of three leaming strategies and addition of some extra mechanisms, with the goal of having a robust and adaptive algorithm to the several variations on operation network conditions. The approaches chosen were Q-learning, dual reinforcement learning and learning based on collective behavior of ants. The developed algorithm was applied to two circuit-switching telecommunications networks and its performance was compared to two algorithms based on ant colony behavior, which were used with success to solve the routing problem. The experiments run comprised real situations faced by telecommunications networks, like variations on the network traffic patterns, load level and topology. Moreover, we did some tests with the presence of noise in information used to select the routes to be traversed by calls. The algorithm proposed produced better results than the others, showing higher capacity of adaptation to the several situations considered. The experiments showed that new optimization mechanisms must be added to the routing algorithm developed, to improve its exploratory behavior under permanent variations on network load level and presence of noise in data used in its tasks. Aprendizagem por reforço dual Aprendizagem-Q Redes de telecomunicações Roteamento Agents based on ant colonies behavior Dual reinforcement learning Q-learning Routing Telecommunications networks
70	Protocolo de Negociação Baseado em Aprendizagem-Q para Bolsa de Valores / Negotiation Protocol Based in Q-Learning for Stock Exchange Cunha, Rafael de Souza 04 March 2013 (has links) Made available in DSpace on 2016-08-17T14:53:24Z (GMT). No. of bitstreams: 1 Dissertacao Rafael de Souza.pdf: 5581665 bytes, checksum: 4edbe8b1f2b84008b5129a93038f2fee (MD5) Previous issue date: 2013-03-04 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / In this work, we applied the technology of Multi-Agent Systems (MAS) in the capital market, i.e., the stock market, specifically in Bolsa de Mercadorias e Futuros de São Paulo (BM&FBovespa). The research focused mainly on negotiation protocols and learning of investors agents. Within the Stock Exchange competitive field, the development of an agent that could learn to negotiate, could become differential for investors who wish to increase their profits. The decision-making based on historical data is motivation for further research in the same direction, however, we sought a different approach with regard to the representation of the states of q-learning algorithm. The reinforcement learning, in particular q-learning, has been shown to be effective in environments with various historical data and seeking reward decisions with positive results. That way it is possible to apply in the purchase and sale of shares, an algorithm that rewards the profit and punishes the loss. Moreover, to achieve their goals agents need to negotiate according to specific protocols of stock exchange. Therefore, endeavor was also the specifications of the rules of negotiation between agents that allow the purchase and sale of shares. Through the exchange of messages between agents, it is possible to determine how the trading will occur and facilitate communication between them, because it sets a standard of how it will happen. Therefore, in view of the specification of negotiation protocols based on q-learning, this research has been the modeling of intelligent agents and models of learning and negotiation required for decision making entities involved. / Neste trabalho, aplicou-se a tecnologia de Sistemas MultiAgente (SMA) no mercado de capitais, isto é, na Bolsa de Valores, especificamente na Bolsa de Mercadorias e Futuros de São Paulo (BM&FBovespa). A pesquisa concentrou-se principalmente nos protocolos de negociação envolvidos e na aprendizagem dos agentes investidores. Dentro do cenário competitivo da Bolsa de Valores, o desenvolvimento de um agente que aprendesse a negociar poderia se tornar diferencial para os investidores que desejam obter lucros cada vez maiores. A tomada de decisão baseada em dados históricos é motivação para outras pesquisas no mesmo sentido, no entanto, buscou-se uma abordagem diferenciada no que diz respeito à representação dos estados do algoritmo de aprendizagem-q. A aprendizagem por reforço, em especial a aprendizagem-q, tem demonstrado ser eficiente em ambientes com vários dados históricos e que procuram recompensar decisões com resultados positivos. Dessa forma é possível aplicar na compra e venda de ações, um algoritmo que premia o lucro e pune o prejuízo. Além disso, para conseguir alcançar seus objetivos os agentes precisam negociar de acordo com os protocolos específicos da bolsa de valores. Sendo assim, procurou-se também as especificações das regras de negociação entre os agentes que permitirão a compra e venda de títulos da bolsa. Através da troca de mensagens entre os agentes, é possível determinar como a negociação ocorrerá e facilitará comunicação entre os mesmos, pois fica padronizada a forma como isso acontecerá. Logo, tendo em vista as especificações dos protocolos de negociação baseados em aprendizagem-q, tem-se nesta pesquisa a modelagem dos agentes inteligentes e os modelos de aprendizagem e negociação necessários para a tomada de decisão das entidades envolvidas. Bolsa de Valores Sistemas Multi-Agente Aprendizagem-q Protocolo de negociação Stock Exchange Multi-Agent Systems Q-Learning Negotiation Protocol

Search results