Global ETD Search

11	All learning is local: Multi-agent learning in global reward games Chang, Yu-Han, Ho, Tracey, Kaelbling, Leslie P. 01 1900 (has links) In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and effectively learn a near-optimal policy in a wide variety of settings. A sequence of increasingly complex empirical tests verifies the efficacy of this technique. / Singapore-MIT Alliance (SMA) Kalman filtering multi-agent systems Q-learning reinforcement learning
12	Improving Computer Game Bots' behavior using Q-Learning Patel, Purvag 01 December 2009 (has links) In modern computer video games, the quality of artificial characters plays a prominent role in the success of the game in the market. The aim of intelligent techniques, termed game AI, used in these games is to provide an interesting and challenging game play to a game player. Being highly sophisticated, these games present game developers with similar kind of requirements and challenges as faced by academic AI community. The game companies claim to use sophisticated game AI to model artificial characters such as computer game bots, intelligent realistic AI agents. However, these bots work via simple routines pre-programmed to suit the game map, game rules, game type, and other parameters unique to each game. Mostly, illusive intelligent behaviors are programmed using simple conditional statements and are hard-coded in the bots' logic. Moreover, a game programmer has to spend considerable time configuring crisp inputs for these conditional statements. Therefore, we realize a need for machine learning techniques to dynamically improve bots' behavior and save precious computer programmers' man-hours. So, we selected Q-learning, a reinforcement learning technique, to evolve dynamic intelligent bots, as it is a simple, efficient, and online learning algorithm. Machine learning techniques such as reinforcement learning are know to be intractable if they use a detailed model of the world, and also requires tuning of various parameters to give satisfactory performance. Therefore, for this research we opt to examine Q-learning for evolving a few basic behaviors viz. learning to fight, and planting the bomb for computer game bots. Furthermore, we experimented on how bots would use knowledge learned from abstract models to evolve its behavior in more detailed model of the world. Bots evolved using these techniques would become more pragmatic, believable and capable of showing human-like behavior. This will provide more realistic feel to the game and provide game programmers with an efficient learning technique for programming these bots. Agents Bots Computer Games Game AI Q-learning
13	Reinforcement Learning For Multiple Time Series Singh, Isha January 2019 (has links) No description available. Computer Science Reinforcement Learning Vector Autoregression Q learning
14	Reinforcement Programming: A New Technique in Automatic Algorithm Development White, Spencer Kesson 03 July 2006 (has links) (PDF) Reinforcement programming is a new technique for using computers to automatically create algorithms. By using the principles of reinforcement learning and Q-learning, reinforcement programming learns programs based on example inputs and outputs. State representations and actions are provided. A transition function and rewards are defined. The system is trained until the system converges on a policy that can be directly implemented as a computer program. The efficiency of reinforcement programming is demonstrated by comparing a generalized in-place iterative sort learned through genetic programming to a sorting algorithm of the same type created using reinforcement programming. The sort learned by reinforcement programming is a novel algorithm. Reinforcement programming is more efficient and provides a more effective solution than genetic programming in the cases attempted. As additional examples, reinforcement programming is used to learn three binary addition problems. Computer Sciences
15	Application of Reinforcement Learning to Multi-Agent Production Scheduling Wang, Yi-chi 13 December 2003 (has links) Reinforcement learning (RL) has received attention in recent years from agent-based researchers because it can be applied to problems where autonomous agents learn to select proper actions for achieving their goals based on interactions with their environment. Each time an agent performs an action, the environment¡Šs response, as indicated by its new state, is used by the agent to reward or penalize its action. The agent¡Šs goal is to maximize the total amount of reward it receives over the long run. Although there have been several successful examples demonstrating the usefulness of RL, its application to manufacturing systems has not been fully explored. The objective of this research is to develop a set of guidelines for applying the Q-learning algorithm to enable an individual agent to develop a decision making policy for use in agent-based production scheduling applications such as dispatching rule selection and job routing. For the dispatching rule selection problem, a single machine agent employs the Q-learning algorithm to develop a decision-making policy on selecting the appropriate dispatching rule from among three given dispatching rules. In the job routing problem, a simulated job shop system is used for examining the implementation of the Q-learning algorithm for use by job agents when making routing decisions in such an environment. Two factorial experiment designs for studying the settings used to apply Q-learning to the single machine dispatching rule selection problem and the job routing problem are carried out. This study not only investigates the main effects of this Q-learning application but also provides recommendations for factor settings and useful guidelines for future applications of Q-learning to agent-based production scheduling. Q-LEARNING ALGORITHM PRODUCTION SCHEDULING MULTI-AGENT REINFORCEMENT LEARNING
16	Predicting Mutational Pathways of Influenza A H1N1 Virus using Q-learning Aarathi Raghuraman, FNU 13 August 2021 (has links) Influenza is a seasonal viral disease affecting over 1 billion people annually around the globe, as reported by the World Health Organization (WHO). The influenza virus has been around for decades causing multiple pandemics and encouraging researchers to perform extensive analysis of its evolutionary patterns. Current research uses phylogenetic trees as the basis to guide population genetics and other phenotypic characteristics when describing the evolution of the influenza genome. Phylogenetic trees are one form of representing the evolutionary trends of sequenced genomes, but that do not capture the multidimensional complexity of mutational pathways. We suggest representing antigenic drifts within influenza A/H1N1 hemagglutinin (HA) protein as a graph, $G = (V, E)$, where $V$ is the set of vertices representing each possible sequence and $E$ is the set of edges representing single amino acid substitutions. Each transition is characterized by a Malthusian fitness model incorporating the genetic adaptation, vaccine similarity, and historical epidemiological response using mortality as the metric where available. Applying reinforcement learning with the vertices as states, edges as actions, and fitness as the reward, we learn the high likelihood mutational pathways and optimal policy, without exploring the entire space of the graph, $G$. Our average predicted versus actual sequence distance of $3.6 \pm 1.2$ amino acids indicates that our novel approach of using naive Q-learning can assist with influenza strain predictions, thus improving vaccine selection for future disease seasons. / Master of Science / Influenza is a seasonal virus affecting over 1 billion people annually around the globe, as reported by the World Health Organization (WHO). The effectiveness of influenza vaccines varies tremendously by the type (A, B, C or D) and season. Of note is the pandemic of 2009, where the influenza A H1N1 virus mutants were significantly different from the chosen vaccine composition. It is pertinent to understand and predict the underlying genetic and environmental behavior of influenza virus mutants to be able to determine the vaccine composition for future seasons, preventing another pandemic. Given the recent 2020 COVID-19 pandemic, which is also a virus that affects the upper respiratory system, novel approaches to predict viruses need to be investigated now more than ever. Thus, in this thesis, I develop a novel approach to predicting a portion of the influenza A H1N1 viruses using machine learning. Fitness Mutational Pathways Mutational Paths Reinforcement Learning Q-Learning
17	Machine Learning Simulation: Torso Dynamics of Robotic Biped Renner, Michael Robert 22 August 2007 (has links) Military, Medical, Exploratory, and Commercial robots have much to gain from exchanging wheels for legs. However, the equations of motion of dynamic bipedal walker models are highly coupled and non-linear, making the selection of an appropriate control scheme difficult. A temporal difference reinforcement learning method known as Q-learning develops complex control policies through environmental exploration and exploitation. As a proof of concept, Q-learning was applied through simulation to a benchmark single pendulum swing-up/balance task; the value function was first approximated with a look-up table, and then an artificial neural network. We then applied Evolutionary Function Approximation for Reinforcement Learning to effectively control the swing-leg and torso of a 3 degree of freedom active dynamic bipedal walker in simulation. The model began each episode in a stationary vertical configuration. At each time-step the learning agent was rewarded for horizontal hip displacement scaled by torso altitude--which promoted faster walking while maintaining an upright posture--and one of six coupled torque activations were applied through two first-order filters. Over the course of 23 generations, an approximation of the value function was evolved which enabled walking at an average speed of 0.36 m/s. The agent oscillated the torso forward then backward at each step, driving the walker forward for forty-two steps in thirty seconds without falling over. This work represents the foundation for improvements in anthropomorphic bipedal robots, exoskeleton mechanisms to assist in walking, and smart prosthetics. / Master of Science Dynamic Bipedal Walking Reinforcement Learning Q-Learning Torso NEAT+Q
18	Trajectory Tracking Control of Unmanned Ground Vehicles using an Intermittent Learning Algorithm Gundu, Pavan Kumar 21 August 2019 (has links) Traffic congestion and safety has become a major issue in the modern world's commute. Congestion has been causing people to travel billions of hours more and to purchase billions of gallons of fuel extra which account to congestion cost of billions of dollars. Autonomous driving vehicles have been one solution to this problem because of their huge impact on efficiency, pollution, and human safety. Also, extensive research has been carried out on control design of vehicular platoons because a further improvement in traffic throughput while not compromising the safety is possible when the vehicles in the platoon are provided with better predictive abilities. Motion control is a key area of autonomous driving research that handles moving parts of vehicles in a deliberate and controlled manner. A widely worked on problem in motion control concerned with time parameterized reference tracking is trajectory tracking. Having an efficient and effective tracking algorithm embedded in the autonomous driving system is the key for better performance in terms of resources consumed and tracking error. Many tracking control algorithms in literature rely on an accurate model of the vehicle and often, it can be an intimidating task to come up with an accurate model taking into consideration various conditions like friction, heat effects, ageing processes etc. And typically, control algorithms rely on periodic execution of the tasks that update the control actions, but such updates might not be required, which result in unnecessary actions that waste resources. The main focus of this work is to design an intermittent model-free optimal control algorithm in order to enable autonomous vehicles to track trajectories at high-speeds. To obtain a solution which is model-free, a Q-learning setup with an actor-network to approximate the optimal intermittent controller and a critic network to approximate the optimal cost, resulting in the appropriate tuning laws is considered. / Master of Science / A risen research effort in the area of autonomous vehicles has been witnessed in the past few decades because these systems improve safety, comfort, transport time and energy consumption which are some of the main issues humans are facing in the modern world’s highway systems. Systems like emergency braking, automatic parking, blind angle vehicle detection are creating a safer driving environment in populated areas. Advanced driver assistance systems (ADAS) are what such kind of systems are known as. An extension of these partially automated ADAS are vehicles with fully automated driving abilities, which are able to drive by themselves without any human involvement. An extensively proposed approach for making traffic throughput more efficient on existing highways is to assemble autonomous vehicles into platoons. Small intervehicle spacing and many vehicles constituting each platoon formation improve the traffic throughput significantly. Lately, the advancements in computational capabilities, in terms of both algorithms and hardware, communications, and navigation and sensing devices contributed a lot to the development of autonomous systems (both single and multiagent) that operate with high reliability in uncertain/dynamic operating conditions and environments. Motion control is an important area in the autonomous vehicles research. Trajectory-tracking is a widely studied motion control scenario which is about designing control laws that force a system to follow some time-dependent reference path and it is important to have an effective and efficient trajectory-tracking control law in an autonomous vehicle to reduce the resources consumed and tracking error. The goal of this work is to design an intermittent model-free trajectory tracking control algorithm where there is no need of any mathematical model of the vehicle system being controlled and which can reduce the controller updates by allowing the system to evolve in an open loop fashion and close the loop only when an user defined triggering condition is satisfied. The approach is energy efficient in that the control updates are limited to instances when they are needed rather than unnecessary periodic updates. Q-learning which is a model-free reinforcement learning technique is used in the trajectory tracking motion control algorithm to make the vehicles track their respective reference trajectories without any requirement of their motion model, the knowledge of which is generally needed when dealing with a motion control problem. The testing of the designed algorithm in simulations and experiments is presented in this work. The study and development of a vehicle platform in order to perform the experiments is also discussed. Different motion control and sensing techniques are presented and used. The vehicle platform is shown to track a reference trajectory autonomously without any human intervention, both in simulations and experiments, proving the effectiveness of the proposed algorithm. Q-Learning Autonomous Driving Intermittent Protocols Platooning String Stability
19	Uma contribui??o ? solu??o do problema dos k-servos usando aprendizagem por refor?o Lima J?nior, Manoel Leandro de 06 April 2005 (has links) Made available in DSpace on 2014-12-17T14:55:59Z (GMT). No. of bitstreams: 1 ManoelLJ.pdf: 474615 bytes, checksum: 061ee02f4ad5cc23a561d346dd73a9da (MD5) Previous issue date: 2005-04-06 / Neste trabalho ? proposto um novo algoritmo online para o resolver o Problema dos k-Servos (PKS). O desempenho desta solu??o ? comparado com o de outros algoritmos existentes na literatura, a saber, os algoritmos Harmonic e Work Function, que mostraram ser competitivos, tornando-os par?metros de compara??o significativos. Um algoritmo que apresente desempenho eficiente em rela??o aos mesmos tende a ser competitivo tamb?m, devendo, obviamente, se provar o referido fato. Tal prova, entretanto, foge aos objetivos do presente trabalho. O algoritmo apresentado para a solu??o do PKS ? baseado em t?cnicas de aprendizagem por refor?o. Para tanto, o problema foi modelado como um processo de decis?o em m?ltiplas etapas, ao qual ? aplicado o algoritmo Q-Learning, um dos m?todos de solu??o mais populares para o estabelecimento de pol?ticas ?timas neste tipo de problema de decis?o. Entretanto, deve-se observar que a dimens?o da estrutura de armazenamento utilizada pela aprendizagem por refor?o para se obter a pol?tica ?tima cresce em fun??o do n?mero de estados e de a??es, que por sua vez ? proporcional ao n?mero n de n?s e k de servos. Ao se analisar esse crescimento (matematicamente, ) percebe-se que o mesmo ocorre de maneira exponencial, limitando a aplica??o do m?todo a problemas de menor porte, onde o n?mero de n?s e de servos ? reduzido. Este problema, denominado maldi??o da dimensionalidade, foi introduzido por Belmann e implica na impossibilidade de execu??o de um algoritmo para certas inst?ncias de um problema pelo esgotamento de recursos computacionais para obten??o de sua sa?da. De modo a evitar que a solu??o proposta, baseada exclusivamente na aprendizagem por refor?o, seja restrita a aplica??es de menor porte, prop?e-se uma solu??o alternativa para problemas mais realistas, que envolvam um n?mero maior de n?s e de servos. Esta solu??o alternativa ? hierarquizada e utiliza dois m?todos de solu??o do PKS: a aprendizagem por refor?o, aplicada a um n?mero reduzido de n?s obtidos a partir de um processo de agrega??o, e um m?todo guloso, aplicado aos subconjuntos de n?s resultantes do processo de agrega??o, onde o crit?rio de escolha do agendamento dos servos ? baseado na menor dist?ncia ao local de demanda K-Servos Aprendizado por Refor?o Q-Learning K-Servos Reinforcement Learning Q-Learning CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
20	SistEX UM SISTEMA DINÂMICO PARA DETECTAR A EXPERIÊNCIA DO ALUNO / SistEX - A DYNAMIC SYSTEM TO DETECT THE STUDENT EXPERIENCE Possobom, Camila Cerezer 15 April 2014 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The widespread use of virtual learning environment (VLE) has great potential for the development of applications that meet needs in education. U -Learning environments the goal is to seek information related to the needs and preferences of users to create their context and present adaptations in its content to suit the user's profile, as in most traditional VLEs, such as Moodle, this process is not generally considered. Given the importance of a dynamic application that can continually adapt to the levels of students' knowledge, this dissertation proposes a module called SistEX (A Dynamic System for Detecting Student Experience). The adaptations in the environment used the Adaptive Hypermedia, and the type of information collected was the level of user knowledge, which were obtained through questionnaires applications. Furthermore, it was used and adapted the algorithm Q -Learning, from the intelligent tutoring system (ITS), to contribute to the user's learning process. As a result, the Software and user testing demonstrated that the environment SistEX worked satisfactorily, based on the assessments made by users who tested the module and its operation. The questionnaire was the System Usability Scale (SUS) on the module developed, which gave a result within a range considered good, which includes the objectives proposed in this work, even though some limitations and difficulties have been identified during development. / A difusão do uso de ambiente virtual de aprendizagem (AVA) apresenta um grande potencial para o desenvolvimento de aplicações que atendam necessidades na área da educação. Em ambientes U-Learning o objetivo é buscar informações relacionadas às necessidades e preferências dos usuários para criar o seu contexto e apresentar adaptações no seu conteúdo para se adequar ao perfil do usuário, visto que na maioria dos AVAs tradicionais, como o Moodle, esse processo geralmente não é considerado. Tendo em vista a importância de uma aplicação mais dinâmica e que consiga se adaptar continuamente aos níveis de conhecimento dos alunos, esta dissertação propõe um módulo denominado de SistEX (Um Sistema Dinâmico para Detectar a Experiência do Aluno). As adaptações realizadas no ambiente utilizaram a Hipermídia Adaptativa, sendo que o tipo de informação coletada foi o nível de conhecimento do usuário, que foram obtidas por meio de aplicações de questionários. Além disso, foi utilizado e adaptado o algoritmo Q-Learning, proveniente do sistema tutor inteligente (STI), para contribuir no processo de aprendizagem do usuário. Como resultados, no teste de Software e com usuários demonstraram que o ambiente SistEX atuou de forma satisfatória, tendo como base as avaliações feitas por usuários que testaram o módulo e o seu funcionamento. O questionário aplicado foi o System Usability Scale (SUS) sobre o módulo desenvolvido, que apresentou resultado dentro de uma escala considerada como boa, o que contempla os objetivos propostos neste trabalho, mesmo que algumas limitações e dificuldades tenham sido identificados durante o desenvolvimento. U-learning Q-learning SUS Nível de conhecimento U -learning Q -learning SUS Level of knowledge

Search results