• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 67
  • 15
  • 6
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 110
  • 110
  • 68
  • 34
  • 27
  • 25
  • 25
  • 20
  • 18
  • 17
  • 17
  • 17
  • 17
  • 16
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

All learning is local: Multi-agent learning in global reward games

Chang, Yu-Han, Ho, Tracey, Kaelbling, Leslie P. 01 1900 (has links)
In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and effectively learn a near-optimal policy in a wide variety of settings. A sequence of increasingly complex empirical tests verifies the efficacy of this technique. / Singapore-MIT Alliance (SMA)
12

Improving Computer Game Bots' behavior using Q-Learning

Patel, Purvag 01 December 2009 (has links)
In modern computer video games, the quality of artificial characters plays a prominent role in the success of the game in the market. The aim of intelligent techniques, termed game AI, used in these games is to provide an interesting and challenging game play to a game player. Being highly sophisticated, these games present game developers with similar kind of requirements and challenges as faced by academic AI community. The game companies claim to use sophisticated game AI to model artificial characters such as computer game bots, intelligent realistic AI agents. However, these bots work via simple routines pre-programmed to suit the game map, game rules, game type, and other parameters unique to each game. Mostly, illusive intelligent behaviors are programmed using simple conditional statements and are hard-coded in the bots' logic. Moreover, a game programmer has to spend considerable time configuring crisp inputs for these conditional statements. Therefore, we realize a need for machine learning techniques to dynamically improve bots' behavior and save precious computer programmers' man-hours. So, we selected Q-learning, a reinforcement learning technique, to evolve dynamic intelligent bots, as it is a simple, efficient, and online learning algorithm. Machine learning techniques such as reinforcement learning are know to be intractable if they use a detailed model of the world, and also requires tuning of various parameters to give satisfactory performance. Therefore, for this research we opt to examine Q-learning for evolving a few basic behaviors viz. learning to fight, and planting the bomb for computer game bots. Furthermore, we experimented on how bots would use knowledge learned from abstract models to evolve its behavior in more detailed model of the world. Bots evolved using these techniques would become more pragmatic, believable and capable of showing human-like behavior. This will provide more realistic feel to the game and provide game programmers with an efficient learning technique for programming these bots.
13

Reinforcement Learning For Multiple Time Series

Singh, Isha January 2019 (has links)
No description available.
14

Reinforcement Programming: A New Technique in Automatic Algorithm Development

White, Spencer Kesson 03 July 2006 (has links) (PDF)
Reinforcement programming is a new technique for using computers to automatically create algorithms. By using the principles of reinforcement learning and Q-learning, reinforcement programming learns programs based on example inputs and outputs. State representations and actions are provided. A transition function and rewards are defined. The system is trained until the system converges on a policy that can be directly implemented as a computer program. The efficiency of reinforcement programming is demonstrated by comparing a generalized in-place iterative sort learned through genetic programming to a sorting algorithm of the same type created using reinforcement programming. The sort learned by reinforcement programming is a novel algorithm. Reinforcement programming is more efficient and provides a more effective solution than genetic programming in the cases attempted. As additional examples, reinforcement programming is used to learn three binary addition problems.
15

Application of Reinforcement Learning to Multi-Agent Production Scheduling

Wang, Yi-chi 13 December 2003 (has links)
Reinforcement learning (RL) has received attention in recent years from agent-based researchers because it can be applied to problems where autonomous agents learn to select proper actions for achieving their goals based on interactions with their environment. Each time an agent performs an action, the environment¡Šs response, as indicated by its new state, is used by the agent to reward or penalize its action. The agent¡Šs goal is to maximize the total amount of reward it receives over the long run. Although there have been several successful examples demonstrating the usefulness of RL, its application to manufacturing systems has not been fully explored. The objective of this research is to develop a set of guidelines for applying the Q-learning algorithm to enable an individual agent to develop a decision making policy for use in agent-based production scheduling applications such as dispatching rule selection and job routing. For the dispatching rule selection problem, a single machine agent employs the Q-learning algorithm to develop a decision-making policy on selecting the appropriate dispatching rule from among three given dispatching rules. In the job routing problem, a simulated job shop system is used for examining the implementation of the Q-learning algorithm for use by job agents when making routing decisions in such an environment. Two factorial experiment designs for studying the settings used to apply Q-learning to the single machine dispatching rule selection problem and the job routing problem are carried out. This study not only investigates the main effects of this Q-learning application but also provides recommendations for factor settings and useful guidelines for future applications of Q-learning to agent-based production scheduling.
16

Trajectory Tracking Control of Unmanned Ground Vehicles using an Intermittent Learning Algorithm

Gundu, Pavan Kumar 21 August 2019 (has links)
Traffic congestion and safety has become a major issue in the modern world's commute. Congestion has been causing people to travel billions of hours more and to purchase billions of gallons of fuel extra which account to congestion cost of billions of dollars. Autonomous driving vehicles have been one solution to this problem because of their huge impact on efficiency, pollution, and human safety. Also, extensive research has been carried out on control design of vehicular platoons because a further improvement in traffic throughput while not compromising the safety is possible when the vehicles in the platoon are provided with better predictive abilities. Motion control is a key area of autonomous driving research that handles moving parts of vehicles in a deliberate and controlled manner. A widely worked on problem in motion control concerned with time parameterized reference tracking is trajectory tracking. Having an efficient and effective tracking algorithm embedded in the autonomous driving system is the key for better performance in terms of resources consumed and tracking error. Many tracking control algorithms in literature rely on an accurate model of the vehicle and often, it can be an intimidating task to come up with an accurate model taking into consideration various conditions like friction, heat effects, ageing processes etc. And typically, control algorithms rely on periodic execution of the tasks that update the control actions, but such updates might not be required, which result in unnecessary actions that waste resources. The main focus of this work is to design an intermittent model-free optimal control algorithm in order to enable autonomous vehicles to track trajectories at high-speeds. To obtain a solution which is model-free, a Q-learning setup with an actor-network to approximate the optimal intermittent controller and a critic network to approximate the optimal cost, resulting in the appropriate tuning laws is considered. / Master of Science / A risen research effort in the area of autonomous vehicles has been witnessed in the past few decades because these systems improve safety, comfort, transport time and energy consumption which are some of the main issues humans are facing in the modern world’s highway systems. Systems like emergency braking, automatic parking, blind angle vehicle detection are creating a safer driving environment in populated areas. Advanced driver assistance systems (ADAS) are what such kind of systems are known as. An extension of these partially automated ADAS are vehicles with fully automated driving abilities, which are able to drive by themselves without any human involvement. An extensively proposed approach for making traffic throughput more efficient on existing highways is to assemble autonomous vehicles into platoons. Small intervehicle spacing and many vehicles constituting each platoon formation improve the traffic throughput significantly. Lately, the advancements in computational capabilities, in terms of both algorithms and hardware, communications, and navigation and sensing devices contributed a lot to the development of autonomous systems (both single and multiagent) that operate with high reliability in uncertain/dynamic operating conditions and environments. Motion control is an important area in the autonomous vehicles research. Trajectory-tracking is a widely studied motion control scenario which is about designing control laws that force a system to follow some time-dependent reference path and it is important to have an effective and efficient trajectory-tracking control law in an autonomous vehicle to reduce the resources consumed and tracking error. The goal of this work is to design an intermittent model-free trajectory tracking control algorithm where there is no need of any mathematical model of the vehicle system being controlled and which can reduce the controller updates by allowing the system to evolve in an open loop fashion and close the loop only when an user defined triggering condition is satisfied. The approach is energy efficient in that the control updates are limited to instances when they are needed rather than unnecessary periodic updates. Q-learning which is a model-free reinforcement learning technique is used in the trajectory tracking motion control algorithm to make the vehicles track their respective reference trajectories without any requirement of their motion model, the knowledge of which is generally needed when dealing with a motion control problem. The testing of the designed algorithm in simulations and experiments is presented in this work. The study and development of a vehicle platform in order to perform the experiments is also discussed. Different motion control and sensing techniques are presented and used. The vehicle platform is shown to track a reference trajectory autonomously without any human intervention, both in simulations and experiments, proving the effectiveness of the proposed algorithm.
17

Predicting Mutational Pathways of Influenza A H1N1 Virus using Q-learning

Aarathi Raghuraman, FNU 13 August 2021 (has links)
Influenza is a seasonal viral disease affecting over 1 billion people annually around the globe, as reported by the World Health Organization (WHO). The influenza virus has been around for decades causing multiple pandemics and encouraging researchers to perform extensive analysis of its evolutionary patterns. Current research uses phylogenetic trees as the basis to guide population genetics and other phenotypic characteristics when describing the evolution of the influenza genome. Phylogenetic trees are one form of representing the evolutionary trends of sequenced genomes, but that do not capture the multidimensional complexity of mutational pathways. We suggest representing antigenic drifts within influenza A/H1N1 hemagglutinin (HA) protein as a graph, $G = (V, E)$, where $V$ is the set of vertices representing each possible sequence and $E$ is the set of edges representing single amino acid substitutions. Each transition is characterized by a Malthusian fitness model incorporating the genetic adaptation, vaccine similarity, and historical epidemiological response using mortality as the metric where available. Applying reinforcement learning with the vertices as states, edges as actions, and fitness as the reward, we learn the high likelihood mutational pathways and optimal policy, without exploring the entire space of the graph, $G$. Our average predicted versus actual sequence distance of $3.6 \pm 1.2$ amino acids indicates that our novel approach of using naive Q-learning can assist with influenza strain predictions, thus improving vaccine selection for future disease seasons. / Master of Science / Influenza is a seasonal virus affecting over 1 billion people annually around the globe, as reported by the World Health Organization (WHO). The effectiveness of influenza vaccines varies tremendously by the type (A, B, C or D) and season. Of note is the pandemic of 2009, where the influenza A H1N1 virus mutants were significantly different from the chosen vaccine composition. It is pertinent to understand and predict the underlying genetic and environmental behavior of influenza virus mutants to be able to determine the vaccine composition for future seasons, preventing another pandemic. Given the recent 2020 COVID-19 pandemic, which is also a virus that affects the upper respiratory system, novel approaches to predict viruses need to be investigated now more than ever. Thus, in this thesis, I develop a novel approach to predicting a portion of the influenza A H1N1 viruses using machine learning.
18

A review of Q-learning methods for Markov decision processes

Blizzard, Christopher, Wiktorsson, Emil January 2024 (has links)
This paper discusses how Q-Learning and Deep Q-Networks (DQN) canbe applied to state-action problems described by a Markov decision process(MDP). These are machine learning methods for finding the optimal choiceof action at each time step, resulting in the optimal policy. The limitationsand advantages for the two methods are discussed, with the main limitationbeing the fact that Q-learning is unable to be used on problems with infinitestate spaces. Q-learning, however, has an advantage in the simplicity of thealgorithm, leading to a better understanding of what the algorithm is actuallydoing. Q-Learning did manage to find the optimal policy for the simpleproblem studied in this paper, but was unable to do so for the advancedproblem. The Deep Q-Network (DQN) approach was able to solve bothproblems, with a drawback in it being harder to understand what the algorithmactually is doing.
19

Machine Learning Simulation: Torso Dynamics of Robotic Biped

Renner, Michael Robert 22 August 2007 (has links)
Military, Medical, Exploratory, and Commercial robots have much to gain from exchanging wheels for legs. However, the equations of motion of dynamic bipedal walker models are highly coupled and non-linear, making the selection of an appropriate control scheme difficult. A temporal difference reinforcement learning method known as Q-learning develops complex control policies through environmental exploration and exploitation. As a proof of concept, Q-learning was applied through simulation to a benchmark single pendulum swing-up/balance task; the value function was first approximated with a look-up table, and then an artificial neural network. We then applied Evolutionary Function Approximation for Reinforcement Learning to effectively control the swing-leg and torso of a 3 degree of freedom active dynamic bipedal walker in simulation. The model began each episode in a stationary vertical configuration. At each time-step the learning agent was rewarded for horizontal hip displacement scaled by torso altitude--which promoted faster walking while maintaining an upright posture--and one of six coupled torque activations were applied through two first-order filters. Over the course of 23 generations, an approximation of the value function was evolved which enabled walking at an average speed of 0.36 m/s. The agent oscillated the torso forward then backward at each step, driving the walker forward for forty-two steps in thirty seconds without falling over. This work represents the foundation for improvements in anthropomorphic bipedal robots, exoskeleton mechanisms to assist in walking, and smart prosthetics. / Master of Science
20

Uma contribui??o ? solu??o do problema dos k-servos usando aprendizagem por refor?o

Lima J?nior, Manoel Leandro de 06 April 2005 (has links)
Made available in DSpace on 2014-12-17T14:55:59Z (GMT). No. of bitstreams: 1 ManoelLJ.pdf: 474615 bytes, checksum: 061ee02f4ad5cc23a561d346dd73a9da (MD5) Previous issue date: 2005-04-06 / Neste trabalho ? proposto um novo algoritmo online para o resolver o Problema dos k-Servos (PKS). O desempenho desta solu??o ? comparado com o de outros algoritmos existentes na literatura, a saber, os algoritmos Harmonic e Work Function, que mostraram ser competitivos, tornando-os par?metros de compara??o significativos. Um algoritmo que apresente desempenho eficiente em rela??o aos mesmos tende a ser competitivo tamb?m, devendo, obviamente, se provar o referido fato. Tal prova, entretanto, foge aos objetivos do presente trabalho. O algoritmo apresentado para a solu??o do PKS ? baseado em t?cnicas de aprendizagem por refor?o. Para tanto, o problema foi modelado como um processo de decis?o em m?ltiplas etapas, ao qual ? aplicado o algoritmo Q-Learning, um dos m?todos de solu??o mais populares para o estabelecimento de pol?ticas ?timas neste tipo de problema de decis?o. Entretanto, deve-se observar que a dimens?o da estrutura de armazenamento utilizada pela aprendizagem por refor?o para se obter a pol?tica ?tima cresce em fun??o do n?mero de estados e de a??es, que por sua vez ? proporcional ao n?mero n de n?s e k de servos. Ao se analisar esse crescimento (matematicamente, ) percebe-se que o mesmo ocorre de maneira exponencial, limitando a aplica??o do m?todo a problemas de menor porte, onde o n?mero de n?s e de servos ? reduzido. Este problema, denominado maldi??o da dimensionalidade, foi introduzido por Belmann e implica na impossibilidade de execu??o de um algoritmo para certas inst?ncias de um problema pelo esgotamento de recursos computacionais para obten??o de sua sa?da. De modo a evitar que a solu??o proposta, baseada exclusivamente na aprendizagem por refor?o, seja restrita a aplica??es de menor porte, prop?e-se uma solu??o alternativa para problemas mais realistas, que envolvam um n?mero maior de n?s e de servos. Esta solu??o alternativa ? hierarquizada e utiliza dois m?todos de solu??o do PKS: a aprendizagem por refor?o, aplicada a um n?mero reduzido de n?s obtidos a partir de um processo de agrega??o, e um m?todo guloso, aplicado aos subconjuntos de n?s resultantes do processo de agrega??o, onde o crit?rio de escolha do agendamento dos servos ? baseado na menor dist?ncia ao local de demanda

Page generated in 0.0992 seconds