• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 66
  • 15
  • 6
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 109
  • 109
  • 68
  • 34
  • 26
  • 25
  • 25
  • 20
  • 18
  • 17
  • 17
  • 17
  • 16
  • 16
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

[en] COMPUTED-TORQUE CONTROL OF A SIMULATED BIPEDAL ROBOT WITH LOCOMOTION BY REINFORCEMENT LEARNING / [pt] CONTROLE POR TORQUE COMPUTADO DE UM ROBÔ BÍPEDE SIMULADO COM LOCOMOÇÃO VIA APRENDIZADO POR REFORÇO

CARLOS MAGNO CATHARINO OLSSON VALLE 27 October 2016 (has links)
[pt] Esta dissertação apresenta o desenvolvimento de um controle híbrido de um robô do tipo humanoide Atlas em regime de locomoção estática para a frente. Nos experimentos faz-se uso do ambiente de simulação Gazebo, que permite uma modelagem precisa do robô. O sistema desenvolvido é composto pela modelagem da mecânica do robô, incluindo as equações da dinâmica que permitem o controle das juntas por torque computado, e pela determinação das posições que as juntas devem assumir. Isto é realizado por agentes que utilizam o algoritmo de aprendizado por reforço Q-Learning aproximado para planejar a locomoção do robô. A definição do espaço de estados, que compõe cada agente, difere da cartesiana tradicional e é baseada no conceito de pontos cardeais para estabelecer as direções a serem seguidas até o objetivo e para evitar obstáculos. Esta definição permite o uso de um ambiente simulado reduzido para treinamento, fornecendo aos agentes um conhecimento prévio à aplicação no ambiente real e facilitando, em consequência, a convergência para uma ação dita ótima em poucas iterações. Utilizam-se, no total, três agentes: um para controlar o deslocamento do centro de massa enquanto as duas pernas estão apoiadas ao chão, e outros dois para manter o centro de massa dentro de uma área de tolerância de cada um dos pés na situação em que o robô estiver apoiado com apenas um dos pés no chão. O controle híbrido foi também concebido para reduzir as chances de queda do robô durante a caminhada mediante o uso de uma série de restrições, tanto pelo aprendizado por reforço como pelo modelo da cinemática do robô. A abordagem proposta permite um treinamento eficiente em poucas iterações, produz bons resultados e assegura a integridade do robô. / [en] This dissertation presents the development of a hybrid control for an Atlas humanoid robot moving forward in a static locomotion regime. The Gazebo simulation environment used in the experiments allows a precise modeling of the robot. The developed system consists of the robot mechanics modeling, including dynamical equations that allow the control of joints by computed-torque and the determination of positions the joints should take. This is accomplished by agents that make use of the approximate Q-Learning reinforcement learning algorithm to plan the robot s locomotion. The definition of the state space that makes up each agent differs from the traditional cartesian one and is based on the concept of cardinal points to establish the directions to be followed to the goal and avoid obstacles. This allows the use of a reduced simulated environment for training, providing the agents with prior knowledge to the application in a real environment and facilitating, as a result, convergence to a so-called optimal action in few iterations. Three agents are used: one to control the center of mass displacement when the two legs are poised on the floor and other two for keeping the center of mass within a tolerance range of each of the legs when only one foot is on the ground. In order to reduce the chance of the robot falling down while walking the hybrid control employs a number of constraints, both in the reinforcement learning part and in the robot kinematics model. The proposed approach allows an effective training in few iterations, achieves good results and ensures the integrity of the robot.
92

Deep Reinforcement Learning in Cart Pole and Pong

Kuurne Uussilta, Dennis, Olsson, Viktor January 2020 (has links)
In this project, we aim to reproduce previous resultsachieved with Deep Reinforcement Learning. We present theMarkov Decision Process model as well as the algorithms Q-learning and Deep Q-learning Network (DQN). We implement aDQN agent, first in an environment called CartPole, and later inthe game Pong.Our agent was able to solve the CartPole environment in lessthan 300 episodes. We assess the impact some of the parametershad on the agents performance. The performance of the agentis particularly sensitive to the learning rate and seeminglyproportional to the dimension of the neural network. The DQNagent implemented in Pong was unable to learn, performing atthe same level as an agent picking actions at random, despiteintroducing various modifications to the algorithm. We discusspossible sources of error, including the RAM used as input,possibly not containing sufficient information. Furthermore, wediscuss the possibility of needing additional modifications to thealgorithm in order to achieve convergence, as it is not guaranteedfor DQN. / Målet med detta projekt är att reproducera tidigare resultat som uppnåtts med Deep Reinforcement Learning. Vi presenterar Markov Decision Process-modellen samt algoritmerna Q-learning och Deep Q-learning Network (DQN). Vi implementerar en DQN agent, först i miljön CartPole, sedan i spelet Pong.  Vår agent lyckades lösa CartPole på mindre än 300 episoder. Vi gör en bedömning av vissa parametrars påverkan på agentens prestanda. Agentens prestanda är särskilt känslig för värdet på ”learning rate” och verkar vara proportionell mot dimensionen av det neurala nätverket. DQN-agenten som implementerades i Pong var oförmögen att lära sig och spelade på samma nivå som en agent som agerar slumpmässigt, trots att vi introducerade diverse modifikationer. Vi diskuterar möjliga felkällor, bland annat att RAM, som används som indata till agenten, eventuellt saknar tillräcklig information. Dessutom diskuterar vi att ytterligare modifikationer kan vara nödvändiga för uppnå konvergens eftersom detta inte är garanterat för DQN. / Kandidatexjobb i elektroteknik 2020, KTH, Stockholm
93

Deep Q-Learning for Lane Localization : Exploring Reinforcement Learning for Accurate Lane Detection / Djupinlärning med Q-lärande för fillokalisation : Utforskning av förstärkningsinlärning för noggrann filavkänning

Ganesan, Aishwarya January 2024 (has links)
In autonomous driving, achieving fast and reliable lane detection is essential. This project explores a two-step lane detection and localization approach, diverging from relying solely on end-to-end deep learning methods, which often struggle with curved or occluded lanes. Specifically, we investigate the feasibility of training a deep reinforcement learning-based agent to adjust the detected lane, manipulating either the lane points or the parameters of a Bézier curve. However, the study found that reinforcement learning-based localization, particularly on datasets like TuSimple, did not perform as well as anticipated, despite efforts to enhance performance using various metrics. Introducing curves to expand the localizer's scope did not surpass the point-based approach, indicating the need for further refinement for Deep Q-learning localization to be feasible. Although optimization techniques like Double Deep Q-Network showed improvements, the study did not support the hypothesis that curves with Deep Q-learning offer superior performance, highlighting the necessity for additional research into alternative methods to achieve more accurate lane detection and localization in autonomous driving systems using reinforcement learning. / I autonom körning är att uppnå snabb och pålitlig filavkänning av avgörande betydelse. Detta projekt utforskar ett tvåstegs tillvägagångssätt för filavkänning och lokalisation som skiljer sig från att enbart förlita sig på end-to-end djupinlärningsmetoder, vilka ofta har svårt med krökta eller ockluderade filer. Mer specifikt undersöker vi genomförbarheten att träna en djupinlärningsbaserad förstärkningsinlärningsagent för att justera den upptäckta filen genom att manipulera antingen filpunkterna eller parametrarna för en Bézier-kurva. Studien fann dock att lokalisation baserad på förstärkningsinlärning, särskilt på dataset som TuSimple, inte presterade så bra som förväntat, trots ansträngningar att förbättra prestanda med olika metriker. Att introducera kurvor för att utvidga lokaliserarens omfattning överträffade inte det punktbaserade tillvägagångssättet, vilket tyder på behovet av ytterligare förfining för att göra Deep Q-learning lokalisation praktiskt genomförbart. Även om optimeringstekniker som Double Deep Q-Network visade förbättringar, stödde studien inte hypotesen att kurvor med Deep Q-learning erbjuder överlägsen prestanda, vilket understryker nödvändigheten av ytterligare forskning om alternativa metoder för att uppnå mer exakt filavkänning och lokalisation i autonom körningssystem med hjälp av förstärkningsinlärning.
94

Online Learning and Simulation Based Algorithms for Stochastic Optimization

Lakshmanan, K January 2012 (has links) (PDF)
In many optimization problems, the relationship between the objective and parameters is not known. The objective function itself may be stochastic such as a long-run average over some random cost samples. In such cases finding the gradient of the objective is not possible. It is in this setting that stochastic approximation algorithms are used. These algorithms use some estimates of the gradient and are stochastic in nature. Amongst gradient estimation techniques, Simultaneous Perturbation Stochastic Approximation (SPSA) and Smoothed Functional(SF) scheme are widely used. In this thesis we have proposed a novel multi-time scale quasi-Newton based smoothed functional (QN-SF) algorithm for unconstrained as well as constrained optimization. The algorithm uses the smoothed functional scheme for estimating the gradient and the quasi-Newton method to solve the optimization problem. The algorithm is shown to converge with probability one. We have also provided here experimental results on the problem of optimal routing in a multi-stage network of queues. Policies like Join the Shortest Queue or Least Work Left assume knowledge of the queue length values that can change rapidly or hard to estimate. If the only information available is the expected end-to-end delay as with our case, such policies cannot be used. The QN-SF based probabilistic routing algorithm uses only the total end-to-end delay for tuning the probabilities. We observe from the experiments that the QN-SF algorithm has better performance than the gradient and Jacobi versions of Newton based smoothed functional algorithms. Next we consider constrained routing in a similar queueing network. We extend the QN-SF algorithm to this case. We study the convergence behavior of the algorithm and observe that the constraints are satisfied at the point of convergence. We provide experimental results for the constrained routing setup as well. Next we study reinforcement learning algorithms which are useful for solving Markov Decision Process(MDP) when the precise information on transition probabilities is not known. When the state, and action sets are very large, it is not possible to store all the state-action tuples. In such cases, function approximators like neural networks have been used. The popular Q-learning algorithm is known to diverge when used with linear function approximation due to the ’off-policy’ problem. Hence developing stable learning algorithms when used with function approximation is an important problem. We present in this thesis a variant of Q-learning with linear function approximation that is based on two-timescale stochastic approximation. The Q-value parameters for a given policy in our algorithm are updated on the slower timescale while the policy parameters themselves are updated on the faster scale. We perform a gradient search in the space of policy parameters. Since the objective function and hence the gradient are not analytically known, we employ the efficient one-simulation simultaneous perturbation stochastic approximation(SPSA) gradient estimates that employ Hadamard matrix based deterministic perturbations. Our algorithm has the advantage that, unlike Q-learning, it does not suffer from high oscillations due to the off-policy problem when using function approximators. Whereas it is difficult to prove convergence of regular Q-learning with linear function approximation because of the off-policy problem, we prove that our algorithm which is on-policy is convergent. Numerical results on a multi-stage stochastic shortest path problem show that our algorithm exhibits significantly better performance and is more robust as compared to Q-learning. Future work would be to compare it with other policy-based reinforcement learning algorithms. Finally, we develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process(MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multistage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.
95

Reinforcement Learning for Market Making / Förstärkningsinlärningsbaserad likviditetsgarantering

Carlsson, Simon, Regnell, August January 2022 (has links)
Market making – the process of simultaneously and continuously providing buy and sell prices in a financial asset – is rather complicated to optimize. Applying reinforcement learning (RL) to infer optimal market making strategies is a relatively uncharted and novel research area. Most published articles in the field are notably opaque concerning most aspects, including precise methods, parameters, and results. This thesis attempts to explore and shed some light on the techniques, problem formulations, algorithms, and hyperparameters used to construct RL-derived strategies for market making. First, a simple probabilistic model of a limit order book is used to compare analytical and RL-derived strategies. Second, a market making agent is trained on a more complex Markov chain model of a limit order book using tabular Q-learning and deep reinforcement learning with double deep Q-learning. Results and strategies are analyzed, compared, and discussed. Finally, we propose some exciting extensions and directions for future work in this research field. / Likviditetsgarantering (eng. ”market making”) – processen att simultant och kontinuerligt kvotera köp- och säljpriser i en finansiell tillgång – är förhållandevis komplicerat att optimera. Att använda förstärkningsinlärning (eng. ”reinforcement learning”) för att härleda optimala strategier för likviditetsgarantering är ett relativt outrett och nytt forskningsområde. De flesta publicerade artiklarna inom området är anmärkningsvärt återhållsamma gällande detaljer om de tekniker, problemformuleringar, algoritmer och hyperparametrar som används för att framställa förstärkningsinlärningsbaserade strategier. I detta examensarbete så gör vi ett försök på att utforska och bringa klarhet över dessa punkter. Först används en rudimentär probabilistisk modell av en limitorderbok som underlag för att jämföra analytiska och förstärkningsinlärda strategier. Därefter brukas en mer sofistikerad Markovkedjemodell av en limitorderbok för att jämföra tabulära och djupa inlärningsmetoder. Till sist presenteras även spännande utökningar och direktiv för framtida arbeten inom området.
96

Techniques d'Apprentissage par Renforcement pour le Routage Adaptatif dans les Réseaux de Télécommunication à Trafic Irrégulie

HOCEINI, SAID 23 November 2004 (has links) (PDF)
L'objectif de ce travail de thèse est de proposer des approches algorithmiques permettant de traiter la problématique du routage adaptatif (RA) dans un réseau de communication à trafic irrégulier. L'analyse des algorithmes existants nous a conduit à retenir comme base de travail l'algorithme Q-Routing (QR); celui-ci s'appuie sur la technique d'apprentissage par renforcement basée sur les modèles de Markov. L'efficacité de ce type de routage dépend fortement des informations sur la charge et la nature du trafic sur le réseau. Ces dernières doivent être à la fois, suffisantes, pertinentes et reflétant la charge réelle du réseau lors de la phase de prise de décision. Pour remédier aux inconvénients des techniques utilisant le QR, nous avons proposé deux algorithmes de RA. Le premier, appelé Q-Neural Routing, s'appuie sur un modèle neuronal stochastique pour estimer et mettre à jour les paramètres nécessaires au RA. Afin d'accélérer le temps de convergence, une deuxième approche est proposée : K-Shortest path Q-Routing. Elle est basée sur la technique de routage multi chemin combiné avec l'algorithme QR, l'espace d'exploration étant réduit aux k meilleurs chemins. Les deux algorithmes proposés sont validés et comparés aux approches traditionnelles en utilisant la plateforme de simulation OPNET, leur efficacité au niveau du RA est mise particulièrement en évidence. En effet, ceux-ci permettent une meilleure prise en compte de l'état du réseau contrairement aux approches classiques.
97

Derivação de modelos de trading de alta frequência em juros utilizando aprendizado por reforço

Castro, Uirá Caiado de 24 August 2017 (has links)
Submitted by Uirá Caiado de Castro (ucaiado@yahoo.com.br) on 2017-08-28T20:17:54Z No. of bitstreams: 1 uira_caiado_tradingRL.pdf: 1000833 bytes, checksum: d530c31d30ddfd98e5978aaaf3170959 (MD5) / Approved for entry into archive by Joana Martorini (joana.martorini@fgv.br) on 2017-08-28T21:06:42Z (GMT) No. of bitstreams: 1 uira_caiado_tradingRL.pdf: 1000833 bytes, checksum: d530c31d30ddfd98e5978aaaf3170959 (MD5) / Made available in DSpace on 2017-08-29T12:42:53Z (GMT). No. of bitstreams: 1 uira_caiado_tradingRL.pdf: 1000833 bytes, checksum: d530c31d30ddfd98e5978aaaf3170959 (MD5) Previous issue date: 2017-08-24 / O presente estudo propõe o uso de um modelo de aprendizagem por reforço para derivar uma estratégia de trading em taxa de juros diretamente de dados históricos de alta frequência do livro de ofertas. Nenhuma suposição sobre a dinâmica do mercado é feita, porém é necessário criar um simulador com o qual o agente de aprendizagem possa interagir para adquirir experiência. Diferentes variáveis relacionadas a microestrutura do mercado são testadas para compor o estado do ambiente. Funções baseadas em P&L e/ou na coerência do posicionamento das ofertas do agente são testadas para avaliar as ações tomadas. Os resultados deste trabalho sugerem algum sucesso na utilização das técnicas propostas quando aplicadas à atividade de trading. Porém, conclui-se que a obtenção de estratégias consistentemente lucrativas dependem muito das restrições colocadas na aprendizagem. / The present study proposes the use of a reinforcement learning model to develop an interest rate trading strategy directly from historical high-frequency order book data. No assumption about market dynamics is made, but it requires creating a simulator wherewith the learning agent can interact to gain experience. Different variables related to the microstructure of the market are tested to compose the state of the environment. Functions based on P&L and/or consistency in the order placement by the agent are tested to evaluate the actions taken. The results suggest some success in bringing the proposed techniques to trading. However, it is presumed that the achievement of consistently profitable strategies is highly dependent on the constraints placed on the learning task.
98

Algorithms for Product Pricing and Energy Allocation in Energy Harvesting Sensor Networks

Sindhu, P R January 2014 (has links) (PDF)
In this thesis, we consider stochastic systems which arise in different real-world application contexts. The first problem we consider is based on product adoption and pricing. A monopolist selling a product has to appropriately price the product over time in order to maximize the aggregated profit. The demand for a product is uncertain and is influenced by a number of factors, some of which are price, advertising, and product technology. We study the influence of price on the demand of a product and also how demand affects future prices. Our approach involves mathematically modelling the variation in demand as a function of price and current sales. We present a simulation-based algorithm for computing the optimal price path of a product for a given period of time. The algorithm we propose uses a smoothed-functional based performance gradient descent method to find a price sequence which maximizes the total profit over a planning horizon. The second system we consider is in the domain of sensor networks. A sensor network is a collection of autonomous nodes, each of which senses the environment. Sensor nodes use energy for sensing and communication related tasks. We consider the problem of finding optimal energy sharing policies that maximize the network performance of a system comprising of multiple sensor nodes and a single energy harvesting(EH) source. Nodes periodically sense a random field and generate data, which is stored in their respective data queues. The EH source harnesses energy from ambient energy sources and the generated energy is stored in a buffer. The nodes require energy for transmission of data and and they receive the energy for this purpose from the EH source. There is a need for efficiently sharing the stored energy in the EH source among the nodes in the system, in order to minimize average delay of data transmission over the long run. We formulate this problem in the framework of average cost infinite-horizon Markov Decision Processes[3],[7]and provide algorithms for the same.
99

Řízení entit ve strategické hře založené na multiagentních systémech / Strategic Game Based on Multiagent Systems

Knapek, Petr January 2019 (has links)
This thesis is focused on designing and implementing system, that adds learning and planning capabilities to agents designed for playing real-time strategy games like StarCraft. It will explain problems of controlling game entities and bots by computer and introduce some often used solutions. Based on analysis, a new system has been designed and implemented. It uses multi-agent systems to control the game, utilizes machine learning methods and is capable of overcoming oponents and adapting to new challenges.
100

Reinforcement Learning for Grid Voltage Stability with FACTS

Oldeen, Joakim, Sharma, Vishnu January 2020 (has links)
With increased penetration of renewable energy sources, maintaining equilibrium between production and consumption in the world’s electrical power systems (EPS) becomes more and more challenging. One way to increase stability and efficiency in an EPS is to use flexible alternating current transmission systems (FACTS). However, an EPS containing multiple FACTS-devices with overlapping areas of influence can lead to negative effects if the reference values they operate around are not updated with sufficient temporal resolution. The reference values are usually set manually by a system operator. The work in this master thesis has investigated how three different reinforcement learning (RL) algorithms can be used to set reference values automatically with higher temporal resolution than a system operator with the aim of increased voltage stability. The three RL algorithms – Q-learning, Deep Q-learning (DQN), and Twindelayed deep deterministic policy gradient (TD3) – were implemented in Python together with a 2-bus EPS test network acting as environment. The 2-bus EPS test network contain two FACTS devices: one for shunt compensation and one for series compensation. The results show that – with respect to reward – DQN was able to perform equally or better than non-RL cases 98.3 % of the time on the simulation test set, while corresponding values for TD3 and Q-learning were 87.3 % and 78.5 % respectively. DQN was able to achieve increased voltage stability on the test network while TD3 showed similar results except during lower loading levels. Q-learning decreased voltage stability on a substantial portion of the test set, even compared to a case without FACTS devices. To help with continued research and possible future real life implementation, a list of suggestions for future work has been established.

Page generated in 0.1226 seconds