• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 91
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 124
  • 124
  • 124
  • 38
  • 25
  • 22
  • 22
  • 22
  • 21
  • 21
  • 20
  • 20
  • 20
  • 19
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

Deep Reinforcement Learning Adaptive Traffic Signal Control / Reinforcement Learning Traffic Signal Control

Genders, Wade 22 November 2018 (has links)
Sub-optimal automated transportation control systems incur high mobility, human health and environmental costs. With society reliant on its transportation systems for the movement of individuals, goods and services, minimizing these costs benefits many. Intersection traffic signal controllers are an important element of modern transportation systems that govern how vehicles traverse road infrastructure. Many types of traffic signal controllers exist; fixed time, actuated and adaptive. Adaptive traffic signal controllers seek to minimize transportation costs through dynamic control of the intersection. However, many existing adaptive traffic signal controllers rely on heuristic or expert knowledge and were not originally designed for scalability or for transportation’s big data future. This research addresses the aforementioned challenges by developing a scalable system for adaptive traffic signal control model development using deep reinforcement learning in traffic simulation. Traffic signal control can be modelled as a sequential decision-making problem; reinforcement learning can solve sequential decision-making problems by learning an optimal policy. Deep reinforcement learning makes use of deep neural networks, powerful function approximators which benefit from large amounts of data. Distributed, parallel computing techniques are used to provide scalability, with the proposed methods validated on a simulation of the City of Luxembourg, Luxembourg, consisting of 196 intersections. This research contributes to the body of knowledge by successfully developing a scalable system for adaptive traffic signal control model development and validating it on the largest traffic microsimulator in the literature. The proposed system reduces delay, queues, vehicle stopped time and travel time compared to conventional traffic signal controllers. Findings from this research include that using reinforcement learning methods which explicitly develop the policy offers improved performance over purely value-based methods. The developed methods are expected to mitigate the problems caused by sub-optimal automated transportation signal controls systems, improving mobility and human health and reducing environmental costs. / Thesis / Doctor of Philosophy (PhD) / Inefficient transportation systems negatively impact mobility, human health and the environment. The goal of this research is to mitigate these negative impacts by improving automated transportation control systems, specifically intersection traffic signal controllers. This research presents a system for developing adaptive traffic signal controllers that can efficiently scale to the size of cities by using machine learning and parallel computation techniques. The proposed system is validated by developing adaptive traffic signal controllers for 196 intersections in a simulation of the City of Luxembourg, Luxembourg, successfully reducing delay, queues, vehicle stopped time and travel time.
122

Reinforcement Learning for Market Making / Förstärkningsinlärningsbaserad likviditetsgarantering

Carlsson, Simon, Regnell, August January 2022 (has links)
Market making – the process of simultaneously and continuously providing buy and sell prices in a financial asset – is rather complicated to optimize. Applying reinforcement learning (RL) to infer optimal market making strategies is a relatively uncharted and novel research area. Most published articles in the field are notably opaque concerning most aspects, including precise methods, parameters, and results. This thesis attempts to explore and shed some light on the techniques, problem formulations, algorithms, and hyperparameters used to construct RL-derived strategies for market making. First, a simple probabilistic model of a limit order book is used to compare analytical and RL-derived strategies. Second, a market making agent is trained on a more complex Markov chain model of a limit order book using tabular Q-learning and deep reinforcement learning with double deep Q-learning. Results and strategies are analyzed, compared, and discussed. Finally, we propose some exciting extensions and directions for future work in this research field. / Likviditetsgarantering (eng. ”market making”) – processen att simultant och kontinuerligt kvotera köp- och säljpriser i en finansiell tillgång – är förhållandevis komplicerat att optimera. Att använda förstärkningsinlärning (eng. ”reinforcement learning”) för att härleda optimala strategier för likviditetsgarantering är ett relativt outrett och nytt forskningsområde. De flesta publicerade artiklarna inom området är anmärkningsvärt återhållsamma gällande detaljer om de tekniker, problemformuleringar, algoritmer och hyperparametrar som används för att framställa förstärkningsinlärningsbaserade strategier. I detta examensarbete så gör vi ett försök på att utforska och bringa klarhet över dessa punkter. Först används en rudimentär probabilistisk modell av en limitorderbok som underlag för att jämföra analytiska och förstärkningsinlärda strategier. Därefter brukas en mer sofistikerad Markovkedjemodell av en limitorderbok för att jämföra tabulära och djupa inlärningsmetoder. Till sist presenteras även spännande utökningar och direktiv för framtida arbeten inom området.
123

Parameter, experience, and compute efficient deep reinforcement learning

Nikishin, Evgenii 08 1900 (has links)
Cette thèse présente trois contributions qui améliorent des axes distincts de l’efficacité des algorithmes d’apprentissage par renforcement profond (RL). Notre première contribution commence par la prémisse selon laquelle les algorithmes RL basés sur un modèle standard minimisent généralement l’erreur de prédiction de l’état suivant pour la formation d’un modèle mondial. Bien qu’il s’agisse d’une approche naturelle, cette erreur pénalise également les erreurs de prédiction des composants de l’espace d’état qui sont pertinents pour la prise de décision et ceux qui ne le sont pas. Pour surmonter cette limitation, nous proposons une manière alternative d’entraîner un modèle en différenciant directement les rendements attendus, l’objectif qu’un agent cherche finalement à optimiser. Notre algorithme surpasse l’approche standard lorsque la capacité du réseau alimentant le modèle est limitée, conduisant à un agent plus efficace en termes de paramètres. La deuxième contribution se concentre sur l’efficacité avec laquelle les algorithmes RL profonds utilisent l’expérience. Nous identifions le phénomène de biais de primauté dans le RL profond, une tendance à apprendre excessivement des premières interactions qu’un agent a avec un environnement. Les conséquences négatives de cette tendance se propagent au reste de la formation, altérant la capacité à apprendre efficacement des interactions ultérieures. Comme remède simple au biais de primauté, nous proposons de réinitialiser périodiquement les paramètres réseau de l’agent tout en préservant le tampon d’expériences. L’application de cette technique améliore systématiquement les rendements entre les algorithmes et les domaines. Enfin, nous apportons une contribution qui améliore l’efficacité informatique de la formation RL approfondie. De nombreux articles antérieurs ont observé que les réseaux neuronaux utilisés dans la RL profonde perdent progressivement leur plasticité et leur capacité à apprendre de nouvelles expériences. Une stratégie immédiate pour atténuer ce problème consiste à utiliser un réseau plus vaste et doté de plus de plasticité au départ ; cependant, cela augmente le coût informatique de la formation. Nous proposons une intervention appelée injection de plasticité qui agrandit progressivement le réseau. Les agents qui partent d’un réseau plus petit et utilisent l’injection de plasticité pendant la formation enregistrent les calculs pendant la formation sans compromettre les retours finaux. / This thesis presents three contributions that improve separate axes of the efficiency of deep reinforcement learning (RL) algorithms. Our first contribution begins with the premise that standard model-based RL algorithms typically minimize the next state prediction error for training a world model. Despite being a natural approach, this error equally penalizes for mispredictions of the components of the state space that are relevant for decision making and that are not. To overcome the limitation, we propose an alternative way to train a model by directly differentiating expected returns, the objective that an agent ultimately seeks to optimize. Our algorithm outperforms the standard approach when the capacity of the network powering the model is limited, leading to a more parameter efficient agent. The second contribution focuses on how efficiently deep RL algorithms utilize the experience. We identify the primacy bias phenomenon in deep RL, a tendency to learn excessively from the first interactions an agent has with an environment. The negative consequences of the tendency propagate to the rest of the training, impairing the ability to learn efficiently from subsequent interactions. As a simple remedy to the primacy bias, we propose to periodically re-initialize the agent’s network parameters while preserving the buffer with experiences. Applying this technique consistently improves the returns across algorithms and domains. Lastly, we make a contribution that improves the computational efficiency of deep RL training. Numerous prior papers observed that neural networks employed in deep RL gradually lose plasticity, the ability to learn from new experiences. An immediate strategy for mitigating this issue is to employ a larger network that has more plasticity to begin with; however, it increases the computational cost of training. We propose an intervention called plasticity injection that gradually grows the network. Agents that start from a smaller network and use plasticity injection during training save the computations during training without compromising the final returns.
124

Deep Reinforcement Learning for Multi-Agent Path Planning in 2D Cost Map Environments : using Unity Machine Learning Agents toolkit

Persson, Hannes January 2024 (has links)
Multi-agent path planning is applied in a wide range of applications in robotics and autonomous vehicles, including aerial vehicles such as drones and other unmanned aerial vehicles (UAVs), to solve tasks in areas like surveillance, search and rescue, and transportation. In today's rapidly evolving technology in the fields of automation and artificial intelligence, multi-agent path planning is growing increasingly more relevant. The main problems encountered in multi-agent path planning are collision avoidance with other agents, obstacle evasion, and pathfinding from a starting point to an endpoint. In this project, the objectives were to create intelligent agents capable of navigating through two-dimensional eight-agent cost map environments to a static target, while avoiding collisions with other agents and simultaneously minimizing the path cost. The method of reinforcement learning was used by utilizing the development platform Unity and the open-source ML-Agents toolkit that enables the development of intelligent agents with reinforcement learning inside Unity. Perlin Noise was used to generate the cost maps. The reinforcement learning algorithm Proximal Policy Optimization was used to train the agents. The training was structured as a curriculum with two lessons, the first lesson was designed to teach the agents to reach the target, without colliding with other agents or moving out of bounds. The second lesson was designed to teach the agents to minimize the path cost. The project successfully achieved its objectives, which could be determined from visual inspection and by comparing the final model with a baseline model. The baseline model was trained only to reach the target while avoiding collisions, without minimizing the path cost. A comparison of the models showed that the final model outperformed the baseline model, reaching an average of $27.6\%$ lower path cost. / Multi-agent-vägsökning används inom en rad olika tillämpningar inom robotik och autonoma fordon, inklusive flygfarkoster såsom drönare och andra obemannade flygfarkoster (UAV), för att lösa uppgifter inom områden som övervakning, sök- och räddningsinsatser samt transport. I dagens snabbt utvecklande teknik inom automation och artificiell intelligens blir multi-agent-vägsökning allt mer relevant. De huvudsakliga problemen som stöts på inom multi-agent-vägsökning är kollisioner med andra agenter, undvikande av hinder och vägsökning från en startpunkt till en slutpunkt. I detta projekt var målen att skapa intelligenta agenter som kan navigera genom tvådimensionella åtta-agents kostnadskartmiljöer till ett statiskt mål, samtidigt som de undviker kollisioner med andra agenter och minimerar vägkostnaden. Metoden förstärkningsinlärning användes genom att utnyttja utvecklingsplattformen Unity och Unitys open-source ML-Agents toolkit, som möjliggör utveckling av intelligenta agenter med förstärkningsinlärning inuti Unity. Perlin Brus användes för att generera kostnadskartorna. Förstärkningsinlärningsalgoritmen Proximal Policy Optimization användes för att träna agenterna. Träningen strukturerades som en läroplan med två lektioner, den första lektionen var utformad för att lära agenterna att nå målet, utan att kollidera med andra agenter eller röra sig utanför gränserna. Den andra lektionen var utformad för att lära agenterna att minimera vägkostnaden. Projektet uppnådde framgångsrikt sina mål, vilket kunde fastställas genom visuell inspektion och genom att jämföra den slutliga modellen med en basmodell. Basmodellen tränades endast för att nå målet och undvika kollisioner, utan att minimera vägen kostnaden. En jämförelse av modellerna visade att den slutliga modellen överträffade baslinjemodellen, och uppnådde en genomsnittlig $27,6\%$ lägre vägkostnad.

Page generated in 0.1042 seconds