• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • Tagged with
  • 8
  • 8
  • 8
  • 8
  • 8
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Model-Based versus Data-Driven Control Design for LEACH-based WSN

Karlsson, Axel, Zhou, Bohan January 2020 (has links)
In relation to the increasing interest in implementing smart cities, deployment of widespread wireless sensor networks (WSNs) has become a current hot topic. Among the application’s greatest challenges, there is still progress to be made concerning energy consumption and quality of service. Consequently, this project aims to explore a series of feasible solutions to improve the WSN energy efficiency for data aggregation by the WSN. This by strategically adjusting the position of the receiving base station and the packet rate of the WSN nodes. Additionally, the low-energy adaptive clustering hierarchy (LEACH) protocol is coupled with the WSN state of charge (SoC). For this thesis, a WSN was defined as a two dimensional area which contains sensor nodes and a mobile sink, i.e. a movable base station. Subsequent to the rigorous analyses of the WSN data clustering principles and system-wide dynamics, two different developing strategies, model-based and data-driven designs, were employed to develop two corresponding control approaches, model predictive control and reinforcement learning, on WSN energy management. To test their performance, a simulation environment was thus developed in Python, including the extended LEACH protocol. The amount of data transmitted by an energy unit is adopted as the index to estimate the control performance. The simulation results show that the model based controller was able to aggregate over 22% more bits than only using the LEACH protocol. Whilst the data driven controller had a worse performance than the LEACH network but showed potential for smaller sized WSNs containing a fewer amount of nodes. Nonetheless, the extension of the LEACH protocol did not give rise to obvious improvement on energy efficiency due to a wide range of differing results. / I samband med det ökande intresset för att implementera så kallade smart cities, har användningen av utbredda trådlösa sensor nätverk (WSN) blivit ett intresseområde. Bland applikationens största utmaningar, finns det fortfarande förbättringar med avseende på energiförbrukning och servicekvalité. Därmed så inriktar sig detta projekt på att utforska en mängd möjliga lösningar för att förbättra energieffektiviteten för dataaggregation inom WSN. Detta gjordes genom att strategiskt justera positionen av den mottagande basstationen samt paketfrekvensen för varje nod. Dessutom påbyggdes low-energy adaptive clustering hierarchy (LEACH) protokollet med WSN:ets laddningstillstånd. För detta examensarbete definierades ett WSN som ett två dimensionellt plan som innehåller sensor noder och en mobil basstation, d.v.s. en basstation som går att flytta. Efter rigorös analys av klustringsmetoder samt dynamiken av ett WSN, utvecklades två kontrollmetoder som bygger på olika kontrollstrategier. Dessa var en modelbaserad MPC kontroller och en datadriven reinforcement learning kontroller som implementerades för att förbättra energieffektiviteten i WSN. För att testa prestandan på dom två kontrollmetoderna, utvecklades en simulations platform baserat på Python, tillsamans med påbyggnaden av LEACH protokollet. Mängden data skickat per energienhet användes som index för att approximera kontrollprestandan. Simuleringsresultaten visar att den modellbaserade kontrollern kunde öka antalet skickade datapacket med 22% jämfört med när LEACH protokollet användes. Medans den datadrivna kontrollern hade en sämre prestanda jämfört med när enbart LEACH protokollet användes men den visade potential för WSN med en mindre storlek. Påbyggnaden av LEACH protokollet gav ingen tydlig ökning med avseende på energieffektiviteten p.g.a. en mängd avvikande resultat.
2

Dynamic Maze Puzzle Navigation Using Deep Reinforcement Learning

Chiu, Luisa Shu Yi 01 September 2024 (has links) (PDF)
The implementation of deep reinforcement learning in mobile robotics offers a great solution for the development of autonomous mobile robots to efficiently complete tasks and transport objects. Reinforcement learning continues to show impressive potential in robotics applications through self-learning and biological plausibility. Despite its advancements, challenges remain in applying these machine learning techniques in dynamic environments. This thesis explores the performance of Deep Q-Networks (DQN), using images as an input, for mobile robot navigation in dynamic maze puzzles and aims to contribute to advancements in deep reinforcement learning applications for simulated and real-life robotic systems. This project is a step towards implementation in a hardware-based system. The proposed approach uses a DQN algorithm with experience replay and an epsilon-greedy annealing schedule. Experiments are conducted to train DQN agents in static and dynamic maze environments, and various reward functions and training strategies are explored to optimize learning outcomes. In this context, the dynamic aspect involves training the agent on fixed mazes and then testing its performance on modified mazes, where obstacles like walls alter previously optimal paths to the goal. In game play, the agent achieved a 100\% win rate in both 4x4 and 10x10 static mazes, successfully making it to the goal regardless of slip conditions. The number of rewards obtained during the game-play episodes indicates that the agent took the optimal path in all 100 episodes of the 4x4 maze without the slip condition, whereas it took the shortest, most optimal path in 99 out of 100 episodes in the 4x4 maze with the slip condition. Compared to the 4x4 maze, the agent more frequently chose sub-optimal paths in the larger 10x10 maze, as indicated by the amount of times the agent maximized rewards obtained. In the 10x10 static maze game-play, the agent took the optimal path in 96 out of 100 episodes for the no slip condition, while it took the shortest path in 93 out of 100 episodes for the slip condition. In the dynamic maze experiment, the agent successfully solved 7 out of 8 mazes with a 100\% win rate in both original and modified maze environments. The results indicate that adequate exploration, well-designed reward functions, and diverse training data significantly impacted both training performance and game play outcomes. The findings suggest that DQN approaches are plausible solutions to stochastic outcomes, but expanding upon the proposed method and more research is needed to improve this methodology. This study highlights the need for further efforts in improving deep reinforcement learning applications in dynamic environments.
3

Scalable Deep Reinforcement Learning for a Multi-Agent Warehouse System

Khan, Akib, Loberg, Marcus January 2022 (has links)
This report presents an application of reinforcementlearning to the problem of controlling multiple robots performingthe task of moving boxes in a warehouse environment. The robotsmake autonomous decisions individually and avoid colliding witheach other and the walls of the warehouse. The problem is definedas a dynamical multi-agent system and a solution is reachedby applying the DQN algorithm. The solution is designed forachieving scalability, meaning that the trained robots are flexibleenough to be deployed in simulated environments of differentsizes and alongside a different number of robots. This wassuccessfully achieved by feature engineering. / Denna rapport presenterar en implementation av Reinforcement Learning som löser problemet med att styra flertalet robotar som utför uppgiften att flytta lådor i en lager miljö. Robotarna tar autonoma beslut individuellt och försöker att undvika att krocka med varandra och väggarna av lagerlokalen. Problemet definieras som ett dynamiskt multi-agent system och en lösning nås genom att tillämpa DQN algoritmen. Lösningen är utformad för att uppnå skalbarhet, vilket innebär att robotarna ska vara flexibla nog att agera i miljöer av antal robotar. Detta uppnåddes framgångsrikt genom att implementera funktionsextraktion. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm
4

Simulated Fixed-Wing Aircraft Attitude Control using Reinforcement Learning Methods

David Jona Richter (11820452) 20 December 2021 (has links)
<div>Autonomous transportation is a research field that has gained huge interest in recent years, with autonomous electric or hydrogen cars coming ever closer to seeing everyday use. Not just cars are subject to autonomous research though, the field of aviation is also being explored for fully autonomous flight. One very important aspect for making autonomous flight a reality is attitude control, the control of roll, pitch, and sometimes yaw. Traditional approaches for automated attitude control use PID (proportional-integral-derivative) controllers, which use hand-tuned parameters to fulfill the task. In this work, however, the use of Reinforcement Learning algorithms for attitude control will be explored. With the surge of more and more powerful artificial neural networks, which have proven to be universally usable function approximators, Deep Reinforcement Learning also becomes an intriguing option. </div><div>A software toolkit will be developed and used to allow for the use of multiple flight simulators to train agents with Reinforcement Learning as well as Deep Reinforcement Learning. Experiments will be run using different hyperparamters, algorithms, state representations, and reward functions to explore possible options for autonomous attitude control using Reinforcement Learning.</div>
5

Reinforcement Learning for Multi-Agent Strategy Synthesis Using Higher-Order Knowledge

Forsell, Gustav, Gergi, Shamoun January 2023 (has links)
Imagine for a moment we are living in the distant future where autonomous robots are patrollingthe streets as police officers. Two such robots are chasing a robber through the city streets. Fearingthe thief might listen in to any potential transmission, both robots remain radio silent and are thuslimited to a strictly visual pursuit. Since the robots cannot see the robber the entire time, they haveto deduce the potential location of the robber. What would the best strategy be for these robots toachieve their objective? This bachelor's thesis investigated the above example by creating strategies through reinforcementlearning. The thesis also investigated the performance of the players when they have differentabilities of deduction. This was tested by creating a suitable game and corresponding reinforcementlearning algorithm and running the simulations for different degrees of knowledge. The study provedthat reinforcement learning is a viable method for strategy construction, reaching nearly guaranteedvictory for cases when the agent knows everything about the environment and a slightly lower winratio when there is uncertainty introduced. The implementation yielded only a small gain in win ratiowhen the agents could deduce even more about each other. / Föreställ dig för ett ögonblick att vi lever i en avlägsen framtid där autonoma robotar patrullerar pågatorna som poliser. Två sådana robotar jagar en rånare genom stadens gator. Eftersom de är räddaför att tjuven kan lyssna på alla möjliga sändningar, förblir båda robotarna radiotysta och är därförbegränsade till en strikt visuell strävan. Eftersom robotarna inte kan se rånaren hela tiden, måste dehärleda den potentiella platsen för rånaren. Vilken skulle den bästa strategin vara för dessa robotarför att uppnå sitt mål? Denna kandidatuppsats undersökte ovanstående exempel genomskapa strategier genomförstärkningsinlärning. Avhandlingen undersökte också spelarnas prestationer när de har olikaavdragsförmåga. Detta testades genom att skapa ett lämpligt spel och motsvarandeförstärkningsinlärningsalgoritm och köra simuleringarna för olika kunskapsgrader. Studien visade attförstärkningsinlärning är en användbar metod för strategikonstruktion, och når nästan garanteradseger i fall då agenten vet allt om miljön och en något lägre vinstkvot när det finns osäkerhet.Implementeringen gav bara en liten vinst i vinstförhållandet när agenterna kunde härleda ännu merom varandra. / Kandidatexjobb i elektroteknik 2023, KTH, Stockholm
6

FLEXPOOL: A DISTRIBUTED MODEL-FREE DEEP REINFORCEMENT LEARNING ALGORITHM FOR JOINT PASSENGERS & GOODS TRANSPORTATION

Kaushik Bharadwaj Manchella (9706697) 15 December 2020 (has links)
<div>The growth in online goods delivery is causing a dramatic surge in urban vehicle traffic from last-mile deliveries. On the other hand, ride-sharing has been on the rise with the success of ride-sharing platforms and increased research on using autonomous vehicle technologies for routing and matching. The future of urban mobility for passengers and goods relies on leveraging new methods that minimize operational costs and environmental footprints of transportation systems. </div><div><br></div><div>This paper considers combining passenger transportation with goods delivery to improve vehicle-based transportation. Even though the problem has been studied with model-based approaches where the dynamic model of the transportation system environment is defined, model-free approaches where the dynamics of the environment are learned by interaction have been demonstrated to be adaptable to new or erratic environment dynamics. </div><div><br></div><div>FlexPool is a distributed model-free deep reinforcement learning algorithm that jointly serves passengers \& goods workloads by learning optimal dispatch policies from its interaction with the environment. The model-free algorithm (as opposed to a model-based one) is an algorithm which does not use the transition probability distribution (and the reward function) associated with the Markov decision process (MDP).</div><div> The proposed algorithm pools passengers for a ride-sharing service and delivers goods using a multi-hop routing method. These flexibilities decrease the fleet's operational cost and environmental footprint while maintaining service levels for passengers and goods. The dispatching algorithm based on deep reinforcement learning is integrated with an efficient matching algorithm for passengers and goods. Through simulations on a realistic urban mobility platform, we demonstrate that FlexPool outperforms other model-free settings in serving the demands from passengers \& goods. FlexPool achieves 30\% higher fleet utilization and 35\% higher fuel efficiency in comparison to (i) model-free approaches where vehicles transport a combination of passengers \& goods without the use of multi-hop transit, and (ii) model-free approaches where vehicles exclusively transport either passengers or goods. </div>
7

Investigating Multi-Objective Reinforcement Learning for Combinatorial Optimization and Scheduling Problems : Feature Identification for multi-objective Reinforcement Learning models / Undersökning av förstärkningsinlärning av flera mål för kombinatorisk optimering och schemaläggningsproblem : Funktionsidentifiering för förstärkningsinlärning av flera mål för kombinatorisk optimering och schemaläggningsproblem

Fridsén Skogsberg, Rikard January 2022 (has links)
Reinforcement Learning (RL) has in recent years become a core method for sequential decision making in complex dynamical systems, being of great interest to support improvements in scheduling problems. This could prove important to areas in the newer generation of cellular networks. One such area is the base stations scheduler which allocates radio resources to users. This is posed as large-scale optmization problem which needs to be solved in millisecond intervals, while at the same time accounting for multiple, sometimes conflicting, objectives like latency or Quality of Service requirements. In this thesis, multi-objective RL (MORL) solutions are proposed and evaluated in order to identify desired features for novel applications to the scheduling problem. The posed solution classes were tested in common MORL benchmark environments such as Deep Sea Treasure for efficient and informative evaluation of features. It was ultimately tested in environments to solve combinatorial optmization and scheduling problems. The results indicate that outer-loop multi-policy solutions are able to produce models that comply with desired features for scheduling. A multi-policy multi-objective deep Q-network was implemented and showed it can produce an adaptive-at-run-time discrete model, based on an outer-loop approach that calls a single-policy algorithm. The presented approach does not increase in complexity when adding objectives but generally requires larger sampling quantities for convergence. Differing scalarization techniques of the reward was tested, indicating effect on variance that could effect performance in certain environment characteristics. / Försärkningsinlärning som en gångbar metod för sekventiellt beslutsfattande i komplexa dynamiska system har ökat under de senaste åren tack vare förbättrade hårdvaru möjligheter. Intressenter av denna teknik finns bland annat inom telekom-indistrin vars aktörer har som mål att uteveckla nya generationens mobilnätverk. En av de grundläggande funktionerna i en basstation är scheduleraren vars uppgift är att allokera radio resurser till användare i nätverket. Detta ställs med fördel upp som ett optimeringsproblem som nödvändiggör att problemet måste lösas på millisekund nivå samtidigt som den kan ta flera typer av mål i beaktning, såsom QoS krav och latens. I detta examensarbete så presenteras och utvärderas förstärningsinlärnings algoritmer för flera mål inom flera lösningsklasser i syfte att identifiera önskvärda funktioner för nya tillämpningar inom radio resurs schemaläggning. De presenterade lösningsklasserna av algoritmer testades i vanligt förekommande riktmärkesmiljöer för denna typ av teknik såsom Deep Sea Treasure för att på effektivt sätt utvärdera de kvalitéer och funktioner varje algoritm har. Slutligen testades lösningen i miljöer inom kombinatorisk optimering och schemaläggning. Resultaten indikerar att fler-policy lösningar har kapaciteten att producera modeller som ligger inom de krav problemet kräver. Fler-policy modeller baserade på djupa Q-närverk av flera mål kunde framställa adaptiva, diskreta realtidsmodeller. Denna lösning ökar inte komplexiteten när fler mål läggs till men har generellt behov av större mängder samplade preferenser för att konvergera. Olika skaläriseringstekniker av belöningen testades och indikerade att dessa påverkade variansen, vilket i vissa typer av miljö konfigurationer påverkade resultaten.
8

Offline Reinforcement Learning for Downlink Link Adaption : A study on dataset and algorithm requirements for offline reinforcement learning. / Offline Reinforcement Learning för nedlänksanpassning : En studie om krav på en datauppsättning och algoritm för offline reinforcement learning

Dalman, Gabriella January 2024 (has links)
This thesis studies offline reinforcement learning as an optimization technique for downlink link adaptation, which is one of many control loops in Radio access networks. The work studies the impact of the quality of pre-collected datasets, in terms of how much the data covers the state-action space and whether it is collected by an expert policy or not. The data quality is evaluated by training three different algorithms: Deep Q-networks, Critic regularized regression, and Monotonic advantage re-weighted imitation learning. The performance is measured for each combination of algorithm and dataset, and their need for hyperparameter tuning and sample efficiency is studied. The results showed Critic regularized regression to be the most robust because it could learn well from any of the datasets that were used in the study and did not require extensive hyperparameter tuning. Deep Q-networks required careful hyperparameter tuning, but paired with the expert data it managed to reach rewards equally as high as the agents trained with Critic Regularized Regression. Monotonic advantage re-weighted imitation learning needed data from an expert policy to reach a high reward. In summary, offline reinforcement learning can perform with success in a telecommunication use case such as downlink link adaptation. Critic regularized regression was the preferred algorithm because it could perform great with all the three different datasets presented in the thesis. / Denna avhandling studerar offline reinforcement learning som en optimeringsteknik för nedlänks länkanpassning, vilket är en av många kontrollcyklar i radio access networks. Arbetet undersöker inverkan av kvaliteten på förinsamlade dataset, i form av hur mycket datan täcker state-action rymden och om den samlats in av en expertpolicy eller inte. Datakvaliteten utvärderas genom att träna tre olika algoritmer: Deep Q-nätverk, Critic regularized regression och Monotonic advantage re-weighted imitation learning. Prestanda mäts för varje kombination av algoritm och dataset, och deras behov av hyperparameterinställning och effektiv användning av data studeras. Resultaten visade att Critic regularized regression var mest robust, eftersom att den lyckades lära sig mycket från alla dataseten som användes i studien och inte krävde omfattande hyperparameterinställning. Deep Q-nätverk krävde noggrann hyperparameterinställning och tillsammans med expertdata lyckades den nå högst prestanda av alla agenter i studien. Monotonic advantage re-weighted imitation learning behövde data från en expertpolicy för att lyckas lära sig problemet. Det datasetet som var mest framgångsrikt var expertdatan. Sammanfattningsvis kan offline reinforcement learning vara framgångsrik inom telekommunikation, specifikt nedlänks länkanpassning. Critic regularized regression var den föredragna algoritmen för att den var stabil och kunde prestera bra med alla tre olika dataseten som presenterades i avhandlingen.

Page generated in 0.0583 seconds