261 |
Autonoma drönare : modifiering av belöningsfunktionen i airsim / Autonomous Drones : modification of the reward function in airsimDzeko, Elvir, Carlsson, Markus January 2018 (has links)
Inom det heta forskningsområdet med självflygande drönare sker det en kontinuerlig utveckling både inom forskningen och inom industrin. Det finns flera forskningsproblem kring autonoma fordon, inklusive autonom styrning av drönare. Ett intressant spår för autonom styrning av drönare, är via deep reinforcement learning, dvs. en kombination av djupa neuronnät med reinforcement learning. Problemen som ofta uppkommer är tidskrävande träning, ineffektiv manövrering och problem med oförutsägbarhet och säkerhet. Även höga kostnader kan vara ett problem. Med hjälp av simuleringsprogrammet AirSim har vi fått en möjlighet att testa aktuella algoritmer utan hänsyn till kostnader och andra begränsande faktorer som kan utgöra svårigheter för att arbeta inom detta område. Microsofts egenutvecklade simulator AirSim tillåter användare att via deras applikationsprogrammeringsgränssnitt kommunicera med drönaren i programmet, vilket gör det möjligt att testa olika algoritmer. Frågeställningen som berörs är hur kan den existerande belöningsfunktionen i AirSim simulatorn förbättras med avseende på att undvika hinder och förflytta drönaren från start till mål. Målet med undersökningen är att studera och förbättra AirSims existerande Deep Q-Network algoritm med fokus på belöningsfunktionen och testa den i olika simulerade miljöer. Med hjälp av två olika experiment som utförts i två olika miljöer, observerades belöningen, antalet kollisioner och beteendet agenten hade i simulatorn. Vi lyckades inte få fram tillräckligt med data för att kunna mäta en tydlig förbättring av den modifierade belöningsfunktionens utvärderingsmått, dock kan vi säga att vi lyckades utveckla en belöningsfunktion som presterar bra genom att den undviker hinder och tar sig till mål. För att kunna jämföra vilken av belöningsfunktionerna som är bättre, behövs mer forskning inom ämnet. Med de problem som fanns med att samla in data är slutsatsen att vi inte lyckades förbättra algoritmen då vi vet inte om den presterar bättre eller sämre än den existerande belöningsfunktionen. / Drones are growing popular and so is the research within the field of autonomous drones. There are several research problems around autonomous vehicles overall, but one interesting problem covered by this study is the autonomous manoeuvring of drones. One interesting path for autonomous drones is through deep reinforcement learning, which is a combination of deep neural networks and reinforcement learning. Problems that researchers often encounter within the field stretch from time consuming training, effective manoeuvring to problems with unpredictability and security. Even high costs of testing can be an issue. With the help of simulation programs, we are able to test algorithms without any concerns to cost or other real-world factors that could limit our work. Microsoft’s own simulator AirSim lets users control the vehicle in their simulator through an application programming interface, which enables the possibility to test a variety of algorithms. The research question addressed in this study is how can the pre-existing reward function be improved on avoiding obstacles and move the drone from start to goal. The goal of this study is to find improvements on AirSim’s pre-existing Deep Q-Network algorithm’s reward function and test it in two different simulated environments. By conducting several experiments and storing evaluation metrics produced by the agents, it was possible to observe a result. The observed evaluation metrics included the average reward that the agent received over time, number of collisions and overall performance in the respective environment. We were not successfully able to gather enough data to measure an improvement of the evaluation metrics for the modified reward function. The modified function that was created performed well but did not display any substantially improved performance. To be able to successfully compare if one reward function is better than the other more research needs to be done. With the difficulties of gathering data, the conclusion is that we created a reward function that we can’t tell if it is better or worse than the benchmark reward function.
|
262 |
Solution Of Delayed Reinforcement Learning Problems Having Continuous Action SpacesRavindran, B 03 1900 (has links) (PDF)
No description available.
|
263 |
Using Reinforcement Learning in Partial Order Plan SpaceCeylan, Hakan 05 1900 (has links)
Partial order planning is an important approach that solves planning problems without completely specifying the orderings between the actions in the plan. This property provides greater flexibility in executing plans; hence making the partial order planners a preferred choice over other planning methodologies. However, in order to find partially ordered plans, partial order planners perform a search in plan space rather than in space of world states and an uninformed search in plan space leads to poor efficiency. In this thesis, I discuss applying a reinforcement learning method, called First-visit Monte Carlo method, to partial order planning in order to design agents which do not need any training data or heuristics but are still able to make informed decisions in plan space based on experience. Communicating effectively with the agent is crucial in reinforcement learning. I address how this task was accomplished in plan space and the results from an evaluation of a blocks world test bed.
|
264 |
Nuclear Renewable Integrated Energy System Power Dispatch Optimization forTightly Coupled Co-Simulation Environment using Deep Reinforcement LearningSah, Suba January 2021 (has links)
No description available.
|
265 |
Domain Transfer for End-to-end Reinforcement Learning / Domain Transfer for End-to-end Reinforcement LearningOlsson, Anton, Rosberg, Felix January 2020 (has links)
In this master thesis project a LiDAR-based, depth image-based and semantic segmentation image-based reinforcement learning agent is investigated and compared forlearning in simulation and performing in real-time. The project utilize the Deep Deterministic Policy Gradient architecture for learning continuous actions and was designed to control a RC car. One of the first project to deploy an agent in a real scenario after training in a similar simulation. The project demonstrated that with a proper reward function and by tuning driving parameters such as restricting steering, maximum velocity, minimum velocity and performing input data scaling a LiDAR-based agent could drive indefinitely on a simple but completely unseen track in real-time.
|
266 |
Entwickeln eines Reinforcement Learning Agenten zur Realisierung eines SchifffolgemodellsZiebarth, Paul 23 November 2021 (has links)
Die Arbeit ist Teil eines aktuellen Forschungsprojekts, bei der ein dynamischer zweidimensionaler Verkehrsflusssimulator zur Beschreibung der Binnenschifffahrt auf einer ca. 220 km langen Strecke auf dem Niederrhein entwickelt werden soll. Ziel dieser Arbeit ist es, ein Schifffolgemodell mithilfe von Deep Learning Ansätzen umzusetzen und mittels geeigneter Beschleunigung ein kollisionsfreies Folgen zu realisieren. Dabei sind die gesetzlichen Randbedingungen (Verkehrsregeln, Mindestabstände) sowie hydrodynamische und physikalische Gesetzmäßigkeiten wie minimale und maximale Beschleunigungen und Geschwindigkeiten zu berücksichtigen.
Nach der Analyse des Systems sowie der notwendigen Parameter, wird ein Modell entworfen und die Modellparameter bestimmt. Unter Berücksichtigung der Modellparameter wird ein Agent ausgewählt und das System in MATLAB implementiert. Die Parameter sind so gestaltet, dass sich damit ein allgemeines Folgemodell ergibt und beispielsweise auch ein Autofolgemodell realisieren lässt.:1 Einleitung
1.1 Ziel der Arbeit
1.2 Aufbau der Arbeit
2 Stand der Technik
2.1 Traditionelle Folgemodelle
2.2 Reinforcement Learning
2.2.1 Modell
2.2.2 State-value function
2.3 Deep Reinforcement Learning
2.3.1 Künstliches neuronales Netz
3 Mathematische Grundlagen
3.1 Künstliche Neuronen
3.1.1 Aktivierungsfunktionen
3.2 Normierung
3.3 Funktionstypen
4 Analyse
4.1 Analyse der Systemfunktionen der Software
5 Modell
5.1 Aufbau
5.2 Approximatoren
5.3 Parameter
5.4 Szenarien
6 Agent
6.1 Auswahl des Agenten
6.2 Twin-Delayed Deterministic Policy Gradient (TD3)
7 Implementierung
7.1 Environment
7.1.1 Rewardfunktion
7.2 Agent
7.2.1 Netzwerkarchitektur
7.2.1.1 Actor-Netzwerk
7.2.1.2 Critic-Netzwerk
7.2.1.3 Rauschprozesse
7.3 Hyperparameter
7.4 Sonstige Parameter
8 Trainingsprozess 45
8.1 Ornstein-Uhlenbeck-Prozess
8.2 Algorithmus
9 Validierung
9.1 Fahrverhalten bei verschiedenen Charakteristika
9.2 Vergleich mit dem Intelligent Driver Model
10 Zusammenfassung und Ausblick
Literaturverzeichnis
|
267 |
MARS: Multi-Scalable Actor-Critic Reinforcement Learning SchedulerBaheri, Betis 24 July 2020 (has links)
No description available.
|
268 |
Deep reinforcement learning for automated building climate controlSnällfot, Erik, Hörnberg, Martin January 2024 (has links)
The building sector is the single largest contributor to greenhouse gas emissions, making it a natural focal point for reducing energy consumption. More efficient use of energy is also becoming increasingly important for property managers as global energy prices are skyrocketing. This report is conducted on behalf of Sustainable Intelligence, a Swedish company that specializes in building automation solutions. It investigates whether deep reinforcement learning (DLR) algorithms can be implemented in a building control environment, if it can be more effective than traditional solutions, and if it can be achieved in reasonable time. The algorithms that were tested were Deep Deterministic Policy Gradient, DDPG, and Proximal Policy Optimization, PPO. They were implemented in a simulated BOPTEST environment in Brussels, Belgium, along with a traditional heating curve and a PI-controller for benchmarks. DDPG never converged, but PPO managed to reduce energy consumption compared to the best benchmark, while only having slightly worse thermal discomfort. The results indicate that DRL algorithms can be implemented in a building environment and reduce green house gas emissions in a reasonable training time. This might especially be interesting in a complex building where DRL can adapt and scale better than traditional solutions. Further research along with implementations on physical buildings need to be done in order to determine if DRL is the superior option.
|
269 |
Towards Provable Guarantees for Learning-based Control ParadigmsShanelle Gertrude Clarke (14247233) 12 December 2022 (has links)
<p> Within recent years, there has been a renewed interest in developing data-driven learning based algorithms for solving longstanding challenging control problems. This interest is primarily motivated by the availability of ubiquitous data and an increase in computational resources of modern machines. However, there is a prevailing concern on the lack of provable performance guarantees on data-driven/model-free learning based control algorithms. This dissertation focuses the following key aspects: i) with what facility can state-of-the-art learning-based control methods eke out successful performance for challenging flight control applications such as aerobatic maneuvering?; and ii) can we leverage well-established tools and techniques in control theory to provide some provable guarantees for different types of learning-based algorithms? </p>
<p>To these ends, a deep RL-based controller is implemented, via high-fidelity simulations, for Fixed-Wing aerobatic maneuvering. which shows the facility with which learning-control methods can eke out successful performances and further encourages the development of learning-based control algorithms with an eye towards providing provable guarantees.<br>
</p>
<p>Two learning-based algorithms are also developed: i) a model-free algorithm which learns a stabilizing optimal control policy for the bilinear biquadratic regulator (BBR) which solves the regulator problem with a biquadratic performance index given an unknown bilinear system; and ii) a model-free inverse reinforcement learning algorithm, called the Model-Free Stochastic inverse LQR (iLQR) algorithm, which solves a well-posed semidefinite programming optimization problem to obtain unique solutions on the linear control gain and the parameters of the quadratic performance index given zero-mean noisy optimal trajectories generated by a linear time-invariant dynamical system. Theoretical analysis and numerical results are provided to validate the effectiveness of all proposed algorithms.</p>
|
270 |
Individual differences in structure learningNewlin, Philip 13 May 2022 (has links)
Humans have a tendency to impute structure spontaneously even in simple learning tasks, however the way they approach structure learning can vary drastically. The present study sought to determine why individuals learn structure differently. One hypothesized explanation for differences in structure learning is individual differences in cognitive control. Cognitive control allows individuals to maintain representations of a task and may interact with reinforcement learning systems. It was expected that individual differences in propensity to apply cognitive control, which shares component processes with hierarchical reinforcement learning, may explain how individuals learn structure differently in a simple structure learning task. Results showed that proactive control and model-based control explained differences in the rate at which individuals applied structure learning.
|
Page generated in 0.0894 seconds