Global ETD Search

11	Autonomous UAV Path Planning using RSS signals in Search and Rescue Operations Anhammer, Axel, Lundeberg, Hugo January 2022 (has links) Unmanned aerial vehicles (UAVs) have emerged as a promising technology in search and rescue operations (SAR). UAVs have the ability to provide more timely localization, thus decreasing the crucial duration of SAR operations. Previous work have demonstrated proof-of-concept in regard to localizing missing people by utilizing received signal strength (RSS) and UAVs. The localization system is based on the assumption that the missing person wears an enabled smartphone whose Wi-Fi signal can be intercepted. This thesis proposes a two-staged path planner for UAVs, utilizing RSS-signals and an initial belief regarding the missing person's location. The objective of the first stage is to locate an RSS-signal. By dividing the search area into grids, a hierarchical solution based on several Markov decision processes (MDPs) can be formulated which takes different areas probabilities into consideration. The objective of the second stage is to isolate the RSS-signal and provide a location estimate. The environment is deemed to be partially observable, and the problem is formulated as a partially observable Markov decision process (POMDP). Two different filters, a point mass filter (PMF) and a particle filter (PF), are evaluated in regard to their ability to correctly estimate the state of the environment. The state of the environment then acts as input to a deep Q-network (DQN) which selects appropriate actions for the UAV. Thus, the DQN becomes a path planner for the UAV and the trajectory it generates is compared to trajectories generated by, among others, a greedy-policy. Results for Stage 1 demonstrate that the path generated by the MDPs prioritizes areas with higher probability, and intuitively seems very reasonable. The results also illustrate potential drawbacks with a hierarchical solution, which potentially can be addressed by considering more factors into the problem. Simulation results for Stage 2 show that both a PMF and a PF can successfully be used to estimate the state of the environment and provide an accurate localization estimate. The PMF generated slightly more accurate estimations compared to the PF. The DQN is successful in isolating the missing person's probable location, by relatively few actions. However, it only performs marginally better than the greedy policy, indicating that it may be a complicated solution to a simpler problem. UAV DQN Deep Q Network particle filter point mass filter MDP POMDP Markov decision process Control Engineering Reglerteknik
12	Reinforcement Learning for Grid Voltage Stability with FACTS Oldeen, Joakim, Sharma, Vishnu January 2020 (has links) With increased penetration of renewable energy sources, maintaining equilibrium between production and consumption in the world’s electrical power systems (EPS) becomes more and more challenging. One way to increase stability and efficiency in an EPS is to use flexible alternating current transmission systems (FACTS). However, an EPS containing multiple FACTS-devices with overlapping areas of influence can lead to negative effects if the reference values they operate around are not updated with sufficient temporal resolution. The reference values are usually set manually by a system operator. The work in this master thesis has investigated how three different reinforcement learning (RL) algorithms can be used to set reference values automatically with higher temporal resolution than a system operator with the aim of increased voltage stability. The three RL algorithms – Q-learning, Deep Q-learning (DQN), and Twindelayed deep deterministic policy gradient (TD3) – were implemented in Python together with a 2-bus EPS test network acting as environment. The 2-bus EPS test network contain two FACTS devices: one for shunt compensation and one for series compensation. The results show that – with respect to reward – DQN was able to perform equally or better than non-RL cases 98.3 % of the time on the simulation test set, while corresponding values for TD3 and Q-learning were 87.3 % and 78.5 % respectively. DQN was able to achieve increased voltage stability on the test network while TD3 showed similar results except during lower loading levels. Q-learning decreased voltage stability on a substantial portion of the test set, even compared to a case without FACTS devices. To help with continued research and possible future real life implementation, a list of suggestions for future work has been established. Reinforcement learning Machine learning Q-learning DQN TD3 Electrical power systems Voltage stability FACTS Computer Sciences Datavetenskap (datalogi) Annan elektroteknik och elektronik Engineering and Technology Teknik och teknologier
13	Simulating market maker behaviour using Deep Reinforcement Learning to understand market microstructure / En simulering av aktiemarknadens mikrostruktur via självlärande finansiella agenter Marcus, Elwin January 2018 (has links) Market microstructure studies the process of exchanging assets underexplicit trading rules. With algorithmic trading and high-frequencytrading, modern financial markets have seen profound changes in marketmicrostructure in the last 5 to 10 years. As a result, previously establishedmethods in the field of market microstructure becomes oftenfaulty or insufficient. Machine learning and, in particular, reinforcementlearning has become more ubiquitous in both finance and otherfields today with applications in trading and optimal execution. This thesisuses reinforcement learning to understand market microstructureby simulating a stock market based on NASDAQ Nordics and trainingmarket maker agents on this stock market. Simulations are run on both a dealer market and a limit orderbook marketdifferentiating it from previous studies. Using DQN and PPO algorithmson these simulated environments, where stochastic optimal controltheory has been mainly used before. The market maker agents successfullyreproduce stylized facts in historical trade data from each simulation,such as mean reverting prices and absence of linear autocorrelationsin price changes as well as beating random policies employed on thesemarkets with a positive profit & loss of maximum 200%. Other tradingdynamics in real-world markets have also been exhibited via theagents interactions, mainly: bid-ask spread clustering, optimal inventorymanagement, declining spreads and independence of inventory and spreads, indicating that using reinforcement learning with PPO and DQN arerelevant choices when modelling market microstructure. / Marknadens mikrostruktur studerar hur utbytet av finansiella tillgångar sker enligt explicita regler. Algoritmisk och högfrekvenshandel har förändrat moderna finansmarknaders strukturer under de senaste 5 till 10 åren. Detta har även påverkat pålitligheten hos tidigare använda metoder från exempelvis ekonometri för att studera marknadens mikrostruktur. Maskininlärning och Reinforcement Learning har blivit mer populära, med många olika användningsområden både inom finans och andra fält. Inom finansfältet har dessa typer av metoder använts främst inom handel och optimal exekvering av ordrar. I denna uppsats kombineras både Reinforcement Learning och marknadens mikrostruktur, för att simulera en aktiemarknad baserad på NASDAQ i Norden. Där tränas market maker - agenter via Reinforcement Learning med målet att förstå marknadens mikrostruktur som uppstår via agenternas interaktioner. I denna uppsats utvärderas och testas agenterna på en dealer – marknad tillsammans med en limit - orderbok. Vilket särskiljer denna studie tillsammans med de två algoritmerna DQN och PPO från tidigare studier. Främst har stokastisk optimering använts för liknande problem i tidigare studier. Agenterna lyckas framgångsrikt med att återskapa egenskaper hos finansiella tidsserier som återgång till medelvärdet och avsaknad av linjär autokorrelation. Agenterna lyckas också med att vinna över slumpmässiga strategier, med maximal vinst på 200%. Slutgiltigen lyckas även agenterna med att visa annan handelsdynamik som förväntas ske på en verklig marknad. Huvudsakligen: kluster av spreads, optimal hantering av aktielager och en minskning av spreads under simuleringarna. Detta visar att Reinforcement Learning med PPO eller DQN är relevanta val vid modellering av marknadens mikrostruktur. Deep Reinforcement Learning Machine Learning Market Microstructure Market Maker Financial Agent Agent Based Modelling Financial Artificial Markets Complex Systems Algorithmic Trading Tensorforce keras-RL PPO DQN Dealer Market Limit Order book Computer Sciences Datavetenskap (datalogi)
14	Modelling Cyber Security of Networks as a Reinforcement Learning Problem using Graphs : An Application of Reinforcement Learning to the Meta Attack Language / Cybersäkerhet för datornätverk representerat som ett förstärkningsinlärningsproblem med grafer : Förstärkningsinlärning applicerat på Meta Attack Language Berglund, Sandor January 2022 (has links) ICT systems are part of the vital infrastructure in today’s society. These systems are under constant threat and efforts are continually being put forth by cyber security experts to protect them. By applying modern AI methods, can these efforts both be improved and alleviated of the cost of expert work. This thesis examines whether a reinforcement learning (RL) algorithm can be applied to a cyber security modelling of ICT systems. The research question answered is that of how well an RL algorithm can optimise the resource cost of successful cyber attacks, as represented by a cyber security model? The modelling, called Meta Attack Language (MAL), is a meta language for attack graphs that details the individual steps to be taken in a cyber attack. In the previous work of Manuel Rickli’s thesis, a method of automatically generating attack graphs according to MAL aimed at modelling industry-level computer networks, was presented. The method was used to generate different distributions of attack graphs that were used to train deep Q-learning (DQN) agents. The agents’ results were then compared with a random agent and a greedy method based on the A∗ search algorithm. The results show that attack step selection can be achieved with a higher performance than the uninformed choice of the random agent, by DQN. However, DQN was unable to achieve higher performance than the A∗ method. This may be due to the simplicity of the attack graph generation or the fact that the A∗ method has access to the complete attack graph, amongst other factors. The thesis also raises questions about general representation of MAL attack graphs as RL problems and how to apply RL algorithms to the RL problem. The source code of this thesis is available at: https://github.com/KTH-SSAS/sandor-berglund-thesis. / IT-system är i dagens samhälle en väsentlig del av infrastrukturen som är under konstant hot av olika personer och organisationer. IT-säkerhetsexperter lägger ner beständigt arbete på att hålla dessa system säkra och för att avvärja illvilliga auktioner mot IT-system. Moderna AI-metoder kan användas för att förbättra och lätta på kostnaden av expertarbetet inom området. Detta examensarbete avser att undersöka hur en förstärkningsinlärningsalgoritm kan appliceras på en cybersäkerhetsmodell. Det görs genom att besvara frågeställningen: Hur väl kan en förstärkningsinlärningsalgoritm optimera en cyberattack representerat av en cybersäkerhetsmodell? Meta Attack Language (MAL) är ett metaspråk för attackgrafer som beskriver varje steg i en cyberattack. I detta examensarbete användes Manuell Ricklis implementation av MAL samt attack grafs generation för att definiera ett förstärkningsinlärningsproblem. Förstärkningsinlärningsalgoritmen deep Q-learning (DQN) användes för att träna ett attention baserat neuronnät på olika fördelningar av attackgrafer och jämfördes med en slumpmässig agent och en girig metod baserad på sökalgoritmen A∗ . Resultaten visar att DQN kunde producera en agent som presterar bättre än den oinformerade slumpmässiga agenten. Agenten presterade däremot inte bättre än den giriga A∗ metoden, vilket kan bero på att A∗ har tillgång till den fulla attack grafen, bland andra bidragande faktorer. Arbetet som läggs fram här väcker frågor om hur MAL-attackgrafer representeras som förstärkningsinlärningsproblem och hur förstärkningsinlärningsalgoritmer appliceras där av. Källkoden till det här examensarbetet finns på: https://github.com/KTHSSAS/sandor-berglund-thesis. Attack graphs reinforcement learning graph neural networks Meta Attack Language MAL deepQ-learning (DQN) Attackgrafer förstärningsinlärning artificiella neuronnät grafneuronnät djup Qinlärning Meta Attack Language MAL Computer and Information Sciences Data- och informationsvetenskap
15	Evaluation of Deep Q-Learning Applied to City Environment Autonomous Driving Wedén, Jonas January 2024 (has links) This project’s goal was to assess both the challenges of implementing the Deep Q-Learning algorithm to create an autonomous car in the CARLA simulator, and the driving performance of the resulting model. An agent was trained to follow waypoints based on two main approaches. First, a camera-based approach, which allowed the agent to gather information about the environment from a camera sensor. The image along with other driving features were fed to a convolutional neural network. Second, an approach focused purely on following the waypoints without the camera sensor. The camera sensor was substituted for an array containing the agent’s angle with respect to the upcoming waypoints along with other driving features. Even though the camera-based approach was the best during evaluation, no approach was successful in consistently following the waypoints of a straight route. To increase the performance of the camera-based approach more training episodes need to be provided. Furthermore, both approaches would greatly benefit from experimentation and optimization of the model’s neural network configuration and its hyperparameters. Machine Learning ML Reinforcement Learning RL Neural Network Deep Learning Autonomous Vehicle Vehicle CARLA Convolutional Neural Network CNN Precisit Q-learning Deep Q-learning DQN Computer Sciences Datavetenskap (datalogi)
16	Exploring feasibility of reinforcement learning flight route planning / Undersökning av använding av förstärkningsinlärning för flyruttsplannering Wickman, Axel January 2021 (has links) This thesis explores and compares traditional and reinforcement learning (RL) methods of performing 2D flight path planning in 3D space. A wide overview of natural, classic, and learning approaches to planning s done in conjunction with a review of some general recurring problems and tradeoffs that appear within planning. This general background then serves as a basis for motivating different possible solutions for this specific problem. These solutions are implemented, together with a testbed inform of a parallelizable simulation environment. This environment makes use of random world generation and physics combined with an aerodynamical model. An A* planner, a local RL planner, and a global RL planner are developed and compared against each other in terms of performance, speed, and general behavior. An autopilot model is also trained and used both to measure flight feasibility and to constrain the planners to followable paths. All planners were partially successful, with the global planner exhibiting the highest overall performance. The RL planners were also found to be more reliable in terms of both speed and followability because of their ability to leave difficult decisions to the autopilot. From this it is concluded that machine learning in general, and reinforcement learning in particular, is a promising future avenue for solving the problem of flight route planning in dangerous environments. SAAB flight route planning autorouting auto-routing auto routing AI machine learning fighter jet convolution PPO DQN Astar A* C++ Python LibTorch PyTorch multi threading multi-threading simulation aerodynamics world generation Perlin noise investigation reward Flygplanering flygruttsplannering maskininlärning AI SAAB faltning faltningslager belöning

Page generated in 0.0404 seconds