Global ETD Search

1	Extracting Behaviour Trees from Deep Q-Networks : Using learning from demostration to transfer knowledge between models. / Extraktion av beteendeträd från djupa Q-nätverk Nordström, Zacharias January 2020 (has links) In recent years the advancement in machine learning have solved more and more complex problems. But still these techniques are not commonly used in the industry. One problem is that many of the techniques are black boxes, it is hard to analyse them to make sure that their behaviour is safe. This property makes them unsuitable for safety critical systems. The goal of this thesis is to examine if the deep learning technique Deep Q-network could be used to create a behaviour tree that can solve the same problem. A behaviour tree is a tree representation of a flow structure that is used for representing behaviours, often used in video games or robotics. To solve the problem two simulators are used, one models a cart that shall balance a pole called cart pole, the other is a static world which needs to be navigated called grid world. Inspiration is taken from the learning from demonstration field to use the Deep Q-network as a teacher and then create a decision tree. During the creation of the decision tree two attributes are used for pruning; to look at the trees accuracy or performance. The thesis then compare three techniques, called Naive, BT Espresso, and BT Espresso Simplified. The techniques are used to transform the extracted decision tree into a behaviour tree. When it comes to the performance of the created behaviour trees they all manage to complete the simulator scenarios in the same, or close to, capacity as the trained Deep Q-network. The trees created from the performance pruned decision tree are generally smaller and less complex, but they have worse accuracy. For cart pole the trees created from the accuracy pruned tree has around 10 000 nodes but the performance pruned trees have around 10-20 nodes. The difference in grid world is smaller going from 35-45 nodes to 40-50 nodes. To get the smallest tree with the best performance then the performance pruned tree should be used with the BT Espresso Simplified algorithm. This thesis have shown that it is possible to use knowledge from a trained Deep Q-network model to create a Behaviour tree that can complete the same task. / Under de senaste åren har ett antal framsteg inom maskininlärning gjorts vilket har lett till att mer och mer komplexa problem har kunnat lösas. Dock är dessa tekniker ofta inte använda av industrin. Ett av problemen är att många av de bättre teknikerna beter sig som svarta lådor, det är väldigt svårt att analyser vad de kommer att göra. Denna egenskap gör att de inte är lämpliga att användas i säkerhetskritiska system. Målet med denna avhandling är att undersöka möjligheten att använda den djupa inlärningstekniken djupa q-nätverk kan användas för att skapa ett beteendeträd som är kapabelt att lösa samma problem. Ett beteendeträd är en flödesstruktur som används för att representera beteenden, ofta använt i dataspel eller för robotar. För att undersöka problemet så används två simulatorer, den ena modellerar en vagn som ska balansera en stav och kallas vagnstav (cart pole). Den andra simulatorn är en statisk värld där målet för agenten är att ta sig till en definierad målplats, vilken kallas rutvärld (grid world). För att lösa problemet tas inspiration från ett angränsande fält kallat inlärning från demonstration. Istället för att använda en mänsklig lärare ansätts det djupa q-nätverket som lärare och används för att skapa ett beslutsträd. Beslutsträdet är sedan reducerat genom att kolla på trädets träffsäkerhet eller hur mycket belöning trädet får. Tre tekniker jämförs för att transformera beslutsträdet till ett beteendeträd, teknikerna heter Naiv, BT Espresso och BT Espresso förenklad. Alla skapade beteendeträd lyckas klara av problemet i simulatorn de är skapade för. De hade liknande prestanda som det djupa q-nätverket. När beslutsträden var reducerat på belöning resulterade det i generellt mindre beteendeträd, dock så hade de inte full träffsäkerhet mot det djupa q-nätverket. För vagnstav simulatorn hade beteendeträden som skapats från träffsäkerhets beslutsträden runt 10 000 noder, mot belönings kapade träd som hade runt 10–20 noder. I rutvärlden var skillnaden mindre med 40–50 noder för träd skapade från träffsäkerhet reducerade beslutsträde och 35–45 noder för belöning reducerade beslutsträd. Denna avhandling har påvisat att det går att skapa beteende träd från en tränad djup q-nätverksmodell för ett scenario och om det minsta trädet som klarar scenariot är att önskat bör belönings reducerade beslutsträd användas med BT Espresso förenkling algoritmen. Behaviour tree deep q-network extraction Computer Sciences Datavetenskap (datalogi)
2	A review of Q-learning methods for Markov decision processes Blizzard, Christopher, Wiktorsson, Emil January 2024 (has links) This paper discusses how Q-Learning and Deep Q-Networks (DQN) canbe applied to state-action problems described by a Markov decision process(MDP). These are machine learning methods for finding the optimal choiceof action at each time step, resulting in the optimal policy. The limitationsand advantages for the two methods are discussed, with the main limitationbeing the fact that Q-learning is unable to be used on problems with infinitestate spaces. Q-learning, however, has an advantage in the simplicity of thealgorithm, leading to a better understanding of what the algorithm is actuallydoing. Q-Learning did manage to find the optimal policy for the simpleproblem studied in this paper, but was unable to do so for the advancedproblem. The Deep Q-Network (DQN) approach was able to solve bothproblems, with a drawback in it being harder to understand what the algorithmactually is doing. Q-Learning Deep Q-Network Markov Decision Process Mathematics Matematik
3	Federated Machine Learning for Resource Allocation in Multi-domain Fog Ecosystems Zhang, Weilin January 2023 (has links) The proliferation of the Internet of Things (IoT) has increasingly demanded intimacy between cloud services and end-users. This has incentivised extending cloud resources to the edge in what is deemed fog computing. The latter is manifesting as an ecosystem of connected clouds, geo-dispersed and of diverse capacities. In such conditions, workload allocation to fog services becomes a non-trivial challenge due to the complexity of trade-offs. Users' demand at the edge is highly diverse, which does not lend itself to straightforward resource planning. Conversely, running services at the edge may leverage proximity, but it comes at higher operational cost let alone rapidly increasing the risk of straining sparse resources. Consequently, there is a need for intelligent yet scalable allocation solutions that counter the adversity of demand at the edge, while efficiently distributing load between the edge and farther clouds. Machine learning is increasingly adopted in resource planning. However, besides privacy concerns, central learning is highly demanding, both computationally and in data supply. Instead, this paper proposes a federated deep reinforcement learning system, based on deep Q-learning network (DQN), for workload distribution in a fog ecosystem. The proposed solution adapts a DQN to optimize local workload allocations, made by single gateways. Federated learning is incorporated to allow multiple gateways in a network to collaboratively build knowledge of users' demand. This is leveraged to establish consensus on the fraction of workload allocated to different fog nodes, using lower data supply and computation resources. The system performance is evaluated using realistic demand set from Google Cluster Workload Traces 2019. Evaluation results show over 50% reduction in failed allocations when distributing users over larger number of gateways, given fixed number of fog nodes. The results further illustrate the trade-offs between performance and cost under different conditions. Workload Allocation Federated Learning Deep Q-network Fog networks Federated Average Aggregation Engineering and Technology Teknik och teknologier
4	Spectrum Management in Dynamic Spectrum Access: A Deep Reinforcement Learning Approach Song, Hao January 2019 (has links) Dynamic spectrum access (DSA) is a promising technology to mitigate spectrum shortage and improve spectrum utilization. However, DSA users have to face two fundamental issues, interference coordination between DSA users and protections to primary users (PUs). These two issues are very challenging, since generally there is no powerful infrastructure in DSA networks to support centralized control. As a result, DSA users have to perform spectrum managements, including spectrum access and power allocations, independently without accurate channel state information. In this thesis, a novel spectrum management approach is proposed, in which Q-learning, a type of reinforcement learning, is utilized to enable DSA users to carry out effective spectrum managements individually and intelligently. For more efficient processes, powerful neural networks (NNs) are employed to implement Q-learning processes, so-called deep Q-network (DQN). Furthermore, I also investigate the optimal way to construct DQN considering both the performance of wireless communications and the difficulty of NN training. Finally, extensive simulation studies are conducted to demonstrate the effectiveness of the proposed spectrum management approach. / Generally, in dynamic spectrum access (DSA) networks, co-operations and centralized control are unavailable and DSA users have to carry out wireless transmissions individually. DSA users have to know other users’ behaviors by sensing and analyzing wireless environments, so that DSA users can adjust their parameters properly and carry out effective wireless transmissions. In this thesis, machine learning and deep learning technologies are leveraged in DSA network to enable appropriate and intelligent spectrum managements, including both spectrum access and power allocations. Accordingly, a novel spectrum management framework utilizing deep reinforcement learning is proposed, in which deep reinforcement learning is employed to accurately learn wireless environments and generate optimal spectrum management strategies to adapt to the variations of wireless environments. Due to the model-free nature of reinforcement learning, DSA users only need to directly interact with environments to obtain optimal strategies rather than relying on accurate channel estimations. In this thesis, Q-learning, a type of reinforcement learning, is adopted to design the spectrum management framework. For more efficient and accurate learning, powerful neural networks (NN) is employed to combine Q-learning and deep learning, also referred to as deep Q-network (DQN). The selection of NNs is crucial for the performance of DQN, since different types of NNs possess various properties and are applicable for different application scenarios. Therefore, in this thesis, the optimal way to construct DQN is also analyzed and studied. Finally, the extensive simulation studies demonstrate that the proposed spectrum management framework could enable users to perform proper spectrum managements and achieve better performance. Dynamic spectrum access spectrum management reinforcement learning deep Q-network echo state networks.
5	Quantile Regression Deep Q-Networks for Multi-Agent System Control Howe, Dustin 05 1900 (has links) Training autonomous agents that are capable of performing their assigned job without fail is the ultimate goal of deep reinforcement learning. This thesis introduces a dueling Quantile Regression Deep Q-network, where the network learns the state value quantile function and advantage quantile function separately. With this network architecture the agent is able to learn to control simulated robots in the Gazebo simulator. Carefully crafted reward functions and state spaces must be designed for the agent to learn in complex non-stationary environments. When trained for only 100,000 timesteps, the agent is able reach asymptotic performance in environments with moving and stationary obstacles using only the data from the inertial measurement unit, LIDAR, and positional information. Through the use of transfer learning, the agents are also capable of formation control and flocking patterns. The performance of agents with frozen networks is improved through advice giving in Deep Q-networks by use of normalized Q-values and majority voting. reinforcement learning deep Q-network machine learning deep learning transfer learning multi-agent formation control ROS
6	[en] ENABLING AUTONOMOUS DATA ANNOTATION: A HUMAN-IN-THE-LOOP REINFORCEMENT LEARNING APPROACH / [pt] HABILITANDO ANOTAÇÕES DE DADOS AUTÔNOMOS: UMA ABORDAGEM DE APRENDIZADO POR REFORÇO COM HUMANO NO LOOP LEONARDO CARDIA DA CRUZ 10 November 2022 (has links) [pt] As técnicas de aprendizado profundo têm mostrado contribuições significativas em vários campos, incluindo a análise de imagens. A grande maioria dos trabalhos em visão computacional concentra-se em propor e aplicar novos modelos e algoritmos de aprendizado de máquina. Para tarefas de aprendizado supervisionado, o desempenho dessas técnicas depende de uma grande quantidade de dados de treinamento, bem como de dados rotulados. No entanto, a rotulagem é um processo caro e demorado. Uma recente área de exploração são as reduções dos esforços na preparação de dados, deixando-os sem inconsistências, ruídos, para que os modelos atuais possam obter um maior desempenho. Esse novo campo de estudo é chamado de Data-Centric IA. Apresentamos uma nova abordagem baseada em Deep Reinforcement Learning (DRL), cujo trabalho é voltado para a preparação de um conjunto de dados em problemas de detecção de objetos, onde as anotações de caixas delimitadoras são feitas de modo autônomo e econômico. Nossa abordagem consiste na criação de uma metodologia para treinamento de um agente virtual a fim de rotular automaticamente os dados, a partir do auxílio humano como professor desse agente. Implementamos o algoritmo Deep Q-Network para criar o agente virtual e desenvolvemos uma abordagem de aconselhamento para facilitar a comunicação do humano professor com o agente virtual estudante. Para completar nossa implementação, utilizamos o método de aprendizado ativo para selecionar casos onde o agente possui uma maior incerteza, necessitando da intervenção humana no processo de anotação durante o treinamento. Nossa abordagem foi avaliada e comparada com outros métodos de aprendizado por reforço e interação humano-computador, em diversos conjuntos de dados, onde o agente virtual precisou criar novas anotações na forma de caixas delimitadoras. Os resultados mostram que o emprego da nossa metodologia impacta positivamente para obtenção de novas anotações a partir de um conjunto de dados com rótulos escassos, superando métodos existentes. Desse modo, apresentamos a contribuição no campo de Data-Centric IA, com o desenvolvimento de uma metodologia de ensino para criação de uma abordagem autônoma com aconselhamento humano para criar anotações econômicas a partir de anotações escassas. / [en] Deep learning techniques have shown significant contributions in various fields, including image analysis. The vast majority of work in computer vision focuses on proposing and applying new machine learning models and algorithms. For supervised learning tasks, the performance of these techniques depends on a large amount of training data and labeled data. However, labeling is an expensive and time-consuming process. A recent area of exploration is the reduction of efforts in data preparation, leaving it without inconsistencies and noise so that current models can obtain greater performance. This new field of study is called Data-Centric AI. We present a new approach based on Deep Reinforcement Learning (DRL), whose work is focused on preparing a dataset, in object detection problems where the bounding box annotations are done autonomously and economically. Our approach consists of creating a methodology for training a virtual agent in order to automatically label the data, using human assistance as a teacher of this agent. We implemented the Deep Q-Network algorithm to create the virtual agent and developed a counseling approach to facilitate the communication of the human teacher with the virtual agent student. We used the active learning method to select cases where the agent has more significant uncertainty, requiring human intervention in the annotation process during training to complete our implementation. Our approach was evaluated and compared with other reinforcement learning methods and human-computer interaction in different datasets, where the virtual agent had to create new annotations in the form of bounding boxes. The results show that the use of our methodology has a positive impact on obtaining new annotations from a dataset with scarce labels, surpassing existing methods. In this way, we present the contribution in the field of Data-Centric AI, with the development of a teaching methodology to create an autonomous approach with human advice to create economic annotations from scarce annotations. [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] ANOTACOES [pt] AGENTE VIRTUAL [pt] DEEP Q-NETWORK [pt] ACONSELHAMENTO [pt] CONJUNTO DE DADOS [pt] CAIXA DELIMITADORA [en] DEEP REINFORCEMENT LEARNING [en] ANNOTATIONS [en] VIRTUAL AGENT [en] DEEP Q-NETWORK [en] ADVICES [en] DATASET [en] BOUNDING BOX DATASETS
7	Det som är Roligt, är Roligt / If It’s Fun, It’s Fun : Deep Reinforcement Learning In Unreal Tournament 2004 Berg, Anton January 2019 (has links) This thesis explores the perceived enjoyability of Deep Reinforcement learning AI agents (DeepRL agent) that strives towards optimality within the First Person Shooter game Unreal Tournament 2004 (UT2004). The DeepRL agent used in the experiments was created and then trained within this game against the AI agent which comes with the UT2004 game (known here as a trivial UT2004 agent). Through testing the opinions of participants who have played UT2004 deathmatches against both the DeepRL agent and the trivial UT2004 agent, the data collected in two participant surveys shows that the DeepRL agent is more enjoyable to face than a trivial UT2004 agent. By striving towards optimality the DeepRL agent developed a behaviour which despite making the DeepRL agent a great deal worse at UT2004 than the trivial UT2004 agent was more enjoyable to face than the trivial UT2004 agent. Considering this outcome the data suggests that DeepRL agents in UT2004 which are encouraged to strive towards optimality during training are “enjoyable enough” in order to be considered by game developers to be “good enough” when developing non-trivial opponents for games similar to UT2004. If the development time of a DeepRL agent is reduced or equal in comparison with the development time of a trivial agent then the DeepRL agent could hypothetically be preferable. Artificial Intelligence Reinforcement Learning Deep Learning Deep Q-Network Enjoyability Video Game First Person Shooter Unreal Tournament 2004 Optimality Computer and Information Sciences Data- och informationsvetenskap
8	Offline Reinforcement Learning for Optimization of Therapy Towards a Clinical Endpoint / Offline förstärkningsinlärning för optimering av terapi mot ett kliniskt slutmål Jenner, Simon January 2022 (has links) The improvement of data acquisition and computer heavy methods in recentyears has paved the way for completely digital healthcare solutions. Digitaltherapeutics (DTx) are such solutions and are often provided as mobileapplications that must undergo clinical trials. A common method for suchapplications is to utilize cognitive behavioral-therapy (CBT), in order toprovide their patients with tools for self-improvement. The Swedish-basedcompany Alex Therapeutics is such a provider. They develop state-of-theartapplications that utilize CBT to help patients. Among their applications,they have one that aims to help users quit smoking. From this app, they havecollected user data with the goal of continuously improving their servicesthrough machine learning (ML). In their current application, they utilizemultiple ML methods to personalize the care, but have opened up possibilitiesfor the usage of reinforcement learning (RL). Often the wanted behavior isknown, such as to quitting smoking, but the optimal path, within the app, forhow to reach such a goal is not. By formalizing the problem as a Markovdecision process, where the transition probabilities have to be inferred fromuser data, such an optimal policy can be found. Standard methods of RL arereliant on direct access of an environment for sampling of data, whereas theuser data sampled from the application are to be treated as such. This thesisthus explores the possibilities of using RL on a static dataset in order to inferan optimal policy. A double deep Q-network (DDQN) was chosen as the reinforcement learningagent. The agent was trained on two different datasets and showed goodconvergence for both, using a custom metric for the task. Using SHAPvaluesthe strategy of the agent is visualized and discussed, together with themethodological challenges. Lastly, future work for the proposed methods arediscussed. / Förbättringar av datainsamling och datortunga metoder har på senare år banatväg för helt digitala vårdlösningar. Digitala terapier (DTx) är sådana lösningaroch tillhandahålls ofta som mobila applikationer. Till skillnad från andrahälsoappar måste DTx-applikationer genomgå klinisk prövning. En vanligmetod för sådana applikationer är att använda kognitiv beteendeterapi (KBT)för att ge patienter verktyg för självförbättring. Det svenskbaserade företagetAlex Therapeutics är en sådan leverantör. De utvecklar moderna applikationersom använder KBT för att hjälpa patienter. Bland deras appar har de förrökavvänjning. Från denna har de samlat in användardata med målet attkontinuerligt förbättra tjänsten via maskininlärning (ML). I sina nuvarandetillämpning använder de flera ML metoder för att personifiera vården, menhar öppnat möjligheter för användningen av Reinforcement learning (RL)(förstärkningsinlärning). Ofta är det önskade beteendet känt, t.ex att slutaröka, men den optimala vägen, inom appen, för hur man når ett sådant mål ärinte känt. Genom att formalisera problemet som en Markovsk beslutsprocess(Markov decision process), där övergångssannolikheterna måste härledas frånanvändardata, kan en sådan optimal väg hittas. Standardmetoder för RLär beroende av direktåtkomst till en miljö för att samla data. Dock skulleanvändardatan som samlats in från appen kunna behandlas på samma sätt.Detta examensarbete undersöker möjligheten att använda RL på statisk dataför att dra slutsatser om en optimal policy. Ett double deep Q-network (DDQN) (dubbelt djupt Q-nätverk) valdes somagent. Agenten tränades på 2 olika datasets och visar bra konvergens förbåda, med hjälp av ett anpassat mått för evaluering. SHAP-värden beräknadesför att visualisera agentens strategi. Detta diskuteas tillsammans med demetodologiska utmaningarna. Till sist behandlas framtida arbete för de föreslagnametoderna. Offline Reinforcement learning Double Deep Q-Network Cognitive behavior therapy Digital therapeutics Optimization Förstärkningsinlärning Dubbelt djupt Q-nätverk Kognitiv beteendeterapi Digital terapeutika Optimering Medical Engineering Medicinteknik
9	Autonomous UAV Path Planning using RSS signals in Search and Rescue Operations Anhammer, Axel, Lundeberg, Hugo January 2022 (has links) Unmanned aerial vehicles (UAVs) have emerged as a promising technology in search and rescue operations (SAR). UAVs have the ability to provide more timely localization, thus decreasing the crucial duration of SAR operations. Previous work have demonstrated proof-of-concept in regard to localizing missing people by utilizing received signal strength (RSS) and UAVs. The localization system is based on the assumption that the missing person wears an enabled smartphone whose Wi-Fi signal can be intercepted. This thesis proposes a two-staged path planner for UAVs, utilizing RSS-signals and an initial belief regarding the missing person's location. The objective of the first stage is to locate an RSS-signal. By dividing the search area into grids, a hierarchical solution based on several Markov decision processes (MDPs) can be formulated which takes different areas probabilities into consideration. The objective of the second stage is to isolate the RSS-signal and provide a location estimate. The environment is deemed to be partially observable, and the problem is formulated as a partially observable Markov decision process (POMDP). Two different filters, a point mass filter (PMF) and a particle filter (PF), are evaluated in regard to their ability to correctly estimate the state of the environment. The state of the environment then acts as input to a deep Q-network (DQN) which selects appropriate actions for the UAV. Thus, the DQN becomes a path planner for the UAV and the trajectory it generates is compared to trajectories generated by, among others, a greedy-policy. Results for Stage 1 demonstrate that the path generated by the MDPs prioritizes areas with higher probability, and intuitively seems very reasonable. The results also illustrate potential drawbacks with a hierarchical solution, which potentially can be addressed by considering more factors into the problem. Simulation results for Stage 2 show that both a PMF and a PF can successfully be used to estimate the state of the environment and provide an accurate localization estimate. The PMF generated slightly more accurate estimations compared to the PF. The DQN is successful in isolating the missing person's probable location, by relatively few actions. However, it only performs marginally better than the greedy policy, indicating that it may be a complicated solution to a simpler problem. UAV DQN Deep Q Network particle filter point mass filter MDP POMDP Markov decision process Control Engineering Reglerteknik
10	An empirical study of stability and variance reduction in DeepReinforcement Learning Lindström, Alexander January 2024 (has links) Reinforcement Learning (RL) is a branch of AI that deals with solving complex sequential decision making problems such as training robots, trading while following patterns and trends, optimal control of industrial processes, and more. These applications span various fields, including data science, factories, finance, and others[1]. The most popular RL algorithm today is Deep Q Learning (DQL), developed by a team at DeepMind, which successfully combines RL with Neural Network (NN). However, combining RL and NN introduces challenges such as numerical instability and unstable learning due to high variance. Among others, these issues are due to the“moving target problem”. To mitigate this problem, the target network was introduced as a solution. However, using a target network slows down learning, vastly increases memory requirements, and adds overheads in running the code. In this thesis, we conduct an empirical study to investigate the importance of target networks. We conduct this empirical study for three scenarios. In the first scenario, we train agents in online learning. The aim here is to demonstrate that the target network can be removed after some point in time without negatively affecting performance. To evaluate this scenario, we introduce the concept of the stabilization point. In thesecond scenario, we pre-train agents before continuing to train them in online learning. For this scenario, we demonstrate the redundancy of the target network by showing that it can be completely omitted. In the third scenario, we evaluate a newly developed activation function called Truncated Gaussian Error Linear Unit (TGeLU). For thisscenario, we train an agent in online learning and show that by using TGeLU as anactivation function, we can completely remove the target network. Through the empirical study of these scenarios, we conjecture and verify that a target network has only transient benefits concerning stability. We show that it has no influence on the quality of the policy found. We also observed that variance was generally higher when using a target network in the later stages of training compared to cases where the target network had been removed. Additionally, during the investigation of the second scenario, we observed that the magnitude of training iterations during pre-training affected the agent’s performance in the online learning phase. This thesis provides a deeper understanding of how the target networkaffects the training process of DQL, some of them - surrounding variance reduction- are contrary to popular belief. Additionally, the results have provided insights into potential future work. These include further explore the benefits of lower variance observed when removing the target network and conducting more efficient convergence analyses for the pre-training part in the second scenario. Reinforcement Learning Markov Decision Processes Neural Network Deep Q Learning Deep Q Network Sigmoid Truncated Gaussian Error Linear Unit Target network Stable learning Online learning Offline learning Computer Engineering Datorteknik

Search results