Global ETD Search

1	Starcraft Resurshantering med Q-Nätverk / Starcraft Resource Management With Q-Network Miranda Cortes, Luis, Karlsson, Mathias January 2016 (has links) Artificiell intelligens är ett område inom datavetenskap som försöker skapa intelligenta system eller system som simulerar intelligens. Sådana system är intressanta för konsumenter eftersom de kan utföra uppgifter som annars krävt mänsklig inblandning.Spelindustrin hjälper till att driva utvecklingen av AI framåt när spelare fortsätter att förvänta sig mer engagerande och verklighetstrogna upplevelser.Akademiskt är spel användbara för studier av artificiell intelligens på grund av att de är relativt simpla. Men även om spel är enklare än verkligheten är det fortfarande en svår uppgift att skapa en artificiell intelligens som kan matcha en mänsklig motståndare.En populär genre av spel är strategispel, exempel på dessa är Age of Empires och Starcraft. I denna rapport undersöks en annorlunda ansats för att lösa problemet med resurshantering för denna typ av spel med hjälp av ett artificiellt neuralt nätverk som klassificerar spelets tillstånd. Detta har inte utforskats tidigare och målet är att ta reda på hurvida det är möjligt. För att träna nätverket används backpropagation i samband med Q- learning vilket gör inlärningen unsupervised. Så författarnas frågeställning är följande: Kan ett Q- nätverk användas för att hantera resursallokeringen för en bot i Starcraft Broodwar?För att kunna se hur väl Q- nätverket löser problemet utförs ett experiment med två olika botar där den ena spelar starcraft med samma möjligheter som en spelare och den andra en förenklad version. Experimentet går ut på att samla data från botarnas träning för att se om de förbättras eller inte. Som kontroll används två extra botar som slumpmässigt väljer handlingar.Resultatet av experimentet var flera grafer som visade botarnas prestanda på olika sätt och hur många spel de vunnit och sannolikheten för vinst. Med stöd av resultatet är det inte möjligt att se någon verklig förbättring i botarnas spelande med 0.7% respektive 0.4% chans för vinst mot standard AI:n. Resultatet visar dessutom att en av botarna är mycket sämre än en som slumpat fram handlingar.Dessutom visade det sig att träningen tog alldeles för lång tid. Om experimentet får mer tid kanske det skulle visat att tekniken är möjlig men först efter orimligt lång tid vilket skulle göra den oanvändbar i praktiken. Om detta hade lyckas hade det inneburit att man skulle kunna skapa bättre AI för strategispel som anpassar sig efter spelaren och kan generalisera när den ställs inför en situation som inte var planerad av utvecklarna.Men i denna studie förblev botarnas beteende mer eller mindre stokastiskt så svaret på frågeställningen är att det inte är möjligt. / Artificial intelligence is a field of computer science that tries to create intelligent systems or systems that simulate intelligence. Such systems are attractive to consumers because they can perform tasks that would otherwise have required human intervention.The gaming industry is helping to drive the development of AI forward as players continue to expect more immersive and lifelike experiences.Academically games are useful for the study of artificial intelligence because they are relatively simple. But even if the game is simpler than the reality, it is still a difficult task to create an artificial intelligence that can match a human opponent.A popular genre of games is strategy, examples of which are Age of Empire and StarCraft. This report examines a different approach to solve the problem of resource management for this type of game with the help of an artificial neural network to classify the game state. This has not been explored previously, and the goal is to find out whether this approach is feasible or not. To train the networks back-propagation is used in conjunction with Q-learning which makes learning unsupervised. So the authors’ research-question is: Can a Q network be used to manage the resource allocation for a cure in StarCraft Broodwar?To see how well the Q networks solve the problem an experiment was conducted with two different bots where one play StarCraft with the same opportunities a player would have and the other a simplified version. The experiment consists of collecting data from the bots training to see if they improve or not. As a control, two additional bots are used with a completely random policy.The results of the experiment were several graphs showing the bots performance in different ways but most importantly, the number of games won and the probability of winning. With the support of the result, it is not possible to see any real improvement in bot gameplay with 0.7% and 0.4% chance to win against the default AI. The results also show that one of the neural net bots performed much worse than the one with random actions.Moreover, the training turned out to be far too long. If the experiment had more time maybe it would have shown that the technology is possible, but still, only after an unreasonably long time, which would make it useless in practice. If this had been successful it would have meant that we might create better AI for the strategy games that adapts to the player and can generalize when faced with a situation that was not planned by the developers.But in this study the bots behavior remained more or less stochastic so the answer to the research- question is that it is not possible. ANN Q-Network Starcraft ANN Q-Network Starcraft Computer and Information Sciences Data- och informationsvetenskap
2	Extracting Behaviour Trees from Deep Q-Networks : Using learning from demostration to transfer knowledge between models. / Extraktion av beteendeträd från djupa Q-nätverk Nordström, Zacharias January 2020 (has links) In recent years the advancement in machine learning have solved more and more complex problems. But still these techniques are not commonly used in the industry. One problem is that many of the techniques are black boxes, it is hard to analyse them to make sure that their behaviour is safe. This property makes them unsuitable for safety critical systems. The goal of this thesis is to examine if the deep learning technique Deep Q-network could be used to create a behaviour tree that can solve the same problem. A behaviour tree is a tree representation of a flow structure that is used for representing behaviours, often used in video games or robotics. To solve the problem two simulators are used, one models a cart that shall balance a pole called cart pole, the other is a static world which needs to be navigated called grid world. Inspiration is taken from the learning from demonstration field to use the Deep Q-network as a teacher and then create a decision tree. During the creation of the decision tree two attributes are used for pruning; to look at the trees accuracy or performance. The thesis then compare three techniques, called Naive, BT Espresso, and BT Espresso Simplified. The techniques are used to transform the extracted decision tree into a behaviour tree. When it comes to the performance of the created behaviour trees they all manage to complete the simulator scenarios in the same, or close to, capacity as the trained Deep Q-network. The trees created from the performance pruned decision tree are generally smaller and less complex, but they have worse accuracy. For cart pole the trees created from the accuracy pruned tree has around 10 000 nodes but the performance pruned trees have around 10-20 nodes. The difference in grid world is smaller going from 35-45 nodes to 40-50 nodes. To get the smallest tree with the best performance then the performance pruned tree should be used with the BT Espresso Simplified algorithm. This thesis have shown that it is possible to use knowledge from a trained Deep Q-network model to create a Behaviour tree that can complete the same task. / Under de senaste åren har ett antal framsteg inom maskininlärning gjorts vilket har lett till att mer och mer komplexa problem har kunnat lösas. Dock är dessa tekniker ofta inte använda av industrin. Ett av problemen är att många av de bättre teknikerna beter sig som svarta lådor, det är väldigt svårt att analyser vad de kommer att göra. Denna egenskap gör att de inte är lämpliga att användas i säkerhetskritiska system. Målet med denna avhandling är att undersöka möjligheten att använda den djupa inlärningstekniken djupa q-nätverk kan användas för att skapa ett beteendeträd som är kapabelt att lösa samma problem. Ett beteendeträd är en flödesstruktur som används för att representera beteenden, ofta använt i dataspel eller för robotar. För att undersöka problemet så används två simulatorer, den ena modellerar en vagn som ska balansera en stav och kallas vagnstav (cart pole). Den andra simulatorn är en statisk värld där målet för agenten är att ta sig till en definierad målplats, vilken kallas rutvärld (grid world). För att lösa problemet tas inspiration från ett angränsande fält kallat inlärning från demonstration. Istället för att använda en mänsklig lärare ansätts det djupa q-nätverket som lärare och används för att skapa ett beslutsträd. Beslutsträdet är sedan reducerat genom att kolla på trädets träffsäkerhet eller hur mycket belöning trädet får. Tre tekniker jämförs för att transformera beslutsträdet till ett beteendeträd, teknikerna heter Naiv, BT Espresso och BT Espresso förenklad. Alla skapade beteendeträd lyckas klara av problemet i simulatorn de är skapade för. De hade liknande prestanda som det djupa q-nätverket. När beslutsträden var reducerat på belöning resulterade det i generellt mindre beteendeträd, dock så hade de inte full träffsäkerhet mot det djupa q-nätverket. För vagnstav simulatorn hade beteendeträden som skapats från träffsäkerhets beslutsträden runt 10 000 noder, mot belönings kapade träd som hade runt 10–20 noder. I rutvärlden var skillnaden mindre med 40–50 noder för träd skapade från träffsäkerhet reducerade beslutsträde och 35–45 noder för belöning reducerade beslutsträd. Denna avhandling har påvisat att det går att skapa beteende träd från en tränad djup q-nätverksmodell för ett scenario och om det minsta trädet som klarar scenariot är att önskat bör belönings reducerade beslutsträd användas med BT Espresso förenkling algoritmen. Behaviour tree deep q-network extraction Computer Sciences Datavetenskap (datalogi)
3	A review of Q-learning methods for Markov decision processes Blizzard, Christopher, Wiktorsson, Emil January 2024 (has links) This paper discusses how Q-Learning and Deep Q-Networks (DQN) canbe applied to state-action problems described by a Markov decision process(MDP). These are machine learning methods for finding the optimal choiceof action at each time step, resulting in the optimal policy. The limitationsand advantages for the two methods are discussed, with the main limitationbeing the fact that Q-learning is unable to be used on problems with infinitestate spaces. Q-learning, however, has an advantage in the simplicity of thealgorithm, leading to a better understanding of what the algorithm is actuallydoing. Q-Learning did manage to find the optimal policy for the simpleproblem studied in this paper, but was unable to do so for the advancedproblem. The Deep Q-Network (DQN) approach was able to solve bothproblems, with a drawback in it being harder to understand what the algorithmactually is doing. Q-Learning Deep Q-Network Markov Decision Process Mathematics Matematik
4	MULTI-AGENT REINFORCEMENT LEARNING WITH APPLICATION ON TRAFFIC FLOW CONTROL Jurvelin Olsson, Mikael January 2021 (has links) Traffic congestion diminish driving experience and increases the CO2 emissions. With the rise of 5G and machine learning, the possibilities to reduce traffic congestion are endless. This thesis aims to study if multi-agent reinforcement learning speed recommendations on a vehicle level can reduce congestion and thus control traffic flow. This is done by simulating a highway with an obstacle on one side of the lanes, forcing all the vehicles to drive on the same lane past the obstacle, resulting in congestion. A game theory aspect of drivers not obeying the speed recommendations was implemented to further simulate real traffic. Three DeepQ-network based models were trained on the highway and the best model was tested. The tests showed that multi-agent reinforcement learning speed recommendations can reduce the congestion, measured in vehicle hours, up to 21% and if 1/3 of the vehicles uses the system, the total congestion can be significantly reduced. In addition, the test showed that the model achieves a success rate of 80%. Two improvements to the success rate would be more training and implementing a non reinforcement learning mechanism for the autonomous driving part. Deep Q-Network Game Theory Shielding Simulations. Probability Theory and Statistics Sannolikhetsteori och statistik
5	Federated Machine Learning for Resource Allocation in Multi-domain Fog Ecosystems Zhang, Weilin January 2023 (has links) The proliferation of the Internet of Things (IoT) has increasingly demanded intimacy between cloud services and end-users. This has incentivised extending cloud resources to the edge in what is deemed fog computing. The latter is manifesting as an ecosystem of connected clouds, geo-dispersed and of diverse capacities. In such conditions, workload allocation to fog services becomes a non-trivial challenge due to the complexity of trade-offs. Users' demand at the edge is highly diverse, which does not lend itself to straightforward resource planning. Conversely, running services at the edge may leverage proximity, but it comes at higher operational cost let alone rapidly increasing the risk of straining sparse resources. Consequently, there is a need for intelligent yet scalable allocation solutions that counter the adversity of demand at the edge, while efficiently distributing load between the edge and farther clouds. Machine learning is increasingly adopted in resource planning. However, besides privacy concerns, central learning is highly demanding, both computationally and in data supply. Instead, this paper proposes a federated deep reinforcement learning system, based on deep Q-learning network (DQN), for workload distribution in a fog ecosystem. The proposed solution adapts a DQN to optimize local workload allocations, made by single gateways. Federated learning is incorporated to allow multiple gateways in a network to collaboratively build knowledge of users' demand. This is leveraged to establish consensus on the fraction of workload allocated to different fog nodes, using lower data supply and computation resources. The system performance is evaluated using realistic demand set from Google Cluster Workload Traces 2019. Evaluation results show over 50% reduction in failed allocations when distributing users over larger number of gateways, given fixed number of fog nodes. The results further illustrate the trade-offs between performance and cost under different conditions. Workload Allocation Federated Learning Deep Q-network Fog networks Federated Average Aggregation Engineering and Technology Teknik och teknologier
6	Spectrum Management in Dynamic Spectrum Access: A Deep Reinforcement Learning Approach Song, Hao January 2019 (has links) Dynamic spectrum access (DSA) is a promising technology to mitigate spectrum shortage and improve spectrum utilization. However, DSA users have to face two fundamental issues, interference coordination between DSA users and protections to primary users (PUs). These two issues are very challenging, since generally there is no powerful infrastructure in DSA networks to support centralized control. As a result, DSA users have to perform spectrum managements, including spectrum access and power allocations, independently without accurate channel state information. In this thesis, a novel spectrum management approach is proposed, in which Q-learning, a type of reinforcement learning, is utilized to enable DSA users to carry out effective spectrum managements individually and intelligently. For more efficient processes, powerful neural networks (NNs) are employed to implement Q-learning processes, so-called deep Q-network (DQN). Furthermore, I also investigate the optimal way to construct DQN considering both the performance of wireless communications and the difficulty of NN training. Finally, extensive simulation studies are conducted to demonstrate the effectiveness of the proposed spectrum management approach. / Generally, in dynamic spectrum access (DSA) networks, co-operations and centralized control are unavailable and DSA users have to carry out wireless transmissions individually. DSA users have to know other users’ behaviors by sensing and analyzing wireless environments, so that DSA users can adjust their parameters properly and carry out effective wireless transmissions. In this thesis, machine learning and deep learning technologies are leveraged in DSA network to enable appropriate and intelligent spectrum managements, including both spectrum access and power allocations. Accordingly, a novel spectrum management framework utilizing deep reinforcement learning is proposed, in which deep reinforcement learning is employed to accurately learn wireless environments and generate optimal spectrum management strategies to adapt to the variations of wireless environments. Due to the model-free nature of reinforcement learning, DSA users only need to directly interact with environments to obtain optimal strategies rather than relying on accurate channel estimations. In this thesis, Q-learning, a type of reinforcement learning, is adopted to design the spectrum management framework. For more efficient and accurate learning, powerful neural networks (NN) is employed to combine Q-learning and deep learning, also referred to as deep Q-network (DQN). The selection of NNs is crucial for the performance of DQN, since different types of NNs possess various properties and are applicable for different application scenarios. Therefore, in this thesis, the optimal way to construct DQN is also analyzed and studied. Finally, the extensive simulation studies demonstrate that the proposed spectrum management framework could enable users to perform proper spectrum managements and achieve better performance. Dynamic spectrum access spectrum management reinforcement learning deep Q-network echo state networks.
7	Quantile Regression Deep Q-Networks for Multi-Agent System Control Howe, Dustin 05 1900 (has links) Training autonomous agents that are capable of performing their assigned job without fail is the ultimate goal of deep reinforcement learning. This thesis introduces a dueling Quantile Regression Deep Q-network, where the network learns the state value quantile function and advantage quantile function separately. With this network architecture the agent is able to learn to control simulated robots in the Gazebo simulator. Carefully crafted reward functions and state spaces must be designed for the agent to learn in complex non-stationary environments. When trained for only 100,000 timesteps, the agent is able reach asymptotic performance in environments with moving and stationary obstacles using only the data from the inertial measurement unit, LIDAR, and positional information. Through the use of transfer learning, the agents are also capable of formation control and flocking patterns. The performance of agents with frozen networks is improved through advice giving in Deep Q-networks by use of normalized Q-values and majority voting. reinforcement learning deep Q-network machine learning deep learning transfer learning multi-agent formation control ROS
8	[en] ENABLING AUTONOMOUS DATA ANNOTATION: A HUMAN-IN-THE-LOOP REINFORCEMENT LEARNING APPROACH / [pt] HABILITANDO ANOTAÇÕES DE DADOS AUTÔNOMOS: UMA ABORDAGEM DE APRENDIZADO POR REFORÇO COM HUMANO NO LOOP LEONARDO CARDIA DA CRUZ 10 November 2022 (has links) [pt] As técnicas de aprendizado profundo têm mostrado contribuições significativas em vários campos, incluindo a análise de imagens. A grande maioria dos trabalhos em visão computacional concentra-se em propor e aplicar novos modelos e algoritmos de aprendizado de máquina. Para tarefas de aprendizado supervisionado, o desempenho dessas técnicas depende de uma grande quantidade de dados de treinamento, bem como de dados rotulados. No entanto, a rotulagem é um processo caro e demorado. Uma recente área de exploração são as reduções dos esforços na preparação de dados, deixando-os sem inconsistências, ruídos, para que os modelos atuais possam obter um maior desempenho. Esse novo campo de estudo é chamado de Data-Centric IA. Apresentamos uma nova abordagem baseada em Deep Reinforcement Learning (DRL), cujo trabalho é voltado para a preparação de um conjunto de dados em problemas de detecção de objetos, onde as anotações de caixas delimitadoras são feitas de modo autônomo e econômico. Nossa abordagem consiste na criação de uma metodologia para treinamento de um agente virtual a fim de rotular automaticamente os dados, a partir do auxílio humano como professor desse agente. Implementamos o algoritmo Deep Q-Network para criar o agente virtual e desenvolvemos uma abordagem de aconselhamento para facilitar a comunicação do humano professor com o agente virtual estudante. Para completar nossa implementação, utilizamos o método de aprendizado ativo para selecionar casos onde o agente possui uma maior incerteza, necessitando da intervenção humana no processo de anotação durante o treinamento. Nossa abordagem foi avaliada e comparada com outros métodos de aprendizado por reforço e interação humano-computador, em diversos conjuntos de dados, onde o agente virtual precisou criar novas anotações na forma de caixas delimitadoras. Os resultados mostram que o emprego da nossa metodologia impacta positivamente para obtenção de novas anotações a partir de um conjunto de dados com rótulos escassos, superando métodos existentes. Desse modo, apresentamos a contribuição no campo de Data-Centric IA, com o desenvolvimento de uma metodologia de ensino para criação de uma abordagem autônoma com aconselhamento humano para criar anotações econômicas a partir de anotações escassas. / [en] Deep learning techniques have shown significant contributions in various fields, including image analysis. The vast majority of work in computer vision focuses on proposing and applying new machine learning models and algorithms. For supervised learning tasks, the performance of these techniques depends on a large amount of training data and labeled data. However, labeling is an expensive and time-consuming process. A recent area of exploration is the reduction of efforts in data preparation, leaving it without inconsistencies and noise so that current models can obtain greater performance. This new field of study is called Data-Centric AI. We present a new approach based on Deep Reinforcement Learning (DRL), whose work is focused on preparing a dataset, in object detection problems where the bounding box annotations are done autonomously and economically. Our approach consists of creating a methodology for training a virtual agent in order to automatically label the data, using human assistance as a teacher of this agent. We implemented the Deep Q-Network algorithm to create the virtual agent and developed a counseling approach to facilitate the communication of the human teacher with the virtual agent student. We used the active learning method to select cases where the agent has more significant uncertainty, requiring human intervention in the annotation process during training to complete our implementation. Our approach was evaluated and compared with other reinforcement learning methods and human-computer interaction in different datasets, where the virtual agent had to create new annotations in the form of bounding boxes. The results show that the use of our methodology has a positive impact on obtaining new annotations from a dataset with scarce labels, surpassing existing methods. In this way, we present the contribution in the field of Data-Centric AI, with the development of a teaching methodology to create an autonomous approach with human advice to create economic annotations from scarce annotations. [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] ANOTACOES [pt] AGENTE VIRTUAL [pt] DEEP Q-NETWORK [pt] ACONSELHAMENTO [pt] CONJUNTO DE DADOS [pt] CAIXA DELIMITADORA [en] DEEP REINFORCEMENT LEARNING [en] ANNOTATIONS [en] VIRTUAL AGENT [en] DEEP Q-NETWORK [en] ADVICES [en] DATASET [en] BOUNDING BOX DATASETS
9	Det som är Roligt, är Roligt / If It’s Fun, It’s Fun : Deep Reinforcement Learning In Unreal Tournament 2004 Berg, Anton January 2019 (has links) This thesis explores the perceived enjoyability of Deep Reinforcement learning AI agents (DeepRL agent) that strives towards optimality within the First Person Shooter game Unreal Tournament 2004 (UT2004). The DeepRL agent used in the experiments was created and then trained within this game against the AI agent which comes with the UT2004 game (known here as a trivial UT2004 agent). Through testing the opinions of participants who have played UT2004 deathmatches against both the DeepRL agent and the trivial UT2004 agent, the data collected in two participant surveys shows that the DeepRL agent is more enjoyable to face than a trivial UT2004 agent. By striving towards optimality the DeepRL agent developed a behaviour which despite making the DeepRL agent a great deal worse at UT2004 than the trivial UT2004 agent was more enjoyable to face than the trivial UT2004 agent. Considering this outcome the data suggests that DeepRL agents in UT2004 which are encouraged to strive towards optimality during training are “enjoyable enough” in order to be considered by game developers to be “good enough” when developing non-trivial opponents for games similar to UT2004. If the development time of a DeepRL agent is reduced or equal in comparison with the development time of a trivial agent then the DeepRL agent could hypothetically be preferable. Artificial Intelligence Reinforcement Learning Deep Learning Deep Q-Network Enjoyability Video Game First Person Shooter Unreal Tournament 2004 Optimality Computer and Information Sciences Data- och informationsvetenskap
10	Offline Reinforcement Learning for Optimization of Therapy Towards a Clinical Endpoint / Offline förstärkningsinlärning för optimering av terapi mot ett kliniskt slutmål Jenner, Simon January 2022 (has links) The improvement of data acquisition and computer heavy methods in recentyears has paved the way for completely digital healthcare solutions. Digitaltherapeutics (DTx) are such solutions and are often provided as mobileapplications that must undergo clinical trials. A common method for suchapplications is to utilize cognitive behavioral-therapy (CBT), in order toprovide their patients with tools for self-improvement. The Swedish-basedcompany Alex Therapeutics is such a provider. They develop state-of-theartapplications that utilize CBT to help patients. Among their applications,they have one that aims to help users quit smoking. From this app, they havecollected user data with the goal of continuously improving their servicesthrough machine learning (ML). In their current application, they utilizemultiple ML methods to personalize the care, but have opened up possibilitiesfor the usage of reinforcement learning (RL). Often the wanted behavior isknown, such as to quitting smoking, but the optimal path, within the app, forhow to reach such a goal is not. By formalizing the problem as a Markovdecision process, where the transition probabilities have to be inferred fromuser data, such an optimal policy can be found. Standard methods of RL arereliant on direct access of an environment for sampling of data, whereas theuser data sampled from the application are to be treated as such. This thesisthus explores the possibilities of using RL on a static dataset in order to inferan optimal policy. A double deep Q-network (DDQN) was chosen as the reinforcement learningagent. The agent was trained on two different datasets and showed goodconvergence for both, using a custom metric for the task. Using SHAPvaluesthe strategy of the agent is visualized and discussed, together with themethodological challenges. Lastly, future work for the proposed methods arediscussed. / Förbättringar av datainsamling och datortunga metoder har på senare år banatväg för helt digitala vårdlösningar. Digitala terapier (DTx) är sådana lösningaroch tillhandahålls ofta som mobila applikationer. Till skillnad från andrahälsoappar måste DTx-applikationer genomgå klinisk prövning. En vanligmetod för sådana applikationer är att använda kognitiv beteendeterapi (KBT)för att ge patienter verktyg för självförbättring. Det svenskbaserade företagetAlex Therapeutics är en sådan leverantör. De utvecklar moderna applikationersom använder KBT för att hjälpa patienter. Bland deras appar har de förrökavvänjning. Från denna har de samlat in användardata med målet attkontinuerligt förbättra tjänsten via maskininlärning (ML). I sina nuvarandetillämpning använder de flera ML metoder för att personifiera vården, menhar öppnat möjligheter för användningen av Reinforcement learning (RL)(förstärkningsinlärning). Ofta är det önskade beteendet känt, t.ex att slutaröka, men den optimala vägen, inom appen, för hur man når ett sådant mål ärinte känt. Genom att formalisera problemet som en Markovsk beslutsprocess(Markov decision process), där övergångssannolikheterna måste härledas frånanvändardata, kan en sådan optimal väg hittas. Standardmetoder för RLär beroende av direktåtkomst till en miljö för att samla data. Dock skulleanvändardatan som samlats in från appen kunna behandlas på samma sätt.Detta examensarbete undersöker möjligheten att använda RL på statisk dataför att dra slutsatser om en optimal policy. Ett double deep Q-network (DDQN) (dubbelt djupt Q-nätverk) valdes somagent. Agenten tränades på 2 olika datasets och visar bra konvergens förbåda, med hjälp av ett anpassat mått för evaluering. SHAP-värden beräknadesför att visualisera agentens strategi. Detta diskuteas tillsammans med demetodologiska utmaningarna. Till sist behandlas framtida arbete för de föreslagnametoderna. Offline Reinforcement learning Double Deep Q-Network Cognitive behavior therapy Digital therapeutics Optimization Förstärkningsinlärning Dubbelt djupt Q-nätverk Kognitiv beteendeterapi Digital terapeutika Optimering Medical Engineering Medicinteknik

Search results