Spelling suggestions: "subject:"[een] DEEP Q-LEARNING"" "subject:"[enn] DEEP Q-LEARNING""
1 |
Learning medical triage by using a reinforcement learning approachSundqvist, Niklas January 2022 (has links)
Many emergency departments are today suffering from a overcrowding of people seeking care. The first stage in seeking care is being prioritised in different orders depending on symptoms by a doctor or nurse called medical triage. This is a cumbersome process that could be subject of automatisation. This master thesis investigates the possibility of using reinforcement learning for performing medical triage of patients. A deep Q-learning approach is taken for designing the agent for the environment together with the two extensions of using double Q-learning and a duelling network architecture. The agent is deployed to train in two different environments. The goal for the agent in the first environment is to ask questions to a patient and then decide, when enough information has been collected, how the patient should be prioritised. The second environment makes the agent decide which questions should be asked to the patient and then a separate classifier is used with the information gained to perform the actual triage decision of the patient. The training and testing process of the agent in the two environments reveal difficulties in exploring the environment efficiently and thoroughly. It was also shown that defining a reward function for the environments that guides the agent into asking valuable questions and learninga stopping condition for asking questions is a complicated task. Suitable future work is discussed that would, in combination with the work performed in this paper, create a better reinforcement learning model that could potentially show more promising results in the task of performing medical triage of patients.
|
2 |
AI-driven admission control : with Deep Reinforcement Learning / AI-driven antagningskontroll : med Djup FörstärkningslärandeAi, Lingling January 2021 (has links)
5G is expected to provide a high-performance and highly efficient network to prominent industry verticals with ubiquitous access to a wide range of services with orders of magnitude of improvement over 4G. Network slicing, which allocates network resources according to users’ specific requirements, is a key feature to fulfil the diversity of requirements in 5G network. However, network slicing also brings more orchestration and difficulty in monitoring and admission control. Although the problem of admission control has been extensively studied, those research take measurements for granted. Fixed high monitoring frequency can waste system resources, while low monitoring frequency (low level of observability) can lead to insufficient information for good admission control decisions. To achieve efficient admission control in 5G, we consider the impact of configurable observability, i.e. control observed information by configuring measurement frequency, is worth investigating. Generally, we believe more measurements provide more information about the monitored system, thus enabling a capable decision-maker to have better decisions. However, more measurements also bring more monitoring overhead. To study the problem of configurable observability, we can dynamically decide what measurements to monitor and their frequencies to achieve efficient admission control. In the problem of admission control with configurable observability, the objective is to minimize monitoring overhead while maintaining enough information to make proper admission control decisions. In this thesis, we propose using the Deep Reinforcement Learning (DRL) method to achieve efficient admission control in a simulated 5G end-to-end network, including core network, radio access network and four dynamic UEs. The proposed method is evaluated by comparing with baseline methods using different performance metrics, and then the results are discussed. With experiments, the proposed method demonstrates the ability to learn from interaction with the simulated environment and have good performance in admission control and used low measurement frequencies. After 11000 steps of learning, the proposed DRL agents generally achieve better performance than the threshold-based baseline agent, which takes admission decisions based on combined threshold conditions on RTT and throughput. Furthermore, the DRL agents that take non-zero measurement costs into consideration uses much lower measurement frequencies than DRL agents that take measurement costs as zero. / 5G förväntas ge ett högpresterande och högeffektivt nätverk till framstående industrivertikaler genom allmän tillgång till ett brett utbud av tjänster, med förbättringar i storleksordningar jämfört med 4G. Network slicing, som allokerar nätverksresurser enligt specifika användarkrav, är en nyckelfunktion för att uppfylla mångfalden av krav i 5G-nätverk. Network slicing kräver däremot också mer orkestrering och medför svårigheter med övervakning och tillträdeskontroll. Även om problemet med tillträdeskontroll har studerats ingående, tar de studierna mätfrekvenser för givet. Detta trots att hög övervakningsfrekvens kan slösa systemresurser, medan låg övervakningsfrekvens (låg nivå av observerbarhet) kan leda till otillräcklig information för att ta bra beslut om antagningskontroll. För att uppnå effektiv tillträdeskontroll i 5G anser vi att effekten av konfigurerbar observerbarhet, det vill säga att kontrollera observerad information genom att konfigurera mätfrekvens, är värt att undersöka. Generellt tror vi att fler mätningar ger mer information om det övervakade systemet, vilket gör det möjligt för en kompetent beslutsfattare att fatta bättre beslut. Men fler mätningar ger också högre övervakningskostnader. För att studera problemet med konfigurerbar observerbarhet kan vi dynamiskt bestämma vilka mätningar som ska övervakas och deras frekvenser för att uppnå effektiv tillträdeskontroll. I problemet med tillträdeskontroll med konfigurerbar observerbarhet är målet att minimera övervakningskostnader samtidigt som tillräckligt med information bibehålls för att fatta korrekta beslut om tillträdeskontroll. I denna avhandling föreslår vi att använda Deep Reinforcement Learning (DRL)-metoden för att uppnå effektiv tillträdeskontroll i ett simulerat 5G-änd-till-änd-nätverk, inklusive kärnnät, radioaccessnätverk och fyra dynamiska användarenheter. Den föreslagna metoden utvärderas genom att jämföra med standardmetoder som använder olika prestationsmått, varpå resultaten diskuteras. I experiment visar den föreslagna metoden förmågan att lära av interaktion med den simulerade miljön och ha god prestanda i tillträdeskontroll och använda låga mätfrekvenser. Efter 11 000 inlärningssteg uppnår de föreslagna DRL-agenterna i allmänhet bättre prestanda än den tröskelbaserade standardagenten, som fattar tillträdesbeslut baserat på kombinerade tröskelvillkor för RTT och throughput. Dessutom använder de DRL-agenter som tar hänsyn till nollskilda mätkostnader, mycket lägre mätfrekvenser än DRL-agenter som tar mätkostnaderna som noll.
|
3 |
Deep Reinforcement Learning for the Popular Game tagSöderlund, August, von Knorring, Gustav January 2021 (has links)
Reinforcement learning can be compared to howhumans learn – by interaction, which is the fundamental conceptof this project. This paper aims to compare three differentlearning methods by creating two adversarial reinforcementlearning models and simulate them in the game tag. The threefundamental learning methods are ordinary Q-learning, Deep Qlearning(DQN), and Double Deep Q-learning (DDQN).The models for ordinary Q-learning are built using a table andthe models for both DQN and DDQN are constructed by using aPython module called TensorFlow. The environment is composedof a bounded square with two obstacles and two agents withadversarial objectives. The rewards are given primarily basedon the distance between the agents.By comparing the trained models it was established that onlyDDQN could solve the task well and generalize, whilst both theQ-model and DQN had more serious flaws. A comparison ofthe DDQN model against its average reward trends establishedthat the model still improved regardless of the constant averagereward.Conclusively, DDQN is the appropriate choice for this adversarialproblem whilst Q-learning and DQN should be avoided.Finally, a constant average reward can be caused by bothagents improving at a similar rate rather than a stagnation inperformance. / Förstärkande inlärning kan jämföras medsättet vi människor lär oss, genom interaktion, vilket är denfundamentala idéen med detta projekt. Syftet med denna rapportär att jämföra tre olika inlärningsmetoder genom att skapatvå förstärkande motståndarinlärningsagenter och simulera demi spelet kull. De tre fundamentala inlärningsmetoderna är Qlearning,Deep Q-learning (DQN) och Double Deep Q-learning(DDQN).Modellerna för vanlig Q-learning är konstruerade med hjälpav en tabell och modellerna för både DQN och DDQN är byggdamed en Python modul, TensorFlow. Miljön är uppbyggd av enbegränsad kvadrat med två hinder och två agenter med motsattamål. Belöningarna ges baserat på avståndet mellan agenterna.En jämförelse mellan de tränade modelerna visade på attenbart DDQN kunde spela bra och generalisera sig, medan bådeQ-modellen och DQN-modellen hade mer allvarliga problem.Genom en jämförelse för DDQN-modellerna och deras genomsnittligabelöning visade det sig att DDQN-modellen fortfarandeförbättrade sig, oavsett det konstanta genomsnittet.Sammanfattningsvis, DDQN är det bäst lämpade valet fördenna motpart simulering medan vanlig Q-learning och DQNborde undvikas. Slutligen, ett konstant belöningsgenomsnitt orsakasav att agenterna förbättras i samma takt snarare än attde stagnerar i prestanda. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
|
4 |
A Learning based Adaptive Cruise and Lane Control SystemXu, Peng 31 August 2018 (has links)
No description available.
|
5 |
A Deep Reinforcement Learning Approach for Dynamic Traffic Light Control with Transit Signal PriorityNousch, Tobias, Zhou, Runhao, Adam, Django, Hirrle, Angelika, Wang, Meng 23 June 2023 (has links)
Traffic light control (TLC) with transit signal priority (TSP) is an effective way to deal with urban congestion and travel delay. The growing amount of available connected vehicle data offers opportunities for signal control with transit priority, but the conventional control algorithms fall short in fully exploiting those datasets. This paper proposes a novel approach for dynamic TLC with TSP at an urban intersection. We propose a deep reinforcement learning based framework JenaRL to deal with the complex real-world intersections. The optimisation focuses on TSP while balancing the delay of all vehicles. A two-layer state space is defined to capture the real-time traffic information, i.e. vehicle position, type and incoming lane. The discrete action space includes the optimal phase and phase duration based on the real-time traffic situation. An intersection in the inner city of Jena is constructed in an open-source microscopic traffic simulator SUMO. A time-varying traffic demand of motorised individual traffic (MIT), the current TLC controller of the city, as well as the original timetables of the public transport (PT) are implemented in simulation to construct a realistic traffic environment. The results of the simulation with the proposed framework indicate a significant enhancement in the performance of traffic light controller by reducing the delay of all vehicles, and especially minimising the loss time of PT.
|
6 |
Deep Reinforcement Learning for Card GamesTegnér Mohringe, Oscar, Cali, Rayan January 2022 (has links)
This project aims to investigate how reinforcement learning (RL) techniques can be applied to the card game LimitTexas Hold’em. RL is a type of machine learning that can learn to optimally solve problems that can be formulated according toa Markov Decision Process.We considered two different RL algorithms, Deep Q-Learning(DQN) for its popularity within the RL community and DeepMonte-Carlo (DMC) for its success in other card games. With the goal of investigating how different parameters affect their performance and if possible achieve human performance.To achieve this, a subset of the parameters used by these methods were varied and their impact on the overall learning performance was investigated. With both DQN and DMC we were able to isolate parameters that had a significant impact on the performance.While both methods failed to reach human performance, both showed obvious signs of learning. The DQN algorithm’s biggest flaw was that it tended to fall into simplified strategies where it would stick to using only one action. The pitfall for DMC was the fact that the algorithm has a high variance and therefore needs a lot of samples to train. However, despite this fallacy,the algorithm has seemingly developed a primitive strategy. We believe that with some modifications to the methods, better results could be achieved. / Detta projekt strävar efter att undersöka hur olika Förstärkningsinlärning (RL) tekniker kan implementeras för kortspelet Limit Texas Hold’Em. RL är en typ av maskininlärning som kan lära sig att optimalt lösa problem som kan formuleras enligt en markovbeslutsprocess. Vi betraktade två olika algoritmer, Deep Q-Learning (DQN) som valdes för sin popularitet och Deep Monte-Carlo (DMC) valdes för dess tidigare framgång i andra kortspel. Med målet att undersöka hur olika parametrar påverkar inlärningsprocessen och om möjligt uppnå mänsklig prestanda. För att uppnå detta så valdes en delmängd av de parametrar som används av dessa metoder. Dessa ändrades successivt för att sedan mäta dess påverkan på den övergripande inlärningsprestandan. Med både DQN och DMC så lyckades vi isolera parametrar som hade en signifikant påverkan på prestandan. Trots att båda metoderna misslyckades med att uppnå mänsklig prestanda så visade båda tecken på upplärning. Det största problemet med DQN var att metoden tenderade att fastna i enkla strategier där den enbart valde ett drag. För DMC så låg problemet i att metoden har en hög varians vilket innebär att metoden behöver mycket tid för att tränas upp. Dock så lyckades ändå metoden utveckla en primitiv strategi. Vi tror att metoder med ett par modifikationer skulle kunna nå ett bättre resultat. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm
|
7 |
Deep Reinforcement Learning for the Optimization of Combining Raster Images in Forest PlanningWen, Yangyang January 2021 (has links)
Raster images represent the treatment options of how the forest will be cut. Economic benefits from cutting the forest will be generated after the treatment is selected and executed. Existing raster images have many clusters and small sizes, this becomes the principal cause of overhead. If we can fully explore the relationship among the raster images and combine the old data sets according to the optimization algorithm to generate a new raster image, then this result will surpass the existing raster images and create higher economic benefits. The question of this project is can we create a dynamic model that treats the updating pixel’s status as an agent selecting options for an empty raster image in response to neighborhood environmental and landscape parameters. This project is trying to explore if it is realistic to use deep reinforcement learning to generate new and superior raster images. Finally, this project aims to explore the feasibility, usefulness, and effectiveness of deep reinforcement learning algorithms in optimizing existing treatment options. The problem was modeled as a Markov decision process, in which the pixel to be updated was an agent of the empty raster image, which would determine the choice of the treatment option for the current empty pixel. This project used the Deep Q learning neural network model to calculate the Q values. The temporal difference reinforcement learning algorithm was applied to predict future rewards and to update model parameters. After the modeling was completed, this project set up the model usefulness experiment to test the usefulness of the model. Then the parameter correlation experiment was set to test the correlation between the parameters and the benefit of the model. Finally, the trained model was used to generate a larger size raster image to test its effectiveness.
|
8 |
A Comparative Study of Reinforcement-based and Semi-classical Learning in Sensor FusionBodén, Johan January 2021 (has links)
Reinforcement learning has proven itself very useful in certain areas, such as games. However, the approach has been seen as quite limited. Reinforcement-based learning has for instance not been commonly used for classification tasks as it is receiving feedback on how well it did for an action performed on a specific input. This slows the performance convergence rate as compared to other classification approaches which has the input and the corresponding output to train on. Nevertheless, this thesis aims to investigate whether reinforcement-based learning could successfully be employed on a classification task. Moreover, as sensor fusion is an expanding field which can for instance assist autonomous vehicles in understanding its surroundings, it is also interesting to see how sensor fusion, i.e., fusion between lidar and RGB images, could increase the performance in a classification task. In this thesis, a reinforcement-based learning approach is compared to a semi-classical approach. As an example of a reinforcement learning model, a deep Q-learning network was chosen, and a support vector machine classifier built on top of a deep neural network, was chosen as an example of a semi-classical model. In this work, these frameworks are compared with and without sensor fusion to see whether fusion improves their performance. Experiments show that the evaluated reinforcement-based learning approach underperforms in terms of metrics but mainly due to its slow learning process, in comparison to the semi-classical approach. However, on the other hand using reinforcement-based learning to carry out a classification task could still in some cases be advantageous, as it still performs fairly well in terms of the metrics presented in this work, e.g. F1-score, or for instance imbalanced datasets. As for the impact of sensor fusion, a notable improvement can be seen, e.g. when training the deep Q-learning model for 50 episodes, the F1-score increased with 0.1329; especially, when taking into account that the most of the lidar data used in the fusion is lost since this work projects the 3D lidar data onto the same 2D plane as the RGB images.
|
9 |
Distributed Optimisation in Multi-Agent Systems Through Deep Reinforcement LearningEriksson, Andreas, Hansson, Jonas January 2019 (has links)
The increased availability of computing power have made reinforcement learning a popular field of science in the most recent years. Recently, reinforcement learning has been used in applications like decreasing energy consumption in data centers, diagnosing patients in medical care and in text-tospeech software. This project investigates how well two different reinforcement learning algorithms, Q-learning and deep Qlearning, can be used as a high-level planner for controlling robots inside a warehouse. A virtual warehouse was created, and the two different algorithms were tested. The reliability of both algorithms where found to be insufficient for real world applications but the deep Q-learning algorithm showed great potential and further research is encouraged.
|
10 |
Deep Reinforcement Learning in Cart Pole and PongKuurne Uussilta, Dennis, Olsson, Viktor January 2020 (has links)
In this project, we aim to reproduce previous resultsachieved with Deep Reinforcement Learning. We present theMarkov Decision Process model as well as the algorithms Q-learning and Deep Q-learning Network (DQN). We implement aDQN agent, first in an environment called CartPole, and later inthe game Pong.Our agent was able to solve the CartPole environment in lessthan 300 episodes. We assess the impact some of the parametershad on the agents performance. The performance of the agentis particularly sensitive to the learning rate and seeminglyproportional to the dimension of the neural network. The DQNagent implemented in Pong was unable to learn, performing atthe same level as an agent picking actions at random, despiteintroducing various modifications to the algorithm. We discusspossible sources of error, including the RAM used as input,possibly not containing sufficient information. Furthermore, wediscuss the possibility of needing additional modifications to thealgorithm in order to achieve convergence, as it is not guaranteedfor DQN. / Målet med detta projekt är att reproducera tidigare resultat som uppnåtts med Deep Reinforcement Learning. Vi presenterar Markov Decision Process-modellen samt algoritmerna Q-learning och Deep Q-learning Network (DQN). Vi implementerar en DQN agent, först i miljön CartPole, sedan i spelet Pong. Vår agent lyckades lösa CartPole på mindre än 300 episoder. Vi gör en bedömning av vissa parametrars påverkan på agentens prestanda. Agentens prestanda är särskilt känslig för värdet på ”learning rate” och verkar vara proportionell mot dimensionen av det neurala nätverket. DQN-agenten som implementerades i Pong var oförmögen att lära sig och spelade på samma nivå som en agent som agerar slumpmässigt, trots att vi introducerade diverse modifikationer. Vi diskuterar möjliga felkällor, bland annat att RAM, som används som indata till agenten, eventuellt saknar tillräcklig information. Dessutom diskuterar vi att ytterligare modifikationer kan vara nödvändiga för uppnå konvergens eftersom detta inte är garanterat för DQN. / Kandidatexjobb i elektroteknik 2020, KTH, Stockholm
|
Page generated in 0.0946 seconds