Global ETD Search

1	Learning medical triage by using a reinforcement learning approach Sundqvist, Niklas January 2022 (has links) Many emergency departments are today suffering from a overcrowding of people seeking care. The first stage in seeking care is being prioritised in different orders depending on symptoms by a doctor or nurse called medical triage. This is a cumbersome process that could be subject of automatisation. This master thesis investigates the possibility of using reinforcement learning for performing medical triage of patients. A deep Q-learning approach is taken for designing the agent for the environment together with the two extensions of using double Q-learning and a duelling network architecture. The agent is deployed to train in two different environments. The goal for the agent in the first environment is to ask questions to a patient and then decide, when enough information has been collected, how the patient should be prioritised. The second environment makes the agent decide which questions should be asked to the patient and then a separate classifier is used with the information gained to perform the actual triage decision of the patient. The training and testing process of the agent in the two environments reveal difficulties in exploring the environment efficiently and thoroughly. It was also shown that defining a reward function for the environments that guides the agent into asking valuable questions and learninga stopping condition for asking questions is a complicated task. Suitable future work is discussed that would, in combination with the work performed in this paper, create a better reinforcement learning model that could potentially show more promising results in the task of performing medical triage of patients. machine learning reinforcement learning medical triage q-learning deep q-learning double deep q-learning Computer Sciences Datavetenskap (datalogi)
2	AI-driven admission control : with Deep Reinforcement Learning / AI-driven antagningskontroll : med Djup Förstärkningslärande Ai, Lingling January 2021 (has links) 5G is expected to provide a high-performance and highly efficient network to prominent industry verticals with ubiquitous access to a wide range of services with orders of magnitude of improvement over 4G. Network slicing, which allocates network resources according to users’ specific requirements, is a key feature to fulfil the diversity of requirements in 5G network. However, network slicing also brings more orchestration and difficulty in monitoring and admission control. Although the problem of admission control has been extensively studied, those research take measurements for granted. Fixed high monitoring frequency can waste system resources, while low monitoring frequency (low level of observability) can lead to insufficient information for good admission control decisions. To achieve efficient admission control in 5G, we consider the impact of configurable observability, i.e. control observed information by configuring measurement frequency, is worth investigating. Generally, we believe more measurements provide more information about the monitored system, thus enabling a capable decision-maker to have better decisions. However, more measurements also bring more monitoring overhead. To study the problem of configurable observability, we can dynamically decide what measurements to monitor and their frequencies to achieve efficient admission control. In the problem of admission control with configurable observability, the objective is to minimize monitoring overhead while maintaining enough information to make proper admission control decisions. In this thesis, we propose using the Deep Reinforcement Learning (DRL) method to achieve efficient admission control in a simulated 5G end-to-end network, including core network, radio access network and four dynamic UEs. The proposed method is evaluated by comparing with baseline methods using different performance metrics, and then the results are discussed. With experiments, the proposed method demonstrates the ability to learn from interaction with the simulated environment and have good performance in admission control and used low measurement frequencies. After 11000 steps of learning, the proposed DRL agents generally achieve better performance than the threshold-based baseline agent, which takes admission decisions based on combined threshold conditions on RTT and throughput. Furthermore, the DRL agents that take non-zero measurement costs into consideration uses much lower measurement frequencies than DRL agents that take measurement costs as zero. / 5G förväntas ge ett högpresterande och högeffektivt nätverk till framstående industrivertikaler genom allmän tillgång till ett brett utbud av tjänster, med förbättringar i storleksordningar jämfört med 4G. Network slicing, som allokerar nätverksresurser enligt specifika användarkrav, är en nyckelfunktion för att uppfylla mångfalden av krav i 5G-nätverk. Network slicing kräver däremot också mer orkestrering och medför svårigheter med övervakning och tillträdeskontroll. Även om problemet med tillträdeskontroll har studerats ingående, tar de studierna mätfrekvenser för givet. Detta trots att hög övervakningsfrekvens kan slösa systemresurser, medan låg övervakningsfrekvens (låg nivå av observerbarhet) kan leda till otillräcklig information för att ta bra beslut om antagningskontroll. För att uppnå effektiv tillträdeskontroll i 5G anser vi att effekten av konfigurerbar observerbarhet, det vill säga att kontrollera observerad information genom att konfigurera mätfrekvens, är värt att undersöka. Generellt tror vi att fler mätningar ger mer information om det övervakade systemet, vilket gör det möjligt för en kompetent beslutsfattare att fatta bättre beslut. Men fler mätningar ger också högre övervakningskostnader. För att studera problemet med konfigurerbar observerbarhet kan vi dynamiskt bestämma vilka mätningar som ska övervakas och deras frekvenser för att uppnå effektiv tillträdeskontroll. I problemet med tillträdeskontroll med konfigurerbar observerbarhet är målet att minimera övervakningskostnader samtidigt som tillräckligt med information bibehålls för att fatta korrekta beslut om tillträdeskontroll. I denna avhandling föreslår vi att använda Deep Reinforcement Learning (DRL)-metoden för att uppnå effektiv tillträdeskontroll i ett simulerat 5G-änd-till-änd-nätverk, inklusive kärnnät, radioaccessnätverk och fyra dynamiska användarenheter. Den föreslagna metoden utvärderas genom att jämföra med standardmetoder som använder olika prestationsmått, varpå resultaten diskuteras. I experiment visar den föreslagna metoden förmågan att lära av interaktion med den simulerade miljön och ha god prestanda i tillträdeskontroll och använda låga mätfrekvenser. Efter 11 000 inlärningssteg uppnår de föreslagna DRL-agenterna i allmänhet bättre prestanda än den tröskelbaserade standardagenten, som fattar tillträdesbeslut baserat på kombinerade tröskelvillkor för RTT och throughput. Dessutom använder de DRL-agenter som tar hänsyn till nollskilda mätkostnader, mycket lägre mätfrekvenser än DRL-agenter som tar mätkostnaderna som noll. Admission Control Reinforcement Learning Configurable Observability Network Slicing Deep Q-Learning Antagningskontroll förstärkningsinlärning konfigurerbar observerbarhet nätverksdelning Deep Q-Learning Computer and Information Sciences Data- och informationsvetenskap
3	Deep Reinforcement Learning for the Popular Game tag Söderlund, August, von Knorring, Gustav January 2021 (has links) Reinforcement learning can be compared to howhumans learn – by interaction, which is the fundamental conceptof this project. This paper aims to compare three differentlearning methods by creating two adversarial reinforcementlearning models and simulate them in the game tag. The threefundamental learning methods are ordinary Q-learning, Deep Qlearning(DQN), and Double Deep Q-learning (DDQN).The models for ordinary Q-learning are built using a table andthe models for both DQN and DDQN are constructed by using aPython module called TensorFlow. The environment is composedof a bounded square with two obstacles and two agents withadversarial objectives. The rewards are given primarily basedon the distance between the agents.By comparing the trained models it was established that onlyDDQN could solve the task well and generalize, whilst both theQ-model and DQN had more serious flaws. A comparison ofthe DDQN model against its average reward trends establishedthat the model still improved regardless of the constant averagereward.Conclusively, DDQN is the appropriate choice for this adversarialproblem whilst Q-learning and DQN should be avoided.Finally, a constant average reward can be caused by bothagents improving at a similar rate rather than a stagnation inperformance. / Förstärkande inlärning kan jämföras medsättet vi människor lär oss, genom interaktion, vilket är denfundamentala idéen med detta projekt. Syftet med denna rapportär att jämföra tre olika inlärningsmetoder genom att skapatvå förstärkande motståndarinlärningsagenter och simulera demi spelet kull. De tre fundamentala inlärningsmetoderna är Qlearning,Deep Q-learning (DQN) och Double Deep Q-learning(DDQN).Modellerna för vanlig Q-learning är konstruerade med hjälpav en tabell och modellerna för både DQN och DDQN är byggdamed en Python modul, TensorFlow. Miljön är uppbyggd av enbegränsad kvadrat med två hinder och två agenter med motsattamål. Belöningarna ges baserat på avståndet mellan agenterna.En jämförelse mellan de tränade modelerna visade på attenbart DDQN kunde spela bra och generalisera sig, medan bådeQ-modellen och DQN-modellen hade mer allvarliga problem.Genom en jämförelse för DDQN-modellerna och deras genomsnittligabelöning visade det sig att DDQN-modellen fortfarandeförbättrade sig, oavsett det konstanta genomsnittet.Sammanfattningsvis, DDQN är det bäst lämpade valet fördenna motpart simulering medan vanlig Q-learning och DQNborde undvikas. Slutligen, ett konstant belöningsgenomsnitt orsakasav att agenterna förbättras i samma takt snarare än attde stagnerar i prestanda. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Reinforcement Learning Neural Networks Qlearning Deep Q-learning Double Deep Q-learning Dual-agent Training. Elektroteknik och elektronik
4	A Learning based Adaptive Cruise and Lane Control System Xu, Peng 31 August 2018 (has links) No description available. Engineering Electrical Engineering Artificial Intelligence
5	Millimeter-Wave Software-Defined Radios with Programmable Directionality jean, Marc H 01 January 2024 (has links) (PDF) Fifth generation (5G) networks are currently being deployed at millimeter wave (mmWave) bands, beyond 22 GHz. The wireless node density and gigabit-per-second demands of 5G Internet-of Things (IoT) devices are pushing for more spatial reuse and higher frequency bands, which can be achieved by directional beamforming methods. Over the years, researchers have relied on synthetic data and simulation for studying directionality and beamforming, due to the lack and high cost of mmWave hardware. Hence, there is a major need for software-defined radio (SDR) platforms that enable programmable directionality in wireless studies and experimentation. Recently, more affordable and commercially available mmWave radio frequency (RF) front-ends with off-the shelf SDRs have made it possible to set up experimental test-bed platforms for beam alignment studies. In this thesis, we present a low-cost “directional SDR” test-bed that enables convenient programming of mmWave beam directions from a high-level programming language. The test bed design allows modular use of different mmWave antenna systems, including horn and path array antennas. Using a multi-threaded software configuration, the test-bed facilitates real-time access to legacy SDR methods including machine learning (ML) algorithm libraries. With a focus on receiver side Angle-of-Arrival (AoA) detection as a use case, we demonstrate the test-bed’s capabilities in ML-based mmWave beamforming. AoA Beamforming mmWave Machine Learning Q-learning Double Q-learning Deep Q-learning
6	A Deep Reinforcement Learning Approach for Dynamic Traffic Light Control with Transit Signal Priority Nousch, Tobias, Zhou, Runhao, Adam, Django, Hirrle, Angelika, Wang, Meng 23 June 2023 (has links) Traffic light control (TLC) with transit signal priority (TSP) is an effective way to deal with urban congestion and travel delay. The growing amount of available connected vehicle data offers opportunities for signal control with transit priority, but the conventional control algorithms fall short in fully exploiting those datasets. This paper proposes a novel approach for dynamic TLC with TSP at an urban intersection. We propose a deep reinforcement learning based framework JenaRL to deal with the complex real-world intersections. The optimisation focuses on TSP while balancing the delay of all vehicles. A two-layer state space is defined to capture the real-time traffic information, i.e. vehicle position, type and incoming lane. The discrete action space includes the optimal phase and phase duration based on the real-time traffic situation. An intersection in the inner city of Jena is constructed in an open-source microscopic traffic simulator SUMO. A time-varying traffic demand of motorised individual traffic (MIT), the current TLC controller of the city, as well as the original timetables of the public transport (PT) are implemented in simulation to construct a realistic traffic environment. The results of the simulation with the proposed framework indicate a significant enhancement in the performance of traffic light controller by reducing the delay of all vehicles, and especially minimising the loss time of PT. info:eu-repo/classification/ddc/360 ddc:360
7	Deep Q Learning with a Multi-Level Vehicle Perception for Cooperative Automated Highway Driving Hamilton, Richard January 2021 (has links) Autonomous vehicles, commonly known as “self-driving cars”, are increasingly becoming of interest for researchers due to their potential to mitigate traffic accidents and congestion. Using reinforcement learning, previous research has demonstrated that a DQN agent can be trained to effectively navigate a simulated two-lane environment via cooperative driving, in which a model of V2X technology allows an AV to receive information from surrounding vehicles (termed Primary Perceived Vehicles) to make driving decisions. Results have demonstrated that the DQN agent can learn to navigate longitudinally and laterally, but with a prohibitively high collision rate of 1.5% - 4.8% and an average speed of 13.4 m/s. In this research, the impact of including information from traffic vehicles that are outside of those that immediately surround the AV (termed Secondary Perceived Vehicles) as inputs to a DQN agent is investigated. Results indicate that while including velocity and distance information from SPVs does not improve the collision rate and average speed of the driving algorithm, it does yield a lower standard deviation of speed during episodes, indicating lower acceleration. This effect, however, is lost when the agent is tested under constant traffic flow scenarios (as opposed to fluctuating driving conditions). Taken together, it is concluded that while the SPV inclusion does not have an impact on collision rate and average speed, its ability to achieve the same performance with lower acceleration can significantly improve fuel economy and drive quality. These findings give a better understanding of how additional vehicle information during cooperative driving affects automated driving. / Thesis / Master of Applied Science (MASc) Deep Q Learning, Autonomous Vehicles Machine Learning Reinforcement Learning Primary Perceived Vehicles Secondary Perceived Vehicles Cooperative Driving
8	Deep Reinforcement Learning for Card Games Tegnér Mohringe, Oscar, Cali, Rayan January 2022 (has links) This project aims to investigate how reinforcement learning (RL) techniques can be applied to the card game LimitTexas Hold’em. RL is a type of machine learning that can learn to optimally solve problems that can be formulated according toa Markov Decision Process.We considered two different RL algorithms, Deep Q-Learning(DQN) for its popularity within the RL community and DeepMonte-Carlo (DMC) for its success in other card games. With the goal of investigating how different parameters affect their performance and if possible achieve human performance.To achieve this, a subset of the parameters used by these methods were varied and their impact on the overall learning performance was investigated. With both DQN and DMC we were able to isolate parameters that had a significant impact on the performance.While both methods failed to reach human performance, both showed obvious signs of learning. The DQN algorithm’s biggest flaw was that it tended to fall into simplified strategies where it would stick to using only one action. The pitfall for DMC was the fact that the algorithm has a high variance and therefore needs a lot of samples to train. However, despite this fallacy,the algorithm has seemingly developed a primitive strategy. We believe that with some modifications to the methods, better results could be achieved. / Detta projekt strävar efter att undersöka hur olika Förstärkningsinlärning (RL) tekniker kan implementeras för kortspelet Limit Texas Hold’Em. RL är en typ av maskininlärning som kan lära sig att optimalt lösa problem som kan formuleras enligt en markovbeslutsprocess. Vi betraktade två olika algoritmer, Deep Q-Learning (DQN) som valdes för sin popularitet och Deep Monte-Carlo (DMC) valdes för dess tidigare framgång i andra kortspel. Med målet att undersöka hur olika parametrar påverkar inlärningsprocessen och om möjligt uppnå mänsklig prestanda. För att uppnå detta så valdes en delmängd av de parametrar som används av dessa metoder. Dessa ändrades successivt för att sedan mäta dess påverkan på den övergripande inlärningsprestandan. Med både DQN och DMC så lyckades vi isolera parametrar som hade en signifikant påverkan på prestandan. Trots att båda metoderna misslyckades med att uppnå mänsklig prestanda så visade båda tecken på upplärning. Det största problemet med DQN var att metoden tenderade att fastna i enkla strategier där den enbart valde ett drag. För DMC så låg problemet i att metoden har en hög varians vilket innebär att metoden behöver mycket tid för att tränas upp. Dock så lyckades ändå metoden utveckla en primitiv strategi. Vi tror att metoder med ett par modifikationer skulle kunna nå ett bättre resultat. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm Reinforcement Learning Deep Q-Learning Deep Monte-Carlo Poker Elektroteknik och elektronik
9	Deep Reinforcement Learning for the Optimization of Combining Raster Images in Forest Planning Wen, Yangyang January 2021 (has links) Raster images represent the treatment options of how the forest will be cut. Economic benefits from cutting the forest will be generated after the treatment is selected and executed. Existing raster images have many clusters and small sizes, this becomes the principal cause of overhead. If we can fully explore the relationship among the raster images and combine the old data sets according to the optimization algorithm to generate a new raster image, then this result will surpass the existing raster images and create higher economic benefits. The question of this project is can we create a dynamic model that treats the updating pixel’s status as an agent selecting options for an empty raster image in response to neighborhood environmental and landscape parameters. This project is trying to explore if it is realistic to use deep reinforcement learning to generate new and superior raster images. Finally, this project aims to explore the feasibility, usefulness, and effectiveness of deep reinforcement learning algorithms in optimizing existing treatment options. The problem was modeled as a Markov decision process, in which the pixel to be updated was an agent of the empty raster image, which would determine the choice of the treatment option for the current empty pixel. This project used the Deep Q learning neural network model to calculate the Q values. The temporal difference reinforcement learning algorithm was applied to predict future rewards and to update model parameters. After the modeling was completed, this project set up the model usefulness experiment to test the usefulness of the model. Then the parameter correlation experiment was set to test the correlation between the parameters and the benefit of the model. Finally, the trained model was used to generate a larger size raster image to test its effectiveness. Raster images Optimization Deep Reinforcement Learning Markov Decision Process Deep Q Learning Neural Network Temporal Difference Model Usefulness Parameter Correlation Model Effectiveness. Computer Systems Datorsystem
10	A Comparative Study of Reinforcement-based and Semi-classical Learning in Sensor Fusion Bodén, Johan January 2021 (has links) Reinforcement learning has proven itself very useful in certain areas, such as games. However, the approach has been seen as quite limited. Reinforcement-based learning has for instance not been commonly used for classification tasks as it is receiving feedback on how well it did for an action performed on a specific input. This slows the performance convergence rate as compared to other classification approaches which has the input and the corresponding output to train on. Nevertheless, this thesis aims to investigate whether reinforcement-based learning could successfully be employed on a classification task. Moreover, as sensor fusion is an expanding field which can for instance assist autonomous vehicles in understanding its surroundings, it is also interesting to see how sensor fusion, i.e., fusion between lidar and RGB images, could increase the performance in a classification task. In this thesis, a reinforcement-based learning approach is compared to a semi-classical approach. As an example of a reinforcement learning model, a deep Q-learning network was chosen, and a support vector machine classifier built on top of a deep neural network, was chosen as an example of a semi-classical model. In this work, these frameworks are compared with and without sensor fusion to see whether fusion improves their performance. Experiments show that the evaluated reinforcement-based learning approach underperforms in terms of metrics but mainly due to its slow learning process, in comparison to the semi-classical approach. However, on the other hand using reinforcement-based learning to carry out a classification task could still in some cases be advantageous, as it still performs fairly well in terms of the metrics presented in this work, e.g. F1-score, or for instance imbalanced datasets. As for the impact of sensor fusion, a notable improvement can be seen, e.g. when training the deep Q-learning model for 50 episodes, the F1-score increased with 0.1329; especially, when taking into account that the most of the lidar data used in the fusion is lost since this work projects the 3D lidar data onto the same 2D plane as the RGB images. machine learning reinforcement learning deep Q-learning network classical learning supervised learning support vector machine deep neural network sensor fusion Computer and Information Sciences Data- och informationsvetenskap

Search results