Spelling suggestions: "subject:"1earning."" "subject:"colearning.""
81 |
[en] PESSIMISTIC Q-LEARNING: AN ALGORITHM TO CREATE BOTS FOR TURN-BASED GAMES / [pt] Q-LEARNING PESSIMISTA: UM ALGORITMO PARA GERAÇÃO DE BOTS DE JOGOS EM TURNOSADRIANO BRITO PEREIRA 25 January 2017 (has links)
[pt] Este documento apresenta um novo algoritmo de aprendizado por reforço, o Q-Learning Pessimista. Nossa motivação é resolver o problema de gerar bots capazes de jogar jogos baseados em turnos e contribuir para obtenção de melhores resultados através dessa extensão do algoritmo Q-Learning. O Q-Learning Pessimista explora a flexibilidade dos cálculos gerados pelo Q-Learning tradicional sem a utilização de força bruta. Para medir a qualidade do bot gerado, consideramos qualidade como a soma do potencial de vitória e empate em um jogo. Nosso propósito fundamental é gerar bots de boa qualidade para diferentes jogos. Desta forma, podemos utilizar este algoritmo para famílias de jogos baseados em turno. Desenvolvemos um framework chamado Wisebots e realizamos experimentos com alguns cenários aplicados aos seguintes jogos tradicionais: TicTacToe, Connect-4 e CardPoints. Comparando a qualidade do Q-Learning Pessimista com a do Q-Learning tradicional, observamos ganhos de 0,8 por cento no TicTacToe, obtendo um algoritmo que nunca perde. Observamos também ganhos de 35 por cento no Connect-4 e de 27 por cento no CardPoints, elevando ambos da faixa de 50 por cento a 60 por cento para 90 por cento a 100 por cento de qualidade. Esses resultados ilustram o potencial de melhoria com o uso do Q-Learning Pessimista, sugerindo sua aplicação aos diversos tipos de jogos de turnos. / [en] This document presents a new algorithm for reinforcement learning method, Q-Learning Pessimistic. Our motivation is to resolve the problem of generating bots able to play turn-based games and contribute to achieving better results through this extension of the Q-Learning algorithm. The Q-Learning Pessimistic explores the flexibility of the calculations generated by the traditional Q-learning without the use of force brute. To measure the quality of bot generated, we consider quality as the sum of the potential to win and tie in a game. Our fundamental purpose, is to generate bots with good quality for different games. Thus, we can use this algorithm to families of turn-based games. We developed a framework called Wisebots and conducted experiments with some scenarios applied to the following traditional games TicTacToe, Connect-4 and CardPoints. Comparing the quality of Pessimistic Q-Learning with the traditional Q-Learning, we observed gains to 100 per cent in the TicTacToe, obtaining an algorithm that never loses. Also observed in 35 per cent gains Connect-4 and 27 per cent in CardPoints, increasing both the range of 60 per cent to 80 per cent for 90 per cent to 100 per cent of quality. These results illustrate the potential for improvement with the use of Q-Learning Pessimistic, suggesting its application to various types of games.
|
82 |
Deep Reinforcement Learning for the Optimization of Combining Raster Images in Forest PlanningWen, Yangyang January 2021 (has links)
Raster images represent the treatment options of how the forest will be cut. Economic benefits from cutting the forest will be generated after the treatment is selected and executed. Existing raster images have many clusters and small sizes, this becomes the principal cause of overhead. If we can fully explore the relationship among the raster images and combine the old data sets according to the optimization algorithm to generate a new raster image, then this result will surpass the existing raster images and create higher economic benefits. The question of this project is can we create a dynamic model that treats the updating pixel’s status as an agent selecting options for an empty raster image in response to neighborhood environmental and landscape parameters. This project is trying to explore if it is realistic to use deep reinforcement learning to generate new and superior raster images. Finally, this project aims to explore the feasibility, usefulness, and effectiveness of deep reinforcement learning algorithms in optimizing existing treatment options. The problem was modeled as a Markov decision process, in which the pixel to be updated was an agent of the empty raster image, which would determine the choice of the treatment option for the current empty pixel. This project used the Deep Q learning neural network model to calculate the Q values. The temporal difference reinforcement learning algorithm was applied to predict future rewards and to update model parameters. After the modeling was completed, this project set up the model usefulness experiment to test the usefulness of the model. Then the parameter correlation experiment was set to test the correlation between the parameters and the benefit of the model. Finally, the trained model was used to generate a larger size raster image to test its effectiveness.
|
83 |
Simulated Fixed-Wing Aircraft Attitude Control using Reinforcement Learning MethodsDavid Jona Richter (11820452) 20 December 2021 (has links)
<div>Autonomous transportation is a research field that has gained huge interest in recent years, with autonomous electric or hydrogen cars coming ever closer to seeing everyday use. Not just cars are subject to autonomous research though, the field of aviation is also being explored for fully autonomous flight. One very important aspect for making autonomous flight a reality is attitude control, the control of roll, pitch, and sometimes yaw. Traditional approaches for automated attitude control use PID (proportional-integral-derivative) controllers, which use hand-tuned parameters to fulfill the task. In this work, however, the use of Reinforcement Learning algorithms for attitude control will be explored. With the surge of more and more powerful artificial neural networks, which have proven to be universally usable function approximators, Deep Reinforcement Learning also becomes an intriguing option. </div><div>A software toolkit will be developed and used to allow for the use of multiple flight simulators to train agents with Reinforcement Learning as well as Deep Reinforcement Learning. Experiments will be run using different hyperparamters, algorithms, state representations, and reward functions to explore possible options for autonomous attitude control using Reinforcement Learning.</div>
|
84 |
Policy-based Reinforcement learning control for window opening and closing in an office buildingKaisaravalli Bhojraj, Gokul, Markonda, Yeswanth Surya Achyut January 2020 (has links)
The level of indoor comfort can highly be influenced by window opening and closing behavior of the occupant in an office building. It will not only affect the comfort level but also affects the energy consumption, if not properly managed. This occupant behavior is not easy to predict and control in conventional way. Nowadays, to call a system smart it must learn user behavior, as it gives valuable information to the controlling system. To make an efficient way of controlling a window, we propose RL (Reinforcement Learning) in our thesis which should be able to learn user behavior and maintain optimal indoor climate. This model free nature of RL gives the flexibility in developing an intelligent control system in a simpler way, compared to that of the conventional techniques. Data in our thesis is taken from an office building in Beijing. There has been implementation of Value-based Reinforcement learning before for controlling the window, but here in this thesis we are applying policy-based RL (REINFORCE algorithm) and also compare our results with value-based (Q-learning) and there by getting a better idea, which suits better for the task that we have in our hand and also to explore how they behave. Based on our work it is found that policy based RL provides a great trade-off in maintaining optimal indoor temperature and learning occupant’s behavior, which is important for a system to be called smart.
|
85 |
A Comparative Study of Reinforcement-based and Semi-classical Learning in Sensor FusionBodén, Johan January 2021 (has links)
Reinforcement learning has proven itself very useful in certain areas, such as games. However, the approach has been seen as quite limited. Reinforcement-based learning has for instance not been commonly used for classification tasks as it is receiving feedback on how well it did for an action performed on a specific input. This slows the performance convergence rate as compared to other classification approaches which has the input and the corresponding output to train on. Nevertheless, this thesis aims to investigate whether reinforcement-based learning could successfully be employed on a classification task. Moreover, as sensor fusion is an expanding field which can for instance assist autonomous vehicles in understanding its surroundings, it is also interesting to see how sensor fusion, i.e., fusion between lidar and RGB images, could increase the performance in a classification task. In this thesis, a reinforcement-based learning approach is compared to a semi-classical approach. As an example of a reinforcement learning model, a deep Q-learning network was chosen, and a support vector machine classifier built on top of a deep neural network, was chosen as an example of a semi-classical model. In this work, these frameworks are compared with and without sensor fusion to see whether fusion improves their performance. Experiments show that the evaluated reinforcement-based learning approach underperforms in terms of metrics but mainly due to its slow learning process, in comparison to the semi-classical approach. However, on the other hand using reinforcement-based learning to carry out a classification task could still in some cases be advantageous, as it still performs fairly well in terms of the metrics presented in this work, e.g. F1-score, or for instance imbalanced datasets. As for the impact of sensor fusion, a notable improvement can be seen, e.g. when training the deep Q-learning model for 50 episodes, the F1-score increased with 0.1329; especially, when taking into account that the most of the lidar data used in the fusion is lost since this work projects the 3D lidar data onto the same 2D plane as the RGB images.
|
86 |
Offline Reinforcement Learning for Remote Electrical Tilt Optimization : An application of Conservative Q-Learning / Offline förstärkningsinlärning för fjärran antennlutningsoptimering : En tillämpning av konservativ Q-inlärningKastengren, Marcus January 2021 (has links)
In telecom networks adjusting the tilt of antennas in an optimal manner, the so called remote electrical tilt (RET) optimization, is a method to ensure quality of service (QoS) for network users. Tilt adjustments made during operations in real-world networks are usually executed through a suboptimal policy, and a significant amount of data is collected during the execution of such policy. The policy collecting the data is known as the behavior policy and can be used to learn improved tilt update policies in an offline manner. In this thesis the RET optimization problem is formulated in a offline Reinforcement Learning (RL) setting, where the objective is to learn an optimal policy from batches of data collected by the logging policy. Offline RL is a challenging problem where traditional RL algorithms can fail to learn policies that will perform well when evaluated online.In this thesis Conservative Q-learning (CQL) is applied to tackle the challenges of offline RL, with the purpose of learning improved policies for tilt adjustment from data in a simulated environment. Experiments are made with different types of function approximators to model the Q-function. Specifically, an Artificial Neural Network (ANN) and a linear model are employed in the experiments. With linear function approximation, two novel algorithms which combine the properties of CQL and the classic Least Squares Policy Iteration (LSPI) algorithm are proposed. They are also used for learning RET adjustment policies. In online evaluation in the simulator one of the proposed algorithms with simple linear function approximation achieves similar results to CQL with the more complex artificial neural network function approximator. These versions of CQL outperform both the behavior policy and the naive Deep Q-Networks (DQN) method. / I telekomnätverk är justering av lutningen av antenner, kallat Remote Electrical Tilt (RET) optimering en metod för att säkerställa servicekvalitet för användare av nätverket. Justeringar under drift är gjorda med ickeoptimala riktlinjer men gjort på ett säkert sätt och data samlas in under driften. Denna datan kan potentiellt användas för att skaffa fram bättre riktlinjer för att justera antennlutningen.Antennlutningsproblemet kan formuleras som ett offline-förstärkandeinlärningsproblem, där målet är att ta fram optimala riktlinjer från ett dataset. Offline-förstärkningsinlärning är ett utmanande problem där naiva implementationer av traditionella förstärkningsinlärnings-algoritmer kan fallera.I denna masteruppsats används metoden konservativ Q-inlärning (CQL) för att tackla utmaningarna hos offline-förstärkningsinlärning och för att hitta förbättrade riktlinjer för antennlutningsjusteringar i en simulerad miljö. Problem-uppställningens egenskaper gör att Q-inlärningsmetoder som CQL behöver funktions-approximatorer för modellera Q-funktionen. I denna masteruppsats görs experiment med både expressiva artificiella neurala nätverk och linjära kombinationer av simpla basfunktioner som funktions-approximatorer.I fallet med linjära funktions-approximatorer så föreslås två nya algoritmer som kombinerar egenskaperna hos CQL med den klassiska förstäkningsinlärningsalgoritmen minsta-kvadrat policyiteration (LSPI) som sedan också används för att skapa riktlinjer för antennlutningsjustering.Resultaten visar att CQL med artificiella neurala nätverk och en av dom föreslagna algoritmerna kan lära sig riktlinjer med bättre resultat en både riktlinjerna som samlade in träningsdatan och den klassiska metoden djupt Q-nätverk applicerad offline.
|
87 |
Model-Free Reinforcement Learning for Hierarchical OO-MDPsGoldblatt, John Dallan 23 May 2022 (has links)
No description available.
|
88 |
Energy Efficient Communication Scheduling for IoT-based Waterbirds Monitoring: Decentralized StrategiesSobirov, Otabek January 2022 (has links)
Monitoring waterbirds have several benefits, including analyzing the number of endangered species, giving a reliable indication of public health, etc. Monitoring waterbirds in their habitat is a challenging task since the location is distant, and the collection of monitoring data requires large bandwidth. A promising technology to tackle these challenges is thought to be Wireless Multimedia Sensor Networks (WMSN). These networks are composed of small energy-constrained IoT devices that communicate together to collect data or monitor a given location. Performances in such networks are impacted by not only upper-layer protocols (transmission, routing, application layer) but also Medium Access Control (MAC) Layer. Therefore, improvement in this layer can increase the performance considerably. Traditional contention-based MAC modes like CSMA have large energy expenditure even though they have a good network performance profile. Energy-constrained devices cannot have a long lifespan with this type of MAC layer technology. Therefore, the IEEE 802.15.4e amendment proposed TSCH MAC mode which takes advantage of time-slotted access and channel hopping techniques. IETF integrated TSCH protocol into IPv6-based wireless sensor networks and standardized it as 6TiSCH which is a unique protocol stack for Low-Power and Lossy Networks (LLN). WMSN applications (e.g. Waterbirds Monitoring Application) generates heterogeneous traffic. Heterogeneous traffic can be defined as a mixture of different traffic types (light: temperature, humidity, etc. and heavy: audio, picture, video, etc.). TSCH-based WMSNs are considered a fit for this kind of traffic since they provide better performance and low power usage. Yet, the 6TiSCH Working Group left open the scheduling of TSCH communication for industries to make TSCH more easily adaptable to any kind of application. Until now, there have been a huge number of scheduling algorithms from industries and academia. Each scheduling algorithm has a different objective that maximizes the network performance of a specific application. This thesis work studies the most recent state-of-the-art scheduling algorithms (protocols) and compares them in a unique simulation environment with heterogeneous traffic to find out which protocol performs well while maintaining low energy consumption. Particularly, this work studies a new approach in TSCH scheduling which is Reinforcement Learning based scheduling. We implemented one of the state-of-the-art RL-based schedulers in Contiki-NG and included it in our comparison of TSCH schedulers. The experiment results showed that the RL-based scheduler implemented in this work demonstrated better performance in PDR and latency compared to other scheduling protocols. However, it presented high energy usage. On the other hand, Orchestra performed well while keeping the energy expenditure of nodes at a low level.
|
89 |
Energy Sustainable Reinforcement Learning-based Adaptive Duty-Cycling in Wireless Sensor Networks-based Internet of Things NetworksCharef, Nadia January 2023 (has links)
The Internet of Things (IoT) is widely adopted across various fields due to its flexibility and low cost. Energy-harvesting Wireless Sensor Networks (WSNs) are becoming a building block of many IoT applications and provide a perpetual source of energy to power energy-constrained IoT devices. However, the dynamic and stochastic nature of the available harvested energy drives the need for adaptive energy management solutions. Duty cycling is among the most prominent adaptive approaches that help consolidate the effort of energy management solutions at the routing and application layers to ensure energy sustainability and, hence, continuous network operation. The IEEE 802.15.4 standard defines the physical layer and the Medium Access Control (MAC) sub-layer of low-data-rate wireless devices with limited energy consumption requirements. The MAC sub-layer’s functionalities include the scheduling of the duty cycle of individual devices. However, the scheduling of the duty cycle is left open to the industry. Various computational mechanisms are used to compute the duty cycle of IoT nodes to ensure optimal performance in energy sustainability and Quality of Service (QoS). Reinforcement Learning (RL) is the most employed mechanism in this context. The literature depicts various RL-based solutions to adjust the duty cycle of IoT devices to adapt to changes in the IoT environment. However, these solutions are usually tailored to specific scenarios or focus mainly on one aspect of the problem, namely QoS performance or energy limitation. This work proposes a generic adaptive duty cycling solution and evaluates its performance under different energy generation and traffic conditions. Moreover, it emphasizes the energy sustainability aspect while taking the QoS performance into account. While different approaches exist to achieve energy sustainability, Energy Neutral Operation (ENO)-based solutions provide the most prominent approach to ensure energy-sustainable performance. Nevertheless, these approaches do not necessarily guarantee optimal performance in QoS. This work adopts a Markov Decision Process (MDP) model from the literature that aims to minimize the distance from energy neutrality given the energy harvesting and ENO conditions. We introduce QoS penalties to the reward formulation to improve QoS performance. We start by examining the performance in QoS against the benchmarking solution. Then, we analyze the performance using different energy harvesting and consumption profiles to further assess QoS performance and determine if energy sustainability is still maintained under different conditions. The results prove more efficient utilization of harvested energy when available in abundance. However, one limitation to our solution occurs when energy demand is high, or harvested energy is scarce. In such cases, we observe degradation in QoS due to IoT nodes adopting a low-duty cycle to avoid energy depletion. We further study the effect this limitation has on the solution's scalability. We also attempt to address this problem by assessing the performance using a routing solution that balances load distribution and, hence, energy demand across the network.
|
90 |
Reinforcement Learning-Based Test Case Generation with Test Suite Prioritization for Android Application TestingKhan, Md Khorrom 07 1900 (has links)
This dissertation introduces a hybrid strategy for automated testing of Android applications that combines reinforcement learning and test suite prioritization. These approaches aim to improve the effectiveness of the testing process by employing reinforcement learning algorithms, namely Q-learning and SARSA (State-Action-Reward-State-Action), for automated test case generation. The studies provide compelling evidence that reinforcement learning techniques hold great potential in generating test cases that consistently achieve high code coverage; however, the generated test cases may not always be in the optimal order. In this study, novel test case prioritization methods are developed, leveraging pairwise event interactions coverage, application state coverage, and application activity coverage, so as to optimize the rates of code coverage specifically for SARSA-generated test cases. Additionally, test suite prioritization techniques are introduced based on UI element coverage, test case cost, and test case complexity to further enhance the ordering of SARSA-generated test cases. Empirical investigations demonstrate that applying the proposed test suite prioritization techniques to the test suites generated by the reinforcement learning algorithm SARSA improved the rates of code coverage over original orderings and random orderings of test cases.
|
Page generated in 0.0703 seconds