Spelling suggestions: "subject:"colearning"" "subject:"bylearning""
81 |
Deep Reinforcement Learning for the Optimization of Combining Raster Images in Forest PlanningWen, Yangyang January 2021 (has links)
Raster images represent the treatment options of how the forest will be cut. Economic benefits from cutting the forest will be generated after the treatment is selected and executed. Existing raster images have many clusters and small sizes, this becomes the principal cause of overhead. If we can fully explore the relationship among the raster images and combine the old data sets according to the optimization algorithm to generate a new raster image, then this result will surpass the existing raster images and create higher economic benefits. The question of this project is can we create a dynamic model that treats the updating pixel’s status as an agent selecting options for an empty raster image in response to neighborhood environmental and landscape parameters. This project is trying to explore if it is realistic to use deep reinforcement learning to generate new and superior raster images. Finally, this project aims to explore the feasibility, usefulness, and effectiveness of deep reinforcement learning algorithms in optimizing existing treatment options. The problem was modeled as a Markov decision process, in which the pixel to be updated was an agent of the empty raster image, which would determine the choice of the treatment option for the current empty pixel. This project used the Deep Q learning neural network model to calculate the Q values. The temporal difference reinforcement learning algorithm was applied to predict future rewards and to update model parameters. After the modeling was completed, this project set up the model usefulness experiment to test the usefulness of the model. Then the parameter correlation experiment was set to test the correlation between the parameters and the benefit of the model. Finally, the trained model was used to generate a larger size raster image to test its effectiveness.
|
82 |
Simulated Fixed-Wing Aircraft Attitude Control using Reinforcement Learning MethodsDavid Jona Richter (11820452) 20 December 2021 (has links)
<div>Autonomous transportation is a research field that has gained huge interest in recent years, with autonomous electric or hydrogen cars coming ever closer to seeing everyday use. Not just cars are subject to autonomous research though, the field of aviation is also being explored for fully autonomous flight. One very important aspect for making autonomous flight a reality is attitude control, the control of roll, pitch, and sometimes yaw. Traditional approaches for automated attitude control use PID (proportional-integral-derivative) controllers, which use hand-tuned parameters to fulfill the task. In this work, however, the use of Reinforcement Learning algorithms for attitude control will be explored. With the surge of more and more powerful artificial neural networks, which have proven to be universally usable function approximators, Deep Reinforcement Learning also becomes an intriguing option. </div><div>A software toolkit will be developed and used to allow for the use of multiple flight simulators to train agents with Reinforcement Learning as well as Deep Reinforcement Learning. Experiments will be run using different hyperparamters, algorithms, state representations, and reward functions to explore possible options for autonomous attitude control using Reinforcement Learning.</div>
|
83 |
Policy-based Reinforcement learning control for window opening and closing in an office buildingKaisaravalli Bhojraj, Gokul, Markonda, Yeswanth Surya Achyut January 2020 (has links)
The level of indoor comfort can highly be influenced by window opening and closing behavior of the occupant in an office building. It will not only affect the comfort level but also affects the energy consumption, if not properly managed. This occupant behavior is not easy to predict and control in conventional way. Nowadays, to call a system smart it must learn user behavior, as it gives valuable information to the controlling system. To make an efficient way of controlling a window, we propose RL (Reinforcement Learning) in our thesis which should be able to learn user behavior and maintain optimal indoor climate. This model free nature of RL gives the flexibility in developing an intelligent control system in a simpler way, compared to that of the conventional techniques. Data in our thesis is taken from an office building in Beijing. There has been implementation of Value-based Reinforcement learning before for controlling the window, but here in this thesis we are applying policy-based RL (REINFORCE algorithm) and also compare our results with value-based (Q-learning) and there by getting a better idea, which suits better for the task that we have in our hand and also to explore how they behave. Based on our work it is found that policy based RL provides a great trade-off in maintaining optimal indoor temperature and learning occupant’s behavior, which is important for a system to be called smart.
|
84 |
A Comparative Study of Reinforcement-based and Semi-classical Learning in Sensor FusionBodén, Johan January 2021 (has links)
Reinforcement learning has proven itself very useful in certain areas, such as games. However, the approach has been seen as quite limited. Reinforcement-based learning has for instance not been commonly used for classification tasks as it is receiving feedback on how well it did for an action performed on a specific input. This slows the performance convergence rate as compared to other classification approaches which has the input and the corresponding output to train on. Nevertheless, this thesis aims to investigate whether reinforcement-based learning could successfully be employed on a classification task. Moreover, as sensor fusion is an expanding field which can for instance assist autonomous vehicles in understanding its surroundings, it is also interesting to see how sensor fusion, i.e., fusion between lidar and RGB images, could increase the performance in a classification task. In this thesis, a reinforcement-based learning approach is compared to a semi-classical approach. As an example of a reinforcement learning model, a deep Q-learning network was chosen, and a support vector machine classifier built on top of a deep neural network, was chosen as an example of a semi-classical model. In this work, these frameworks are compared with and without sensor fusion to see whether fusion improves their performance. Experiments show that the evaluated reinforcement-based learning approach underperforms in terms of metrics but mainly due to its slow learning process, in comparison to the semi-classical approach. However, on the other hand using reinforcement-based learning to carry out a classification task could still in some cases be advantageous, as it still performs fairly well in terms of the metrics presented in this work, e.g. F1-score, or for instance imbalanced datasets. As for the impact of sensor fusion, a notable improvement can be seen, e.g. when training the deep Q-learning model for 50 episodes, the F1-score increased with 0.1329; especially, when taking into account that the most of the lidar data used in the fusion is lost since this work projects the 3D lidar data onto the same 2D plane as the RGB images.
|
85 |
Offline Reinforcement Learning for Remote Electrical Tilt Optimization : An application of Conservative Q-Learning / Offline förstärkningsinlärning för fjärran antennlutningsoptimering : En tillämpning av konservativ Q-inlärningKastengren, Marcus January 2021 (has links)
In telecom networks adjusting the tilt of antennas in an optimal manner, the so called remote electrical tilt (RET) optimization, is a method to ensure quality of service (QoS) for network users. Tilt adjustments made during operations in real-world networks are usually executed through a suboptimal policy, and a significant amount of data is collected during the execution of such policy. The policy collecting the data is known as the behavior policy and can be used to learn improved tilt update policies in an offline manner. In this thesis the RET optimization problem is formulated in a offline Reinforcement Learning (RL) setting, where the objective is to learn an optimal policy from batches of data collected by the logging policy. Offline RL is a challenging problem where traditional RL algorithms can fail to learn policies that will perform well when evaluated online.In this thesis Conservative Q-learning (CQL) is applied to tackle the challenges of offline RL, with the purpose of learning improved policies for tilt adjustment from data in a simulated environment. Experiments are made with different types of function approximators to model the Q-function. Specifically, an Artificial Neural Network (ANN) and a linear model are employed in the experiments. With linear function approximation, two novel algorithms which combine the properties of CQL and the classic Least Squares Policy Iteration (LSPI) algorithm are proposed. They are also used for learning RET adjustment policies. In online evaluation in the simulator one of the proposed algorithms with simple linear function approximation achieves similar results to CQL with the more complex artificial neural network function approximator. These versions of CQL outperform both the behavior policy and the naive Deep Q-Networks (DQN) method. / I telekomnätverk är justering av lutningen av antenner, kallat Remote Electrical Tilt (RET) optimering en metod för att säkerställa servicekvalitet för användare av nätverket. Justeringar under drift är gjorda med ickeoptimala riktlinjer men gjort på ett säkert sätt och data samlas in under driften. Denna datan kan potentiellt användas för att skaffa fram bättre riktlinjer för att justera antennlutningen.Antennlutningsproblemet kan formuleras som ett offline-förstärkandeinlärningsproblem, där målet är att ta fram optimala riktlinjer från ett dataset. Offline-förstärkningsinlärning är ett utmanande problem där naiva implementationer av traditionella förstärkningsinlärnings-algoritmer kan fallera.I denna masteruppsats används metoden konservativ Q-inlärning (CQL) för att tackla utmaningarna hos offline-förstärkningsinlärning och för att hitta förbättrade riktlinjer för antennlutningsjusteringar i en simulerad miljö. Problem-uppställningens egenskaper gör att Q-inlärningsmetoder som CQL behöver funktions-approximatorer för modellera Q-funktionen. I denna masteruppsats görs experiment med både expressiva artificiella neurala nätverk och linjära kombinationer av simpla basfunktioner som funktions-approximatorer.I fallet med linjära funktions-approximatorer så föreslås två nya algoritmer som kombinerar egenskaperna hos CQL med den klassiska förstäkningsinlärningsalgoritmen minsta-kvadrat policyiteration (LSPI) som sedan också används för att skapa riktlinjer för antennlutningsjustering.Resultaten visar att CQL med artificiella neurala nätverk och en av dom föreslagna algoritmerna kan lära sig riktlinjer med bättre resultat en både riktlinjerna som samlade in träningsdatan och den klassiska metoden djupt Q-nätverk applicerad offline.
|
86 |
Model-Free Reinforcement Learning for Hierarchical OO-MDPsGoldblatt, John Dallan 23 May 2022 (has links)
No description available.
|
87 |
Energy Efficient Communication Scheduling for IoT-based Waterbirds Monitoring: Decentralized StrategiesSobirov, Otabek January 2022 (has links)
Monitoring waterbirds have several benefits, including analyzing the number of endangered species, giving a reliable indication of public health, etc. Monitoring waterbirds in their habitat is a challenging task since the location is distant, and the collection of monitoring data requires large bandwidth. A promising technology to tackle these challenges is thought to be Wireless Multimedia Sensor Networks (WMSN). These networks are composed of small energy-constrained IoT devices that communicate together to collect data or monitor a given location. Performances in such networks are impacted by not only upper-layer protocols (transmission, routing, application layer) but also Medium Access Control (MAC) Layer. Therefore, improvement in this layer can increase the performance considerably. Traditional contention-based MAC modes like CSMA have large energy expenditure even though they have a good network performance profile. Energy-constrained devices cannot have a long lifespan with this type of MAC layer technology. Therefore, the IEEE 802.15.4e amendment proposed TSCH MAC mode which takes advantage of time-slotted access and channel hopping techniques. IETF integrated TSCH protocol into IPv6-based wireless sensor networks and standardized it as 6TiSCH which is a unique protocol stack for Low-Power and Lossy Networks (LLN). WMSN applications (e.g. Waterbirds Monitoring Application) generates heterogeneous traffic. Heterogeneous traffic can be defined as a mixture of different traffic types (light: temperature, humidity, etc. and heavy: audio, picture, video, etc.). TSCH-based WMSNs are considered a fit for this kind of traffic since they provide better performance and low power usage. Yet, the 6TiSCH Working Group left open the scheduling of TSCH communication for industries to make TSCH more easily adaptable to any kind of application. Until now, there have been a huge number of scheduling algorithms from industries and academia. Each scheduling algorithm has a different objective that maximizes the network performance of a specific application. This thesis work studies the most recent state-of-the-art scheduling algorithms (protocols) and compares them in a unique simulation environment with heterogeneous traffic to find out which protocol performs well while maintaining low energy consumption. Particularly, this work studies a new approach in TSCH scheduling which is Reinforcement Learning based scheduling. We implemented one of the state-of-the-art RL-based schedulers in Contiki-NG and included it in our comparison of TSCH schedulers. The experiment results showed that the RL-based scheduler implemented in this work demonstrated better performance in PDR and latency compared to other scheduling protocols. However, it presented high energy usage. On the other hand, Orchestra performed well while keeping the energy expenditure of nodes at a low level.
|
88 |
Energy Sustainable Reinforcement Learning-based Adaptive Duty-Cycling in Wireless Sensor Networks-based Internet of Things NetworksCharef, Nadia January 2023 (has links)
The Internet of Things (IoT) is widely adopted across various fields due to its flexibility and low cost. Energy-harvesting Wireless Sensor Networks (WSNs) are becoming a building block of many IoT applications and provide a perpetual source of energy to power energy-constrained IoT devices. However, the dynamic and stochastic nature of the available harvested energy drives the need for adaptive energy management solutions. Duty cycling is among the most prominent adaptive approaches that help consolidate the effort of energy management solutions at the routing and application layers to ensure energy sustainability and, hence, continuous network operation. The IEEE 802.15.4 standard defines the physical layer and the Medium Access Control (MAC) sub-layer of low-data-rate wireless devices with limited energy consumption requirements. The MAC sub-layer’s functionalities include the scheduling of the duty cycle of individual devices. However, the scheduling of the duty cycle is left open to the industry. Various computational mechanisms are used to compute the duty cycle of IoT nodes to ensure optimal performance in energy sustainability and Quality of Service (QoS). Reinforcement Learning (RL) is the most employed mechanism in this context. The literature depicts various RL-based solutions to adjust the duty cycle of IoT devices to adapt to changes in the IoT environment. However, these solutions are usually tailored to specific scenarios or focus mainly on one aspect of the problem, namely QoS performance or energy limitation. This work proposes a generic adaptive duty cycling solution and evaluates its performance under different energy generation and traffic conditions. Moreover, it emphasizes the energy sustainability aspect while taking the QoS performance into account. While different approaches exist to achieve energy sustainability, Energy Neutral Operation (ENO)-based solutions provide the most prominent approach to ensure energy-sustainable performance. Nevertheless, these approaches do not necessarily guarantee optimal performance in QoS. This work adopts a Markov Decision Process (MDP) model from the literature that aims to minimize the distance from energy neutrality given the energy harvesting and ENO conditions. We introduce QoS penalties to the reward formulation to improve QoS performance. We start by examining the performance in QoS against the benchmarking solution. Then, we analyze the performance using different energy harvesting and consumption profiles to further assess QoS performance and determine if energy sustainability is still maintained under different conditions. The results prove more efficient utilization of harvested energy when available in abundance. However, one limitation to our solution occurs when energy demand is high, or harvested energy is scarce. In such cases, we observe degradation in QoS due to IoT nodes adopting a low-duty cycle to avoid energy depletion. We further study the effect this limitation has on the solution's scalability. We also attempt to address this problem by assessing the performance using a routing solution that balances load distribution and, hence, energy demand across the network.
|
89 |
Reinforcement Learning-Based Test Case Generation with Test Suite Prioritization for Android Application TestingKhan, Md Khorrom 07 1900 (has links)
This dissertation introduces a hybrid strategy for automated testing of Android applications that combines reinforcement learning and test suite prioritization. These approaches aim to improve the effectiveness of the testing process by employing reinforcement learning algorithms, namely Q-learning and SARSA (State-Action-Reward-State-Action), for automated test case generation. The studies provide compelling evidence that reinforcement learning techniques hold great potential in generating test cases that consistently achieve high code coverage; however, the generated test cases may not always be in the optimal order. In this study, novel test case prioritization methods are developed, leveraging pairwise event interactions coverage, application state coverage, and application activity coverage, so as to optimize the rates of code coverage specifically for SARSA-generated test cases. Additionally, test suite prioritization techniques are introduced based on UI element coverage, test case cost, and test case complexity to further enhance the ordering of SARSA-generated test cases. Empirical investigations demonstrate that applying the proposed test suite prioritization techniques to the test suites generated by the reinforcement learning algorithm SARSA improved the rates of code coverage over original orderings and random orderings of test cases.
|
90 |
Reinforcement Learning in Problems with Continuous Action Spaces : a Comparative StudyLarsson, Axel January 2021 (has links)
Reinforcement learning (RL) is one of the three main areas in machine learning (ML) with a solid theoretical background and progress. Generally, RL can provide solutions to many real- world applications, such as self-driving cars and protein folding. A class of RL problems with an infinite number of actions from each state has recently received significant attention, namely infinite action space RL problems. There are several standard algorithms for RL problems, and depending on the nature of the problem, one should choose a proper RL algorithm which can be a challenging task. To compare RL algorithms, we carefully implement them on different tasks and store the relevant results. To have a fair comparison, we tune the algorithms and iteratively test and update them beforehand. This study compares four different RL algorithms. Our results show that the RL algorithms that store the steps of their path, or have a model for the environment, have the highest rate of convergence. By updating the value of every step of the path after a reward, instead of only looking backward a single step, the algorithms find a solution faster and more often. Having a model to help the algorithm plan ahead also contributed to faster and more stable learning. RL algorithms that use a deep neural network for evaluation are the least stable. Our results can provide a good basis for selecting appropriate algorithms for infinite action space RL problems. It can be built upon, simplifying the development of improvements for researchers on the RL algorithms that exist today. / Förstärkningsinlärning är ett av de tre huvudområdena inom maskininlärning med en stark teoretisk bakgrund och stor utveckling. I allmänhet kan förstärkningsinlärning tillhandahålla lösningar för många applikationer som används i praktiken, såsom självkörande bilar och proteinveckning. En klass av förstärkningsinlärningsproblem med oändligt antal handlingar från varje tillstånd har nyligen fått betydande uppmärksamhet, nämligen förstärkningsinlärningsproblem med oändliga handlingsrum. Det finns flera standardalgoritmer för förstärkningsinlärningsproblem och en utmanande uppgift blir därför att välja en passande förstärkningsinlärningsalgoritm beroende på problemets natur. För att jämföra algoritmerna implementerar vi dem noggrant på olika uppgifter och lagrar relevanta resultat. För att få en rättvis jämförelse justerar vi och testar algoritmerna iterativt och uppdaterar dem i förväg. Denna studie jämför fyra olika förstärkningsinlärningsalgoritmer. Våra resultat visar att de algoritmer som lagrar varje steg under vägen, eller har en modell för miljön, har den högsta konvergensgraden. Genom att uppdatera värdet för varje steg på vägen efter en belöning, istället för att bara se bakåt ett steg, hittar algoritmerna en lösning snabbare och oftare. Att ha en modell för att hjälpa algoritmen att planera sina handlingar bidrar också till snabbare och mer stabilt lärande. Förstärkningsinlärningsalgoritmer som använder ett djupt neuralt nätverk för evaluering är minst stabila. Våra resultat kan ge en bra grund för att välja lämpliga algoritmer för förstärkningsinlärningsproblem med oändliga handlingsrum. Det här kan byggas vidare på, vilket förenklar utvecklingen av förbättringar för forskare på de förstärkningsinlärningsalgoritmer som finns idag.
|
Page generated in 0.0466 seconds