Spelling suggestions: "subject:"deepreinforcement learning"" "subject:"lessreinforcement learning""
51 |
Deep Reinforcement Learning for Building Control : A comparative study for applying Deep Reinforcement Learning to Building Energy Management / Djup förstärkningsinlärning för byggnadskontroll : En jämförande studie för att tillämpa djup förstärkningsinlärning på byggnadsenergihushållningZheng, Wanfu January 2022 (has links)
Energy and environment have become hot topics in the world. The building sector accounts for a high proportion of energy consumption, with over one-third of energy use globally. A variety of optimization methods have been proposed for building energy management, which are mainly divided into two types: model-based and model-free. Model Predictive Control is a model-based method but is not widely adopted by the building industry as it requires too much expertise and time to develop a model. Model-free Deep Reinforcement Learning(DRL) has successful applications in game-playing and robotics control. Therefore, we explored the effectiveness of the DRL algorithms applied to building control and investigated which DRL algorithm performs best. Three DRL algorithms were implemented, namely, Deep Deterministic Policy Gradient(DDPG), Double Deep Q learning(DDQN) and Soft Actor Critic(SAC). We used the building optimization testing (BOPTEST) framework, a standardized virtual testbed, to test the DRL algorithms. The performance is evaluated by two Key Performance Indicators(KPIs): thermal discomfort and operational cost. The results show that the DDPG agent performs best, and outperforms the baseline with the saving of thermal discomfort by 91.5% and 18.3%, and the saving of the operational cost by 11.0% and 14.6% during the peak and typical heating periods, respectively. DDQN and SAC agents do not show a clear advantage of performance over the baseline. This research highlights the excellent control performance of the DDPG agent, suggesting that the application of DRL in building control can achieve a better performance than the conventional control method. / Energi och miljö blir heta ämnen i världen. Byggsektorn står för en hög andel av energiförbrukningen, med över en tredjedel av energianvändningen globalt. En mängd olika optimeringsmetoder har föreslagits för Building Energy Management, vilka huvudsakligen är uppdelade i två typer: modellbaserade och modellfria. Model Predictive Control är en modellbaserad metod men är inte allmänt antagen av byggbranschen eftersom det kräver för mycket expertis och tid för att utveckla en modell. Modellfri Deep Reinforcement Learning (DRL) har framgångsrika tillämpningar inom spel och robotstyrning. Därför undersökte vi effektiviteten av DRL-algoritmerna som tillämpas på byggnadskontroll och undersökte vilken DRL-algoritm som presterar bäst. Tre DRL-algoritmer implementerades, nämligen Deep Deterministic Policy Gradient (DDPG), Double Deep Q Learning (DDQN) och Soft Actor Critic (SAC). Vi använde ramverket Building Optimization Testing (BOPTEST), en standardiserad virtuell testbädd, för att testa DRL-algoritmerna. Prestandan utvärderas av två Key Performance Indicators (KPIs): termiskt obehag och driftskostnad. Resultaten visar att DDPG-medlet presterar bäst och överträffar baslinjen med besparingen av termiskt obehag med 91.5% och 18.3%, och besparingen av driftskostnaden med 11.0% och 14.6% under topp och typisk uppvärmning perioder, respektive. DDQN- och SAC-agenter visar inte en klar fördel i prestanda jämfört med baslinjen. Denna forskning belyser DDPG-medlets utmärkta prestanda, vilket tyder på att tillämpningen av DRL i byggnadskontroll kan uppnå bättre prestanda än den konventionella metoden.
|
52 |
Reinforcement learning for EV charging optimization : A holistic perspective for commercial vehicle fleetsCording, Enzo Alexander January 2023 (has links)
Recent years have seen an unprecedented uptake in electric vehicles, driven by the global push to reduce carbon emissions. At the same time, intermittent renewables are being deployed increasingly. These developments are putting flexibility measures such as dynamic load management in the spotlight of the energy transition. Flexibility measures must consider EV charging, as it has the ability to introduce grid constraints: In Germany, the cumulative power of all EV onboard chargers amounts to ca. 120 GW, while the German peak load only amounts to 80 GW. Commercial operations have strong incentives to optimize charging and flatten peak loads in real-time, given that the highest quarter-hour can determine the power-related energy bill, and that a blown fuse due to overloading can halt operations. Increasing research efforts have therefore gone into real-time-capable optimization methods. Reinforcement Learning (RL) has particularly gained attention due to its versatility, performance and realtime capabilities. This thesis implements such an approach and introduces FleetRL as a realistic RL environment for EV charging, with a focus on commercial vehicle fleets. Through its implementation, it was found that RL saved up to 83% compared to static benchmarks, and that grid overloading was entirely avoided in some scenariosby sacrificing small portions of SOC, or by delaying the charging process. Linear optimization with one year of perfect knowledge outperformed RL, but reached its practical limits in one use-case, where a feasible solution could not be found by thesolver. Overall, this thesis makes a strong case for RL-based EV charging. It further provides a foundation which can be built upon: a modular, open-source software framework that integrates an MDP model, schedule generation, and non-linear battery degradation. / Elektrifieringen av transportsektorn är en nödvändig men utmanande uppgift. I kombination med ökande solcellsproduktion och förnybara energikällor skapar det ett dilemma för elnätet som kräver omfattande flexibilitetsåtgärder. Dessa åtgärder måste inkludera laddning av elbilar, ett fenomen som har lett till aldrig tidigare skådade belastningstoppar. Ur ett kommersiellt perspektiv är incitamentet att optimera laddningsprocessen och säkerställa drifttid. Forskningen har fokuserat på realtidsoptimeringsmetoder som Deep Reinforcement Learning (DRL). Denna avhandling introducerar FleetRL som en ny RL-miljö för EV-laddning av kommersiella flottor. Genom att tillämpa ramverket visade det sig att RL sparade upp till 83% jämfört med statiska riktmärken, och att överbelastning av nätet helt kunde undvikas i de flesta scenarier. Linjär optimering överträffade RL men nådde sina gränser i snävt begränsade användningsfall. Efter att ha funnit ett positivt business case förvarje kommersiellt användningsområde, ger denna avhandling ett starkt argument för RL-baserad laddning och en grund för framtida arbete via praktiska insikter och ett modulärt mjukvaruramverk med öppen källkod.
|
53 |
Robust Deep Reinforcement Learning for Portfolio ManagementMasoudi, Mohammad Amin 27 September 2021 (has links)
In Finance, the use of Automated Trading Systems (ATS) on markets is growing every year and the trades generated by an algorithm now account for most of orders that arrive at stock exchanges (Kissell, 2020). Historically, these systems were based on advanced statistical methods and signal processing designed to extract trading signals from financial data. The recent success of Machine Learning has attracted the interest of the financial community. Reinforcement Learning is a subcategory of machine learning and has been broadly applied by investors and researchers in building trading systems (Kissell, 2020). In this thesis, we address the issue that deep reinforcement learning may be susceptible to sampling errors and over-fitting and propose a robust deep reinforcement learning method that integrates techniques from reinforcement learning and robust optimization. We back-test and compare the performance of the developed algorithm, Robust DDPG, with UBAH (Uniform Buy and Hold) benchmark and other RL algorithms and show that the robust algorithm of this research can reduce the downside risk of an investment strategy significantly and can ensure a safer path for the investor’s portfolio value.
|
54 |
Nuclear Renewable Integrated Energy System Power Dispatch Optimization forTightly Coupled Co-Simulation Environment using Deep Reinforcement LearningSah, Suba January 2021 (has links)
No description available.
|
55 |
Slice-Aware Radio Resource Management for Future Mobile NetworksKhodapanah, Behnam 05 June 2023 (has links)
The concept of network slicing has been introduced in order to enable mobile networks to accommodate multiple heterogeneous use cases that are anticipated to be served within a single physical infrastructure. The slices are end-to-end virtual networks that share the resources of a physical network, spanning the core network (CN) and the radio access network (RAN). RAN slicing can be more challenging than CN slicing as the former deals with the distribution of radio resources, where the capacity is not constant over time and is hard to extend. The main challenge in RAN slicing is to simultaneously improve multiplexing gains while assuring enough isolation between slices, meaning one of the slices cannot negatively influence other slices. In this work, a flexible and configurable framework for RAN slicing is provided, where diverse requirements of slices are taken into account, and slice management algorithms adjust the control parameters of different radio resource management (RRM) mechanisms to satisfy the slices' service level agreements (SLAs). A new entity that translates the key performance indicator (KPI) targets of the SLAs to the control parameters is introduced and is called RAN slice orchestrator. Diverse algorithms governing this entity are introduced, which range from heuristics-based to model-free methods. Besides, a protection mechanism is constructed to prevent the negative influences of slices on each other's performances. The simulation-based analysis demonstrates the feasibility of slicing the RAN with multiplexing gains and slice isolation.
|
56 |
Towards provably safe and robust learning-enabled systemsFan, Jiameng 26 August 2022 (has links)
Machine learning (ML) has demonstrated great success in numerous complicated tasks. Fueled by these advances, many real-world systems like autonomous vehicles and aircraft are adopting ML techniques by adding learning-enabled components. Unfortunately, ML models widely used today, like neural networks, lack the necessary mathematical framework to provide formal guarantees on safety, causing growing concerns over these learning-enabled systems in safety-critical settings. In this dissertation, we tackle this problem by combining formal methods and machine learning to bring provable safety and robustness to learning-enabled systems.
We first study the robustness verification problem of neural networks on classification tasks. We focus on providing provable safety guarantees on the absence of failures under arbitrarily strong adversaries. We propose an efficient neural network verifier LayR to compute a guaranteed and overapproximated range for the output of a neural network given an input set which contains all possible adversarially perturbed inputs. LayR relaxes nonlinear units in neural networks using linear bounds and refines such relaxations with mixed integer linear programming (MILP) to iteratively improve the approximation precision, which achieves tighter output range estimations compared to prior neural network verifiers. However, the neural network verifier focuses more on analyzing a trained neural network but less on learning provably safe neural networks. To tackle this problem, we study verifiable training that incorporates verification into training procedures to train provably safe neural networks and scale to larger models and datasets. We propose a novel framework, AdvIBP, for combining adversarial training and provable robustness verification. We show that the proposed framework can learn provable robust neural networks at a sublinear convergence rate.
In the second part of the dissertation, we study the verification of system-level properties in neural-network controlled systems (NNCS). We focus on proving bounded-time safety properties by computing reachable sets. We first introduce two efficient NNCS verifiers ReachNN* and POLAR that construct polynomial-based overapproximations of neural-network controllers. We transfer NNCSs to tractable closed-loop systems with approximated polynomial controllers for computing reachable sets using existing reachability analysis tools of dynamical systems. The combination of polynomial overapproximations and reachability analysis tools opens promising directions for NNCS verification. We also include a survey and experimental study of existing NNCS verification methods, including combining state-of-the-art neural network verifiers with reachability analysis tools, to discuss what overapproximation is suitable for NNCS reachability analysis. While these verifiers enable proving safety properties of NNCS, the nonlinearity of neural-network controllers is the main bottleneck that limits their efficiency and scalability. We propose a novel framework of knowledge distillation to control “the degree of nonlinearity” of NN controllers to ease NNCS verification which improves provable safety of NNCSs especially when they are safe but cannot be verified due to their complexity. For the verification community, this opens up the possibility of reducing verification complexity by influencing how a system is trained.
Though NNCS verification can prove safety when system models are known, modern deep learning, e.g., deep reinforcement learning (DRL), often targets tasks with unknown system models, also known as the model-free setting. To tackle this issue, we first focus on safe exploration of DRL and propose a novel Lyapunov-inspired method. Our method uses Gaussian Process models to provide probabilistic guarantees on the policies, and guide the exploration of the unknown environment in a safe fashion. Then, we study learning robust visual control policies in DRL to enhance the robustness against visual changes that were not seen during training. We propose a method DRIBO, which can learn robust state representations for RL via a novel contrastive version of the Multi-View Information Bottleneck (MIB). This approach enables us to train high-performance visual policies that are robust to visual distractions, and can generalize well to unseen environments.
|
57 |
Voltage-Based Multi-step Prediction : Data Labeling, Software Evaluation, and Contrasting DRL with Traditional Prediction MethodsSvensson, Joakim January 2023 (has links)
In this project, three primary problems were addressed to improve battery data management and software performance evaluation. All solutions used voltage values in time together with various device characteristics. Battery replacement labeling was performed using Hidden Markov Models. Both deep reinforcement learning, specifically TD3 with an LSTM layer, and traditional models were employed to predict future battery voltages. These predictions subsequently informed a developed novel method for early evaluation of software impact on battery performance. A baseline model was also introduced for optimal battery replacement timing. Results indicated that the TD3-LSTM model achieved a mean absolute percentage error below 5%, on par with traditional methods. The battery replacement labeling had above 85% correctly labeled replacements, impact on battery performance was above 90% correct in software comparisons. TD3-LSTM proved a viable choice for multi-step predictions requiring online learning, albeit requiring potentially more tuning. / I detta projekt behandlades tre primära problem i syfte att förbättra batteridatahantering och utvärdering av mjukvaruprestanda. Alla lösningar använde spänningsvärden i tid tillsammans med olika enhetsegenskaper. Batteribytesmärkning utfördes med hjälp av Hidden Markov Models. Både deep reinforcement learning, särskilt TD3 med ett LSTM-lager, och traditionella modeller användes för att förutsäga framtida batterispänningar. Dessa förutsägelser användes sedan i en framtagen ny metod för tidig utvärdering av mjukvarans påverkan på batteriprestanda. En basmodell introducerades också för optimal batteribytestid. Resultaten indikerade att TD3-LSTM modellen uppnådde ett genomsnittligt absolut procentfel under 5%, i nivå med traditionella metoder. Batteribytesmärkningen hade över 85% korrekt märkta batteribyten, inverkan på batteriprestanda var över 90% korrekt i mjukvarujämförelser. TD3-LSTM visade sig vara ett hållbart val för flerstegsförutsägelser som kräver onlineinlärning, även om det krävde potentiellt mer justering.
|
58 |
Deep reinforcement learning for automated building climate controlSnällfot, Erik, Hörnberg, Martin January 2024 (has links)
The building sector is the single largest contributor to greenhouse gas emissions, making it a natural focal point for reducing energy consumption. More efficient use of energy is also becoming increasingly important for property managers as global energy prices are skyrocketing. This report is conducted on behalf of Sustainable Intelligence, a Swedish company that specializes in building automation solutions. It investigates whether deep reinforcement learning (DLR) algorithms can be implemented in a building control environment, if it can be more effective than traditional solutions, and if it can be achieved in reasonable time. The algorithms that were tested were Deep Deterministic Policy Gradient, DDPG, and Proximal Policy Optimization, PPO. They were implemented in a simulated BOPTEST environment in Brussels, Belgium, along with a traditional heating curve and a PI-controller for benchmarks. DDPG never converged, but PPO managed to reduce energy consumption compared to the best benchmark, while only having slightly worse thermal discomfort. The results indicate that DRL algorithms can be implemented in a building environment and reduce green house gas emissions in a reasonable training time. This might especially be interesting in a complex building where DRL can adapt and scale better than traditional solutions. Further research along with implementations on physical buildings need to be done in order to determine if DRL is the superior option.
|
59 |
Heterogeneous IoT Network Architecture Design for Age of Information MinimizationXia, Xiaohao 01 February 2023 (has links) (PDF)
Timely data collection and execution in heterogeneous Internet of Things (IoT) networks in which different protocols and spectrum bands coexist such as WiFi, RFID, Zigbee, and LoRa, requires further investigation. This thesis studies the problem of age-of-information minimization in heterogeneous IoT networks consisting of heterogeneous IoT devices, an intermediate layer of multi-protocol mobile gateways (M-MGs) that collects and relays data from IoT objects and performs computing tasks, and heterogeneous access points (APs). A federated matching framework is presented to model the collaboration between different service providers (SPs) to deploy and share M-MGs and minimize the average weighted sum of the age-of-information and energy consumption. Further, we develop a two-level multi-protocol multi-agent actor-critic (MP-MAAC) to solve the optimization problem, where M-MGs and SPs can learn collaborative strategies through their own observations. The M-MGs' strategies include selecting IoT objects for data collection, execution, relaying, and/or offloading to SPs’ access points while SPs decide on spectrum allocation. Finally, to improve the convergence of the learning process we incorporate federated learning into the multi-agent collaborative framework. The numerical results show that our Fed-Match algorithm reduces the AoI by factor four, collects twice more packets than existing approaches, reduces the penalty by factor five when enabling relaying, and establishes design principles for the stability of the training process.
|
60 |
Adversarial Reinforcement Learning for Control System Design: A Deep Reinforcement Learning ApproachYang, Zhaoyuan, Yang 15 August 2018 (has links)
No description available.
|
Page generated in 0.1201 seconds