Spelling suggestions: "subject:"deepreinforcement learning"" "subject:"lessreinforcement learning""
91 |
Deep Reinforcement Learning on Social Environment Aware Navigation based on MapsSanchez, Victor January 2023 (has links)
Reinforcement learning (RL) has seen a fast expansion in recent years of its successful application to a range of decision-making and complex control tasks. Moreover, deep learning offers RL the opportunity to enlarge its spectrum of complex fields. Social Robotics is a domain that involves challenges like Human-Robot Interaction which bears inspiration for development in deep RL. Autonomous systems demand a fast and efficient environment perception so as to guarantee safety. However, while being attentive to its surrounding, a robot needs to take decisions to navigate optimally and avoid potential obstacles. In this thesis, we investigate a deep RL method for mobile robot end-to-end navigation in a social environment. Using the observation collected in a simulation environment, a convolutional neural network is trained to predict an appropriate set of discrete angular and linear velocities for a robot based on its egocentric local occupancy grid map. We compare a random learning way to a curriculum learning approach to ameliorate speed convergence during training. We divide the main problem by analysing separately end-to-end navigation and obstacle avoidance in static and dynamic environments. For each problem, we propose an adaptation that aims to improve the surrounding awareness of the agent. The qualitative and quantitative evaluations of the investigated approach were performed in simulations. The results show that the end-to-end navigation map-based model is easy to set up and shows similar performance as a Model Predictive Control approach. However, we discern that obstacle avoidance is harder to translate to a deep RL framework. Despite this difficulty, using different RL methods and configurations will definitely help and bring ideas for improvement for future work. / Förstärkande Inlärning (RL) har sett en snabb expansion de senaste åren för sin fruktbara tillämpning på en rad beslutsfattande och komplexa kontrolluppgifter. Dessutom erbjuder djupinlärning RL möjligheten att utöka sitt spektrum till komplexa områden. Social Robotics är en domän som involverar utmaningar som människa-robot interaktion som bär inspiration för utveckling i djup RL. Autonoma system kräver en snabb och effektiv miljöuppfattning för att garantera säkerheten. Men samtidigt som den är uppmärksam på sin omgivning, måste en robot fatta beslut för att navigera optimalt och undvika potentiella hinder. I detta examensarbete undersöker vi en djup RL-metod för mobil robot-end-to-end-navigering i en social miljö. Med hjälp av observationen som samlats in i en simuleringsmiljö tränas ett faltningsneuralt nätverk för att förutsäga en lämplig uppsättning diskreta vinkel- och linjärhastigheter för en robot baserat på dess egocentriska rutnätskarta över lokala beläggningar. Vi jämför ett slumpmässigt inlärningssätt med läroplansinlärningsmetod för att förbättra hastighetskonvergensen. Vi delar upp huvudproblemet genom att separat analysera end-to-end-navigering och undvikande av hinder i statisk och dynamisk miljö. För varje problem föreslår vi en anpassning som syftar till att agenten bättre förstår sin omgivning. De kvalitativa och kvantitativa utvärderingarna av det undersökta tillvägagångssättet utfördes endast i simuleringar. Resultaten visar att den heltäckande navigationskartbaserade modellen är lätt att distribuera och visar liknande prestanda som en modell för prediktiv kontroll. Vi ser dock att undvikande av hinder är svårare att översätta till ett djupt RL-ramverk. Trots denna svårighet kommer användning av olika RL-metoder och konfiguration definitivt att hjälpa och ge idéer om förbättringar för framtida arbete. / L’apprentissage par renforcement (RL) a connu une expansion rapide ces dernières années pour ses applications à une gamme de tâches de prise de décision et de contrôle complexes. Le deep learning offre au RL la possibilité d’élargir son spectre à des domaines complexes. La robotique sociale est un domaine qui implique des défis tels que l’interaction homme-robot, source d’inspiration pour le développement en RL profond. Les systèmes autonomes exigent une perception rapide et efficace de l’environnement afin de garantir la sécurité. Cependant, tout en étant attentif à son environnement, un robot doit prendre des décisions pour naviguer de manière optimale et éviter les obstacles potentiels. Dans cette thèse, nous étudions une méthode de RL profond pour la navigation de bout a bout de robots mobiles dans un environnement social. À l’aide de l’observation recueillie dans un environnement de simulation, un réseau neuronal convolutif prédit un ensemble adapté de vitesses angulaires et linéaires discrètes pour un robot en fonction de sa carte de grille d’occupation locale égocentrique. Nous comparons une méthode d’apprentissage aléatoire à une approche d’apprentissage du curriculum pour accelerer la convergence durant l’entrainement. Nous divisons le problème principal en analysant séparément la navigation de bout a bout et l’évitement d’obstacles dans un environnement statique et dynamique. Pour chaque problème, nous proposons une adaptation visant à ce que l’agent comprenne mieux son environnement. Les évaluations qualitatives et quantitatives de l’approche étudiée ont été effectuées uniquement dans des simulations. Les résultats montrent que le modèle basé sur la carte de navigation de bout en bout est facile à déployer et affiche des performances similaires à celles d’une approche de contrôle prédictif de modèle. Cependant, nous discernons que l’évitement d’obstacles est plus difficile à traduire dans un cadre RL profond. Malgré cette difficulté, l’utilisation de différentes méthodes et configurations RL aidera certainement et apportera une idée d’amélioration pour les travaux futurs.
|
92 |
Remembering how to walk - Using Active Dendrite Networks to Drive Physical Animations / Att minnas att gå - användning av Active Dendrite Nätverk för att driva fysiska animeringarHenriksson, Klas January 2023 (has links)
Creating embodied agents capable of performing a wide range of tasks in different types of environments has been a longstanding challenge in deep reinforcement learning. A novel network architecture introduced in 2021 called the Active Dendrite Network [A. Iyer et al., “Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments”] designed to create sparse subnetworks for different tasks showed promising multi-tasking performance on the Meta-World [T. Yu et al., “Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning”] multi-tasking benchmark. This thesis further explores the performance of this novel architecture in a multi-tasking environment focused on physical animations and locomotion. Specifically we implement and compare the architecture to the commonly used Multi-Layer Perceptron (MLP) architecture on a multi-task reinforcement learning problem in a video-game setting consisting of training a hexapedal agent on a set of locomotion tasks involving moving at different speeds, turning and standing still. The evaluation focused on two areas: (1) Assessing the average overall performance of the Active Dendrite Network relative to the MLP on a set of locomotive scenarios featuring our behaviour sets and environments. (2) Assessing the relative impact Active Dendrite networks have on transfer learning between related tasks by comparing their performance on novel behaviours shortly after training a related behaviour. Our findings suggest that the novel Active Dendrite Network can make better use of limited network capacity compared to the MLP - the Active Dendrite Network outperformed the MLP by ∼18% on our benchmark using limited network capacity. When both networks have sufficient capacity however, there is not much difference between the two. We further find that Active Dendrite Networks have very similar transfer-learning capabilities compared to the MLP in our benchmarks.
|
93 |
REAL-TIME UPDATING AND NEAR-OPTIMAL ENERGY MANAGEMENT SYSTEM FOR MULTI-MODE ELECTRIFIED POWERTRAIN WITH REINFORCEMENT LEARNING CONTROLBiswas, Atriya January 2021 (has links)
Energy management systems (EMSs), implemented in the electronic control unit (ECU) of an actual vehicle with electri ed powertrain, is a much simpler version of the theoretically developed EMS. Such simpli cation is done to accommodate the EMS within the given memory constraint and computational capacity of the ECU. The simpli cation should ensure reasonable performance compared to theoretical EMS under real-life driving scenarios. The process of simpli cation must be effective to create a versatile and utilitarian EMS. The reinforcement learning-based controllers feature pro table characteristics in optimizing the performance of controllable physical systems as they do not mandatorily require a mathematical model of system dynamics (i.e. they are model-free). Quite naturally, it can aspired to testify such prowess of reinforcement learning-based controllers in achieving near-global optimal performance for energy management system (supervisory) of electri ed powertrains. Before deployment of any supervisory controller as a mainstream controller, they should be essentially scrutinized through various levels of virtual simulation platforms with an ascending order of physical system emulating-capability. The controller evolves from a mathematical concept to an utilitarian embedded system through a series of these levels where it undergoes gradual transformation to finally become apposite for a real physical system. Implementation of the control strategy in a Simulink-based forward simulation model could be the first stage of the aforementioned evolution process. This brief will delineate all the steps required for implementing an reinforcement learning-based supervisory controller in a forward simulation model of a hybrid electric vehicle. A novel framework of loss-minimization based instantaneous optimal strategy is introduced for the energy management system of a multi-mode hybrid electric powertrain in this brief. The loss-minimization strategy is flexible enough to be implemented in any architecture of electrified powertrains. It is mathematically proven that the overall system loss minimization is equivalent to the minimization of fuel consumption. An online simulation framework is developed in this article to evaluate the performance of a multi-mode electrified powertrain equipped with more than one power source. An electrically variable transmission with two planetary gear-set has been chosen as the centerpiece of the powertrain considering the versatility and future prospects of such transmissions. It is noteworthy to mention that a novel architecture topology selected for this dissertation is engendered through a series of rigorous screening process whose workflow is presented here with brevity.
One of the legitimate concern of multi-mode transmission is it's proclivity to contribute discontinuity of power-flow in the downstream of the powertrain. Mode-shift events can be predominantly held responsible for engendering such discontinuity. Advent of dynamic coordinated control as a technique for ameliorating such discontinuity has been substantiated by many scholars in literature. Hence, a system-level coordinated control is employed within the energy management system which governs the mode schedule of the multi-mode powertrain in real-time simulation. / Thesis / Doctor of Philosophy (PhD)
|
94 |
DEEP LEARNING BASED MODELS FOR NOVELTY ADAPTATION IN AUTONOMOUS MULTI-AGENT SYSTEMSMarina Wagdy Wadea Haliem (13121685) 20 July 2022 (has links)
<p>Autonomous systems are often deployed in dynamic environments and are challenged with unexpected changes (novelties) in the environments where they receive novel data that was not seen during training. Given the uncertainty, they should be able to operate without (or with limited) human intervention and they are expected to (1) Adapt to such changes while still being effective and efficient in performing their multiple tasks. The system should be able to provide continuous availability of its critical functionalities. (2) Make informed decisions independently from any central authority. (3) Be Cognitive: learns the new context, its possible actions, and be rich in knowledge discovery through mining and pattern recognition. (4) Be Reflexive: reacts to novel unknown data as well as to security threats without terminating on-going critical missions. These characteristics combine to create the workflow of autonomous decision-making process in multi-agent environments (i.e.,) any action taken by the system must go through these characteristic models to autonomously make an ideal decision based on the situation. </p>
<p><br></p>
<p>In this dissertation, we propose novel learning-based models to enhance the decision-making process in autonomous multi-agent systems where agents are able to detect novelties (i.e., unexpected changes in the environment), and adapt to it in a timely manner. For this purpose, we explore two complex and highly dynamic domains </p>
<p>(1) Transportation Networks (e.g., Ridesharing application): where we develop AdaPool: a novel distributed diurnal-adaptive decision-making framework for multi-agent autonomous vehicles using model-free deep reinforcement learning and change point detection. (2) Multi-agent games (e.g., Monopoly): for which we propose a hybrid approach that combines deep reinforcement learning (for frequent but complex decisions) with a fixed-policy approach (for infrequent but straightforward decisions) to facilitate decision-making and it is also adaptive to novelties. (3) Further, we present a domain agnostic approach for decision making without prior knowledge in dynamic environments using Bootstrapped DQN. Finally, to enhance security of autonomous multi-agent systems, (4) we develop a machine learning based resilience testing of address randomization moving target defense. Additionally, to further improve the decision-making process, we present (5) a novel framework for multi-agent deep covering option discovery that is designed to accelerate exploration (which is the first step of decision-making for autonomous agents), by identifying potential collaborative agents and encouraging visiting the under-represented states in their joint observation space. </p>
|
95 |
Physical Layer Security with Unmanned Aerial Vehicles for Advanced Wireless NetworksAbdalla, Aly Sabri 08 August 2023 (has links) (PDF)
Unmanned aerial vehicles (UAVs) are emerging as enablers for supporting many applications and services, such as precision agriculture, search and rescue, temporary network deployment, coverage extension, and security. UAVs are being considered for integration into emerging wireless networks as aerial users, aerial relays (ARs), or aerial base stations (ABSs). This dissertation proposes employing UAVs to contribute to physical layer techniques that enhance the security performance of advanced wireless networks and services in terms of availability, resilience, and confidentiality. The focus is on securing terrestrial cellular communications against eavesdropping with a cellular-connected UAV that is dispatched as an AR or ABS. The research develops mathematical tools and applies machine learning algorithms to jointly optimize UAV trajectory and advanced communication parameters for improving the secrecy rate of wireless links, covering various communication scenarios: static and mobile users, single and multiple users, and single and multiple eavesdroppers with and without knowledge of the location of attackers and their channel state information. The analysis is based on established air-to-ground and air-to-air channel models for single and multiple antenna systems while taking into consideration the limited on-board energy resources of cellular-connected UAVs. Simulation results show fast algorithm convergence and significant improvements in terms of channel secrecy capacity that can be achieved when UAVs assist terrestrial cellular networks as proposed here over state-of-the-art solutions. In addition, numerical results demonstrate that the proposed methods scale well with the number of users to be served and with different eavesdropping distributions. The presented solutions are wireless protocol agnostic, can complement traditional security principles, and can be extended to address other communication security and performance needs.
|
96 |
PhD ThesisJunghoon Kim (15348493) 26 April 2023 (has links)
<p> </p>
<p>In order to advance next-generation communication systems, it is critical to enhance the state-of-the-art communication architectures, such as device-to-device (D2D), multiple- input multiple-output (MIMO), and intelligent reflecting surface (IRS), in terms of achieving high data rate, low latency, and high energy efficiency. In the first part of this dissertation, we address joint learning and optimization methodologies on cutting-edge network archi- tectures. First, we consider D2D networks equipped with MIMO systems. In particular, we address the problem of minimizing the network overhead in D2D networks, defined as the sum of time and energy required for processing tasks at devices, through the design for MIMO beamforming and communication/computation resource allocation. Second, we address IRS-assisted communication systems. Specifically, we study an adaptive IRS control scheme considering realistic IRS reflection behavior and channel environments, and propose a novel adaptive codebook-based limited feedback protocol and learning-based solutions for codebook updates. </p>
<p><br></p>
<p>Furthermore, in order for revolutionary innovations to emerge for future generations of communications, it is crucial to explore and address fundamental, long-standing open problems for communications, such as the design of practical codes for a variety of important channel models. In the later part of this dissertation, we study the design of practical codes for feedback-enabled communication channels, i.e., feedback codes. The existing feedback codes, which have been developed over the past six decades, have been demonstrated to be vulnerable to high forward/feedback noises, due to the non-triviality of the design of feedback codes. We propose a novel recurrent neural network (RNN) autoencoder-based architecture to mitigate the susceptibility to high channel noises by incorporating domain knowledge into the design of the deep learning architecture. Using this architecture, we suggest a new class of non-linear feedback codes that increase robustness to forward/feedback noise in additive White Gaussian noise (AWGN) channels with feedback. </p>
|
97 |
[pt] APRENDIZADO POR REFORÇO PROFUNDO PARA CONTROLE DE TRAJETÓRIA DE UM QUADROTOR EM AMBIENTES VIRTUAIS / [en] DEEP REINFORCEMENT LEARNING FOR QUADROTOR TRAJECTORY CONTROL IN VIRTUAL ENVIRONMENTSGUILHERME SIQUEIRA EDUARDO 12 August 2021 (has links)
[pt] Com recentes avanços em poder computacional, o uso de novos modelos
de controle complexos se tornou viável para realizar o controle de quadrotores.
Um destes métodos é o aprendizado por reforço profundo (do inglês, Deep
Reinforcement Learning, DRL), que pode produzir uma política de controle
que atende melhor as não-linearidades presentes no modelo do quadrotor que
um método de controle tradicional. Umas das não-linearidades importantes
presentes em veículos aéreos transportadores de carga são as propriedades
variantes no tempo, como tamanho e massa, causadas pela adição e remoção
de carga. A abordagem geral e domínio-agnóstica de um controlador por DRL
também o permite lidar com navegação visual, na qual a estimação de dados
de posição é incerta. Neste trabalho, aplicamos um algorítmo de Soft Actor-
Critic com o objeivo de projetar controladores para um quadrotor a fim de
realizar tarefas que reproduzem os desafios citados em um ambiente virtual.
Primeiramente, desenvolvemos dois controladores de condução por waypoint:
um controlador de baixo nível que atua diretamente em comandos para o motor
e um controlador de alto nível que interage em cascata com um controlador de
velocidade PID. Os controladores são então avaliados quanto à tarefa proposta
de coleta e alijamento de carga, que, dessa forma, introduz uma variável
variante no tempo. Os controladores concebidos são capazes de superar o
controlador clássico de posição PID com ganhos otimizados no curso proposto,
enquanto permanece agnóstico em relação a um conjunto de parâmetros de
simulação. Finalmente, aplicamos o mesmo algorítmo de DRL para desenvolver
um controlador que se utiliza de dados visuais para completar um curso de
corrida em uma simulação. Com este controlador, o quadrotor é capaz de
localizar portões utilizando uma câmera RGB-D e encontrar uma trajetória
que o conduz a atravessar o máximo possível de portões presentes no percurso. / [en] With recent advances in computational power, the use of novel, complex
control models has become viable for controlling quadrotors. One such method
is Deep Reinforcement Learning (DRL), which can devise a control policy
that better addresses non-linearities in the quadrotor model than traditional
control methods. An important non-linearity present in payload carrying air
vehicles are the inherent time-varying properties, such as size and mass,
caused by the addition and removal of cargo. The general, domain-agnostic
approach of the DRL controller also allows it to handle visual navigation,
in which position estimation data is unreliable. In this work, we employ a
Soft Actor-Critic algorithm to design controllers for a quadrotor to carry out
tasks reproducing the mentioned challenges in a virtual environment. First,
we develop two waypoint guidance controllers: a low-level controller that acts
directly on motor commands and a high-level controller that interacts in
cascade with a velocity PID controller. The controllers are then evaluated
on the proposed payload pickup and drop task, thereby introducing a timevarying
variable. The controllers conceived are able to outperform a traditional
positional PID controller with optimized gains in the proposed course, while
remaining agnostic to a set of simulation parameters. Finally, we employ the
same DRL algorithm to develop a controller that can leverage visual data to
complete a racing course in simulation. With this controller, the quadrotor is
able to localize gates using an RGB-D camera and devise a trajectory that
drives it to traverse as many gates in the racing course as possible.
|
98 |
Simulating market maker behaviour using Deep Reinforcement Learning to understand market microstructure / En simulering av aktiemarknadens mikrostruktur via självlärande finansiella agenterMarcus, Elwin January 2018 (has links)
Market microstructure studies the process of exchanging assets underexplicit trading rules. With algorithmic trading and high-frequencytrading, modern financial markets have seen profound changes in marketmicrostructure in the last 5 to 10 years. As a result, previously establishedmethods in the field of market microstructure becomes oftenfaulty or insufficient. Machine learning and, in particular, reinforcementlearning has become more ubiquitous in both finance and otherfields today with applications in trading and optimal execution. This thesisuses reinforcement learning to understand market microstructureby simulating a stock market based on NASDAQ Nordics and trainingmarket maker agents on this stock market. Simulations are run on both a dealer market and a limit orderbook marketdifferentiating it from previous studies. Using DQN and PPO algorithmson these simulated environments, where stochastic optimal controltheory has been mainly used before. The market maker agents successfullyreproduce stylized facts in historical trade data from each simulation,such as mean reverting prices and absence of linear autocorrelationsin price changes as well as beating random policies employed on thesemarkets with a positive profit & loss of maximum 200%. Other tradingdynamics in real-world markets have also been exhibited via theagents interactions, mainly: bid-ask spread clustering, optimal inventorymanagement, declining spreads and independence of inventory and spreads, indicating that using reinforcement learning with PPO and DQN arerelevant choices when modelling market microstructure. / Marknadens mikrostruktur studerar hur utbytet av finansiella tillgångar sker enligt explicita regler. Algoritmisk och högfrekvenshandel har förändrat moderna finansmarknaders strukturer under de senaste 5 till 10 åren. Detta har även påverkat pålitligheten hos tidigare använda metoder från exempelvis ekonometri för att studera marknadens mikrostruktur. Maskininlärning och Reinforcement Learning har blivit mer populära, med många olika användningsområden både inom finans och andra fält. Inom finansfältet har dessa typer av metoder använts främst inom handel och optimal exekvering av ordrar. I denna uppsats kombineras både Reinforcement Learning och marknadens mikrostruktur, för att simulera en aktiemarknad baserad på NASDAQ i Norden. Där tränas market maker - agenter via Reinforcement Learning med målet att förstå marknadens mikrostruktur som uppstår via agenternas interaktioner. I denna uppsats utvärderas och testas agenterna på en dealer – marknad tillsammans med en limit - orderbok. Vilket särskiljer denna studie tillsammans med de två algoritmerna DQN och PPO från tidigare studier. Främst har stokastisk optimering använts för liknande problem i tidigare studier. Agenterna lyckas framgångsrikt med att återskapa egenskaper hos finansiella tidsserier som återgång till medelvärdet och avsaknad av linjär autokorrelation. Agenterna lyckas också med att vinna över slumpmässiga strategier, med maximal vinst på 200%. Slutgiltigen lyckas även agenterna med att visa annan handelsdynamik som förväntas ske på en verklig marknad. Huvudsakligen: kluster av spreads, optimal hantering av aktielager och en minskning av spreads under simuleringarna. Detta visar att Reinforcement Learning med PPO eller DQN är relevanta val vid modellering av marknadens mikrostruktur.
|
99 |
Access Point Selection and Clustering Methods with Minimal Switching for Green Cell-Free Massive MIMO NetworksHe, Qinglong January 2022 (has links)
As a novel beyond fifth-generation (5G) concept, cell-free massive MIMO (multiple-input multiple-output) recently has become a promising physical-layer technology where an enormous number of distributed access points (APs), coordinated by a central processing unit (CPU), cooperate to coherently serve a large number of user equipments (UEs) in the same time/frequency resource. However, denser AP deployment in cell-free networks as well as an exponentially growing number of mobile UEs lead to higher power consumption. What is more, similar to conventional cellular networks, cell-free massive MIMO networks are dimensioned to provide the required quality of service (QoS) to the UEs under heavy traffic load conditions, and thus they might be underutilized during low traffic load periods, leading to inefficient use of both spectral and energy resources. Aiming at the implementation of energy-efficient cell-free networks, several approaches have been proposed in the literature, which consider different AP switch ON/OFF (ASO) strategies for power minimization. Different from prior works, this thesis focuses on additional factors other than ASO that have an adverse effect not only on total power consumption but also on implementation complexity and operation cost. For instance, too frequent ON/OFF switching in an AP can lead to tapering off the potential power saving of ASO by incurring extra power consumption due to excessive switching. Indeed, frequent switching of APs might also result in thermal fatigue and serious lifetime degeneration. Moreover, time variations in the AP-UE association in favor of energy saving in a dynamic network bring additional signaling and implementation complexity. Thus, in the first part of the thesis, we propose a multi-objective optimization problem that aims to minimize the total power consumption together with AP switching and AP-UE association variations in comparison to the state of the network in the previous state. The proposed problem is cast in mixed integer quadratic programming form and solved optimally. Our simulation results show that by limiting AP switching (node switching) and AP-UE association reformation switching (link switching), the total power consumption from APs only slightly increases but the number of average switching drops significantly regardless of node switching or link switching. It achieves a good balance on the trade-off between radio power consumption and the side effects excessive switching will bring. In the second part of the thesis, we consider a larger cell-free massive MIMO network by dividing the total area into disjoint network-centric clusters, where the APs in each cluster are connected to a separate CPU. In each cluster, cell-free joint transmission is locally implemented to achieve a scalable network implementation. Motivated by the outcomes of the first part, we reshape our dynamic network simulator to keep the active APs for a given spatial traffic pattern the same as long as the mean arrival rates of the UEs are constant. Moreover, the initially formed AP-UE association for a particular UE is not allowed to change. In that way, we make the number of node and link switching zero throughout the considered time interval. For this dynamic network, we propose a deep reinforcement learning (DRL) framework that learns the policy of maximizing long-term energy efficiency (EE) for a given spatially-varying traffic density. The active AP density of each network-centric cluster and the boundaries of the clusters are learned by the trained agent to maximize the EE. The DRL algorithm is shown to learn a non-trivial joint cluster geometry and AP density with at least 7% improvement in terms of EE compared to the heuristically-developed benchmarks. / Som ett nytt koncept bortom den femte generationen (5G) har cellfri massiv MIMO (multiple input multiple output) nyligen blivit en lovande teknik för det fysiska lagret där ett enormt antal distribuerade åtkomstpunkter (AP), som samordnas av en central processorenhet (CPU), samarbetar för att på ett sammanhängande sätt betjäna ett stort antal användarutrustningar (UE) i samma tids- och frekvensresurs. En tätare utplacering av AP:er i cellfria nät samt ett exponentiellt växande antal mobila användare leder dock till högre energiförbrukning. Dessutom är cellfria massiva MIMO-nät, i likhet med konventionella cellulära nät, dimensionerade för att ge den erforderliga tjänstekvaliteten (QoS) till enheterna under förhållanden med hög trafikbelastning, och därför kan de vara underutnyttjade under perioder med låg trafikbelastning, vilket leder till ineffektiv användning av både spektral- och energiresurser. För att genomföra energieffektiva cellfria nät har flera metoder föreslagits i litteraturen, där olika ASO-strategier (AP switch ON/OFF) beaktas för att minimera energiförbrukningen. Till skillnad från tidigare arbeten fokuserar den här avhandlingen på andra faktorer än ASO som har en negativ effekt inte bara på den totala energiförbrukningen utan också på komplexiteten i genomförandet och driftskostnaden. Till exempel kan alltför frekventa ON/OFF-omkopplingar i en AP leda till att ASO:s potentiella energibesparingar avtar genom extra energiförbrukning på grund av överdriven omkoppling. Frekventa omkopplingar av AP:er kan också leda till termisk trötthet och allvarlig försämring av livslängden. Dessutom medför tidsvariationer i AP-UE-associationen till förmån för energibesparingar i ett dynamiskt nät ytterligare signalering och komplexitet i genomförandet. I den första delen av avhandlingen föreslår vi därför ett optimeringsproblem med flera mål som syftar till att minimera den totala energiförbrukningen tillsammans med växling av AP och variationer i AP-UE-associationen i jämförelse med nätets tillstånd i det föregående läget. Det föreslagna problemet är en blandad helhetsmässig kvadratisk programmering och löses optimalt. Våra simuleringsresultat visar att genom att begränsa växling av AP (node switching) och växling av AP-UE-association (link switching) ökar den totala energiförbrukningen från AP:erna endast något, men antalet genomsnittliga växlingar ökar, oavsett om det rör sig om node switching eller link switching. Det ger en bra balans mellan radiokraftförbrukning och de bieffekter som överdriven växling medför. I den andra delen av avhandlingen tar vi hänsyn till ett större cellfritt massivt MIMO-nätverk genom att dela upp det totala området i disjunkta nätverkscentrerade kluster, där AP:erna i varje kluster är anslutna till en separat CPU. I varje kluster genomförs cellfri gemensam överföring lokalt för att uppnå en skalbar nätverksimplementering. Motiverat av resultaten i den första delen omformar vi vår dynamiska nätverkssimulator så att de aktiva AP:erna för ett givet rumsligt trafikmönster är desamma så länge som den genomsnittliga ankomsthastigheten för de enskilda enheterna är konstant. Dessutom tillåts inte den ursprungligen bildade AP-UE-associationen för en viss användare att förändras. På så sätt gör vi antalet nod- och länkbyten till noll under hela det aktuella tidsintervallet. För detta dynamiska nätverk föreslår vi ett ramverk för djup förstärkningsinlärning (DRL) som lär sig en strategi för att maximera energieffektiviteten på lång sikt för en given rumsligt varierande trafiktäthet. Den aktiva AP-tätheten i varje nätverkscentrerat kluster och klustrens gränser lärs av den utbildade agenten för att maximera EE. Det visas att DRL-algoritmen lär sig en icke-trivial gemensam klustergeometri och AP-täthet med minst 7% förbättring av EE jämfört med de heuristiskt utvecklade riktmärkena.
|
100 |
[en] ENABLING AUTONOMOUS DATA ANNOTATION: A HUMAN-IN-THE-LOOP REINFORCEMENT LEARNING APPROACH / [pt] HABILITANDO ANOTAÇÕES DE DADOS AUTÔNOMOS: UMA ABORDAGEM DE APRENDIZADO POR REFORÇO COM HUMANO NO LOOPLEONARDO CARDIA DA CRUZ 10 November 2022 (has links)
[pt] As técnicas de aprendizado profundo têm mostrado contribuições significativas em vários campos, incluindo a análise de imagens. A grande maioria
dos trabalhos em visão computacional concentra-se em propor e aplicar
novos modelos e algoritmos de aprendizado de máquina. Para tarefas de
aprendizado supervisionado, o desempenho dessas técnicas depende de uma
grande quantidade de dados de treinamento, bem como de dados rotulados. No entanto, a rotulagem é um processo caro e demorado. Uma recente
área de exploração são as reduções dos esforços na preparação de dados,
deixando-os sem inconsistências, ruídos, para que os modelos atuais possam obter um maior desempenho. Esse novo campo de estudo é chamado
de Data-Centric IA. Apresentamos uma nova abordagem baseada em Deep
Reinforcement Learning (DRL), cujo trabalho é voltado para a preparação
de um conjunto de dados em problemas de detecção de objetos, onde as anotações de caixas delimitadoras são feitas de modo autônomo e econômico.
Nossa abordagem consiste na criação de uma metodologia para treinamento
de um agente virtual a fim de rotular automaticamente os dados, a partir do
auxílio humano como professor desse agente. Implementamos o algoritmo
Deep Q-Network para criar o agente virtual e desenvolvemos uma abordagem de aconselhamento para facilitar a comunicação do humano professor
com o agente virtual estudante. Para completar nossa implementação, utilizamos o método de aprendizado ativo para selecionar casos onde o agente
possui uma maior incerteza, necessitando da intervenção humana no processo de anotação durante o treinamento. Nossa abordagem foi avaliada
e comparada com outros métodos de aprendizado por reforço e interação
humano-computador, em diversos conjuntos de dados, onde o agente virtual precisou criar novas anotações na forma de caixas delimitadoras. Os
resultados mostram que o emprego da nossa metodologia impacta positivamente para obtenção de novas anotações a partir de um conjunto de dados
com rótulos escassos, superando métodos existentes. Desse modo, apresentamos a contribuição no campo de Data-Centric IA, com o desenvolvimento
de uma metodologia de ensino para criação de uma abordagem autônoma
com aconselhamento humano para criar anotações econômicas a partir de
anotações escassas. / [en] Deep learning techniques have shown significant contributions in various
fields, including image analysis. The vast majority of work in computer
vision focuses on proposing and applying new machine learning models
and algorithms. For supervised learning tasks, the performance of these
techniques depends on a large amount of training data and labeled data.
However, labeling is an expensive and time-consuming process.
A recent area of exploration is the reduction of efforts in data preparation,
leaving it without inconsistencies and noise so that current models can
obtain greater performance. This new field of study is called Data-Centric
AI. We present a new approach based on Deep Reinforcement Learning
(DRL), whose work is focused on preparing a dataset, in object detection
problems where the bounding box annotations are done autonomously and
economically. Our approach consists of creating a methodology for training
a virtual agent in order to automatically label the data, using human
assistance as a teacher of this agent.
We implemented the Deep Q-Network algorithm to create the virtual agent
and developed a counseling approach to facilitate the communication of the
human teacher with the virtual agent student. We used the active learning
method to select cases where the agent has more significant uncertainty,
requiring human intervention in the annotation process during training to
complete our implementation. Our approach was evaluated and compared
with other reinforcement learning methods and human-computer interaction
in different datasets, where the virtual agent had to create new annotations
in the form of bounding boxes. The results show that the use of our
methodology has a positive impact on obtaining new annotations from
a dataset with scarce labels, surpassing existing methods. In this way,
we present the contribution in the field of Data-Centric AI, with the
development of a teaching methodology to create an autonomous approach
with human advice to create economic annotations from scarce annotations.
|
Page generated in 0.1029 seconds