Global ETD Search

81	DEEP LEARNING BASED MODELS FOR NOVELTY ADAPTATION IN AUTONOMOUS MULTI-AGENT SYSTEMS Marina Wagdy Wadea Haliem (13121685) 20 July 2022 (has links) <p>Autonomous systems are often deployed in dynamic environments and are challenged with unexpected changes (novelties) in the environments where they receive novel data that was not seen during training. Given the uncertainty, they should be able to operate without (or with limited) human intervention and they are expected to (1) Adapt to such changes while still being effective and efficient in performing their multiple tasks. The system should be able to provide continuous availability of its critical functionalities. (2) Make informed decisions independently from any central authority. (3) Be Cognitive: learns the new context, its possible actions, and be rich in knowledge discovery through mining and pattern recognition. (4) Be Reflexive: reacts to novel unknown data as well as to security threats without terminating on-going critical missions. These characteristics combine to create the workflow of autonomous decision-making process in multi-agent environments (i.e.,) any action taken by the system must go through these characteristic models to autonomously make an ideal decision based on the situation. </p> <p><br></p> <p>In this dissertation, we propose novel learning-based models to enhance the decision-making process in autonomous multi-agent systems where agents are able to detect novelties (i.e., unexpected changes in the environment), and adapt to it in a timely manner. For this purpose, we explore two complex and highly dynamic domains </p> <p>(1) Transportation Networks (e.g., Ridesharing application): where we develop AdaPool: a novel distributed diurnal-adaptive decision-making framework for multi-agent autonomous vehicles using model-free deep reinforcement learning and change point detection. (2) Multi-agent games (e.g., Monopoly): for which we propose a hybrid approach that combines deep reinforcement learning (for frequent but complex decisions) with a fixed-policy approach (for infrequent but straightforward decisions) to facilitate decision-making and it is also adaptive to novelties. (3) Further, we present a domain agnostic approach for decision making without prior knowledge in dynamic environments using Bootstrapped DQN. Finally, to enhance security of autonomous multi-agent systems, (4) we develop a machine learning based resilience testing of address randomization moving target defense. Additionally, to further improve the decision-making process, we present (5) a novel framework for multi-agent deep covering option discovery that is designed to accelerate exploration (which is the first step of decision-making for autonomous agents), by identifying potential collaborative agents and encouraging visiting the under-represented states in their joint observation space. </p> Autonomous agents and multiagent systems Adversarial machine learning Deep learning Reinforcement learning Deep Reinforcement Learning (DRL) Autonomous Systems Transportation Networks Multi-agent Systems Adversarial Machine Learning Multi-agent Games Ridesharing Novelty Adaptation Novelty detection Deep Learning Open World
82	Physical Layer Security with Unmanned Aerial Vehicles for Advanced Wireless Networks Abdalla, Aly Sabri 08 August 2023 (has links) (PDF) Unmanned aerial vehicles (UAVs) are emerging as enablers for supporting many applications and services, such as precision agriculture, search and rescue, temporary network deployment, coverage extension, and security. UAVs are being considered for integration into emerging wireless networks as aerial users, aerial relays (ARs), or aerial base stations (ABSs). This dissertation proposes employing UAVs to contribute to physical layer techniques that enhance the security performance of advanced wireless networks and services in terms of availability, resilience, and confidentiality. The focus is on securing terrestrial cellular communications against eavesdropping with a cellular-connected UAV that is dispatched as an AR or ABS. The research develops mathematical tools and applies machine learning algorithms to jointly optimize UAV trajectory and advanced communication parameters for improving the secrecy rate of wireless links, covering various communication scenarios: static and mobile users, single and multiple users, and single and multiple eavesdroppers with and without knowledge of the location of attackers and their channel state information. The analysis is based on established air-to-ground and air-to-air channel models for single and multiple antenna systems while taking into consideration the limited on-board energy resources of cellular-connected UAVs. Simulation results show fast algorithm convergence and significant improvements in terms of channel secrecy capacity that can be achieved when UAVs assist terrestrial cellular networks as proposed here over state-of-the-art solutions. In addition, numerical results demonstrate that the proposed methods scale well with the number of users to be served and with different eavesdropping distributions. The presented solutions are wireless protocol agnostic, can complement traditional security principles, and can be extended to address other communication security and performance needs. Unmanned Aerial Vehicle (UAV) Physical Layer Security (PLS) Mobile Networks Secrecy Capacity Deep Reinforcement Learning (DRL) Multi-Variable Optimization Beamforming Multiuser Multiple Input Single Output (MISO) Trajectory Optimization Digital Communications and Networking Systems and Communications
83	PhD Thesis Junghoon Kim (15348493) 26 April 2023 (has links) <p> </p> <p>In order to advance next-generation communication systems, it is critical to enhance the state-of-the-art communication architectures, such as device-to-device (D2D), multiple- input multiple-output (MIMO), and intelligent reflecting surface (IRS), in terms of achieving high data rate, low latency, and high energy efficiency. In the first part of this dissertation, we address joint learning and optimization methodologies on cutting-edge network archi- tectures. First, we consider D2D networks equipped with MIMO systems. In particular, we address the problem of minimizing the network overhead in D2D networks, defined as the sum of time and energy required for processing tasks at devices, through the design for MIMO beamforming and communication/computation resource allocation. Second, we address IRS-assisted communication systems. Specifically, we study an adaptive IRS control scheme considering realistic IRS reflection behavior and channel environments, and propose a novel adaptive codebook-based limited feedback protocol and learning-based solutions for codebook updates. </p> <p><br></p> <p>Furthermore, in order for revolutionary innovations to emerge for future generations of communications, it is crucial to explore and address fundamental, long-standing open problems for communications, such as the design of practical codes for a variety of important channel models. In the later part of this dissertation, we study the design of practical codes for feedback-enabled communication channels, i.e., feedback codes. The existing feedback codes, which have been developed over the past six decades, have been demonstrated to be vulnerable to high forward/feedback noises, due to the non-triviality of the design of feedback codes. We propose a novel recurrent neural network (RNN) autoencoder-based architecture to mitigate the susceptibility to high channel noises by incorporating domain knowledge into the design of the deep learning architecture. Using this architecture, we suggest a new class of non-linear feedback codes that increase robustness to forward/feedback noise in additive White Gaussian noise (AWGN) channels with feedback. </p> Data communications beamforming Device-to-Device (D2D) intelligent reflecting surface (IRS) Reconfigurable intelligent surface (RIS) Deep Reinforcement Learning (DRL) feedback systems Channel coding Recurrent Neural Network (RNN)
84	[en] A SIMULATION STUDY OF TRANSFER LEARNING IN DEEP REINFORCEMENT LEARNING FOR ROBOTICS / [pt] UM ESTUDO DE TRANSFER LEARNING EM DEEP REINFORCEMENT LEARNING EM AMBIENTES ROBÓTICOS SIMULADOS EVELYN CONCEICAO SANTOS BATISTA 05 August 2020 (has links) [pt] Esta dissertação de mestrado consiste em um estudo avançado sobre aprendizado profundo por reforço visual para robôs autônomos através de técnicas de transferência de aprendizado. Os ambientes de simulação testados neste estudo são ambientes realistas complexos onde o robô tinha como desafio aprender e transferir conhecimento em diferentes contextos para aproveitar a experiência de ambientes anteriores em ambientes futuros. Este tipo de abordagem, além de agregar conhecimento ao robô autônomo, diminui o número de épocas de treinamento do algoritmo, mesmo em ambientes complexos, justificando o uso de técnicas de transferência de aprendizado. / [en] This master s thesis consists of an advanced study on deep learning by visual reinforcement for autonomous robots through transfer learning techniques. The simulation environments tested in this study are highly realistic environments where the challenge of the robot was to learn and tranfer knowledge in different contexts to take advantage of the experiencia of previous environments in future environments. This type of approach besides adding knowledge to the autonomous robot reduces the number of training epochs the algorithm, even in complex environments, justifying the use of transfer learning techniques. [pt] REDE NEURAL CONVOLUCIONAL [pt] AMBIENTES COMPLEXOS [pt] APRENDIZADO COM REFORCO PROFUNDO [pt] ROBO AUTONOMO [pt] TRANSFERENCIA DE APRENDIZADO [en] CONVOLUTIONAL NEURAL NETWORK [en] COMPLEX ENVIRONMENTS [en] DEEP REINFORCEMENT LEARNING [en] AUTONOMOUS ROBOT [en] TRANSFER LEARNING
85	[pt] APRENDIZADO POR REFORÇO PROFUNDO PARA CONTROLE DE TRAJETÓRIA DE UM QUADROTOR EM AMBIENTES VIRTUAIS / [en] DEEP REINFORCEMENT LEARNING FOR QUADROTOR TRAJECTORY CONTROL IN VIRTUAL ENVIRONMENTS GUILHERME SIQUEIRA EDUARDO 12 August 2021 (has links) [pt] Com recentes avanços em poder computacional, o uso de novos modelos de controle complexos se tornou viável para realizar o controle de quadrotores. Um destes métodos é o aprendizado por reforço profundo (do inglês, Deep Reinforcement Learning, DRL), que pode produzir uma política de controle que atende melhor as não-linearidades presentes no modelo do quadrotor que um método de controle tradicional. Umas das não-linearidades importantes presentes em veículos aéreos transportadores de carga são as propriedades variantes no tempo, como tamanho e massa, causadas pela adição e remoção de carga. A abordagem geral e domínio-agnóstica de um controlador por DRL também o permite lidar com navegação visual, na qual a estimação de dados de posição é incerta. Neste trabalho, aplicamos um algorítmo de Soft Actor- Critic com o objeivo de projetar controladores para um quadrotor a fim de realizar tarefas que reproduzem os desafios citados em um ambiente virtual. Primeiramente, desenvolvemos dois controladores de condução por waypoint: um controlador de baixo nível que atua diretamente em comandos para o motor e um controlador de alto nível que interage em cascata com um controlador de velocidade PID. Os controladores são então avaliados quanto à tarefa proposta de coleta e alijamento de carga, que, dessa forma, introduz uma variável variante no tempo. Os controladores concebidos são capazes de superar o controlador clássico de posição PID com ganhos otimizados no curso proposto, enquanto permanece agnóstico em relação a um conjunto de parâmetros de simulação. Finalmente, aplicamos o mesmo algorítmo de DRL para desenvolver um controlador que se utiliza de dados visuais para completar um curso de corrida em uma simulação. Com este controlador, o quadrotor é capaz de localizar portões utilizando uma câmera RGB-D e encontrar uma trajetória que o conduz a atravessar o máximo possível de portões presentes no percurso. / [en] With recent advances in computational power, the use of novel, complex control models has become viable for controlling quadrotors. One such method is Deep Reinforcement Learning (DRL), which can devise a control policy that better addresses non-linearities in the quadrotor model than traditional control methods. An important non-linearity present in payload carrying air vehicles are the inherent time-varying properties, such as size and mass, caused by the addition and removal of cargo. The general, domain-agnostic approach of the DRL controller also allows it to handle visual navigation, in which position estimation data is unreliable. In this work, we employ a Soft Actor-Critic algorithm to design controllers for a quadrotor to carry out tasks reproducing the mentioned challenges in a virtual environment. First, we develop two waypoint guidance controllers: a low-level controller that acts directly on motor commands and a high-level controller that interacts in cascade with a velocity PID controller. The controllers are then evaluated on the proposed payload pickup and drop task, thereby introducing a timevarying variable. The controllers conceived are able to outperform a traditional positional PID controller with optimized gains in the proposed course, while remaining agnostic to a set of simulation parameters. Finally, we employ the same DRL algorithm to develop a controller that can leverage visual data to complete a racing course in simulation. With this controller, the quadrotor is able to localize gates using an RGB-D camera and devise a trajectory that drives it to traverse as many gates in the racing course as possible. [pt] VEICULO AEREO NAO TRIPULADO [pt] NAVEGACAO VISUAL [pt] SOFT ACTOR-CRITIC-SAC [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] CONTROLE DE QUADROTOR [en] UNMANNED AERIAL VEHICLE [en] VISUAL NAVIGATION [en] SOFT ACTOR-CRITIC-SAC [en] DEEP REINFORCEMENT LEARNING [en] QUADROTOR CONTROL
86	Simulating market maker behaviour using Deep Reinforcement Learning to understand market microstructure / En simulering av aktiemarknadens mikrostruktur via självlärande finansiella agenter Marcus, Elwin January 2018 (has links) Market microstructure studies the process of exchanging assets underexplicit trading rules. With algorithmic trading and high-frequencytrading, modern financial markets have seen profound changes in marketmicrostructure in the last 5 to 10 years. As a result, previously establishedmethods in the field of market microstructure becomes oftenfaulty or insufficient. Machine learning and, in particular, reinforcementlearning has become more ubiquitous in both finance and otherfields today with applications in trading and optimal execution. This thesisuses reinforcement learning to understand market microstructureby simulating a stock market based on NASDAQ Nordics and trainingmarket maker agents on this stock market. Simulations are run on both a dealer market and a limit orderbook marketdifferentiating it from previous studies. Using DQN and PPO algorithmson these simulated environments, where stochastic optimal controltheory has been mainly used before. The market maker agents successfullyreproduce stylized facts in historical trade data from each simulation,such as mean reverting prices and absence of linear autocorrelationsin price changes as well as beating random policies employed on thesemarkets with a positive profit & loss of maximum 200%. Other tradingdynamics in real-world markets have also been exhibited via theagents interactions, mainly: bid-ask spread clustering, optimal inventorymanagement, declining spreads and independence of inventory and spreads, indicating that using reinforcement learning with PPO and DQN arerelevant choices when modelling market microstructure. / Marknadens mikrostruktur studerar hur utbytet av finansiella tillgångar sker enligt explicita regler. Algoritmisk och högfrekvenshandel har förändrat moderna finansmarknaders strukturer under de senaste 5 till 10 åren. Detta har även påverkat pålitligheten hos tidigare använda metoder från exempelvis ekonometri för att studera marknadens mikrostruktur. Maskininlärning och Reinforcement Learning har blivit mer populära, med många olika användningsområden både inom finans och andra fält. Inom finansfältet har dessa typer av metoder använts främst inom handel och optimal exekvering av ordrar. I denna uppsats kombineras både Reinforcement Learning och marknadens mikrostruktur, för att simulera en aktiemarknad baserad på NASDAQ i Norden. Där tränas market maker - agenter via Reinforcement Learning med målet att förstå marknadens mikrostruktur som uppstår via agenternas interaktioner. I denna uppsats utvärderas och testas agenterna på en dealer – marknad tillsammans med en limit - orderbok. Vilket särskiljer denna studie tillsammans med de två algoritmerna DQN och PPO från tidigare studier. Främst har stokastisk optimering använts för liknande problem i tidigare studier. Agenterna lyckas framgångsrikt med att återskapa egenskaper hos finansiella tidsserier som återgång till medelvärdet och avsaknad av linjär autokorrelation. Agenterna lyckas också med att vinna över slumpmässiga strategier, med maximal vinst på 200%. Slutgiltigen lyckas även agenterna med att visa annan handelsdynamik som förväntas ske på en verklig marknad. Huvudsakligen: kluster av spreads, optimal hantering av aktielager och en minskning av spreads under simuleringarna. Detta visar att Reinforcement Learning med PPO eller DQN är relevanta val vid modellering av marknadens mikrostruktur. Deep Reinforcement Learning Machine Learning Market Microstructure Market Maker Financial Agent Agent Based Modelling Financial Artificial Markets Complex Systems Algorithmic Trading Tensorforce keras-RL PPO DQN Dealer Market Limit Order book Computer Sciences Datavetenskap (datalogi)
87	Job shop smart manufacturing scheduling by deep reinforcement learning for Industry 4.0 Serrano Ruiz, Julio César 24 January 2025 (has links) Tesis por compendio / [ES] El paradigma de la Industria 4.0 (I4.0) gravita en gran medida sobre el potencial de las tecnologías de la información y la comunicación (TIC) para mejorar la competitividad y sostenibilidad de las industrias. El concepto de Smart Manufacturing Scheduling (SMS) surge y se inspira de ese potencial. SMS, como estrategia de transformación digital, aspira a optimizar los procesos industriales mediante la aplicación de tecnologías como el gemelo digital o digital twin (DT), el modelo de gestión zero-defect manufacturing (ZDM), y el aprendizaje por refuerzo profundo o deep reinforcement learning (DRL), con el propósito final de orientar los procesos de programación de operaciones hacia una automatización adaptativa en tiempo real y una reducción de las perturbaciones en los sistemas de producción. SMS se basa en cuatro principios de diseño del espectro I4.0: automatización, autonomía, capacidad de acción en tiempo real e interoperabilidad. A partir de estos principios clave, SMS combina las capacidades de la tecnología DT para simular, analizar y predecir; la del modelo ZDM para prevenir perturbaciones en los sistemas de planificación y control de la producción; y la del enfoque de modelado DRL para mejorar la toma de decisiones en tiempo real. Este enfoque conjunto orienta los procesos de programación de operaciones hacia una mayor eficiencia y, con ello, hacia un mayor rendimiento y resiliencia del sistema productivo. Esta investigación emprende, en primer lugar, una revisión exhaustiva del estado del arte sobre SMS. Con la revisión efectuada como referencia, la investigación plantea un modelo conceptual de SMS como estrategia de transformación digital en el contexto del proceso de programación del taller de trabajos o job shop. Finalmente, la investigación propone un modelo basado en DRL para abordar la implementación de los elementos clave del modelo conceptual: el DT del taller de trabajos y el agente programador. Los algoritmos que integran este modelo se han programado en Python y han sido validados contra varias de las más conocidas reglas heurísticas de prioridad. El desarrollo del modelo y los algoritmos supone una contribución académica y gerencial en el área de la planificación y control de la producción. / [CA] El paradigma de la Indústria 4.0 (I4.0) gravita en gran mesura sobre el potencial de les tecnologies de la informació i la comunicació (TIC) per millorar la competitivitat i la sostenibilitat de les indústries. El concepte d'smart manufacturing scheduling (SMS) sorgeix i inspira a partir d'aquest potencial. SMS, com a estratègia de transformació digital, aspira a optimitzar els processos industrials mitjançant l'aplicació de tecnologies com el bessó digital o digital twin (DT), el model de gestió zero-defect manufacturing (ZDM), i l'aprenentatge per reforçament profund o deep reinforcement learning (DRL), amb el propòsit final dorientar els processos de programació doperacions cap a una automatització adaptativa en temps real i una reducció de les pertorbacions en els sistemes de producció. SMS es basa en quatre principis de disseny de l'espectre I4.0: automatització, autonomia, capacitat d¿acció en temps real i interoperabilitat. A partir d'aquests principis clau, SMS combina les capacitats de la tecnologia DT per simular, analitzar i predir; la del model ZDM per prevenir pertorbacions en els sistemes de planificació i control de la producció; i la de de l'enfocament de modelatge DRL per millorar la presa de decisions en temps real. Aquest enfocament conjunt orienta els processos de programació d'operacions cap a una eficiència més gran i, amb això, cap a un major rendiment i resiliència del sistema productiu. Aquesta investigació emprèn, en primer lloc, una exhaustiva revisió de l'estat de l'art sobre SMS. Amb la revisió efectuada com a referència, la investigació planteja un model conceptual de SMS com a estratègia de transformació digital en el context del procés de programació del taller de treballs o job shop. Finalment, la investigació proposa un model basat en DRL per abordar la implementació dels elements claus del model conceptual: el DT del taller de treballs i l'agent programador. Els algorismes que integren aquest model s'han programat a Python i han estat validats contra diverses de les més conegudes regles heurístiques de prioritat. El desenvolupament del model i els algorismes suposa una contribució a nivell acadèmic i gerencial a l'àrea de la planificació i control de la producció. / [EN] The Industry 4.0 (I4.0) paradigm relies, to a large extent, on the potential of information and communication technologies (ICT) to improve the competitiveness and sustainability of industries. The smart manufacturing scheduling (SMS) concept arises and draws inspiration from this potential. As a digital transformation strategy, SMS aims to optimise industrial processes through the application of technologies, such as the digital twin (DT), the zero-defect manufacturing (ZDM) management model and deep reinforcement learning (DRL), for the ultimate purpose of guiding operations scheduling processes towards real-time adaptive automation and to reduce disturbances in production systems. SMS is based on four design principles of the I4.0 spectrum: automation, autonomy, real-time capability and interoperability. Based on these key principles, SMS combines the capabilities of the DT technology to simulate, analyse and predict; with the ZDM model, to prevent disturbances in production planning and control systems; by the DRL modelling approach, to improve real-time decision making. This joint approach orients operations scheduling processes towards greater efficiency and, with it, a better performing and more resilient production system. This research firstly undertakes a comprehensive review of the state of the art on SMS. By taking the review as a reference, the research proposes a conceptual model of SMS as a digital transformation strategy in the job shop scheduling process context. Finally, it proposes a DRL-based model to address the implementation of the key elements of the conceptual model: the job shop DT and the scheduling agent. The algorithms that integrate this model have been programmed in Python and validated against several of the most well-known heuristic priority rules. The development of the model and algorithms is an academic and managerial contribution in the production planning and control area. / This thesis was developed with the support of the Research Centre on Production Management and Engineering (CIGIP) of the Universitat Politècnica de València and received funding from: the European Union H2020 programme under grant agreement No. 825631, “Zero Defect Manufacturing Platform (ZDMP)”; the European Union H2020 programme under grant agreement No. 872548, "Fostering DIHs for Embedding Interoperability in Cyber-Physical Systems of European SMEs (DIH4CPS)"; the European Union H2020 programme under grant agreement No. 958205, “Industrial Data Services for Quality Control in Smart Manufacturing (i4Q)”; the European Union Horizon Europe programme under grant agreement No. 101057294, “AI Driven Industrial Equipment Product Life Cycle Boosting Agility, Sustainability and Resilience” (AIDEAS); the Spanish Ministry of Science, Innovation and Universities under grant agreement RTI2018-101344-B-I00, "Optimisation of zero-defects production technologies enabling supply chains 4.0 (CADS4.0)"; the Valencian Regional Government, in turn funded from grant RTI2018- 101344-B-I00 by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”, "Industrial Production and Logistics optimization in Industry 4.0" (i4OPT) (Ref. PROMETEO/2021/065); and the grant PDC2022-133957- I00, “Validation of transferable results of optimisation of zero-defect enabling production technologies for supply chain 4.0” (CADS4.0-II) funded by MCIN/AEI/10.13039/501100011033 and by European Union Next GenerationEU/PRTR. / Serrano Ruiz, JC. (2024). Job shop smart manufacturing scheduling by deep reinforcement learning for Industry 4.0 [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202871 / Compendio Fabricación inteligente Programación inteligente Industria 4.0 Taller de trabajo Aprendizaje de refuerzo profundo Gemelo digital Fabricación sin defectos Smart manufacturing scheduling Industry 4.0 Job shop Deep reinforcement learning Digital twin Zero-defect manufacturing ORGANIZACION DE EMPRESAS
88	Access Point Selection and Clustering Methods with Minimal Switching for Green Cell-Free Massive MIMO Networks He, Qinglong January 2022 (has links) As a novel beyond fifth-generation (5G) concept, cell-free massive MIMO (multiple-input multiple-output) recently has become a promising physical-layer technology where an enormous number of distributed access points (APs), coordinated by a central processing unit (CPU), cooperate to coherently serve a large number of user equipments (UEs) in the same time/frequency resource. However, denser AP deployment in cell-free networks as well as an exponentially growing number of mobile UEs lead to higher power consumption. What is more, similar to conventional cellular networks, cell-free massive MIMO networks are dimensioned to provide the required quality of service (QoS) to the UEs under heavy traffic load conditions, and thus they might be underutilized during low traffic load periods, leading to inefficient use of both spectral and energy resources. Aiming at the implementation of energy-efficient cell-free networks, several approaches have been proposed in the literature, which consider different AP switch ON/OFF (ASO) strategies for power minimization. Different from prior works, this thesis focuses on additional factors other than ASO that have an adverse effect not only on total power consumption but also on implementation complexity and operation cost. For instance, too frequent ON/OFF switching in an AP can lead to tapering off the potential power saving of ASO by incurring extra power consumption due to excessive switching. Indeed, frequent switching of APs might also result in thermal fatigue and serious lifetime degeneration. Moreover, time variations in the AP-UE association in favor of energy saving in a dynamic network bring additional signaling and implementation complexity. Thus, in the first part of the thesis, we propose a multi-objective optimization problem that aims to minimize the total power consumption together with AP switching and AP-UE association variations in comparison to the state of the network in the previous state. The proposed problem is cast in mixed integer quadratic programming form and solved optimally. Our simulation results show that by limiting AP switching (node switching) and AP-UE association reformation switching (link switching), the total power consumption from APs only slightly increases but the number of average switching drops significantly regardless of node switching or link switching. It achieves a good balance on the trade-off between radio power consumption and the side effects excessive switching will bring. In the second part of the thesis, we consider a larger cell-free massive MIMO network by dividing the total area into disjoint network-centric clusters, where the APs in each cluster are connected to a separate CPU. In each cluster, cell-free joint transmission is locally implemented to achieve a scalable network implementation. Motivated by the outcomes of the first part, we reshape our dynamic network simulator to keep the active APs for a given spatial traffic pattern the same as long as the mean arrival rates of the UEs are constant. Moreover, the initially formed AP-UE association for a particular UE is not allowed to change. In that way, we make the number of node and link switching zero throughout the considered time interval. For this dynamic network, we propose a deep reinforcement learning (DRL) framework that learns the policy of maximizing long-term energy efficiency (EE) for a given spatially-varying traffic density. The active AP density of each network-centric cluster and the boundaries of the clusters are learned by the trained agent to maximize the EE. The DRL algorithm is shown to learn a non-trivial joint cluster geometry and AP density with at least 7% improvement in terms of EE compared to the heuristically-developed benchmarks. / Som ett nytt koncept bortom den femte generationen (5G) har cellfri massiv MIMO (multiple input multiple output) nyligen blivit en lovande teknik för det fysiska lagret där ett enormt antal distribuerade åtkomstpunkter (AP), som samordnas av en central processorenhet (CPU), samarbetar för att på ett sammanhängande sätt betjäna ett stort antal användarutrustningar (UE) i samma tids- och frekvensresurs. En tätare utplacering av AP:er i cellfria nät samt ett exponentiellt växande antal mobila användare leder dock till högre energiförbrukning. Dessutom är cellfria massiva MIMO-nät, i likhet med konventionella cellulära nät, dimensionerade för att ge den erforderliga tjänstekvaliteten (QoS) till enheterna under förhållanden med hög trafikbelastning, och därför kan de vara underutnyttjade under perioder med låg trafikbelastning, vilket leder till ineffektiv användning av både spektral- och energiresurser. För att genomföra energieffektiva cellfria nät har flera metoder föreslagits i litteraturen, där olika ASO-strategier (AP switch ON/OFF) beaktas för att minimera energiförbrukningen. Till skillnad från tidigare arbeten fokuserar den här avhandlingen på andra faktorer än ASO som har en negativ effekt inte bara på den totala energiförbrukningen utan också på komplexiteten i genomförandet och driftskostnaden. Till exempel kan alltför frekventa ON/OFF-omkopplingar i en AP leda till att ASO:s potentiella energibesparingar avtar genom extra energiförbrukning på grund av överdriven omkoppling. Frekventa omkopplingar av AP:er kan också leda till termisk trötthet och allvarlig försämring av livslängden. Dessutom medför tidsvariationer i AP-UE-associationen till förmån för energibesparingar i ett dynamiskt nät ytterligare signalering och komplexitet i genomförandet. I den första delen av avhandlingen föreslår vi därför ett optimeringsproblem med flera mål som syftar till att minimera den totala energiförbrukningen tillsammans med växling av AP och variationer i AP-UE-associationen i jämförelse med nätets tillstånd i det föregående läget. Det föreslagna problemet är en blandad helhetsmässig kvadratisk programmering och löses optimalt. Våra simuleringsresultat visar att genom att begränsa växling av AP (node switching) och växling av AP-UE-association (link switching) ökar den totala energiförbrukningen från AP:erna endast något, men antalet genomsnittliga växlingar ökar, oavsett om det rör sig om node switching eller link switching. Det ger en bra balans mellan radiokraftförbrukning och de bieffekter som överdriven växling medför. I den andra delen av avhandlingen tar vi hänsyn till ett större cellfritt massivt MIMO-nätverk genom att dela upp det totala området i disjunkta nätverkscentrerade kluster, där AP:erna i varje kluster är anslutna till en separat CPU. I varje kluster genomförs cellfri gemensam överföring lokalt för att uppnå en skalbar nätverksimplementering. Motiverat av resultaten i den första delen omformar vi vår dynamiska nätverkssimulator så att de aktiva AP:erna för ett givet rumsligt trafikmönster är desamma så länge som den genomsnittliga ankomsthastigheten för de enskilda enheterna är konstant. Dessutom tillåts inte den ursprungligen bildade AP-UE-associationen för en viss användare att förändras. På så sätt gör vi antalet nod- och länkbyten till noll under hela det aktuella tidsintervallet. För detta dynamiska nätverk föreslår vi ett ramverk för djup förstärkningsinlärning (DRL) som lär sig en strategi för att maximera energieffektiviteten på lång sikt för en given rumsligt varierande trafiktäthet. Den aktiva AP-tätheten i varje nätverkscentrerat kluster och klustrens gränser lärs av den utbildade agenten för att maximera EE. Det visas att DRL-algoritmen lär sig en icke-trivial gemensam klustergeometri och AP-täthet med minst 7% förbättring av EE jämfört med de heuristiskt utvecklade riktmärkena. Cell-free massive MIMO multi-objective optimization deep reinforcement learning AP switch ON/OFF energy efficiency Cellfri massiv MIMO multiobjektiv optimering djup förstärkningsinlärning AP switch ON/OFF energieffektivitet Computer and Information Sciences Data- och informationsvetenskap
89	[en] ENABLING AUTONOMOUS DATA ANNOTATION: A HUMAN-IN-THE-LOOP REINFORCEMENT LEARNING APPROACH / [pt] HABILITANDO ANOTAÇÕES DE DADOS AUTÔNOMOS: UMA ABORDAGEM DE APRENDIZADO POR REFORÇO COM HUMANO NO LOOP LEONARDO CARDIA DA CRUZ 10 November 2022 (has links) [pt] As técnicas de aprendizado profundo têm mostrado contribuições significativas em vários campos, incluindo a análise de imagens. A grande maioria dos trabalhos em visão computacional concentra-se em propor e aplicar novos modelos e algoritmos de aprendizado de máquina. Para tarefas de aprendizado supervisionado, o desempenho dessas técnicas depende de uma grande quantidade de dados de treinamento, bem como de dados rotulados. No entanto, a rotulagem é um processo caro e demorado. Uma recente área de exploração são as reduções dos esforços na preparação de dados, deixando-os sem inconsistências, ruídos, para que os modelos atuais possam obter um maior desempenho. Esse novo campo de estudo é chamado de Data-Centric IA. Apresentamos uma nova abordagem baseada em Deep Reinforcement Learning (DRL), cujo trabalho é voltado para a preparação de um conjunto de dados em problemas de detecção de objetos, onde as anotações de caixas delimitadoras são feitas de modo autônomo e econômico. Nossa abordagem consiste na criação de uma metodologia para treinamento de um agente virtual a fim de rotular automaticamente os dados, a partir do auxílio humano como professor desse agente. Implementamos o algoritmo Deep Q-Network para criar o agente virtual e desenvolvemos uma abordagem de aconselhamento para facilitar a comunicação do humano professor com o agente virtual estudante. Para completar nossa implementação, utilizamos o método de aprendizado ativo para selecionar casos onde o agente possui uma maior incerteza, necessitando da intervenção humana no processo de anotação durante o treinamento. Nossa abordagem foi avaliada e comparada com outros métodos de aprendizado por reforço e interação humano-computador, em diversos conjuntos de dados, onde o agente virtual precisou criar novas anotações na forma de caixas delimitadoras. Os resultados mostram que o emprego da nossa metodologia impacta positivamente para obtenção de novas anotações a partir de um conjunto de dados com rótulos escassos, superando métodos existentes. Desse modo, apresentamos a contribuição no campo de Data-Centric IA, com o desenvolvimento de uma metodologia de ensino para criação de uma abordagem autônoma com aconselhamento humano para criar anotações econômicas a partir de anotações escassas. / [en] Deep learning techniques have shown significant contributions in various fields, including image analysis. The vast majority of work in computer vision focuses on proposing and applying new machine learning models and algorithms. For supervised learning tasks, the performance of these techniques depends on a large amount of training data and labeled data. However, labeling is an expensive and time-consuming process. A recent area of exploration is the reduction of efforts in data preparation, leaving it without inconsistencies and noise so that current models can obtain greater performance. This new field of study is called Data-Centric AI. We present a new approach based on Deep Reinforcement Learning (DRL), whose work is focused on preparing a dataset, in object detection problems where the bounding box annotations are done autonomously and economically. Our approach consists of creating a methodology for training a virtual agent in order to automatically label the data, using human assistance as a teacher of this agent. We implemented the Deep Q-Network algorithm to create the virtual agent and developed a counseling approach to facilitate the communication of the human teacher with the virtual agent student. We used the active learning method to select cases where the agent has more significant uncertainty, requiring human intervention in the annotation process during training to complete our implementation. Our approach was evaluated and compared with other reinforcement learning methods and human-computer interaction in different datasets, where the virtual agent had to create new annotations in the form of bounding boxes. The results show that the use of our methodology has a positive impact on obtaining new annotations from a dataset with scarce labels, surpassing existing methods. In this way, we present the contribution in the field of Data-Centric AI, with the development of a teaching methodology to create an autonomous approach with human advice to create economic annotations from scarce annotations. [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] ANOTACOES [pt] AGENTE VIRTUAL [pt] DEEP Q-NETWORK [pt] ACONSELHAMENTO [pt] CONJUNTO DE DADOS [pt] CAIXA DELIMITADORA [en] DEEP REINFORCEMENT LEARNING [en] ANNOTATIONS [en] VIRTUAL AGENT [en] DEEP Q-NETWORK [en] ADVICES [en] DATASET [en] BOUNDING BOX DATASETS
90	Real-time adaptation of robotic knees using reinforcement control Daníel Sigurðarson, Leifur January 2023 (has links) Microprocessor-controlled knees (MPK’s) allow amputees to walk with increasing ease and safety as technology progresses. As an amputee is fitted with a new MPK, the knee’s internal parameters are tuned to the user’s preferred settings in a controlled environment. These parameters determine various gait control settings, such as flexion target angle or swing extension resistance. Though these parameters may work well during the initial fitting, the MPK experiences various internal & external environmental changes throughout its life-cycle, such as product wear, changes in the amputee’s muscle strength, temperature changes, etc. This work investigates the feasibility of using a reinforcement learning (RL) control to adapt the MPK’s swing resistance to consistently induce the amputee’s preferred swing performance in realtime. Three gait features were identified as swing performance indicators for the RL algorithm. Results show that the RL control is able to learn and improve its tuning performance in terms of Mean Absolute Error over two 40-45 minute training sessions with a human-in-the-loop. Additionally, results show promise in using transfer learning to reduce strenuous RL training times. / Mikroprocessorkontrollerade knän (MPK) gör att amputerade kan utföra fysiska aktiviteter med ökad lätthet och säkerhet allt eftersom tekniken fortskrider. När en ny MPK monteras på en amputerad person, anpassas knäts interna parametrar till användarens i ett kontrollerad miljö. Dessa parametrar styr olika gångkontrollinställningar, såsom flexionsmålvinkel eller svängförlängningsmotstånd. Även om parametrarna kan fungera bra under den initiala anpassningen, upplever den MPK olika interna och yttre miljöförändringar under sin hela livscykel, till exempel produktslitage, förändringar i den amputerades muskelstyrka, temperaturförändringar, etc. Detta arbete undersöker möjligheten av, med hjälp av en förstärkningsinlärningskontroll (RL), att anpassa MPK svängmotstånd för att konsekvent inducera den amputerades föredragna svängprestanda i realtid. Tre gångegenskaper identifierades som svingprestandaindikatorer för RL-algoritmen. Resultaten visar att RL-kontrollen kan lära sig och förbättra sin inställningsprestanda i termer av Mean Absolute Error under två 40-45 minuters träningspass med en människa-i-loopen. Dessutom är resultaten lovande när det gäller att använda överföringsinlärning för att minska ansträngande RL-träningstider. Machine learning deep reinforcement learning transfer learning medical device prosthetic prosthesis controls human-in-the-loop Maskininlärning djup förstärkningsinlärning överföringsinlärning medicinsk utrustning protes kontroller människa-i-loopen Computer and Information Sciences Data- och informationsvetenskap

Search results