• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 82
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 114
  • 114
  • 114
  • 36
  • 24
  • 22
  • 22
  • 20
  • 20
  • 19
  • 19
  • 18
  • 18
  • 18
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

[en] A SIMULATION STUDY OF TRANSFER LEARNING IN DEEP REINFORCEMENT LEARNING FOR ROBOTICS / [pt] UM ESTUDO DE TRANSFER LEARNING EM DEEP REINFORCEMENT LEARNING EM AMBIENTES ROBÓTICOS SIMULADOS

EVELYN CONCEICAO SANTOS BATISTA 05 August 2020 (has links)
[pt] Esta dissertação de mestrado consiste em um estudo avançado sobre aprendizado profundo por reforço visual para robôs autônomos através de técnicas de transferência de aprendizado. Os ambientes de simulação testados neste estudo são ambientes realistas complexos onde o robô tinha como desafio aprender e transferir conhecimento em diferentes contextos para aproveitar a experiência de ambientes anteriores em ambientes futuros. Este tipo de abordagem, além de agregar conhecimento ao robô autônomo, diminui o número de épocas de treinamento do algoritmo, mesmo em ambientes complexos, justificando o uso de técnicas de transferência de aprendizado. / [en] This master s thesis consists of an advanced study on deep learning by visual reinforcement for autonomous robots through transfer learning techniques. The simulation environments tested in this study are highly realistic environments where the challenge of the robot was to learn and tranfer knowledge in different contexts to take advantage of the experiencia of previous environments in future environments. This type of approach besides adding knowledge to the autonomous robot reduces the number of training epochs the algorithm, even in complex environments, justifying the use of transfer learning techniques.
92

[pt] APRENDIZADO POR REFORÇO PROFUNDO PARA CONTROLE DE TRAJETÓRIA DE UM QUADROTOR EM AMBIENTES VIRTUAIS / [en] DEEP REINFORCEMENT LEARNING FOR QUADROTOR TRAJECTORY CONTROL IN VIRTUAL ENVIRONMENTS

GUILHERME SIQUEIRA EDUARDO 12 August 2021 (has links)
[pt] Com recentes avanços em poder computacional, o uso de novos modelos de controle complexos se tornou viável para realizar o controle de quadrotores. Um destes métodos é o aprendizado por reforço profundo (do inglês, Deep Reinforcement Learning, DRL), que pode produzir uma política de controle que atende melhor as não-linearidades presentes no modelo do quadrotor que um método de controle tradicional. Umas das não-linearidades importantes presentes em veículos aéreos transportadores de carga são as propriedades variantes no tempo, como tamanho e massa, causadas pela adição e remoção de carga. A abordagem geral e domínio-agnóstica de um controlador por DRL também o permite lidar com navegação visual, na qual a estimação de dados de posição é incerta. Neste trabalho, aplicamos um algorítmo de Soft Actor- Critic com o objeivo de projetar controladores para um quadrotor a fim de realizar tarefas que reproduzem os desafios citados em um ambiente virtual. Primeiramente, desenvolvemos dois controladores de condução por waypoint: um controlador de baixo nível que atua diretamente em comandos para o motor e um controlador de alto nível que interage em cascata com um controlador de velocidade PID. Os controladores são então avaliados quanto à tarefa proposta de coleta e alijamento de carga, que, dessa forma, introduz uma variável variante no tempo. Os controladores concebidos são capazes de superar o controlador clássico de posição PID com ganhos otimizados no curso proposto, enquanto permanece agnóstico em relação a um conjunto de parâmetros de simulação. Finalmente, aplicamos o mesmo algorítmo de DRL para desenvolver um controlador que se utiliza de dados visuais para completar um curso de corrida em uma simulação. Com este controlador, o quadrotor é capaz de localizar portões utilizando uma câmera RGB-D e encontrar uma trajetória que o conduz a atravessar o máximo possível de portões presentes no percurso. / [en] With recent advances in computational power, the use of novel, complex control models has become viable for controlling quadrotors. One such method is Deep Reinforcement Learning (DRL), which can devise a control policy that better addresses non-linearities in the quadrotor model than traditional control methods. An important non-linearity present in payload carrying air vehicles are the inherent time-varying properties, such as size and mass, caused by the addition and removal of cargo. The general, domain-agnostic approach of the DRL controller also allows it to handle visual navigation, in which position estimation data is unreliable. In this work, we employ a Soft Actor-Critic algorithm to design controllers for a quadrotor to carry out tasks reproducing the mentioned challenges in a virtual environment. First, we develop two waypoint guidance controllers: a low-level controller that acts directly on motor commands and a high-level controller that interacts in cascade with a velocity PID controller. The controllers are then evaluated on the proposed payload pickup and drop task, thereby introducing a timevarying variable. The controllers conceived are able to outperform a traditional positional PID controller with optimized gains in the proposed course, while remaining agnostic to a set of simulation parameters. Finally, we employ the same DRL algorithm to develop a controller that can leverage visual data to complete a racing course in simulation. With this controller, the quadrotor is able to localize gates using an RGB-D camera and devise a trajectory that drives it to traverse as many gates in the racing course as possible.
93

Simulating market maker behaviour using Deep Reinforcement Learning to understand market microstructure / En simulering av aktiemarknadens mikrostruktur via självlärande finansiella agenter

Marcus, Elwin January 2018 (has links)
Market microstructure studies the process of exchanging assets underexplicit trading rules. With algorithmic trading and high-frequencytrading, modern financial markets have seen profound changes in marketmicrostructure in the last 5 to 10 years. As a result, previously establishedmethods in the field of market microstructure becomes oftenfaulty or insufficient. Machine learning and, in particular, reinforcementlearning has become more ubiquitous in both finance and otherfields today with applications in trading and optimal execution. This thesisuses reinforcement learning to understand market microstructureby simulating a stock market based on NASDAQ Nordics and trainingmarket maker agents on this stock market. Simulations are run on both a dealer market and a limit orderbook marketdifferentiating it from previous studies. Using DQN and PPO algorithmson these simulated environments, where stochastic optimal controltheory has been mainly used before. The market maker agents successfullyreproduce stylized facts in historical trade data from each simulation,such as mean reverting prices and absence of linear autocorrelationsin price changes as well as beating random policies employed on thesemarkets with a positive profit & loss of maximum 200%. Other tradingdynamics in real-world markets have also been exhibited via theagents interactions, mainly: bid-ask spread clustering, optimal inventorymanagement, declining spreads and independence of inventory and spreads, indicating that using reinforcement learning with PPO and DQN arerelevant choices when modelling market microstructure. / Marknadens mikrostruktur studerar hur utbytet av finansiella tillgångar sker enligt explicita regler. Algoritmisk och högfrekvenshandel har förändrat moderna finansmarknaders strukturer under de senaste 5 till 10 åren. Detta har även påverkat pålitligheten hos tidigare använda metoder från exempelvis ekonometri för att studera marknadens mikrostruktur. Maskininlärning och Reinforcement Learning har blivit mer populära, med många olika användningsområden både inom finans och andra fält. Inom finansfältet har dessa typer av metoder använts främst inom handel och optimal exekvering av ordrar. I denna uppsats kombineras både Reinforcement Learning och marknadens mikrostruktur, för att simulera en aktiemarknad baserad på NASDAQ i Norden. Där tränas market maker - agenter via Reinforcement Learning med målet att förstå marknadens mikrostruktur som uppstår via agenternas interaktioner. I denna uppsats utvärderas och testas agenterna på en dealer – marknad tillsammans med en limit - orderbok. Vilket särskiljer denna studie tillsammans med de två algoritmerna DQN och PPO från tidigare studier. Främst har stokastisk optimering använts för liknande problem i tidigare studier. Agenterna lyckas framgångsrikt med att återskapa egenskaper hos finansiella tidsserier som återgång till medelvärdet och avsaknad av linjär autokorrelation. Agenterna lyckas också med att vinna över slumpmässiga strategier, med maximal vinst på 200%. Slutgiltigen lyckas även agenterna med att visa annan handelsdynamik som förväntas ske på en verklig marknad. Huvudsakligen: kluster av spreads, optimal hantering av aktielager och en minskning av spreads under simuleringarna. Detta visar att Reinforcement Learning med PPO eller DQN är relevanta val vid modellering av marknadens mikrostruktur.
94

Job shop smart manufacturing scheduling by deep reinforcement learning for Industry 4.0

Serrano Ruiz, Julio César 24 January 2025 (has links)
Tesis por compendio / [ES] El paradigma de la Industria 4.0 (I4.0) gravita en gran medida sobre el potencial de las tecnologías de la información y la comunicación (TIC) para mejorar la competitividad y sostenibilidad de las industrias. El concepto de Smart Manufacturing Scheduling (SMS) surge y se inspira de ese potencial. SMS, como estrategia de transformación digital, aspira a optimizar los procesos industriales mediante la aplicación de tecnologías como el gemelo digital o digital twin (DT), el modelo de gestión zero-defect manufacturing (ZDM), y el aprendizaje por refuerzo profundo o deep reinforcement learning (DRL), con el propósito final de orientar los procesos de programación de operaciones hacia una automatización adaptativa en tiempo real y una reducción de las perturbaciones en los sistemas de producción. SMS se basa en cuatro principios de diseño del espectro I4.0: automatización, autonomía, capacidad de acción en tiempo real e interoperabilidad. A partir de estos principios clave, SMS combina las capacidades de la tecnología DT para simular, analizar y predecir; la del modelo ZDM para prevenir perturbaciones en los sistemas de planificación y control de la producción; y la del enfoque de modelado DRL para mejorar la toma de decisiones en tiempo real. Este enfoque conjunto orienta los procesos de programación de operaciones hacia una mayor eficiencia y, con ello, hacia un mayor rendimiento y resiliencia del sistema productivo. Esta investigación emprende, en primer lugar, una revisión exhaustiva del estado del arte sobre SMS. Con la revisión efectuada como referencia, la investigación plantea un modelo conceptual de SMS como estrategia de transformación digital en el contexto del proceso de programación del taller de trabajos o job shop. Finalmente, la investigación propone un modelo basado en DRL para abordar la implementación de los elementos clave del modelo conceptual: el DT del taller de trabajos y el agente programador. Los algoritmos que integran este modelo se han programado en Python y han sido validados contra varias de las más conocidas reglas heurísticas de prioridad. El desarrollo del modelo y los algoritmos supone una contribución académica y gerencial en el área de la planificación y control de la producción. / [CA] El paradigma de la Indústria 4.0 (I4.0) gravita en gran mesura sobre el potencial de les tecnologies de la informació i la comunicació (TIC) per millorar la competitivitat i la sostenibilitat de les indústries. El concepte d'smart manufacturing scheduling (SMS) sorgeix i inspira a partir d'aquest potencial. SMS, com a estratègia de transformació digital, aspira a optimitzar els processos industrials mitjançant l'aplicació de tecnologies com el bessó digital o digital twin (DT), el model de gestió zero-defect manufacturing (ZDM), i l'aprenentatge per reforçament profund o deep reinforcement learning (DRL), amb el propòsit final dorientar els processos de programació doperacions cap a una automatització adaptativa en temps real i una reducció de les pertorbacions en els sistemes de producció. SMS es basa en quatre principis de disseny de l'espectre I4.0: automatització, autonomia, capacitat d¿acció en temps real i interoperabilitat. A partir d'aquests principis clau, SMS combina les capacitats de la tecnologia DT per simular, analitzar i predir; la del model ZDM per prevenir pertorbacions en els sistemes de planificació i control de la producció; i la de de l'enfocament de modelatge DRL per millorar la presa de decisions en temps real. Aquest enfocament conjunt orienta els processos de programació d'operacions cap a una eficiència més gran i, amb això, cap a un major rendiment i resiliència del sistema productiu. Aquesta investigació emprèn, en primer lloc, una exhaustiva revisió de l'estat de l'art sobre SMS. Amb la revisió efectuada com a referència, la investigació planteja un model conceptual de SMS com a estratègia de transformació digital en el context del procés de programació del taller de treballs o job shop. Finalment, la investigació proposa un model basat en DRL per abordar la implementació dels elements claus del model conceptual: el DT del taller de treballs i l'agent programador. Els algorismes que integren aquest model s'han programat a Python i han estat validats contra diverses de les més conegudes regles heurístiques de prioritat. El desenvolupament del model i els algorismes suposa una contribució a nivell acadèmic i gerencial a l'àrea de la planificació i control de la producció. / [EN] The Industry 4.0 (I4.0) paradigm relies, to a large extent, on the potential of information and communication technologies (ICT) to improve the competitiveness and sustainability of industries. The smart manufacturing scheduling (SMS) concept arises and draws inspiration from this potential. As a digital transformation strategy, SMS aims to optimise industrial processes through the application of technologies, such as the digital twin (DT), the zero-defect manufacturing (ZDM) management model and deep reinforcement learning (DRL), for the ultimate purpose of guiding operations scheduling processes towards real-time adaptive automation and to reduce disturbances in production systems. SMS is based on four design principles of the I4.0 spectrum: automation, autonomy, real-time capability and interoperability. Based on these key principles, SMS combines the capabilities of the DT technology to simulate, analyse and predict; with the ZDM model, to prevent disturbances in production planning and control systems; by the DRL modelling approach, to improve real-time decision making. This joint approach orients operations scheduling processes towards greater efficiency and, with it, a better performing and more resilient production system. This research firstly undertakes a comprehensive review of the state of the art on SMS. By taking the review as a reference, the research proposes a conceptual model of SMS as a digital transformation strategy in the job shop scheduling process context. Finally, it proposes a DRL-based model to address the implementation of the key elements of the conceptual model: the job shop DT and the scheduling agent. The algorithms that integrate this model have been programmed in Python and validated against several of the most well-known heuristic priority rules. The development of the model and algorithms is an academic and managerial contribution in the production planning and control area. / This thesis was developed with the support of the Research Centre on Production Management and Engineering (CIGIP) of the Universitat Politècnica de València and received funding from: the European Union H2020 programme under grant agreement No. 825631, “Zero Defect Manufacturing Platform (ZDMP)”; the European Union H2020 programme under grant agreement No. 872548, "Fostering DIHs for Embedding Interoperability in Cyber-Physical Systems of European SMEs (DIH4CPS)"; the European Union H2020 programme under grant agreement No. 958205, “Industrial Data Services for Quality Control in Smart Manufacturing (i4Q)”; the European Union Horizon Europe programme under grant agreement No. 101057294, “AI Driven Industrial Equipment Product Life Cycle Boosting Agility, Sustainability and Resilience” (AIDEAS); the Spanish Ministry of Science, Innovation and Universities under grant agreement RTI2018-101344-B-I00, "Optimisation of zero-defects production technologies enabling supply chains 4.0 (CADS4.0)"; the Valencian Regional Government, in turn funded from grant RTI2018- 101344-B-I00 by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”, "Industrial Production and Logistics optimization in Industry 4.0" (i4OPT) (Ref. PROMETEO/2021/065); and the grant PDC2022-133957- I00, “Validation of transferable results of optimisation of zero-defect enabling production technologies for supply chain 4.0” (CADS4.0-II) funded by MCIN/AEI/10.13039/501100011033 and by European Union Next GenerationEU/PRTR. / Serrano Ruiz, JC. (2024). Job shop smart manufacturing scheduling by deep reinforcement learning for Industry 4.0 [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202871 / Compendio
95

Access Point Selection and Clustering Methods with Minimal Switching for Green Cell-Free Massive MIMO Networks

He, Qinglong January 2022 (has links)
As a novel beyond fifth-generation (5G) concept, cell-free massive MIMO (multiple-input multiple-output) recently has become a promising physical-layer technology where an enormous number of distributed access points (APs), coordinated by a central processing unit (CPU), cooperate to coherently serve a large number of user equipments (UEs) in the same time/frequency resource. However, denser AP deployment in cell-free networks as well as an exponentially growing number of mobile UEs lead to higher power consumption. What is more, similar to conventional cellular networks, cell-free massive MIMO networks are dimensioned to provide the required quality of service (QoS) to the UEs under heavy traffic load conditions, and thus they might be underutilized during low traffic load periods, leading to inefficient use of both spectral and energy resources. Aiming at the implementation of energy-efficient cell-free networks, several approaches have been proposed in the literature, which consider different AP switch ON/OFF (ASO) strategies for power minimization. Different from prior works, this thesis focuses on additional factors other than ASO that have an adverse effect not only on total power consumption but also on implementation complexity and operation cost. For instance, too frequent ON/OFF switching in an AP can lead to tapering off the potential power saving of ASO by incurring extra power consumption due to excessive switching. Indeed, frequent switching of APs might also result in thermal fatigue and serious lifetime degeneration. Moreover, time variations in the AP-UE association in favor of energy saving in a dynamic network bring additional signaling and implementation complexity. Thus, in the first part of the thesis, we propose a multi-objective optimization problem that aims to minimize the total power consumption together with AP switching and AP-UE association variations in comparison to the state of the network in the previous state. The proposed problem is cast in mixed integer quadratic programming form and solved optimally. Our simulation results show that by limiting AP switching (node switching) and AP-UE association reformation switching (link switching), the total power consumption from APs only slightly increases but the number of average switching drops significantly regardless of node switching or link switching. It achieves a good balance on the trade-off between radio power consumption and the side effects excessive switching will bring. In the second part of the thesis, we consider a larger cell-free massive MIMO network by dividing the total area into disjoint network-centric clusters, where the APs in each cluster are connected to a separate CPU. In each cluster, cell-free joint transmission is locally implemented to achieve a scalable network implementation. Motivated by the outcomes of the first part, we reshape our dynamic network simulator to keep the active APs for a given spatial traffic pattern the same as long as the mean arrival rates of the UEs are constant. Moreover, the initially formed AP-UE association for a particular UE is not allowed to change. In that way, we make the number of node and link switching zero throughout the considered time interval. For this dynamic network, we propose a deep reinforcement learning (DRL) framework that learns the policy of maximizing long-term energy efficiency (EE) for a given spatially-varying traffic density. The active AP density of each network-centric cluster and the boundaries of the clusters are learned by the trained agent to maximize the EE. The DRL algorithm is shown to learn a non-trivial joint cluster geometry and AP density with at least 7% improvement in terms of EE compared to the heuristically-developed benchmarks. / Som ett nytt koncept bortom den femte generationen (5G) har cellfri massiv MIMO (multiple input multiple output) nyligen blivit en lovande teknik för det fysiska lagret där ett enormt antal distribuerade åtkomstpunkter (AP), som samordnas av en central processorenhet (CPU), samarbetar för att på ett sammanhängande sätt betjäna ett stort antal användarutrustningar (UE) i samma tids- och frekvensresurs. En tätare utplacering av AP:er i cellfria nät samt ett exponentiellt växande antal mobila användare leder dock till högre energiförbrukning. Dessutom är cellfria massiva MIMO-nät, i likhet med konventionella cellulära nät, dimensionerade för att ge den erforderliga tjänstekvaliteten (QoS) till enheterna under förhållanden med hög trafikbelastning, och därför kan de vara underutnyttjade under perioder med låg trafikbelastning, vilket leder till ineffektiv användning av både spektral- och energiresurser. För att genomföra energieffektiva cellfria nät har flera metoder föreslagits i litteraturen, där olika ASO-strategier (AP switch ON/OFF) beaktas för att minimera energiförbrukningen. Till skillnad från tidigare arbeten fokuserar den här avhandlingen på andra faktorer än ASO som har en negativ effekt inte bara på den totala energiförbrukningen utan också på komplexiteten i genomförandet och driftskostnaden. Till exempel kan alltför frekventa ON/OFF-omkopplingar i en AP leda till att ASO:s potentiella energibesparingar avtar genom extra energiförbrukning på grund av överdriven omkoppling. Frekventa omkopplingar av AP:er kan också leda till termisk trötthet och allvarlig försämring av livslängden. Dessutom medför tidsvariationer i AP-UE-associationen till förmån för energibesparingar i ett dynamiskt nät ytterligare signalering och komplexitet i genomförandet. I den första delen av avhandlingen föreslår vi därför ett optimeringsproblem med flera mål som syftar till att minimera den totala energiförbrukningen tillsammans med växling av AP och variationer i AP-UE-associationen i jämförelse med nätets tillstånd i det föregående läget. Det föreslagna problemet är en blandad helhetsmässig kvadratisk programmering och löses optimalt. Våra simuleringsresultat visar att genom att begränsa växling av AP (node switching) och växling av AP-UE-association (link switching) ökar den totala energiförbrukningen från AP:erna endast något, men antalet genomsnittliga växlingar ökar, oavsett om det rör sig om node switching eller link switching. Det ger en bra balans mellan radiokraftförbrukning och de bieffekter som överdriven växling medför. I den andra delen av avhandlingen tar vi hänsyn till ett större cellfritt massivt MIMO-nätverk genom att dela upp det totala området i disjunkta nätverkscentrerade kluster, där AP:erna i varje kluster är anslutna till en separat CPU. I varje kluster genomförs cellfri gemensam överföring lokalt för att uppnå en skalbar nätverksimplementering. Motiverat av resultaten i den första delen omformar vi vår dynamiska nätverkssimulator så att de aktiva AP:erna för ett givet rumsligt trafikmönster är desamma så länge som den genomsnittliga ankomsthastigheten för de enskilda enheterna är konstant. Dessutom tillåts inte den ursprungligen bildade AP-UE-associationen för en viss användare att förändras. På så sätt gör vi antalet nod- och länkbyten till noll under hela det aktuella tidsintervallet. För detta dynamiska nätverk föreslår vi ett ramverk för djup förstärkningsinlärning (DRL) som lär sig en strategi för att maximera energieffektiviteten på lång sikt för en given rumsligt varierande trafiktäthet. Den aktiva AP-tätheten i varje nätverkscentrerat kluster och klustrens gränser lärs av den utbildade agenten för att maximera EE. Det visas att DRL-algoritmen lär sig en icke-trivial gemensam klustergeometri och AP-täthet med minst 7% förbättring av EE jämfört med de heuristiskt utvecklade riktmärkena.
96

[en] ENABLING AUTONOMOUS DATA ANNOTATION: A HUMAN-IN-THE-LOOP REINFORCEMENT LEARNING APPROACH / [pt] HABILITANDO ANOTAÇÕES DE DADOS AUTÔNOMOS: UMA ABORDAGEM DE APRENDIZADO POR REFORÇO COM HUMANO NO LOOP

LEONARDO CARDIA DA CRUZ 10 November 2022 (has links)
[pt] As técnicas de aprendizado profundo têm mostrado contribuições significativas em vários campos, incluindo a análise de imagens. A grande maioria dos trabalhos em visão computacional concentra-se em propor e aplicar novos modelos e algoritmos de aprendizado de máquina. Para tarefas de aprendizado supervisionado, o desempenho dessas técnicas depende de uma grande quantidade de dados de treinamento, bem como de dados rotulados. No entanto, a rotulagem é um processo caro e demorado. Uma recente área de exploração são as reduções dos esforços na preparação de dados, deixando-os sem inconsistências, ruídos, para que os modelos atuais possam obter um maior desempenho. Esse novo campo de estudo é chamado de Data-Centric IA. Apresentamos uma nova abordagem baseada em Deep Reinforcement Learning (DRL), cujo trabalho é voltado para a preparação de um conjunto de dados em problemas de detecção de objetos, onde as anotações de caixas delimitadoras são feitas de modo autônomo e econômico. Nossa abordagem consiste na criação de uma metodologia para treinamento de um agente virtual a fim de rotular automaticamente os dados, a partir do auxílio humano como professor desse agente. Implementamos o algoritmo Deep Q-Network para criar o agente virtual e desenvolvemos uma abordagem de aconselhamento para facilitar a comunicação do humano professor com o agente virtual estudante. Para completar nossa implementação, utilizamos o método de aprendizado ativo para selecionar casos onde o agente possui uma maior incerteza, necessitando da intervenção humana no processo de anotação durante o treinamento. Nossa abordagem foi avaliada e comparada com outros métodos de aprendizado por reforço e interação humano-computador, em diversos conjuntos de dados, onde o agente virtual precisou criar novas anotações na forma de caixas delimitadoras. Os resultados mostram que o emprego da nossa metodologia impacta positivamente para obtenção de novas anotações a partir de um conjunto de dados com rótulos escassos, superando métodos existentes. Desse modo, apresentamos a contribuição no campo de Data-Centric IA, com o desenvolvimento de uma metodologia de ensino para criação de uma abordagem autônoma com aconselhamento humano para criar anotações econômicas a partir de anotações escassas. / [en] Deep learning techniques have shown significant contributions in various fields, including image analysis. The vast majority of work in computer vision focuses on proposing and applying new machine learning models and algorithms. For supervised learning tasks, the performance of these techniques depends on a large amount of training data and labeled data. However, labeling is an expensive and time-consuming process. A recent area of exploration is the reduction of efforts in data preparation, leaving it without inconsistencies and noise so that current models can obtain greater performance. This new field of study is called Data-Centric AI. We present a new approach based on Deep Reinforcement Learning (DRL), whose work is focused on preparing a dataset, in object detection problems where the bounding box annotations are done autonomously and economically. Our approach consists of creating a methodology for training a virtual agent in order to automatically label the data, using human assistance as a teacher of this agent. We implemented the Deep Q-Network algorithm to create the virtual agent and developed a counseling approach to facilitate the communication of the human teacher with the virtual agent student. We used the active learning method to select cases where the agent has more significant uncertainty, requiring human intervention in the annotation process during training to complete our implementation. Our approach was evaluated and compared with other reinforcement learning methods and human-computer interaction in different datasets, where the virtual agent had to create new annotations in the form of bounding boxes. The results show that the use of our methodology has a positive impact on obtaining new annotations from a dataset with scarce labels, surpassing existing methods. In this way, we present the contribution in the field of Data-Centric AI, with the development of a teaching methodology to create an autonomous approach with human advice to create economic annotations from scarce annotations.
97

Real-time adaptation of robotic knees using reinforcement control

Daníel Sigurðarson, Leifur January 2023 (has links)
Microprocessor-controlled knees (MPK’s) allow amputees to walk with increasing ease and safety as technology progresses. As an amputee is fitted with a new MPK, the knee’s internal parameters are tuned to the user’s preferred settings in a controlled environment. These parameters determine various gait control settings, such as flexion target angle or swing extension resistance. Though these parameters may work well during the initial fitting, the MPK experiences various internal & external environmental changes throughout its life-cycle, such as product wear, changes in the amputee’s muscle strength, temperature changes, etc. This work investigates the feasibility of using a reinforcement learning (RL) control to adapt the MPK’s swing resistance to consistently induce the amputee’s preferred swing performance in realtime. Three gait features were identified as swing performance indicators for the RL algorithm. Results show that the RL control is able to learn and improve its tuning performance in terms of Mean Absolute Error over two 40-45 minute training sessions with a human-in-the-loop. Additionally, results show promise in using transfer learning to reduce strenuous RL training times. / Mikroprocessorkontrollerade knän (MPK) gör att amputerade kan utföra fysiska aktiviteter med ökad lätthet och säkerhet allt eftersom tekniken fortskrider. När en ny MPK monteras på en amputerad person, anpassas knäts interna parametrar till användarens i ett kontrollerad miljö. Dessa parametrar styr olika gångkontrollinställningar, såsom flexionsmålvinkel eller svängförlängningsmotstånd. Även om parametrarna kan fungera bra under den initiala anpassningen, upplever den MPK olika interna och yttre miljöförändringar under sin hela livscykel, till exempel produktslitage, förändringar i den amputerades muskelstyrka, temperaturförändringar, etc. Detta arbete undersöker möjligheten av, med hjälp av en förstärkningsinlärningskontroll (RL), att anpassa MPK svängmotstånd för att konsekvent inducera den amputerades föredragna svängprestanda i realtid. Tre gångegenskaper identifierades som svingprestandaindikatorer för RL-algoritmen. Resultaten visar att RL-kontrollen kan lära sig och förbättra sin inställningsprestanda i termer av Mean Absolute Error under två 40-45 minuters träningspass med en människa-i-loopen. Dessutom är resultaten lovande när det gäller att använda överföringsinlärning för att minska ansträngande RL-träningstider.
98

Data Harvesting and Path Planning in UAV-aided Internet-of-Things Wireless Networks with Reinforcement Learning : KTH Thesis Report / Datainsamling och vägplanering i UAV-stödda Internet-of-Things trådlösa nätverk med förstärkningsinlärning : KTH Examensrapport

Zhang, Yuming January 2023 (has links)
In recent years, Unmanned aerial vehicles (UAVs) have developed rapidly due to advances in aerospace technology, and wireless communication systems. As a result of their versatility, cost-effectiveness, and flexibility of deployment, UAVs have been developed to accomplish a variety of large and complex tasks without terrain restrictions, such as battlefield operations, search and rescue under disaster conditions, monitoring, etc. Data collection and offloading missions in The internet of thingss (IoTs) networks can be accomplished with the use of UAVs as network edge nodes. The fundamental challenge in such scenarios is to develop a UAV movement policy that enhances the quality of mission completion and avoids collisions. Real-time learning based on neural networks has been proven to be an effective method for solving decision-making problems in a dynamic, unknown environment. In this thesis, we assume a real-life scenario in which a UAV collects data from Ground base stations (GBSs) without knowing the information of the environment. A UAV is responsible for the MOO including collecting data, avoiding obstacles, path planning, and conserving energy. Two Deep reinforcement learnings (DRLs) approaches were implemented in this thesis and compared. / Under de senaste åren har UAV utvecklats snabbt på grund av framsteg inom flygteknik och trådlösa kommunikationssystem. Som ett resultat av deras mångsidighet, kostnadseffektivitet och flexibilitet i utbyggnaden har UAV:er utvecklats för att utföra en mängd stora och komplexa uppgifter utan terrängrestriktioner, såsom slagfältsoperationer, sök och räddning under katastrofförhållanden, övervakning, etc. Data insamlings- och avlastningsuppdrag i IoT-nätverk kan utföras med användning av UAV:er som nätverkskantnoder. Den grundläggande utmaningen i sådana scenarier är att utveckla en UAV-rörelsepolicy som förbättrar kvaliteten på uppdragets slutförande och undviker kollisioner. Realtidsinlärning baserad på neurala nätverk har visat sig vara en effektiv metod för att lösa beslutsfattande problem i en dynamisk, okänd miljö. I den här avhandlingen utgår vi från ett verkligt scenario där en UAV samlar in data från GBS utan att känna till informationen om miljön. En UAV är ansvarig för MOO inklusive insamling av data, undvikande av hinder, vägplanering och energibesparing. Två DRL-metoder implementerades i denna avhandling och jämfördes.
99

Stabilizing Q-Learning for continuous control

Hui, David Yu-Tung 12 1900 (has links)
L'apprentissage profond par renforcement a produit des décideurs qui jouent aux échecs, au Go, au Shogi, à Atari et à Starcraft avec une capacité surhumaine. Cependant, ces algorithmes ont du mal à naviguer et à contrôler des environnements physiques, contrairement aux animaux et aux humains. Manipuler le monde physique nécessite la maîtrise de domaines d'actions continues tels que la position, la vitesse et l'accélération, contrairement aux domaines d'actions discretes dans des jeux de société et de vidéo. L'entraînement de réseaux neuronaux profonds pour le contrôle continu est instable: les agents ont du mal à apprendre et à conserver de bonnes habitudes, le succès est à haute variance sur hyperparamètres, graines aléatoires, même pour la même tâche, et les algorithmes ont du mal à bien se comporter en dehors des domaines dans lesquels ils ont été développés. Cette thèse examine et améliore l'utilisation de réseaux de neurones profonds dans l'apprentissage par renforcement. Le chapitre 1 explique comment le principe d'entropie maximale produit des fonctions d'objectifs pour l'apprentissage supervisé et non supervisé et déduit, à partir de la dynamique d'apprentissage des réseaux neuronaux profonds, certains termes régulisants pour stabiliser les réseaux neuronaux profonds. Le chapitre 2 fournit une justification de l'entropie maximale pour la forme des algorithmes acteur-critique et trouve une configuration d'un algorithme acteur-critique qui s'entraîne le plus stablement. Enfin, le chapitre 3 examine la dynamique d'apprentissage de l'apprentissage par renforcement profond afin de proposer deux améliorations aux réseaux cibles et jumeaux qui améliorent la stabilité et la convergence. Des expériences sont réalisées dans les simulateurs de physique idéale DeepMind Control, MuJoCo et Box2D. / Deep Reinforcement Learning has produced decision makers that play Chess, Go, Shogi, Atari, and Starcraft with superhuman ability. However, unlike animals and humans, these algorithms struggle to navigate and control physical environments. Manipulating the physical world requires controlling continuous action spaces such as position, velocity, and acceleration, unlike the discrete action spaces of board and video games. Training deep neural networks for continuous control is unstable: agents struggle to learn and retain good behaviors, performance is high variance across hyperparameters, random seed, and even multiple runs of the same task, and algorithms struggle to perform well outside the domains they have been developed in. This thesis finds principles behind the success of deep neural networks in other learning paradigms and examines their impact on reinforcement learning for continuous control. Chapter 1 explains how the maximum-entropy principle produces supervised and unsupervised learning loss functions and derives some regularizers used to stabilize deep networks from the training dynamics of deep learning. Chapter 2 provides a maximum-entropy justification for the form of actor-critic algorithms and finds a configuration of an actor-critic algorithm that trains most stably. Finally, Chapter 3 considers the training dynamics of deep reinforcement learning to propose two improvements to target and twin networks that improve stability and convergence. Experiments are performed within the DeepMind Control, MuJoCo, and Box2D ideal-physics simulators.
100

Reinforcement learning applied to the real world : uncertainty, sample efficiency, and multi-agent coordination

Mai, Vincent 12 1900 (has links)
L'immense potentiel des approches d'apprentissage par renforcement profond (ARP) pour la conception d'agents autonomes a été démontré à plusieurs reprises au cours de la dernière décennie. Son application à des agents physiques, tels que des robots ou des réseaux électriques automatisés, est cependant confrontée à plusieurs défis. Parmi eux, l'inefficacité de leur échantillonnage, combinée au coût et au risque d'acquérir de l'expérience dans le monde réel, peut décourager tout projet d'entraînement d'agents incarnés. Dans cette thèse, je me concentre sur l'application de l'ARP sur des agents physiques. Je propose d'abord un cadre probabiliste pour améliorer l'efficacité de l'échantillonnage dans l'ARP. Dans un premier article, je présente la pondération BIV (batch inverse-variance), une fonction de perte tenant compte de la variance du bruit des étiquettes dans la régression bruitée hétéroscédastique. La pondération BIV est un élément clé du deuxième article, où elle est combinée avec des méthodes de pointe de prédiction de l'incertitude pour les réseaux neuronaux profonds dans un pipeline bayésien pour les algorithmes d'ARP avec différences temporelles. Cette approche, nommée apprentissage par renforcement à variance inverse (IV-RL), conduit à un entraînement nettement plus rapide ainsi qu'à de meilleures performances dans les tâches de contrôle. Dans le troisième article, l'apprentissage par renforcement multi-agent (MARL) est appliqué au problème de la réponse rapide à la demande, une approche prometteuse pour gérer l'introduction de sources d'énergie renouvelables intermittentes dans les réseaux électriques. En contrôlant la coordination de plusieurs climatiseurs, les agents MARL obtiennent des performances nettement supérieures à celles des approches basées sur des règles. Ces résultats soulignent le rôle potentiel que les agents physiques entraînés par MARL pourraient jouer dans la transition énergétique et la lutte contre le réchauffement climatique. / The immense potential of deep reinforcement learning (DRL) approaches to build autonomous agents has been proven repeatedly in the last decade. Its application to embodied agents, such as robots or automated power systems, is however facing several challenges. Among them, their sample inefficiency, combined to the cost and the risk of gathering experience in the real world, can deter any idea of training embodied agents. In this thesis, I focus on the application of DRL on embodied agents. I first propose a probabilistic framework to improve sample efficiency in DRL. In the first article, I present batch inverse-variance (BIV) weighting, a loss function accounting for label noise variance in heteroscedastic noisy regression. BIV is a key element of the second article, where it is combined with state-of-the-art uncertainty prediction methods for deep neural networks in a Bayesian pipeline for temporal differences DRL algorithms. This approach, named inverse-variance reinforcement learning (IV-RL), leads to significantly faster training as well as better performance in control tasks. In the third article, multi-agent reinforcement learning (MARL) is applied to the problem of fast-timescale demand response, a promising approach to the manage the introduction of intermittent renewable energy sources in power-grids. As MARL agents control the coordination of multiple air conditioners, they achieve significantly better performance than rule-based approaches. These results underline to the potential role that DRL trained embodied agents could take in the energetic transition and the fight against global warming.

Page generated in 0.0916 seconds