Global ETD Search

101	Data Harvesting and Path Planning in UAV-aided Internet-of-Things Wireless Networks with Reinforcement Learning : KTH Thesis Report / Datainsamling och vägplanering i UAV-stödda Internet-of-Things trådlösa nätverk med förstärkningsinlärning : KTH Examensrapport Zhang, Yuming January 2023 (has links) In recent years, Unmanned aerial vehicles (UAVs) have developed rapidly due to advances in aerospace technology, and wireless communication systems. As a result of their versatility, cost-effectiveness, and flexibility of deployment, UAVs have been developed to accomplish a variety of large and complex tasks without terrain restrictions, such as battlefield operations, search and rescue under disaster conditions, monitoring, etc. Data collection and offloading missions in The internet of thingss (IoTs) networks can be accomplished with the use of UAVs as network edge nodes. The fundamental challenge in such scenarios is to develop a UAV movement policy that enhances the quality of mission completion and avoids collisions. Real-time learning based on neural networks has been proven to be an effective method for solving decision-making problems in a dynamic, unknown environment. In this thesis, we assume a real-life scenario in which a UAV collects data from Ground base stations (GBSs) without knowing the information of the environment. A UAV is responsible for the MOO including collecting data, avoiding obstacles, path planning, and conserving energy. Two Deep reinforcement learnings (DRLs) approaches were implemented in this thesis and compared. / Under de senaste åren har UAV utvecklats snabbt på grund av framsteg inom flygteknik och trådlösa kommunikationssystem. Som ett resultat av deras mångsidighet, kostnadseffektivitet och flexibilitet i utbyggnaden har UAV:er utvecklats för att utföra en mängd stora och komplexa uppgifter utan terrängrestriktioner, såsom slagfältsoperationer, sök och räddning under katastrofförhållanden, övervakning, etc. Data insamlings- och avlastningsuppdrag i IoT-nätverk kan utföras med användning av UAV:er som nätverkskantnoder. Den grundläggande utmaningen i sådana scenarier är att utveckla en UAV-rörelsepolicy som förbättrar kvaliteten på uppdragets slutförande och undviker kollisioner. Realtidsinlärning baserad på neurala nätverk har visat sig vara en effektiv metod för att lösa beslutsfattande problem i en dynamisk, okänd miljö. I den här avhandlingen utgår vi från ett verkligt scenario där en UAV samlar in data från GBS utan att känna till informationen om miljön. En UAV är ansvarig för MOO inklusive insamling av data, undvikande av hinder, vägplanering och energibesparing. Två DRL-metoder implementerades i denna avhandling och jämfördes. Unmanned aerial vehicle the Internet of Things data harvesting obstacle avoidance path planning deep reinforcement learning Obemannat luftfordon sakernas internet datainsamling undvikande av hinder vägplanering djup förstärkningsinlärning Computer and Information Sciences Data- och informationsvetenskap
102	Stabilizing Q-Learning for continuous control Hui, David Yu-Tung 12 1900 (has links) L'apprentissage profond par renforcement a produit des décideurs qui jouent aux échecs, au Go, au Shogi, à Atari et à Starcraft avec une capacité surhumaine. Cependant, ces algorithmes ont du mal à naviguer et à contrôler des environnements physiques, contrairement aux animaux et aux humains. Manipuler le monde physique nécessite la maîtrise de domaines d'actions continues tels que la position, la vitesse et l'accélération, contrairement aux domaines d'actions discretes dans des jeux de société et de vidéo. L'entraînement de réseaux neuronaux profonds pour le contrôle continu est instable: les agents ont du mal à apprendre et à conserver de bonnes habitudes, le succès est à haute variance sur hyperparamètres, graines aléatoires, même pour la même tâche, et les algorithmes ont du mal à bien se comporter en dehors des domaines dans lesquels ils ont été développés. Cette thèse examine et améliore l'utilisation de réseaux de neurones profonds dans l'apprentissage par renforcement. Le chapitre 1 explique comment le principe d'entropie maximale produit des fonctions d'objectifs pour l'apprentissage supervisé et non supervisé et déduit, à partir de la dynamique d'apprentissage des réseaux neuronaux profonds, certains termes régulisants pour stabiliser les réseaux neuronaux profonds. Le chapitre 2 fournit une justification de l'entropie maximale pour la forme des algorithmes acteur-critique et trouve une configuration d'un algorithme acteur-critique qui s'entraîne le plus stablement. Enfin, le chapitre 3 examine la dynamique d'apprentissage de l'apprentissage par renforcement profond afin de proposer deux améliorations aux réseaux cibles et jumeaux qui améliorent la stabilité et la convergence. Des expériences sont réalisées dans les simulateurs de physique idéale DeepMind Control, MuJoCo et Box2D. / Deep Reinforcement Learning has produced decision makers that play Chess, Go, Shogi, Atari, and Starcraft with superhuman ability. However, unlike animals and humans, these algorithms struggle to navigate and control physical environments. Manipulating the physical world requires controlling continuous action spaces such as position, velocity, and acceleration, unlike the discrete action spaces of board and video games. Training deep neural networks for continuous control is unstable: agents struggle to learn and retain good behaviors, performance is high variance across hyperparameters, random seed, and even multiple runs of the same task, and algorithms struggle to perform well outside the domains they have been developed in. This thesis finds principles behind the success of deep neural networks in other learning paradigms and examines their impact on reinforcement learning for continuous control. Chapter 1 explains how the maximum-entropy principle produces supervised and unsupervised learning loss functions and derives some regularizers used to stabilize deep networks from the training dynamics of deep learning. Chapter 2 provides a maximum-entropy justification for the form of actor-critic algorithms and finds a configuration of an actor-critic algorithm that trains most stably. Finally, Chapter 3 considers the training dynamics of deep reinforcement learning to propose two improvements to target and twin networks that improve stability and convergence. Experiments are performed within the DeepMind Control, MuJoCo, and Box2D ideal-physics simulators. Computer Science Aritifical Intelligence Deep Learning Reinforcement Learning Deep Reinforcement Learning Control Continuous Control Q-Learning MuJoCo Informatique Intelligence Artificielle Apprentissage Profond Apprentissage par Reinforcement Apprentissage par Reinforcement Profond Contrôle Contrôle Continu
103	Reinforcement learning applied to the real world : uncertainty, sample efficiency, and multi-agent coordination Mai, Vincent 12 1900 (has links) L'immense potentiel des approches d'apprentissage par renforcement profond (ARP) pour la conception d'agents autonomes a été démontré à plusieurs reprises au cours de la dernière décennie. Son application à des agents physiques, tels que des robots ou des réseaux électriques automatisés, est cependant confrontée à plusieurs défis. Parmi eux, l'inefficacité de leur échantillonnage, combinée au coût et au risque d'acquérir de l'expérience dans le monde réel, peut décourager tout projet d'entraînement d'agents incarnés. Dans cette thèse, je me concentre sur l'application de l'ARP sur des agents physiques. Je propose d'abord un cadre probabiliste pour améliorer l'efficacité de l'échantillonnage dans l'ARP. Dans un premier article, je présente la pondération BIV (batch inverse-variance), une fonction de perte tenant compte de la variance du bruit des étiquettes dans la régression bruitée hétéroscédastique. La pondération BIV est un élément clé du deuxième article, où elle est combinée avec des méthodes de pointe de prédiction de l'incertitude pour les réseaux neuronaux profonds dans un pipeline bayésien pour les algorithmes d'ARP avec différences temporelles. Cette approche, nommée apprentissage par renforcement à variance inverse (IV-RL), conduit à un entraînement nettement plus rapide ainsi qu'à de meilleures performances dans les tâches de contrôle. Dans le troisième article, l'apprentissage par renforcement multi-agent (MARL) est appliqué au problème de la réponse rapide à la demande, une approche prometteuse pour gérer l'introduction de sources d'énergie renouvelables intermittentes dans les réseaux électriques. En contrôlant la coordination de plusieurs climatiseurs, les agents MARL obtiennent des performances nettement supérieures à celles des approches basées sur des règles. Ces résultats soulignent le rôle potentiel que les agents physiques entraînés par MARL pourraient jouer dans la transition énergétique et la lutte contre le réchauffement climatique. / The immense potential of deep reinforcement learning (DRL) approaches to build autonomous agents has been proven repeatedly in the last decade. Its application to embodied agents, such as robots or automated power systems, is however facing several challenges. Among them, their sample inefficiency, combined to the cost and the risk of gathering experience in the real world, can deter any idea of training embodied agents. In this thesis, I focus on the application of DRL on embodied agents. I first propose a probabilistic framework to improve sample efficiency in DRL. In the first article, I present batch inverse-variance (BIV) weighting, a loss function accounting for label noise variance in heteroscedastic noisy regression. BIV is a key element of the second article, where it is combined with state-of-the-art uncertainty prediction methods for deep neural networks in a Bayesian pipeline for temporal differences DRL algorithms. This approach, named inverse-variance reinforcement learning (IV-RL), leads to significantly faster training as well as better performance in control tasks. In the third article, multi-agent reinforcement learning (MARL) is applied to the problem of fast-timescale demand response, a promising approach to the manage the introduction of intermittent renewable energy sources in power-grids. As MARL agents control the coordination of multiple air conditioners, they achieve significantly better performance than rule-based approaches. These results underline to the potential role that DRL trained embodied agents could take in the energetic transition and the fight against global warming. Uncertainty estimation Estimation d'incertitude Multi agent reinforcement learning Apprentissage par renforcement profond Deep reinforcement learning Heteroscedastic regression Régression hétéroscédastique Demand response Régulation de fréquence Réseau électrique Power grid
104	[en] A FRAMEWORK FOR AUTOMATED VISUAL INSPECTION OF UNDERWATER PIPELINES / [pt] UM FRAMEWORK PARA INSPEÇÃO VISUAL AUTOMATIZADA DE DUTOS SUBAQUÁTICOS EVELYN CONCEICAO SANTOS BATISTA 30 January 2024 (has links) [pt] Em ambientes aquáticos, o uso tradicional de mergulhadores ou veiculos subaquáticos tripulados foi substituído por veículos subaquáticos não tripulados (como ROVs ou AUVs). Com vantagens em termos de redução de riscos de segurança, como exposição à pressão, temperatura ou falta de ar. Além disso, conseguem acessar áreas de extrema profundidade que até então não eram possiveis para o ser humano. Esses veiculos não tripulados são amplamente utilizados para inspeções como as necessárias para o descomissionamento de plataformas de petróleo Neste tipo de fiscalização é necessário analisar as condições do solo, da tu- bulação e, principalmente, se foi criado um ecossistema próximo à tubulação. Grande parte dos trabalhos realizados para a automação desses veículos utilizam diferentes tipos de sensores e GPS para realizar a percepção do ambiente. Devido à complexidade do ambiente de navegação, diferentes algoritmos de controle e automação têm sido testados nesta área, O interesse deste trabalho é fazer com que o autômato tome decisões através da análise de eventos visuais. Este método de pesquisa traz a vantagem de redução de custos para o projeto, visto que as câmeras possuem um preço inferior em relação aos sensores ou dispositivos GPS. A tarefa de inspeção autônoma tem vários desafios: detectar os eventos, processar as imagens e tomar a decisão de alterar a rota em tempo real. É uma tarefa altamente complexa e precisa de vários algoritmos trabalhando juntos para ter um bom desempenho. A inteligência artificial apresenta diversos algoritmos para automatizar, como os baseados em aprendizagem por reforço entre outros na área de detecção e classificação de imagens Esta tese de doutorado consiste em um estudo para criação de um sistema avançado de inspeção autônoma. Este sistema é capaz de realizar inspeções apenas analisando imagens da câmera AUV, usando aprendizagem de reforço profundo profundo para otimizar o planejamento do ponto de vista e técnicas de detecção de novidades. Contudo, este quadro pode ser adaptado a muitas outras tarefas de inspecção. Neste estudo foram utilizados ambientes realistas complexos, nos quais o agente tem o desafio de chegar da melhor forma possível ao objeto de interesse para que possa classificar o objeto. Vale ressaltar, entretanto, que os ambientes de simulação utilizados neste contexto apresentam certo grau de simplicidade carecendo de recursos como correntes marítimas on dinâmica de colisão em seus cenários simulados Ao final deste projeto, o Visual Inspection of Pipelines (VIP) framework foi desenvolvido e testado, apresentando excelentes resultados e ilustrando a viabilidade de redução do tempo de inspeção através da otimização do planejamento do ponto de vista. Esse tipo de abordagem, além de agregar conhecimento ao robô autônomo, faz com que as inspeções subaquáticas exijam pouca presença de ser humano (human-in-the-loop), justificando o uso das técnicas empregadas. / [en] In aquatic environments, the traditional use of divers or manned underwater vehicles has been replaced by unmanned underwater vehicles (such as ROVs or AUVs). With advantages in terms of reducing safety risks, such as exposure to pressure, temperature or shortness of breath. In addition, they are able to access areas of extreme depth that were not possible for humans until then. These unmanned vehicles are widely used for inspections, such as those required for the decommissioning of oil platforms. In this type of inspection, it is necessary to analyze the conditions of the soil, the pipeline and, especially, if an ecosystem was created close to the pipeline. Most of the works carried out for the automation of these vehicles use different types of sensors and GPS to perform the perception of the environment. Due to the complexity of the navigation environment, different control and automation algorithms have been tested in this area. The interest of this work is to make the automaton take decisions through the analysis of visual events. This research method provides the advantage of cost reduction for the project, given that cameras have a lower price compared to sensors or GPS devices. The autonomous inspection task has several challenges: detecting the events, processing the images and making the decision to change the route in real time. It is a highly complex task and needs multiple algorithms working together to perform well. Artificial intelligence presents many algorithms to automate, such as those based on reinforcement learning, among others in the area of image detection and classification. This doctoral thesis consists of a study to create an advanced autonomous inspection system. This system is capable of performing inspections only by analyzing images from the AUV camera, using deep reinforcement learning, and novelty detection techniques. However, this framework can be adapted to many other inspection tasks. In this study, complex realistic environments were used, in which the agent has the challenge of reaching the object of interest in the best possible way so that it can classify the object. It is noteworthy, however, that the simulation environments utilized in this context exhibit a certain degree of simplicity, lacking features like marine currents or collision dynamics in their simulated scenarios. At the conclusion of this project, a Visual Inspection of Pipelines (VIP) framework was developed and tested, showcasing excellent results and illustrating the feasibility of reducing inspection time through the optimization of viewpoint planning. This type of approach, in addition to adding knowledge to the autonomous robot, means that underwater inspections require little pres- ence of a human being (human-in-the-loop), justifying the use of the techniques employed. [pt] CLASSIFICACAO [pt] PLANEJAMENTO DE PONTO DE VISTA [pt] DETECCAO DE ANOMALIA [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] ROV [pt] ROBO AUTONOMO [pt] AUV [pt] FRAMEWORK [en] CLASSIFICATION [en] VIEWPOINT PLANNING [en] ANOMALY DETECTION [en] DEEP REINFORCEMENT LEARNING [en] ROV [en] AUTONOMOUS ROBOT [en] AUV [en] FRAMEWORK
105	Generation and Detection of Adversarial Attacks for Reinforcement Learning Policies Drotz, Axel, Hector, Markus January 2021 (has links) In this project we investigate the susceptibility ofreinforcement rearning (RL) algorithms to adversarial attacks.Adversarial attacks have been proven to be very effective atreducing performance of deep learning classifiers, and recently,have also been shown to reduce performance of RL agents.The goal of this project is to evaluate adversarial attacks onagents trained using deep reinforcement learning (DRL), aswell as to investigate how to detect these types of attacks. Wefirst use DRL to solve two environments from OpenAI’s gymmodule, namely Cartpole and Lunarlander, by using DQN andDDPG (DRL techniques). We then evaluate the performanceof attacks and finally we also train neural networks to detectattacks. The attacks was successful at reducing performancein the LunarLander environment and CartPole environment.The attack detector was very successful at detecting attacks onthe CartPole environment, but performed not quiet as well onLunarLander.We hypothesize that continuous action space environmentsmay pose a greater difficulty for attack detectors to identifypotential adversarial attacks. / I detta projekt undersöker vikänsligheten hos förstärknings lärda (RL) algotritmerför attacker mot förstärknings lärda agenter. Attackermot förstärknings lärda agenter har visat sig varamycket effektiva för att minska prestandan hos djuptförsärknings lärda klassifierare och har nyligen visat sigockså minska prestandan hos förstärknings lärda agenter.Målet med detta projekt är att utvärdera attacker motdjupt förstärknings lärda agenter och försöka utföraoch upptäcka attacker. Vi använder först RL för attlösa två miljöer från OpenAIs gym module CartPole-v0och ContiniousLunarLander-v0 med DQN och DDPG.Vi utvärderar sedan utförandet av attacker och avslutarslutligen med ett möjligt sätt att upptäcka attacker.Attackerna var mycket framgångsrika i att minskaprestandan i både CartPole-miljön och LunarLandermiljön. Attackdetektorn var mycket framgångsrik medatt upptäcka attacker i CartPole-miljön men presteradeinte lika bra i LunarLander-miljön.Vi hypotiserar att miljöer med kontinuerligahandlingsrum kan innebära en större svårighet fören attack identifierare att upptäcka attacker mot djuptförstärknings lärda agenter. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Deep Reinforcement Learning Adversarial Attacks Adversarial Attack Detection Fast Gradient Sign Method Deep Deterministic Policy Gradient Deep Q-Learning Likelihood Ratio Test CUSUM Elektroteknik och elektronik
106	Towards Novelty-Resilient AI: Learning in the Open World Trevor A Bonjour (18423153) 22 April 2024 (has links) <p dir="ltr">Current artificial intelligence (AI) systems are proficient at tasks in a closed-world setting where the rules are often rigid. However, in real-world applications, the environment is usually open and dynamic. In this work, we investigate the effects of such dynamic environments on AI systems and develop ways to mitigate those effects. Central to our exploration is the concept of \textit{novelties}. Novelties encompass structural changes, unanticipated events, and environmental shifts that can confound traditional AI systems. We categorize novelties based on their representation, anticipation, and impact on agents, laying the groundwork for systematic detection and adaptation strategies. We explore novelties in the context of stochastic games. Decision-making in stochastic games exercises many aspects of the same reasoning capabilities needed by AI agents acting in the real world. A multi-agent stochastic game allows for infinitely many ways to introduce novelty. We propose an extension of the deep reinforcement learning (DRL) paradigm to develop agents that can detect and adapt to novelties in these environments. To address the sample efficiency challenge in DRL, we introduce a hybrid approach that combines fixed-policy methods with traditional DRL techniques, offering enhanced performance in complex decision-making tasks. We present a novel method for detecting anticipated novelties in multi-agent games, leveraging information theory to discern patterns indicative of collusion among players. Finally, we introduce DABLER, a pioneering deep reinforcement learning architecture that dynamically adapts to changing environmental conditions through broad learning approaches and environment recognition. Our findings underscore the importance of developing AI systems equipped to navigate the uncertainties of the open world, offering promising pathways for advancing AI research and application in real-world settings.</p> Autonomous agents and multiagent systems Knowledge representation and reasoning Modelling and simulation Planning and decision making Context learning Deep learning Neural networks Reinforcement learning Novelty Context MDP CMDP Reinforcement Learning Deep Reinforcement Learning Collusion Novelty Detection Detection Adaptation
107	[en] A SIMULATION STUDY OF TRANSFER LEARNING IN DEEP REINFORCEMENT LEARNING FOR ROBOTICS / [pt] UM ESTUDO DE TRANSFER LEARNING EM DEEP REINFORCEMENT LEARNING EM AMBIENTES ROBÓTICOS SIMULADOS EVELYN CONCEICAO SANTOS BATISTA 05 August 2020 (has links) [pt] Esta dissertação de mestrado consiste em um estudo avançado sobre aprendizado profundo por reforço visual para robôs autônomos através de técnicas de transferência de aprendizado. Os ambientes de simulação testados neste estudo são ambientes realistas complexos onde o robô tinha como desafio aprender e transferir conhecimento em diferentes contextos para aproveitar a experiência de ambientes anteriores em ambientes futuros. Este tipo de abordagem, além de agregar conhecimento ao robô autônomo, diminui o número de épocas de treinamento do algoritmo, mesmo em ambientes complexos, justificando o uso de técnicas de transferência de aprendizado. / [en] This master s thesis consists of an advanced study on deep learning by visual reinforcement for autonomous robots through transfer learning techniques. The simulation environments tested in this study are highly realistic environments where the challenge of the robot was to learn and tranfer knowledge in different contexts to take advantage of the experiencia of previous environments in future environments. This type of approach besides adding knowledge to the autonomous robot reduces the number of training epochs the algorithm, even in complex environments, justifying the use of transfer learning techniques. [pt] REDE NEURAL CONVOLUCIONAL [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] AMBIENTES COMPLEXOS [pt] ROBO AUTONOMO [pt] TRANSFERENCIA DE APRENDIZADO [en] CONVOLUTIONAL NEURAL NETWORK [en] DEEP REINFORCEMENT LEARNING [en] COMPLEX ENVIRONMENTS [en] AUTONOMOUS ROBOT [en] TRANSFER LEARNING
108	Optimizing the Fronthaul in C-RAN by Deep Reinforcement Learning : Latency Constrained Fronthaul optimization with Deep Reinforment Learning / Optimering av Fronthaul i C-RAN med Djup Förstärknings Inlärning : Latens begränsad Fronthaul Optimering med Djup Förstärknings Inlärning Grönland, Axel January 2023 (has links) Centralized Radio Access Networks or C-RAN for short is a type of network that aims to centralize perform some of it's computation at centralized locations. Since a lot of functionality is centralized we can show from multiplexing that the centralization leads to lower operating costs. The drawback with C-RAN are the huge bandwidth requirements over the fronthaul. We know that scenarios where all cells experience high load is a very low probability scenario. Since functions are centralized this also allows more adaptability, we can choose to change the communication standard for each cell depending on the load scenario. In this thesis we set out to create such a controller with the use of Deep Reinforcement Learning. The problem overall is difficult due to the complexity of modelling the problem, but also since C-RAN is a relatively new concept in the telecom world. We solved this problem with two traditional reinforcement learning algorithms, DQN and SAC. We define a constraint optimization problem and phrase it in such a way that the problem can be solved with a deep reinforcement learning algorithm. We found that the learning worked pretty well and we can show that our trained policies satisfy the constraint. With these results one could show that resource allocations problems can be solved pretty well by a deep reinforcement learning controller. / Centralized Radio Access Networks eller C-RAN som förkortning är en kommunications nätverk som siktar på att centralisera vissa funktioner i centrala platser. Eftersom mmånga funktioner är centraliserade så kan vi visa från statistisk multiplexing att hög trafik scenarion över många celler är av låg sannolikhet vilket leder till lägre service kostnader. Nackdelen med C-RAN är den höga bandbredds kravet över fronthaulen. Trafik scenarion där alla celler utsäts för hög last är väldigt låg sannolikhet så kan vi dimensionera fronthaulen för att klara mindre än det värsta trafik scenariot. Eftersom funktioner är centralizerade så tillåter det även att vi kan adaptivt anpassa resurser för trafiken. I denna uppsats så kommer vi att skapa en sådan kontroller med djup reinforcement learning. Problemet är komplext att modellera och C-RAN är ett relativt nytt concept i telecom världen. Vi löser detta problem med två traditionella algoritmer, deep Q networks(DQN) och soft actor critic(SAC). Vi definierar ett vilkorligt optimerings problem och visar hur det kan formuleras som ett inlärnings problem. Vi visar att denna metod funkar rätt bra som en lösning till problemet och att den uppfyller bivilkoren. Våra resultat visar att resurs allokerings problem kan lösas nära optimalitet med reinforcement learning. Machine learning Deep Reinforcement Learning C-RAN Fronthaul Performance Evaluation Maskininlärning Djup Förstärkningslärning C-RAN Fronthaul Prestanda Utvärdering Computer Sciences Datavetenskap (datalogi) Information Systems Telecommunications Telekommunikation
109	Beyond the status quo in deep reinforcement learning Agarwal, Rishabh 05 1900 (has links) L’apprentissage par renforcement profond (RL) a connu d’énormes progrès ces dernières années, mais il est encore difficile d’appliquer le RL aux problèmes de prise de décision du monde réel. Cette thèse identifie trois défis clés avec la façon dont nous faisons la recherche RL elle-même qui entravent les progrès de la recherche RL. — Évaluation et comparaison peu fiables des algorithmes RL ; les méthodes d’évaluation actuelles conduisent souvent à des résultats peu fiables. — Manque d’informations préalables dans la recherche RL ; Les algorithmes RL sont souvent formés à partir de zéro, ce qui peut nécessiter de grandes quantités de données ou de ressources informatiques. — Manque de compréhension de la façon dont les réseaux de neurones profonds interagissent avec RL, ce qui rend difficile le développement de méthodes évolutives de RL. Pour relever ces défis susmentionnés, cette thèse apporte les contributions suivantes : — Une méthodologie plus rigoureuse pour évaluer les algorithmes RL. — Un flux de travail de recherche alternatif qui se concentre sur la réutilisation des progrès existants sur une tâche. — Identification d’un phénomène de perte de capacité implicite avec un entraînement RL hors ligne prolongé. Dans l’ensemble, cette thèse remet en question le statu quo dans le RL profond et montre comment cela peut conduire à des algorithmes de RL plus efficaces, fiables et mieux applicables dans le monde réel. / Deep reinforcement learning (RL) has seen tremendous progress in recent years, but it is still difficult to apply RL to real-world decision-making problems. This thesis identifies three key challenges with how we do RL research itself that hinder the progress of RL research. — Unreliable evaluation and comparison of RL algorithms; current evaluation methods often lead to unreliable results. — Lack of prior information in RL research; RL algorithms are often trained from scratch, which can require large amounts of data or computational resources. — Lack of understanding of how deep neural networks interact with RL, making it hard to develop scalable RL methods. To tackle these aforementioned challenges, this thesis makes the following contributions: — A more rigorous methodology for evaluating RL algorithms. — An alternative research workflow that focuses on reusing existing progress on a task. — Identifying an implicit capacity loss phenomenon with prolonged offline RL training. Overall, this thesis challenges the status quo in deep reinforcement learning and shows that doing so can make RL more efficient, reliable and improve its real-world applicability Apprentissage par renforcement profond RL profond Évaluation Réutilisation du calcul RL réincarné RL hors ligne Régularisation implicite Deep reinforcement learning Deep RL Reusing computation Reincarnating RL Offline RL Implicit regularization
110	Job shop smart manufacturing scheduling by deep reinforcement learning for Industry 4.0 Serrano Ruiz, Julio César 24 January 2025 (has links) Tesis por compendio / [ES] El paradigma de la Industria 4.0 (I4.0) gravita en gran medida sobre el potencial de las tecnologías de la información y la comunicación (TIC) para mejorar la competitividad y sostenibilidad de las industrias. El concepto de Smart Manufacturing Scheduling (SMS) surge y se inspira de ese potencial. SMS, como estrategia de transformación digital, aspira a optimizar los procesos industriales mediante la aplicación de tecnologías como el gemelo digital o digital twin (DT), el modelo de gestión zero-defect manufacturing (ZDM), y el aprendizaje por refuerzo profundo o deep reinforcement learning (DRL), con el propósito final de orientar los procesos de programación de operaciones hacia una automatización adaptativa en tiempo real y una reducción de las perturbaciones en los sistemas de producción. SMS se basa en cuatro principios de diseño del espectro I4.0: automatización, autonomía, capacidad de acción en tiempo real e interoperabilidad. A partir de estos principios clave, SMS combina las capacidades de la tecnología DT para simular, analizar y predecir; la del modelo ZDM para prevenir perturbaciones en los sistemas de planificación y control de la producción; y la del enfoque de modelado DRL para mejorar la toma de decisiones en tiempo real. Este enfoque conjunto orienta los procesos de programación de operaciones hacia una mayor eficiencia y, con ello, hacia un mayor rendimiento y resiliencia del sistema productivo. Esta investigación emprende, en primer lugar, una revisión exhaustiva del estado del arte sobre SMS. Con la revisión efectuada como referencia, la investigación plantea un modelo conceptual de SMS como estrategia de transformación digital en el contexto del proceso de programación del taller de trabajos o job shop. Finalmente, la investigación propone un modelo basado en DRL para abordar la implementación de los elementos clave del modelo conceptual: el DT del taller de trabajos y el agente programador. Los algoritmos que integran este modelo se han programado en Python y han sido validados contra varias de las más conocidas reglas heurísticas de prioridad. El desarrollo del modelo y los algoritmos supone una contribución académica y gerencial en el área de la planificación y control de la producción. / [CA] El paradigma de la Indústria 4.0 (I4.0) gravita en gran mesura sobre el potencial de les tecnologies de la informació i la comunicació (TIC) per millorar la competitivitat i la sostenibilitat de les indústries. El concepte d'smart manufacturing scheduling (SMS) sorgeix i inspira a partir d'aquest potencial. SMS, com a estratègia de transformació digital, aspira a optimitzar els processos industrials mitjançant l'aplicació de tecnologies com el bessó digital o digital twin (DT), el model de gestió zero-defect manufacturing (ZDM), i l'aprenentatge per reforçament profund o deep reinforcement learning (DRL), amb el propòsit final dorientar els processos de programació doperacions cap a una automatització adaptativa en temps real i una reducció de les pertorbacions en els sistemes de producció. SMS es basa en quatre principis de disseny de l'espectre I4.0: automatització, autonomia, capacitat d¿acció en temps real i interoperabilitat. A partir d'aquests principis clau, SMS combina les capacitats de la tecnologia DT per simular, analitzar i predir; la del model ZDM per prevenir pertorbacions en els sistemes de planificació i control de la producció; i la de de l'enfocament de modelatge DRL per millorar la presa de decisions en temps real. Aquest enfocament conjunt orienta els processos de programació d'operacions cap a una eficiència més gran i, amb això, cap a un major rendiment i resiliència del sistema productiu. Aquesta investigació emprèn, en primer lloc, una exhaustiva revisió de l'estat de l'art sobre SMS. Amb la revisió efectuada com a referència, la investigació planteja un model conceptual de SMS com a estratègia de transformació digital en el context del procés de programació del taller de treballs o job shop. Finalment, la investigació proposa un model basat en DRL per abordar la implementació dels elements claus del model conceptual: el DT del taller de treballs i l'agent programador. Els algorismes que integren aquest model s'han programat a Python i han estat validats contra diverses de les més conegudes regles heurístiques de prioritat. El desenvolupament del model i els algorismes suposa una contribució a nivell acadèmic i gerencial a l'àrea de la planificació i control de la producció. / [EN] The Industry 4.0 (I4.0) paradigm relies, to a large extent, on the potential of information and communication technologies (ICT) to improve the competitiveness and sustainability of industries. The smart manufacturing scheduling (SMS) concept arises and draws inspiration from this potential. As a digital transformation strategy, SMS aims to optimise industrial processes through the application of technologies, such as the digital twin (DT), the zero-defect manufacturing (ZDM) management model and deep reinforcement learning (DRL), for the ultimate purpose of guiding operations scheduling processes towards real-time adaptive automation and to reduce disturbances in production systems. SMS is based on four design principles of the I4.0 spectrum: automation, autonomy, real-time capability and interoperability. Based on these key principles, SMS combines the capabilities of the DT technology to simulate, analyse and predict; with the ZDM model, to prevent disturbances in production planning and control systems; by the DRL modelling approach, to improve real-time decision making. This joint approach orients operations scheduling processes towards greater efficiency and, with it, a better performing and more resilient production system. This research firstly undertakes a comprehensive review of the state of the art on SMS. By taking the review as a reference, the research proposes a conceptual model of SMS as a digital transformation strategy in the job shop scheduling process context. Finally, it proposes a DRL-based model to address the implementation of the key elements of the conceptual model: the job shop DT and the scheduling agent. The algorithms that integrate this model have been programmed in Python and validated against several of the most well-known heuristic priority rules. The development of the model and algorithms is an academic and managerial contribution in the production planning and control area. / This thesis was developed with the support of the Research Centre on Production Management and Engineering (CIGIP) of the Universitat Politècnica de València and received funding from: the European Union H2020 programme under grant agreement No. 825631, “Zero Defect Manufacturing Platform (ZDMP)”; the European Union H2020 programme under grant agreement No. 872548, "Fostering DIHs for Embedding Interoperability in Cyber-Physical Systems of European SMEs (DIH4CPS)"; the European Union H2020 programme under grant agreement No. 958205, “Industrial Data Services for Quality Control in Smart Manufacturing (i4Q)”; the European Union Horizon Europe programme under grant agreement No. 101057294, “AI Driven Industrial Equipment Product Life Cycle Boosting Agility, Sustainability and Resilience” (AIDEAS); the Spanish Ministry of Science, Innovation and Universities under grant agreement RTI2018-101344-B-I00, "Optimisation of zero-defects production technologies enabling supply chains 4.0 (CADS4.0)"; the Valencian Regional Government, in turn funded from grant RTI2018- 101344-B-I00 by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”, "Industrial Production and Logistics optimization in Industry 4.0" (i4OPT) (Ref. PROMETEO/2021/065); and the grant PDC2022-133957- I00, “Validation of transferable results of optimisation of zero-defect enabling production technologies for supply chain 4.0” (CADS4.0-II) funded by MCIN/AEI/10.13039/501100011033 and by European Union Next GenerationEU/PRTR. / Serrano Ruiz, JC. (2024). Job shop smart manufacturing scheduling by deep reinforcement learning for Industry 4.0 [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202871 / Compendio Fabricación inteligente Programación inteligente Industria 4.0 Taller de trabajo Aprendizaje de refuerzo profundo Gemelo digital Fabricación sin defectos Smart manufacturing scheduling Industry 4.0 Job shop Deep reinforcement learning Digital twin Zero-defect manufacturing ORGANIZACION DE EMPRESAS

Search results