Global ETD Search

81	Deep Reinforcement Learning on Social Environment Aware Navigation based on Maps Sanchez, Victor January 2023 (has links) Reinforcement learning (RL) has seen a fast expansion in recent years of its successful application to a range of decision-making and complex control tasks. Moreover, deep learning offers RL the opportunity to enlarge its spectrum of complex fields. Social Robotics is a domain that involves challenges like Human-Robot Interaction which bears inspiration for development in deep RL. Autonomous systems demand a fast and efficient environment perception so as to guarantee safety. However, while being attentive to its surrounding, a robot needs to take decisions to navigate optimally and avoid potential obstacles. In this thesis, we investigate a deep RL method for mobile robot end-to-end navigation in a social environment. Using the observation collected in a simulation environment, a convolutional neural network is trained to predict an appropriate set of discrete angular and linear velocities for a robot based on its egocentric local occupancy grid map. We compare a random learning way to a curriculum learning approach to ameliorate speed convergence during training. We divide the main problem by analysing separately end-to-end navigation and obstacle avoidance in static and dynamic environments. For each problem, we propose an adaptation that aims to improve the surrounding awareness of the agent. The qualitative and quantitative evaluations of the investigated approach were performed in simulations. The results show that the end-to-end navigation map-based model is easy to set up and shows similar performance as a Model Predictive Control approach. However, we discern that obstacle avoidance is harder to translate to a deep RL framework. Despite this difficulty, using different RL methods and configurations will definitely help and bring ideas for improvement for future work. / Förstärkande Inlärning (RL) har sett en snabb expansion de senaste åren för sin fruktbara tillämpning på en rad beslutsfattande och komplexa kontrolluppgifter. Dessutom erbjuder djupinlärning RL möjligheten att utöka sitt spektrum till komplexa områden. Social Robotics är en domän som involverar utmaningar som människa-robot interaktion som bär inspiration för utveckling i djup RL. Autonoma system kräver en snabb och effektiv miljöuppfattning för att garantera säkerheten. Men samtidigt som den är uppmärksam på sin omgivning, måste en robot fatta beslut för att navigera optimalt och undvika potentiella hinder. I detta examensarbete undersöker vi en djup RL-metod för mobil robot-end-to-end-navigering i en social miljö. Med hjälp av observationen som samlats in i en simuleringsmiljö tränas ett faltningsneuralt nätverk för att förutsäga en lämplig uppsättning diskreta vinkel- och linjärhastigheter för en robot baserat på dess egocentriska rutnätskarta över lokala beläggningar. Vi jämför ett slumpmässigt inlärningssätt med läroplansinlärningsmetod för att förbättra hastighetskonvergensen. Vi delar upp huvudproblemet genom att separat analysera end-to-end-navigering och undvikande av hinder i statisk och dynamisk miljö. För varje problem föreslår vi en anpassning som syftar till att agenten bättre förstår sin omgivning. De kvalitativa och kvantitativa utvärderingarna av det undersökta tillvägagångssättet utfördes endast i simuleringar. Resultaten visar att den heltäckande navigationskartbaserade modellen är lätt att distribuera och visar liknande prestanda som en modell för prediktiv kontroll. Vi ser dock att undvikande av hinder är svårare att översätta till ett djupt RL-ramverk. Trots denna svårighet kommer användning av olika RL-metoder och konfiguration definitivt att hjälpa och ge idéer om förbättringar för framtida arbete. / L’apprentissage par renforcement (RL) a connu une expansion rapide ces dernières années pour ses applications à une gamme de tâches de prise de décision et de contrôle complexes. Le deep learning offre au RL la possibilité d’élargir son spectre à des domaines complexes. La robotique sociale est un domaine qui implique des défis tels que l’interaction homme-robot, source d’inspiration pour le développement en RL profond. Les systèmes autonomes exigent une perception rapide et efficace de l’environnement afin de garantir la sécurité. Cependant, tout en étant attentif à son environnement, un robot doit prendre des décisions pour naviguer de manière optimale et éviter les obstacles potentiels. Dans cette thèse, nous étudions une méthode de RL profond pour la navigation de bout a bout de robots mobiles dans un environnement social. À l’aide de l’observation recueillie dans un environnement de simulation, un réseau neuronal convolutif prédit un ensemble adapté de vitesses angulaires et linéaires discrètes pour un robot en fonction de sa carte de grille d’occupation locale égocentrique. Nous comparons une méthode d’apprentissage aléatoire à une approche d’apprentissage du curriculum pour accelerer la convergence durant l’entrainement. Nous divisons le problème principal en analysant séparément la navigation de bout a bout et l’évitement d’obstacles dans un environnement statique et dynamique. Pour chaque problème, nous proposons une adaptation visant à ce que l’agent comprenne mieux son environnement. Les évaluations qualitatives et quantitatives de l’approche étudiée ont été effectuées uniquement dans des simulations. Les résultats montrent que le modèle basé sur la carte de navigation de bout en bout est facile à déployer et affiche des performances similaires à celles d’une approche de contrôle prédictif de modèle. Cependant, nous discernons que l’évitement d’obstacles est plus difficile à traduire dans un cadre RL profond. Malgré cette difficulté, l’utilisation de différentes méthodes et configurations RL aidera certainement et apportera une idée d’amélioration pour les travaux futurs. Deep Reinforcement Learning Environment-aware navigation Robotics Artificial Intelligence Apprentissage par renforcement profond Navigation consciente de l’humain Intelligence Artificielle Robotique Djup Förstärkande Inlärning Människomedveten navigering Robotik Artificiell Intelligens Elektroteknik och elektronik
82	Remembering how to walk - Using Active Dendrite Networks to Drive Physical Animations / Att minnas att gå - användning av Active Dendrite Nätverk för att driva fysiska animeringar Henriksson, Klas January 2023 (has links) Creating embodied agents capable of performing a wide range of tasks in different types of environments has been a longstanding challenge in deep reinforcement learning. A novel network architecture introduced in 2021 called the Active Dendrite Network [A. Iyer et al., “Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments”] designed to create sparse subnetworks for different tasks showed promising multi-tasking performance on the Meta-World [T. Yu et al., “Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning”] multi-tasking benchmark. This thesis further explores the performance of this novel architecture in a multi-tasking environment focused on physical animations and locomotion. Specifically we implement and compare the architecture to the commonly used Multi-Layer Perceptron (MLP) architecture on a multi-task reinforcement learning problem in a video-game setting consisting of training a hexapedal agent on a set of locomotion tasks involving moving at different speeds, turning and standing still. The evaluation focused on two areas: (1) Assessing the average overall performance of the Active Dendrite Network relative to the MLP on a set of locomotive scenarios featuring our behaviour sets and environments. (2) Assessing the relative impact Active Dendrite networks have on transfer learning between related tasks by comparing their performance on novel behaviours shortly after training a related behaviour. Our findings suggest that the novel Active Dendrite Network can make better use of limited network capacity compared to the MLP - the Active Dendrite Network outperformed the MLP by ∼18% on our benchmark using limited network capacity. When both networks have sufficient capacity however, there is not much difference between the two. We further find that Active Dendrite Networks have very similar transfer-learning capabilities compared to the MLP in our benchmarks. reinforcement learning deep learning physical animation deep reinforcement learning multi-task learning multi-task reinforcement learning machine learning neural networks Computer Sciences Datavetenskap (datalogi)
83	REAL-TIME UPDATING AND NEAR-OPTIMAL ENERGY MANAGEMENT SYSTEM FOR MULTI-MODE ELECTRIFIED POWERTRAIN WITH REINFORCEMENT LEARNING CONTROL Biswas, Atriya January 2021 (has links) Energy management systems (EMSs), implemented in the electronic control unit (ECU) of an actual vehicle with electri ed powertrain, is a much simpler version of the theoretically developed EMS. Such simpli cation is done to accommodate the EMS within the given memory constraint and computational capacity of the ECU. The simpli cation should ensure reasonable performance compared to theoretical EMS under real-life driving scenarios. The process of simpli cation must be effective to create a versatile and utilitarian EMS. The reinforcement learning-based controllers feature pro table characteristics in optimizing the performance of controllable physical systems as they do not mandatorily require a mathematical model of system dynamics (i.e. they are model-free). Quite naturally, it can aspired to testify such prowess of reinforcement learning-based controllers in achieving near-global optimal performance for energy management system (supervisory) of electri ed powertrains. Before deployment of any supervisory controller as a mainstream controller, they should be essentially scrutinized through various levels of virtual simulation platforms with an ascending order of physical system emulating-capability. The controller evolves from a mathematical concept to an utilitarian embedded system through a series of these levels where it undergoes gradual transformation to finally become apposite for a real physical system. Implementation of the control strategy in a Simulink-based forward simulation model could be the first stage of the aforementioned evolution process. This brief will delineate all the steps required for implementing an reinforcement learning-based supervisory controller in a forward simulation model of a hybrid electric vehicle. A novel framework of loss-minimization based instantaneous optimal strategy is introduced for the energy management system of a multi-mode hybrid electric powertrain in this brief. The loss-minimization strategy is flexible enough to be implemented in any architecture of electrified powertrains. It is mathematically proven that the overall system loss minimization is equivalent to the minimization of fuel consumption. An online simulation framework is developed in this article to evaluate the performance of a multi-mode electrified powertrain equipped with more than one power source. An electrically variable transmission with two planetary gear-set has been chosen as the centerpiece of the powertrain considering the versatility and future prospects of such transmissions. It is noteworthy to mention that a novel architecture topology selected for this dissertation is engendered through a series of rigorous screening process whose workflow is presented here with brevity. One of the legitimate concern of multi-mode transmission is it's proclivity to contribute discontinuity of power-flow in the downstream of the powertrain. Mode-shift events can be predominantly held responsible for engendering such discontinuity. Advent of dynamic coordinated control as a technique for ameliorating such discontinuity has been substantiated by many scholars in literature. Hence, a system-level coordinated control is employed within the energy management system which governs the mode schedule of the multi-mode powertrain in real-time simulation. / Thesis / Doctor of Philosophy (PhD) Energy management system Reinforcement learning Real-time Hybrid electric vehicle Deep reinforcement learning Actor-Critic Asynchronous training High-fidelity transmission Multi-mode electrified powertrain Charge sustainability Unfamiliar driving scenario Markov decision process
84	DEEP LEARNING BASED MODELS FOR NOVELTY ADAPTATION IN AUTONOMOUS MULTI-AGENT SYSTEMS Marina Wagdy Wadea Haliem (13121685) 20 July 2022 (has links) <p>Autonomous systems are often deployed in dynamic environments and are challenged with unexpected changes (novelties) in the environments where they receive novel data that was not seen during training. Given the uncertainty, they should be able to operate without (or with limited) human intervention and they are expected to (1) Adapt to such changes while still being effective and efficient in performing their multiple tasks. The system should be able to provide continuous availability of its critical functionalities. (2) Make informed decisions independently from any central authority. (3) Be Cognitive: learns the new context, its possible actions, and be rich in knowledge discovery through mining and pattern recognition. (4) Be Reflexive: reacts to novel unknown data as well as to security threats without terminating on-going critical missions. These characteristics combine to create the workflow of autonomous decision-making process in multi-agent environments (i.e.,) any action taken by the system must go through these characteristic models to autonomously make an ideal decision based on the situation. </p> <p><br></p> <p>In this dissertation, we propose novel learning-based models to enhance the decision-making process in autonomous multi-agent systems where agents are able to detect novelties (i.e., unexpected changes in the environment), and adapt to it in a timely manner. For this purpose, we explore two complex and highly dynamic domains </p> <p>(1) Transportation Networks (e.g., Ridesharing application): where we develop AdaPool: a novel distributed diurnal-adaptive decision-making framework for multi-agent autonomous vehicles using model-free deep reinforcement learning and change point detection. (2) Multi-agent games (e.g., Monopoly): for which we propose a hybrid approach that combines deep reinforcement learning (for frequent but complex decisions) with a fixed-policy approach (for infrequent but straightforward decisions) to facilitate decision-making and it is also adaptive to novelties. (3) Further, we present a domain agnostic approach for decision making without prior knowledge in dynamic environments using Bootstrapped DQN. Finally, to enhance security of autonomous multi-agent systems, (4) we develop a machine learning based resilience testing of address randomization moving target defense. Additionally, to further improve the decision-making process, we present (5) a novel framework for multi-agent deep covering option discovery that is designed to accelerate exploration (which is the first step of decision-making for autonomous agents), by identifying potential collaborative agents and encouraging visiting the under-represented states in their joint observation space. </p> Autonomous agents and multiagent systems Adversarial machine learning Deep learning Reinforcement learning Deep Reinforcement Learning (DRL) Autonomous Systems Transportation Networks Multi-agent Systems Adversarial Machine Learning Multi-agent Games Ridesharing Novelty Adaptation Novelty detection Deep Learning Open World
85	Physical Layer Security with Unmanned Aerial Vehicles for Advanced Wireless Networks Abdalla, Aly Sabri 08 August 2023 (has links) (PDF) Unmanned aerial vehicles (UAVs) are emerging as enablers for supporting many applications and services, such as precision agriculture, search and rescue, temporary network deployment, coverage extension, and security. UAVs are being considered for integration into emerging wireless networks as aerial users, aerial relays (ARs), or aerial base stations (ABSs). This dissertation proposes employing UAVs to contribute to physical layer techniques that enhance the security performance of advanced wireless networks and services in terms of availability, resilience, and confidentiality. The focus is on securing terrestrial cellular communications against eavesdropping with a cellular-connected UAV that is dispatched as an AR or ABS. The research develops mathematical tools and applies machine learning algorithms to jointly optimize UAV trajectory and advanced communication parameters for improving the secrecy rate of wireless links, covering various communication scenarios: static and mobile users, single and multiple users, and single and multiple eavesdroppers with and without knowledge of the location of attackers and their channel state information. The analysis is based on established air-to-ground and air-to-air channel models for single and multiple antenna systems while taking into consideration the limited on-board energy resources of cellular-connected UAVs. Simulation results show fast algorithm convergence and significant improvements in terms of channel secrecy capacity that can be achieved when UAVs assist terrestrial cellular networks as proposed here over state-of-the-art solutions. In addition, numerical results demonstrate that the proposed methods scale well with the number of users to be served and with different eavesdropping distributions. The presented solutions are wireless protocol agnostic, can complement traditional security principles, and can be extended to address other communication security and performance needs. Unmanned Aerial Vehicle (UAV) Physical Layer Security (PLS) Mobile Networks Secrecy Capacity Deep Reinforcement Learning (DRL) Multi-Variable Optimization Beamforming Multiuser Multiple Input Single Output (MISO) Trajectory Optimization Digital Communications and Networking Systems and Communications
86	PhD Thesis Junghoon Kim (15348493) 26 April 2023 (has links) <p> </p> <p>In order to advance next-generation communication systems, it is critical to enhance the state-of-the-art communication architectures, such as device-to-device (D2D), multiple- input multiple-output (MIMO), and intelligent reflecting surface (IRS), in terms of achieving high data rate, low latency, and high energy efficiency. In the first part of this dissertation, we address joint learning and optimization methodologies on cutting-edge network archi- tectures. First, we consider D2D networks equipped with MIMO systems. In particular, we address the problem of minimizing the network overhead in D2D networks, defined as the sum of time and energy required for processing tasks at devices, through the design for MIMO beamforming and communication/computation resource allocation. Second, we address IRS-assisted communication systems. Specifically, we study an adaptive IRS control scheme considering realistic IRS reflection behavior and channel environments, and propose a novel adaptive codebook-based limited feedback protocol and learning-based solutions for codebook updates. </p> <p><br></p> <p>Furthermore, in order for revolutionary innovations to emerge for future generations of communications, it is crucial to explore and address fundamental, long-standing open problems for communications, such as the design of practical codes for a variety of important channel models. In the later part of this dissertation, we study the design of practical codes for feedback-enabled communication channels, i.e., feedback codes. The existing feedback codes, which have been developed over the past six decades, have been demonstrated to be vulnerable to high forward/feedback noises, due to the non-triviality of the design of feedback codes. We propose a novel recurrent neural network (RNN) autoencoder-based architecture to mitigate the susceptibility to high channel noises by incorporating domain knowledge into the design of the deep learning architecture. Using this architecture, we suggest a new class of non-linear feedback codes that increase robustness to forward/feedback noise in additive White Gaussian noise (AWGN) channels with feedback. </p> Data communications beamforming Device-to-Device (D2D) intelligent reflecting surface (IRS) Reconfigurable intelligent surface (RIS) Deep Reinforcement Learning (DRL) feedback systems Channel coding Recurrent Neural Network (RNN)
87	[en] A SIMULATION STUDY OF TRANSFER LEARNING IN DEEP REINFORCEMENT LEARNING FOR ROBOTICS / [pt] UM ESTUDO DE TRANSFER LEARNING EM DEEP REINFORCEMENT LEARNING EM AMBIENTES ROBÓTICOS SIMULADOS EVELYN CONCEICAO SANTOS BATISTA 05 August 2020 (has links) [pt] Esta dissertação de mestrado consiste em um estudo avançado sobre aprendizado profundo por reforço visual para robôs autônomos através de técnicas de transferência de aprendizado. Os ambientes de simulação testados neste estudo são ambientes realistas complexos onde o robô tinha como desafio aprender e transferir conhecimento em diferentes contextos para aproveitar a experiência de ambientes anteriores em ambientes futuros. Este tipo de abordagem, além de agregar conhecimento ao robô autônomo, diminui o número de épocas de treinamento do algoritmo, mesmo em ambientes complexos, justificando o uso de técnicas de transferência de aprendizado. / [en] This master s thesis consists of an advanced study on deep learning by visual reinforcement for autonomous robots through transfer learning techniques. The simulation environments tested in this study are highly realistic environments where the challenge of the robot was to learn and tranfer knowledge in different contexts to take advantage of the experiencia of previous environments in future environments. This type of approach besides adding knowledge to the autonomous robot reduces the number of training epochs the algorithm, even in complex environments, justifying the use of transfer learning techniques. [pt] REDE NEURAL CONVOLUCIONAL [pt] AMBIENTES COMPLEXOS [pt] APRENDIZADO COM REFORCO PROFUNDO [pt] ROBO AUTONOMO [pt] TRANSFERENCIA DE APRENDIZADO [en] CONVOLUTIONAL NEURAL NETWORK [en] COMPLEX ENVIRONMENTS [en] DEEP REINFORCEMENT LEARNING [en] AUTONOMOUS ROBOT [en] TRANSFER LEARNING
88	[pt] APRENDIZADO POR REFORÇO PROFUNDO PARA CONTROLE DE TRAJETÓRIA DE UM QUADROTOR EM AMBIENTES VIRTUAIS / [en] DEEP REINFORCEMENT LEARNING FOR QUADROTOR TRAJECTORY CONTROL IN VIRTUAL ENVIRONMENTS GUILHERME SIQUEIRA EDUARDO 12 August 2021 (has links) [pt] Com recentes avanços em poder computacional, o uso de novos modelos de controle complexos se tornou viável para realizar o controle de quadrotores. Um destes métodos é o aprendizado por reforço profundo (do inglês, Deep Reinforcement Learning, DRL), que pode produzir uma política de controle que atende melhor as não-linearidades presentes no modelo do quadrotor que um método de controle tradicional. Umas das não-linearidades importantes presentes em veículos aéreos transportadores de carga são as propriedades variantes no tempo, como tamanho e massa, causadas pela adição e remoção de carga. A abordagem geral e domínio-agnóstica de um controlador por DRL também o permite lidar com navegação visual, na qual a estimação de dados de posição é incerta. Neste trabalho, aplicamos um algorítmo de Soft Actor- Critic com o objeivo de projetar controladores para um quadrotor a fim de realizar tarefas que reproduzem os desafios citados em um ambiente virtual. Primeiramente, desenvolvemos dois controladores de condução por waypoint: um controlador de baixo nível que atua diretamente em comandos para o motor e um controlador de alto nível que interage em cascata com um controlador de velocidade PID. Os controladores são então avaliados quanto à tarefa proposta de coleta e alijamento de carga, que, dessa forma, introduz uma variável variante no tempo. Os controladores concebidos são capazes de superar o controlador clássico de posição PID com ganhos otimizados no curso proposto, enquanto permanece agnóstico em relação a um conjunto de parâmetros de simulação. Finalmente, aplicamos o mesmo algorítmo de DRL para desenvolver um controlador que se utiliza de dados visuais para completar um curso de corrida em uma simulação. Com este controlador, o quadrotor é capaz de localizar portões utilizando uma câmera RGB-D e encontrar uma trajetória que o conduz a atravessar o máximo possível de portões presentes no percurso. / [en] With recent advances in computational power, the use of novel, complex control models has become viable for controlling quadrotors. One such method is Deep Reinforcement Learning (DRL), which can devise a control policy that better addresses non-linearities in the quadrotor model than traditional control methods. An important non-linearity present in payload carrying air vehicles are the inherent time-varying properties, such as size and mass, caused by the addition and removal of cargo. The general, domain-agnostic approach of the DRL controller also allows it to handle visual navigation, in which position estimation data is unreliable. In this work, we employ a Soft Actor-Critic algorithm to design controllers for a quadrotor to carry out tasks reproducing the mentioned challenges in a virtual environment. First, we develop two waypoint guidance controllers: a low-level controller that acts directly on motor commands and a high-level controller that interacts in cascade with a velocity PID controller. The controllers are then evaluated on the proposed payload pickup and drop task, thereby introducing a timevarying variable. The controllers conceived are able to outperform a traditional positional PID controller with optimized gains in the proposed course, while remaining agnostic to a set of simulation parameters. Finally, we employ the same DRL algorithm to develop a controller that can leverage visual data to complete a racing course in simulation. With this controller, the quadrotor is able to localize gates using an RGB-D camera and devise a trajectory that drives it to traverse as many gates in the racing course as possible. [pt] VEICULO AEREO NAO TRIPULADO [pt] NAVEGACAO VISUAL [pt] SOFT ACTOR-CRITIC-SAC [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] CONTROLE DE QUADROTOR [en] UNMANNED AERIAL VEHICLE [en] VISUAL NAVIGATION [en] SOFT ACTOR-CRITIC-SAC [en] DEEP REINFORCEMENT LEARNING [en] QUADROTOR CONTROL
89	Simulating market maker behaviour using Deep Reinforcement Learning to understand market microstructure / En simulering av aktiemarknadens mikrostruktur via självlärande finansiella agenter Marcus, Elwin January 2018 (has links) Market microstructure studies the process of exchanging assets underexplicit trading rules. With algorithmic trading and high-frequencytrading, modern financial markets have seen profound changes in marketmicrostructure in the last 5 to 10 years. As a result, previously establishedmethods in the field of market microstructure becomes oftenfaulty or insufficient. Machine learning and, in particular, reinforcementlearning has become more ubiquitous in both finance and otherfields today with applications in trading and optimal execution. This thesisuses reinforcement learning to understand market microstructureby simulating a stock market based on NASDAQ Nordics and trainingmarket maker agents on this stock market. Simulations are run on both a dealer market and a limit orderbook marketdifferentiating it from previous studies. Using DQN and PPO algorithmson these simulated environments, where stochastic optimal controltheory has been mainly used before. The market maker agents successfullyreproduce stylized facts in historical trade data from each simulation,such as mean reverting prices and absence of linear autocorrelationsin price changes as well as beating random policies employed on thesemarkets with a positive profit & loss of maximum 200%. Other tradingdynamics in real-world markets have also been exhibited via theagents interactions, mainly: bid-ask spread clustering, optimal inventorymanagement, declining spreads and independence of inventory and spreads, indicating that using reinforcement learning with PPO and DQN arerelevant choices when modelling market microstructure. / Marknadens mikrostruktur studerar hur utbytet av finansiella tillgångar sker enligt explicita regler. Algoritmisk och högfrekvenshandel har förändrat moderna finansmarknaders strukturer under de senaste 5 till 10 åren. Detta har även påverkat pålitligheten hos tidigare använda metoder från exempelvis ekonometri för att studera marknadens mikrostruktur. Maskininlärning och Reinforcement Learning har blivit mer populära, med många olika användningsområden både inom finans och andra fält. Inom finansfältet har dessa typer av metoder använts främst inom handel och optimal exekvering av ordrar. I denna uppsats kombineras både Reinforcement Learning och marknadens mikrostruktur, för att simulera en aktiemarknad baserad på NASDAQ i Norden. Där tränas market maker - agenter via Reinforcement Learning med målet att förstå marknadens mikrostruktur som uppstår via agenternas interaktioner. I denna uppsats utvärderas och testas agenterna på en dealer – marknad tillsammans med en limit - orderbok. Vilket särskiljer denna studie tillsammans med de två algoritmerna DQN och PPO från tidigare studier. Främst har stokastisk optimering använts för liknande problem i tidigare studier. Agenterna lyckas framgångsrikt med att återskapa egenskaper hos finansiella tidsserier som återgång till medelvärdet och avsaknad av linjär autokorrelation. Agenterna lyckas också med att vinna över slumpmässiga strategier, med maximal vinst på 200%. Slutgiltigen lyckas även agenterna med att visa annan handelsdynamik som förväntas ske på en verklig marknad. Huvudsakligen: kluster av spreads, optimal hantering av aktielager och en minskning av spreads under simuleringarna. Detta visar att Reinforcement Learning med PPO eller DQN är relevanta val vid modellering av marknadens mikrostruktur. Deep Reinforcement Learning Machine Learning Market Microstructure Market Maker Financial Agent Agent Based Modelling Financial Artificial Markets Complex Systems Algorithmic Trading Tensorforce keras-RL PPO DQN Dealer Market Limit Order book Computer Sciences Datavetenskap (datalogi)
90	Job shop smart manufacturing scheduling by deep reinforcement learning for Industry 4.0 Serrano Ruiz, Julio César 24 January 2025 (has links) Tesis por compendio / [ES] El paradigma de la Industria 4.0 (I4.0) gravita en gran medida sobre el potencial de las tecnologías de la información y la comunicación (TIC) para mejorar la competitividad y sostenibilidad de las industrias. El concepto de Smart Manufacturing Scheduling (SMS) surge y se inspira de ese potencial. SMS, como estrategia de transformación digital, aspira a optimizar los procesos industriales mediante la aplicación de tecnologías como el gemelo digital o digital twin (DT), el modelo de gestión zero-defect manufacturing (ZDM), y el aprendizaje por refuerzo profundo o deep reinforcement learning (DRL), con el propósito final de orientar los procesos de programación de operaciones hacia una automatización adaptativa en tiempo real y una reducción de las perturbaciones en los sistemas de producción. SMS se basa en cuatro principios de diseño del espectro I4.0: automatización, autonomía, capacidad de acción en tiempo real e interoperabilidad. A partir de estos principios clave, SMS combina las capacidades de la tecnología DT para simular, analizar y predecir; la del modelo ZDM para prevenir perturbaciones en los sistemas de planificación y control de la producción; y la del enfoque de modelado DRL para mejorar la toma de decisiones en tiempo real. Este enfoque conjunto orienta los procesos de programación de operaciones hacia una mayor eficiencia y, con ello, hacia un mayor rendimiento y resiliencia del sistema productivo. Esta investigación emprende, en primer lugar, una revisión exhaustiva del estado del arte sobre SMS. Con la revisión efectuada como referencia, la investigación plantea un modelo conceptual de SMS como estrategia de transformación digital en el contexto del proceso de programación del taller de trabajos o job shop. Finalmente, la investigación propone un modelo basado en DRL para abordar la implementación de los elementos clave del modelo conceptual: el DT del taller de trabajos y el agente programador. Los algoritmos que integran este modelo se han programado en Python y han sido validados contra varias de las más conocidas reglas heurísticas de prioridad. El desarrollo del modelo y los algoritmos supone una contribución académica y gerencial en el área de la planificación y control de la producción. / [CA] El paradigma de la Indústria 4.0 (I4.0) gravita en gran mesura sobre el potencial de les tecnologies de la informació i la comunicació (TIC) per millorar la competitivitat i la sostenibilitat de les indústries. El concepte d'smart manufacturing scheduling (SMS) sorgeix i inspira a partir d'aquest potencial. SMS, com a estratègia de transformació digital, aspira a optimitzar els processos industrials mitjançant l'aplicació de tecnologies com el bessó digital o digital twin (DT), el model de gestió zero-defect manufacturing (ZDM), i l'aprenentatge per reforçament profund o deep reinforcement learning (DRL), amb el propòsit final dorientar els processos de programació doperacions cap a una automatització adaptativa en temps real i una reducció de les pertorbacions en els sistemes de producció. SMS es basa en quatre principis de disseny de l'espectre I4.0: automatització, autonomia, capacitat d¿acció en temps real i interoperabilitat. A partir d'aquests principis clau, SMS combina les capacitats de la tecnologia DT per simular, analitzar i predir; la del model ZDM per prevenir pertorbacions en els sistemes de planificació i control de la producció; i la de de l'enfocament de modelatge DRL per millorar la presa de decisions en temps real. Aquest enfocament conjunt orienta els processos de programació d'operacions cap a una eficiència més gran i, amb això, cap a un major rendiment i resiliència del sistema productiu. Aquesta investigació emprèn, en primer lloc, una exhaustiva revisió de l'estat de l'art sobre SMS. Amb la revisió efectuada com a referència, la investigació planteja un model conceptual de SMS com a estratègia de transformació digital en el context del procés de programació del taller de treballs o job shop. Finalment, la investigació proposa un model basat en DRL per abordar la implementació dels elements claus del model conceptual: el DT del taller de treballs i l'agent programador. Els algorismes que integren aquest model s'han programat a Python i han estat validats contra diverses de les més conegudes regles heurístiques de prioritat. El desenvolupament del model i els algorismes suposa una contribució a nivell acadèmic i gerencial a l'àrea de la planificació i control de la producció. / [EN] The Industry 4.0 (I4.0) paradigm relies, to a large extent, on the potential of information and communication technologies (ICT) to improve the competitiveness and sustainability of industries. The smart manufacturing scheduling (SMS) concept arises and draws inspiration from this potential. As a digital transformation strategy, SMS aims to optimise industrial processes through the application of technologies, such as the digital twin (DT), the zero-defect manufacturing (ZDM) management model and deep reinforcement learning (DRL), for the ultimate purpose of guiding operations scheduling processes towards real-time adaptive automation and to reduce disturbances in production systems. SMS is based on four design principles of the I4.0 spectrum: automation, autonomy, real-time capability and interoperability. Based on these key principles, SMS combines the capabilities of the DT technology to simulate, analyse and predict; with the ZDM model, to prevent disturbances in production planning and control systems; by the DRL modelling approach, to improve real-time decision making. This joint approach orients operations scheduling processes towards greater efficiency and, with it, a better performing and more resilient production system. This research firstly undertakes a comprehensive review of the state of the art on SMS. By taking the review as a reference, the research proposes a conceptual model of SMS as a digital transformation strategy in the job shop scheduling process context. Finally, it proposes a DRL-based model to address the implementation of the key elements of the conceptual model: the job shop DT and the scheduling agent. The algorithms that integrate this model have been programmed in Python and validated against several of the most well-known heuristic priority rules. The development of the model and algorithms is an academic and managerial contribution in the production planning and control area. / This thesis was developed with the support of the Research Centre on Production Management and Engineering (CIGIP) of the Universitat Politècnica de València and received funding from: the European Union H2020 programme under grant agreement No. 825631, “Zero Defect Manufacturing Platform (ZDMP)”; the European Union H2020 programme under grant agreement No. 872548, "Fostering DIHs for Embedding Interoperability in Cyber-Physical Systems of European SMEs (DIH4CPS)"; the European Union H2020 programme under grant agreement No. 958205, “Industrial Data Services for Quality Control in Smart Manufacturing (i4Q)”; the European Union Horizon Europe programme under grant agreement No. 101057294, “AI Driven Industrial Equipment Product Life Cycle Boosting Agility, Sustainability and Resilience” (AIDEAS); the Spanish Ministry of Science, Innovation and Universities under grant agreement RTI2018-101344-B-I00, "Optimisation of zero-defects production technologies enabling supply chains 4.0 (CADS4.0)"; the Valencian Regional Government, in turn funded from grant RTI2018- 101344-B-I00 by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”, "Industrial Production and Logistics optimization in Industry 4.0" (i4OPT) (Ref. PROMETEO/2021/065); and the grant PDC2022-133957- I00, “Validation of transferable results of optimisation of zero-defect enabling production technologies for supply chain 4.0” (CADS4.0-II) funded by MCIN/AEI/10.13039/501100011033 and by European Union Next GenerationEU/PRTR. / Serrano Ruiz, JC. (2024). Job shop smart manufacturing scheduling by deep reinforcement learning for Industry 4.0 [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/202871 / Compendio Fabricación inteligente Programación inteligente Industria 4.0 Taller de trabajo Aprendizaje de refuerzo profundo Gemelo digital Fabricación sin defectos Smart manufacturing scheduling Industry 4.0 Job shop Deep reinforcement learning Digital twin Zero-defect manufacturing ORGANIZACION DE EMPRESAS

Search results