• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 67
  • 15
  • 6
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 110
  • 110
  • 68
  • 34
  • 27
  • 25
  • 25
  • 20
  • 18
  • 17
  • 17
  • 17
  • 17
  • 16
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

[pt] ESTUDO DE TÉCNICAS DE APRENDIZADO POR REFORÇO APLICADAS AO CONTROLE DE PROCESSOS QUÍMICOS / [en] STUDY OF REINFORCEMENT LEARNING TECHNIQUES APPLIED TO THE CONTROL OF CHEMICAL PROCESSES

30 December 2021 (has links)
[pt] A indústria 4.0 impulsionou o desenvolvimento de novas tecnologias para atender as demandas atuais do mercado. Uma dessas novas tecnologias foi a incorporação de técnicas de inteligência computacional no cotidiano da indústria química. Neste âmbito, este trabalho avaliou o desempenho de controladores baseados em aprendizado por reforço em processos químicos industriais. A estratégia de controle interfere diretamente na segurança e no custo do processo. Quanto melhor for o desempenho dessa estrategia, menor será a produção de efluentes e o consumo de insumos e energia. Os algoritmos de aprendizado por reforço apresentaram excelentes resultados para o primeiro estudo de caso, o reator CSTR com a cinética de Van de Vusse. Entretanto, para implementação destes algoritmos na planta química do Tennessee Eastman Process mostrou-se que mais estudos são necessários. A fraca ou inexistente propriedade Markov, a alta dimensionalidade e as peculiaridades da planta foram fatores dificultadores para os controladores desenvolvidos obterem resultados satisfatórios. Foram avaliados para o estudo de caso 1, os algoritmos Q-Learning, Actor Critic TD, DQL, DDPG, SAC e TD3, e para o estudo de caso 2 foram avaliados os algoritmos CMA-ES, TRPO, PPO, DDPG, SAC e TD3. / [en] Industry 4.0 boosted the development of new technologies to meet current market demands. One of these new technologies was the incorporation of computational intelligence techniques into the daily life of the chemical industry. In this context, this present work evaluated the performance of controllers based on reinforcement learning in industrial chemical processes. The control strategy directly affects the safety and cost of the process. The better the performance of this strategy, the lower will be the production of effluents and the consumption of input and energy. The reinforcement learning algorithms showed excellent results for the first case study, the Van de Vusse s reactor. However, to implement these algorithms in the Tennessee Eastman Process chemical plant it was shown that more studies are needed. The weak Markov property, the high dimensionality and peculiarities of the plant were factors that made it difficult for the developed controllers to obtain satisfactory results. For case study 1, the algorithms Q-Learning, Actor Critic TD, DQL, DDPG, SAC and TD3 were evaluated, and for case study 2 the algorithms CMA-ES, TRPO, PPO, DDPG, SAC and TD3 were evaluated.
42

Využití opakovaně posilovaného učení pro řízení čtyřnohého robotu / Using of Reinforcement Learning for Four Legged Robot Control

Ondroušek, Vít January 2011 (has links)
The Ph.D. thesis is focused on using the reinforcement learning for four legged robot control. The main aim is to create an adaptive control system of the walking robot, which will be able to plan the walking gait through Q-learning algorithm. This aim is achieved using the design of the complex three layered architecture, which is based on the DEDS paradigm. The small set of elementary reactive behaviors forms the basis of proposed solution. The set of composite control laws is designed using simultaneous activations of these behaviors. Both types of controllers are able to operate on the plain terrain as well as on the rugged one. The model of all possible behaviors, that can be achieved using activations of mentioned controllers, is designed using an appropriate discretization of the continuous state space. This model is used by the Q-learning algorithm for finding the optimal strategies of robot control. The capabilities of the control unit are shown on solving three complex tasks: rotation of the robot, walking of the robot in the straight line and the walking on the inclined plane. These tasks are solved using the spatial dynamic simulations of the four legged robot with three degrees of freedom on each leg. Resulting walking gaits are evaluated using the quantitative standardized indicators. The video files, which show acting of elementary and composite controllers as well as the resulting walking gaits of the robot, are integral part of this thesis.
43

Essays on Reinforcement Learning with Decision Trees and Accelerated Boosting of Partially Linear Additive Models

Dinger, Steven 01 October 2019 (has links)
No description available.
44

Board Game AI Using Reinforcement Learning

Strömberg, Linus, Lind, Viktor January 2022 (has links)
The purpose of this thesis is to develop an agent that learns to play an interpretation ofthe popular game Ticket To Ride. This project is done in collaboration with Piktiv AB.This thesis presents how an agent based on the Double Deep Q-network algorithm learnsto play a version of Ticket To Ride using self-play. This is the documentation of how thegame and the agent was developed, as well as how the agent was evaluated.
45

Automated touch-less customer order and robot deliver system design at Kroger

Shan, Xingjian 22 August 2022 (has links)
No description available.
46

Model Checked Reinforcement Learning For Multi-Agent Planning

Wetterholm, Erik January 2023 (has links)
Autonomous systems, or agents as they sometimes are called can be anything from drones, self-driving cars, or autonomous construction equipment. The systems are often given tasks of accomplishing missions in a group or more. This may require that they can work within the same area without colliding or disturbing other agents' tasks. There are several tools for planning and designing such systems, one of them being UPPAAL STRATEGO. Multi-agent planning (MAP) is about planning actions in optimal ways such that the agents can accomplish their mission efficiently. A method of doing this named MCRL, utilizes Q learning as the algorithm for  finding an optimal plan. These plans then need to be verified to ensure that they can accomplish what a user intended within the allowed time, something that UPPAAL STRATEGO can do. This is because a Q-learning algorithm does not have a correctness guarantee. Using this method alleviates the state-explosion problem that exists with an increasing number of agents. Using UPPAAL STRATEGO it is also possible to acquire the best and worst-case execution time (BCET and WCET) and their corresponding traces. This thesis aims to obtain the BCET and WCET and their corresponding traces in the model.
47

Reinforcement Learning Based Resource Allocation for Network Slicing in O-RAN

Cheng, Nien Fang 06 July 2023 (has links)
Fifth Generation (5G) introduces technologies that expedite the adoption of mobile networks, such as densely connected devices, ultra-fast data rate, low latency and more. With those visions in 5G and 6G in the next step, the need for a higher transmission rate and lower latency is more demanding, possibly breaking Moore’s law. With Artificial Intelligence (AI) techniques becoming mature in the past decade, optimizing resource allocation in the network has become a highly demanding problem for Mobile Network Operators (MNOs) to provide better Quality of Service (QoS) with less cost. This thesis proposes a Reinforcement Learning (RL) solution on bandwidth allocation for network slicing integration in disaggregated Open Radio Access Network (O-RAN) architecture. O-RAN redefines traditional Radio Access Network (RAN) elements into smaller components with detailed functional specifications. The concept of open modularization leads to greater potential for managing resources of different network slices. In 5G mobile networks, there are three major types of network slices, Enhanced Mobile Broadband (eMBB), Ultra-Reliable Low Latency Communications (URLLC), and Massive Machine Type Communications (mMTC). Each network slice has different features in the 5G network; therefore, the resources can be relocated depending on different needs. The virtualization of O-RAN divides the RAN into smaller function groups. This helps the network slices to divide the shared resources further down. Compared to traditional sequential signal processing, allocating dedicated resources for each network slice can improve the performance individually. In addition, shared resources can be customized statically based on the feature requirement of each slice. To further enhance the bandwidth utilization on the disaggregated O-RAN, a RL algorithm is proposed in this thesis on midhaul bandwidth allocation shared between Centralized Unit (CU) and Distributed Unit (DU). A Python-based simulator has been implemented considering several types of mobile User Equipment (UE)s for this thesis. The simulator is later integrated with the proposed Q-learning model. The RL model finds the optimization on bandwidth allocation in midhaul between Edge Open Cloud (O-Cloud)s (DUs) and Regional O-Cloud (CU). The results show up to 50% improvement in the throughput of the targeted slice, fairness to other slices, and overall bandwidth utilization on the O-Clouds. In addition, the UE QoS has a significant improvement in terms of transmission time.
48

A Learning based Adaptive Cruise and Lane Control System

Xu, Peng 31 August 2018 (has links)
No description available.
49

Reinforcement Learning Methods for OpenAI Environments

Winberg, Andreas, Öhrstam Lindström, Oliver January 2020 (has links)
Using the powerful methods developed in the fieldof reinforcement learning requires an understanding of theadvantages and drawbacks of different methods as well as theeffects of the different adjustable parameters. This paper high-lights the differences in performance and applicability betweenthree different Q-learning methods: Q-table, deep Q-network anddouble deep Q-network where Q refers to the value assigned toa given state-action pair. The performance of these algorithms isevaluated on the two OpenAI gym environments MountainCar-v0 and CartPole-v0. The implementations are done in Pythonusing the Tensorflow toolkit with Keras. The results show thatthe Q-table was the best to use in the Mountain car environmentbecause it was the easiest to implement and was much fasterto compute, but it was also shown that the network methodsrequired far less training data. No significant difference inperformance was found between the deep Q-network and thedouble deep Q-network. In the end, there is a trade-off betweenthe number of episodes required and the computation time foreach episode. The network parameters were also harder to tunesince much more time was needed to compute and visualize theresult. / Att använda de kraftfulla metoderna som utvecklats inom området reinforcement learning kräver en förståelse av fördelar och nackdelar mellan olika metoder samt effekterna av de olika justerbara parametrarna. Denna artikel belyser skillnaderna i prestanda och funktionalitet mellan tre olika metoder: Q-table, deep Q-network och double deep Q- network. Prestandan för dessa algoritmer utvärderas i de två OpenAI gym-miljöerna MountainCar-v0 samt Cartpole-v0. Implementeringarna görs i python med hjälp av programvarubiblioteket Tensorflow tillsammans med Keras. Resultaten visar att Q-table var lättast att implementera och tränade snabbast i båda miljöerna. Nätverksmetoderna krävde dock mindre träningsdata även om det tog lång tid att träna på den data som fanns. Inga stora skillnader i prestanda hittades mellan deep Q-network och double deep Q-network. I slutändan kommer det alltid vara en balansgång mellan mängden träningsdata som krävs och tiden det tar att träna på den data som finns.
50

A Distributed Q-learning Classifier System for task decomposition in real robot learning problems

Chapman, Kevin L. 04 March 2009 (has links)
A distributed reinforcement-learning system is designed and implemented on a mobile robot for the study of complex task decomposition in real robot learning environments. The Distributed Q-learning Classifier System (DQLCS) is evolved from the standard Learning Classifier System (LCS) proposed by J.H. Holland. Two of the limitations of the standard LCS are its monolithic nature and its complex apportionment of credit scheme, the bucket brigade algorithm (BBA). The DQLCS addresses both of these problems as well as the inherent difficulties faced by learning systems operating in real environments. We introduce Q-learning as the apportionment of credit component of the DQLCS, and we develop a distributed learning architecture to facilitate complex task decomposition. Based upon dynamic programming, the Q-learning update equation is derived and its advantages over the complex BBA are discussed. The distributed architecture is implemented to provide for faster learning by allowing the system to effectively decrease the size of the problem space it must explore. Holistic and monolithic shaping approaches are used to distribute reward among the learning modules of the DQLCS in a variety of real robot learning experiments. The results of these experiments support the DQLCS as a useful reinforcement learning paradigm and suggest future areas of study in distributed learning systems. / Master of Science

Page generated in 0.0561 seconds