Global ETD Search

31	Návrh simulátoru autonomního dopravního prostředku / Design of autonomous vehicle simulator Machač, Petr January 2020 (has links) Tato práce se zabývá simulačními prostředky pro vývoj algoritmů pro řízení autonomních automobilů. V zásadě lze rozdělit na dvě části, na rešeršní, teoretickou, a praktickou, vývojovou. V té prvně zmíněné je uveden přehled dostupných nástrojů pro simulaci autonomních vozidel, jedná se jak o nástroje open-sourcové tak placené. Dále se v teoretické části popisuje princip a nástroje, resp. enginy pro řešení dynamických rovnic na počítači. Důraz je kladen na fyzikální engine Box2D který je dle zadání této práce využit ve druhé části teze pro vývoj vlastního prostředí simulujícího autonomní automobil.
32	Training reinforcement learning model with custom OpenAI gym for IIoT scenario Norman, Pontus January 2022 (has links) Denna studie består av ett experiment för att se, som ett test, hur bra det skulle fungera att implementera en industriell gymmiljö för att träna en reinforcement learning modell. För att fastställa det här tränas modellen upprepade gånger och modellen testas. Om modellen lyckas lösa scenariot, som är en representation av miljön, räknas den träningsiterationen som en framgång. Tiden det tar att träna för ett visst antal spelavsnitt mäts. Antalet avsnitt det tar för reinforcement learning modellen att uppnå ett acceptabelt resultat på 80 % av maximal poäng mäts och tiden det tar att träna dessa avsnitt mäts. Dessa mätningar utvärderas och slutsatser dras om hur väl reinforcement learning modellerna fungerade. Verktygen som används är Q-learning algoritmen implementerad på egen hand och djup Q-learning med TensorFlow. Slutsatsen visade att den manuellt implementerade Q-learning algoritmen visade varierande resultat beroende på miljödesign och hur länge modellen tränades. Det gav både hög och låg framgångsfrekvens varierande från 100 % till 0 %. Och tiderna det tog att träna agenten till en acceptabel nivå var 0,116, 0,571 och 3,502 sekunder beroende på vilken miljö som testades (se resultatkapitlet för mer information om hur modellerna ser ut). TensorFlow-implementeringen gav antingen 100 % eller 0 % framgång och eftersom jag tror att de polariserande resultaten berodde på något problem med implementeringen så valde jag att inte göra fler mätningar än för en miljö. Och eftersom modellen aldrig nådde ett stabilt utfall på mer än 80 % mättes ingen tid på länge den behöver tränas för denna implementering. / This study consists of an experiment to see, as a proof of concept, how well it would work to implement an industrial gym environment to train a reinforcement learning model. To determine this, the reinforcement learning model is trained repeatedly and tested. If the model completes the training scenario, then that training iteration counts as a success. The time it takes to train for certain amount of game episodes is measured. The number of episodes it takes for the reinforcement learning model to achieve an acceptable outcome of 80% of maximum score is measured and the time it takes to train those episodes are measured. These measurements are evaluated, and conclusions are drawn on how well the reinforcement learning models worked. The tools used is the Q-learning algorithm implemented on its own and deep Q-learning with TensorFlow. The conclusion showed that the manually implemented Q-learning algorithm showed varying results depending on environment design and how long the agent is trained. It gave both high and low success rate varying from 100% to 0%. And the times it took to train the agent to an acceptable level was 0.116, 0.571 and 3.502 seconds depending on what environment was tested (see the result chapter for more information on the environments). The TensorFlow implementation gave either 100% or 0% success rate and since I believe the polarizing results was because of some issue with the implementation I chose to not do more measurements than for one environment. And since the model never reached a stable outcome of more than 80% no time for long it needs to train was measured for this implementation. Q-learning. Reinforcement Learning OpenAI gym Q-learning. Reinforcement Learning OpenAI gym Software Engineering Programvaruteknik
33	Reinforcement Learning with Auxiliary Memory Suggs, Sterling 08 June 2021 (has links) Deep reinforcement learning algorithms typically require vast amounts of data to train to a useful level of performance. Each time new data is encountered, the network must inefficiently update all of its parameters. Auxiliary memory units can help deep neural networks train more efficiently by separating computation from storage, and providing a means to rapidly store and retrieve precise information. We present four deep reinforcement learning models augmented with external memory, and benchmark their performance on ten tasks from the Arcade Learning Environment. Our discussion and insights will be helpful for future RL researchers developing their own memory agents. Reinforcement learning auxiliary memory neural computer Atari machine learning Q-learning Physical Sciences and Mathematics
34	[pt] ESTUDO DE TÉCNICAS DE APRENDIZADO POR REFORÇO APLICADAS AO CONTROLE DE PROCESSOS QUÍMICOS / [en] STUDY OF REINFORCEMENT LEARNING TECHNIQUES APPLIED TO THE CONTROL OF CHEMICAL PROCESSES 30 December 2021 (has links) [pt] A indústria 4.0 impulsionou o desenvolvimento de novas tecnologias para atender as demandas atuais do mercado. Uma dessas novas tecnologias foi a incorporação de técnicas de inteligência computacional no cotidiano da indústria química. Neste âmbito, este trabalho avaliou o desempenho de controladores baseados em aprendizado por reforço em processos químicos industriais. A estratégia de controle interfere diretamente na segurança e no custo do processo. Quanto melhor for o desempenho dessa estrategia, menor será a produção de efluentes e o consumo de insumos e energia. Os algoritmos de aprendizado por reforço apresentaram excelentes resultados para o primeiro estudo de caso, o reator CSTR com a cinética de Van de Vusse. Entretanto, para implementação destes algoritmos na planta química do Tennessee Eastman Process mostrou-se que mais estudos são necessários. A fraca ou inexistente propriedade Markov, a alta dimensionalidade e as peculiaridades da planta foram fatores dificultadores para os controladores desenvolvidos obterem resultados satisfatórios. Foram avaliados para o estudo de caso 1, os algoritmos Q-Learning, Actor Critic TD, DQL, DDPG, SAC e TD3, e para o estudo de caso 2 foram avaliados os algoritmos CMA-ES, TRPO, PPO, DDPG, SAC e TD3. / [en] Industry 4.0 boosted the development of new technologies to meet current market demands. One of these new technologies was the incorporation of computational intelligence techniques into the daily life of the chemical industry. In this context, this present work evaluated the performance of controllers based on reinforcement learning in industrial chemical processes. The control strategy directly affects the safety and cost of the process. The better the performance of this strategy, the lower will be the production of effluents and the consumption of input and energy. The reinforcement learning algorithms showed excellent results for the first case study, the Van de Vusse s reactor. However, to implement these algorithms in the Tennessee Eastman Process chemical plant it was shown that more studies are needed. The weak Markov property, the high dimensionality and peculiarities of the plant were factors that made it difficult for the developed controllers to obtain satisfactory results. For case study 1, the algorithms Q-Learning, Actor Critic TD, DQL, DDPG, SAC and TD3 were evaluated, and for case study 2 the algorithms CMA-ES, TRPO, PPO, DDPG, SAC and TD3 were evaluated. [pt] APRENDIZADO POR REFORCO [pt] SAC [pt] TD3 [pt] DDPG [pt] DEEP Q-LEARNING [pt] ATOR-CRITICO [pt] REATOR DE VAN DE VUSSE [pt] CONTROLE DE PROCESSOS QUIMICOS [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] Q-LEARNING [pt] PROCESSO TENNESSEE EASTMAN [en] REINFORCEMENT LEARNING [en] SAC [en] TD3 [en] DDPG [en] DEEP Q-LEARNING [en] ACTOR CRITIC [en] CHEMICAL PROCESS CONTROL [en] DEEP REINFORCEMENT LEARNING [en] Q-LEARNING [en] TENNESSEE EASTMAN PROCESS
35	Využití opakovaně posilovaného učení pro řízení čtyřnohého robotu / Using of Reinforcement Learning for Four Legged Robot Control Ondroušek, Vít January 2011 (has links) The Ph.D. thesis is focused on using the reinforcement learning for four legged robot control. The main aim is to create an adaptive control system of the walking robot, which will be able to plan the walking gait through Q-learning algorithm. This aim is achieved using the design of the complex three layered architecture, which is based on the DEDS paradigm. The small set of elementary reactive behaviors forms the basis of proposed solution. The set of composite control laws is designed using simultaneous activations of these behaviors. Both types of controllers are able to operate on the plain terrain as well as on the rugged one. The model of all possible behaviors, that can be achieved using activations of mentioned controllers, is designed using an appropriate discretization of the continuous state space. This model is used by the Q-learning algorithm for finding the optimal strategies of robot control. The capabilities of the control unit are shown on solving three complex tasks: rotation of the robot, walking of the robot in the straight line and the walking on the inclined plane. These tasks are solved using the spatial dynamic simulations of the four legged robot with three degrees of freedom on each leg. Resulting walking gaits are evaluated using the quantitative standardized indicators. The video files, which show acting of elementary and composite controllers as well as the resulting walking gaits of the robot, are integral part of this thesis.
36	Essays on Reinforcement Learning with Decision Trees and Accelerated Boosting of Partially Linear Additive Models Dinger, Steven 01 October 2019 (has links) No description available. Statistics Fitted Q-Iteration Gradient Boosting Online Random Forest Q-Learning Twin Boosting Variable Selection
37	Board Game AI Using Reinforcement Learning Strömberg, Linus, Lind, Viktor January 2022 (has links) The purpose of this thesis is to develop an agent that learns to play an interpretation ofthe popular game Ticket To Ride. This project is done in collaboration with Piktiv AB.This thesis presents how an agent based on the Double Deep Q-network algorithm learnsto play a version of Ticket To Ride using self-play. This is the documentation of how thegame and the agent was developed, as well as how the agent was evaluated. Reinforcement Learning Game Development Computer Engineering Q-learning 2 Computer Sciences Datavetenskap (datalogi)
38	Automated touch-less customer order and robot deliver system design at Kroger Shan, Xingjian 22 August 2022 (has links) No description available. Mechanical Engineering Q-learning Reinforcment learning hand pose detection robot design human-robot interaction
39	Model Checked Reinforcement Learning For Multi-Agent Planning Wetterholm, Erik January 2023 (has links) Autonomous systems, or agents as they sometimes are called can be anything from drones, self-driving cars, or autonomous construction equipment. The systems are often given tasks of accomplishing missions in a group or more. This may require that they can work within the same area without colliding or disturbing other agents' tasks. There are several tools for planning and designing such systems, one of them being UPPAAL STRATEGO. Multi-agent planning (MAP) is about planning actions in optimal ways such that the agents can accomplish their mission efficiently. A method of doing this named MCRL, utilizes Q learning as the algorithm for finding an optimal plan. These plans then need to be verified to ensure that they can accomplish what a user intended within the allowed time, something that UPPAAL STRATEGO can do. This is because a Q-learning algorithm does not have a correctness guarantee. Using this method alleviates the state-explosion problem that exists with an increasing number of agents. Using UPPAAL STRATEGO it is also possible to acquire the best and worst-case execution time (BCET and WCET) and their corresponding traces. This thesis aims to obtain the BCET and WCET and their corresponding traces in the model. MALTA; UPPAAL UPPAAL STRATEGO TImed Games Q-Learning Timed Automata Timed Games Computer Sciences Datavetenskap (datalogi)
40	Reinforcement Learning Based Resource Allocation for Network Slicing in O-RAN Cheng, Nien Fang 06 July 2023 (has links) Fifth Generation (5G) introduces technologies that expedite the adoption of mobile networks, such as densely connected devices, ultra-fast data rate, low latency and more. With those visions in 5G and 6G in the next step, the need for a higher transmission rate and lower latency is more demanding, possibly breaking Moore’s law. With Artificial Intelligence (AI) techniques becoming mature in the past decade, optimizing resource allocation in the network has become a highly demanding problem for Mobile Network Operators (MNOs) to provide better Quality of Service (QoS) with less cost. This thesis proposes a Reinforcement Learning (RL) solution on bandwidth allocation for network slicing integration in disaggregated Open Radio Access Network (O-RAN) architecture. O-RAN redefines traditional Radio Access Network (RAN) elements into smaller components with detailed functional specifications. The concept of open modularization leads to greater potential for managing resources of different network slices. In 5G mobile networks, there are three major types of network slices, Enhanced Mobile Broadband (eMBB), Ultra-Reliable Low Latency Communications (URLLC), and Massive Machine Type Communications (mMTC). Each network slice has different features in the 5G network; therefore, the resources can be relocated depending on different needs. The virtualization of O-RAN divides the RAN into smaller function groups. This helps the network slices to divide the shared resources further down. Compared to traditional sequential signal processing, allocating dedicated resources for each network slice can improve the performance individually. In addition, shared resources can be customized statically based on the feature requirement of each slice. To further enhance the bandwidth utilization on the disaggregated O-RAN, a RL algorithm is proposed in this thesis on midhaul bandwidth allocation shared between Centralized Unit (CU) and Distributed Unit (DU). A Python-based simulator has been implemented considering several types of mobile User Equipment (UE)s for this thesis. The simulator is later integrated with the proposed Q-learning model. The RL model finds the optimization on bandwidth allocation in midhaul between Edge Open Cloud (O-Cloud)s (DUs) and Regional O-Cloud (CU). The results show up to 50% improvement in the throughput of the targeted slice, fairness to other slices, and overall bandwidth utilization on the O-Clouds. In addition, the UE QoS has a significant improvement in terms of transmission time. network slicing O-RAN CU-DU functional split bandwidth optimization RL Q-learning

Search results