Global ETD Search

61	Learning sensori-motor mappings using little knowledge : application to manipulation robotics / Apprentissage de couplages sensori-moteur en utilisant très peu d'informations : application à la robotique de manipulation De La Bourdonnaye, François 18 December 2018 (has links) La thèse consiste en l'apprentissage d'une tâche complexe de robotique de manipulation en utilisant très peu d'aprioris. Plus précisément, la tâche apprise consiste à atteindre un objet avec un robot série. L'objectif est de réaliser cet apprentissage sans paramètres de calibrage des caméras, modèles géométriques directs, descripteurs faits à la main ou des démonstrations d'expert. L'apprentissage par renforcement profond est une classe d'algorithmes particulièrement intéressante dans cette optique. En effet, l'apprentissage par renforcement permet d’apprendre une compétence sensori-motrice en se passant de modèles dynamiques. Par ailleurs, l'apprentissage profond permet de se passer de descripteurs faits à la main pour la représentation d'état. Cependant, spécifier les objectifs sans supervision humaine est un défi important. Certaines solutions consistent à utiliser des signaux de récompense informatifs ou des démonstrations d'experts pour guider le robot vers les solutions. D'autres consistent à décomposer l'apprentissage. Par exemple, l'apprentissage "petit à petit" ou "du simple au compliqué" peut être utilisé. Cependant, cette stratégie nécessite la connaissance de l'objectif en termes d'état. Une autre solution est de décomposer une tâche complexe en plusieurs tâches plus simples. Néanmoins, cela n'implique pas l'absence de supervision pour les sous tâches mentionnées. D'autres approches utilisant plusieurs robots en parallèle peuvent également être utilisés mais nécessite du matériel coûteux. Pour notre approche, nous nous inspirons du comportement des êtres humains. Ces derniers généralement regardent l'objet avant de le manipuler. Ainsi, nous décomposons la tâche d'atteinte en 3 sous tâches. La première tâche consiste à apprendre à fixer un objet avec un système de deux caméras pour le localiser dans l'espace. Cette tâche est apprise avec de l'apprentissage par renforcement profond et un signal de récompense faiblement supervisé. Pour la tâche suivante, deux compétences sont apprises en parallèle : la fixation d'effecteur et une fonction de coordination main-oeil. Comme la précédente tâche, un algorithme d'apprentissage par renforcement profond est utilisé avec un signal de récompense faiblement supervisé. Le but de cette tâche est d'être capable de localiser l'effecteur du robot à partir des coordonnées articulaires. La dernière tâche utilise les compétences apprises lors des deux précédentes étapes pour apprendre au robot à atteindre un objet. Cet apprentissage utilise les mêmes aprioris que pour les tâches précédentes. En plus de la tâche d'atteinte, un predicteur d'atteignabilité d'objet est appris. La principale contribution de ces travaux est l'apprentissage d'une tâche de robotique complexe en n'utilisant que très peu de supervision. / The thesis is focused on learning a complex manipulation robotics task using little knowledge. More precisely, the concerned task consists in reaching an object with a serial arm and the objective is to learn it without camera calibration parameters, forward kinematics, handcrafted features, or expert demonstrations. Deep reinforcement learning algorithms suit well to this objective. Indeed, reinforcement learning allows to learn sensori-motor mappings while dispensing with dynamics. Besides, deep learning allows to dispense with handcrafted features for the state spacerepresentation. However, it is difficult to specify the objectives of the learned task without requiring human supervision. Some solutions imply expert demonstrations or shaping rewards to guiderobots towards its objective. The latter is generally computed using forward kinematics and handcrafted visual modules. Another class of solutions consists in decomposing the complex task. Learning from easy missions can be used, but this requires the knowledge of a goal state. Decomposing the whole complex into simpler sub tasks can also be utilized (hierarchical learning) but does notnecessarily imply a lack of human supervision. Alternate approaches which use several agents in parallel to increase the probability of success can be used but are costly. In our approach,we decompose the whole reaching task into three simpler sub tasks while taking inspiration from the human behavior. Indeed, humans first look at an object before reaching it. The first learned task is an object fixation task which is aimed at localizing the object in the 3D space. This is learned using deep reinforcement learning and a weakly supervised reward function. The second task consists in learning jointly end-effector binocular fixations and a hand-eye coordination function. This is also learned using a similar set-up and is aimed at localizing the end-effector in the 3D space. The third task uses the two prior learned skills to learn to reach an object and uses the same requirements as the two prior tasks: it hardly requires supervision. In addition, without using additional priors, an object reachability predictor is learned in parallel. The main contribution of this thesis is the learning of a complex robotic task with weak supervision. Robotique de manipulation Apprentissage par renforcement profond Apprentissage faiblement supervisé Apprentissage sensori-moteur Manipulation robotics Deep reinforcement learning Weakly supervised learning Sensori-motor learning
62	Knowledge reuse for deep reinforcement learning. / Reutilização do conhecimento para aprendizado por reforço profundo. Glatt, Ruben 12 June 2019 (has links) With the rise of Deep Learning the field of Artificial Intelligence (AI) Research has entered a new era. Together with an increasing amount of data and vastly improved computing capabilities, Machine Learning builds the backbone of AI, providing many of the tools and algorithms that drive development and applications. While we have already achieved many successes in the fields of image recognition, language processing, recommendation engines, robotics, or autonomous systems, most progress was achieved when the algorithms were focused on learning only a single task with little regard to effort and reusability. Since learning a new task from scratch often involves an expensive learning process, in this work, we are considering the use of previously acquired knowledge to speed up the learning of a new task. For that, we investigated the application of Transfer Learning methods for Deep Reinforcement Learning (DRL) agents and propose a novel framework for knowledge preservation and reuse. We show, that the knowledge transfer can make a big difference if the source knowledge is chosen carefully in a systematic approach. To get to this point, we provide an overview of existing literature of methods that realize knowledge transfer for DRL, a field which has been starting to appear frequently in the relevant literature only in the last two years. We then formulate the Case-based Reasoning methodology, which describes a framework for knowledge reuse in general terms, in Reinforcement Learning terminology to facilitate the adaption and communication between the respective communities. Building on this framework, we propose Deep Case-based Policy Inference (DECAF) and demonstrate in an experimental evaluation the usefulness of our approach for sequential task learning with knowledge preservation and reuse. Our results highlight the benefits of knowledge transfer while also making aware of the challenges that come with it. We consider the work in this area as an important step towards more stable general learning agents that are capable of dealing with the most complex tasks, which would be a key achievement towards Artificial General Intelligence. / Com a evolução da Aprendizagem Profunda (Deep Learning), o campo da Inteligência Artificial (IA) entrou em uma nova era. Juntamente com uma quantidade crescente de dados e recursos computacionais cada vez mais aprimorados, o Aprendizado de Máquina estabelece a base para a IA moderna, fornecendo muitas das ferramentas e algoritmos que impulsionam seu desenvolvimento e aplicações. Apesar dos muitos sucessos nas áreas de reconhecimento de imagem, processamento de linguagem natural, sistemas de recomendação, robótica e sistemas autônomos, a maioria dos avanços foram feitos focando no aprendizado de apenas uma única tarefa, sem muita atenção aos esforços dispendidos e reusabilidade da solução. Como o aprendizado de uma nova tarefa geralmente envolve um processo de aprendizado despendioso, neste trabalho, estamos considerando o reúso de conhecimento para acelerar o aprendizado de uma nova tarefa. Para tanto, investigamos a aplicação dos métodos de Transferência de Aprendizado (Transfer Learning) para agentes de Aprendizado por Reforço profundo (Deep Reinforcement Learning - DRL) e propomos um novo arcabouço para preservação e reutilização de conhecimento. Mostramos que a transferência de conhecimento pode fazer uma grande diferença no aprendizado se a origem do conhecimento for escolhida cuidadosa e sistematicamente. Para chegar a este ponto, nós fornecemos uma visão geral da literatura existente de métodos que realizam a transferência de conhecimento para DRL, um campo que tem despontado com frequência na literatura relevante apenas nos últimos dois anos. Em seguida, formulamos a metodologia Raciocínio baseado em Casos (Case-based Reasoning), que descreve uma estrutura para reutilização do conhecimento em termos gerais, na terminologia de Aprendizado por Reforço, para facilitar a adaptação e a comunicação entre as respectivas comunidades. Com base nessa metodologia, propomos Deep Casebased Policy Inference (DECAF) e demonstramos, em uma avaliação experimental, a utilidade de nossa proposta para a aprendizagem sequencial de tarefas, com preservação e reutilização do conhecimento. Nossos resultados destacam os benefícios da transferência de conhecimento e, ao mesmo tempo, conscientizam os desafios que a acompanham. Consideramos o trabalho nesta área como um passo importante para agentes de aprendizagem mais estáveis, capazes de lidar com as tarefas mais complexas, o que seria um passo fundamental para a Inteligência Geral Artificial. Aprendizado computacional Aprendizado por reforço profundo Artificial intelligence Case-based reasoning Deep reinforcement learning Inteligência artificial Machine learning Raciocínio baseado em casos Transfer learning Transferência de aprendizado
63	Reinforcement learning and reward estimation for dialogue policy optimisation Su, Pei-Hao January 2018 (has links) Modelling dialogue management as a reinforcement learning task enables a system to learn to act optimally by maximising a reward function. This reward function is designed to induce the system behaviour required for goal-oriented applications, which usually means fulfilling the user’s goal as efficiently as possible. However, in real-world spoken dialogue systems, the reward is hard to measure, because the goal of the conversation is often known only to the user. Certainly, the system can ask the user if the goal has been satisfied, but this can be intrusive. Furthermore, in practice, the reliability of the user’s response has been found to be highly variable. In addition, due to the sparsity of the reward signal and the large search space, reinforcement learning-based dialogue policy optimisation is often slow. This thesis presents several approaches to address these problems. To better evaluate a dialogue for policy optimisation, two methods are proposed. First, a recurrent neural network-based predictor pre-trained from off-line data is proposed to estimate task success during subsequent on-line dialogue policy learning to avoid noisy user ratings and problems related to not knowing the user’s goal. Second, an on-line learning framework is described where a dialogue policy is jointly trained alongside a reward function modelled as a Gaussian process with active learning. This mitigates the noisiness of user ratings and minimises user intrusion. It is shown that both off-line and on-line methods achieve practical policy learning in real-world applications, while the latter provides a more general joint learning system directly from users. To enhance the policy learning speed, the use of reward shaping is explored and shown to be effective and complementary to the core policy learning algorithm. Furthermore, as deep reinforcement learning methods have the potential to scale to very large tasks, this thesis also investigates the application to dialogue systems. Two sample-efficient algorithms, trust region actor-critic with experience replay (TRACER) and episodic natural actor-critic with experience replay (eNACER), are introduced. In addition, a corpus of demonstration data is utilised to pre-train the models prior to on-line reinforcement learning to handle the cold start problem. Combining these two methods, a practical approach is demonstrated to effectively learn deep reinforcement learning-based dialogue policies in a task-oriented information seeking domain. Overall, this thesis provides solutions which allow truly on-line and continuous policy learning in spoken dialogue systems.
64	Quoting behaviour of a market-maker under different exchange fee structures / Quoting behaviour of a market-maker under different exchange fee structures Kiseľ, Rastislav January 2018 (has links) During the last few years, market micro-structure research has been active in analysing the dependence of market efficiency on different market character istics. Make-take fees are one of those topics as they might modify the incen tives for participating agents, e.g. broker-dealers or market-makers. In this thesis, we propose a Hawkes process-based model that captures statistical differences arising from different fee regimes and we estimate the differences on limit order book data. We then use these estimates in an attempt to measure the execution quality from the perspective of a market-maker. We appropriate existing theoretical market frameworks, however, for the pur pose of hireling optimal market-making policies we apply a novel method of deep reinforcement learning. Our results suggest, firstly, that maker-taker exchanges provide better liquidity to the markets, and secondly, that deep reinforcement learning methods may be successfully applied to the domain of optimal market-making. JEL Classification Keywords Author's e-mail Supervisor's e-mail C32, C45, C61, C63 make-take fees, Hawkes process, limit order book, market-making, deep reinforcement learn ing kiselrastislavSgmail.com barunik@f sv.cuni.cz
65	Deep Reinforcement Learning for the Optimization of Combining Raster Images in Forest Planning Wen, Yangyang January 2021 (has links) Raster images represent the treatment options of how the forest will be cut. Economic benefits from cutting the forest will be generated after the treatment is selected and executed. Existing raster images have many clusters and small sizes, this becomes the principal cause of overhead. If we can fully explore the relationship among the raster images and combine the old data sets according to the optimization algorithm to generate a new raster image, then this result will surpass the existing raster images and create higher economic benefits. The question of this project is can we create a dynamic model that treats the updating pixel’s status as an agent selecting options for an empty raster image in response to neighborhood environmental and landscape parameters. This project is trying to explore if it is realistic to use deep reinforcement learning to generate new and superior raster images. Finally, this project aims to explore the feasibility, usefulness, and effectiveness of deep reinforcement learning algorithms in optimizing existing treatment options. The problem was modeled as a Markov decision process, in which the pixel to be updated was an agent of the empty raster image, which would determine the choice of the treatment option for the current empty pixel. This project used the Deep Q learning neural network model to calculate the Q values. The temporal difference reinforcement learning algorithm was applied to predict future rewards and to update model parameters. After the modeling was completed, this project set up the model usefulness experiment to test the usefulness of the model. Then the parameter correlation experiment was set to test the correlation between the parameters and the benefit of the model. Finally, the trained model was used to generate a larger size raster image to test its effectiveness. Raster images Optimization Deep Reinforcement Learning Markov Decision Process Deep Q Learning Neural Network Temporal Difference Model Usefulness Parameter Correlation Model Effectiveness. Computer Systems Datorsystem
66	Simulated Fixed-Wing Aircraft Attitude Control using Reinforcement Learning Methods David Jona Richter (11820452) 20 December 2021 (has links) <div>Autonomous transportation is a research field that has gained huge interest in recent years, with autonomous electric or hydrogen cars coming ever closer to seeing everyday use. Not just cars are subject to autonomous research though, the field of aviation is also being explored for fully autonomous flight. One very important aspect for making autonomous flight a reality is attitude control, the control of roll, pitch, and sometimes yaw. Traditional approaches for automated attitude control use PID (proportional-integral-derivative) controllers, which use hand-tuned parameters to fulfill the task. In this work, however, the use of Reinforcement Learning algorithms for attitude control will be explored. With the surge of more and more powerful artificial neural networks, which have proven to be universally usable function approximators, Deep Reinforcement Learning also becomes an intriguing option. </div><div>A software toolkit will be developed and used to allow for the use of multiple flight simulators to train agents with Reinforcement Learning as well as Deep Reinforcement Learning. Experiments will be run using different hyperparamters, algorithms, state representations, and reward functions to explore possible options for autonomous attitude control using Reinforcement Learning.</div> Reinforcement Learning Deep Learning Deep Reinforcement Learning Machine Learning Neural Networks Q-Learning Deep Q-Networks (DQN) Aviation Attitude Control Autopilot
67	Hluboké posilovaná učení a řešení pohybu robotu typu had / Deep reinforcement learning and snake-like robot locomotion design Kočí, Jakub January 2020 (has links) This master thesis is discussing application of reinforcement learning in deep learning tasks. In theoretical part, basics about artificial neural networks and reinforcement learning. The thesis describes theoretical model of reinforcement learning process - Markov processes. Some interesting techniques are shown on conventional reinforcement learning algorithms. Some of widely used deep reinforcement learning algorithms are described here as well. Practical part consist of implementing model of robot and it's environment and of the deep reinforcement learning system itself.
68	Optimizing Power Consumption, Resource Utilization, and Performance for Manycore Architectures using Reinforcement Learning Fettes, Quintin 23 May 2022 (has links) No description available. Computer Science Computer Engineering Artificial Intelligence Multicores Network-on-Chip Reinforcement Learning Deep Learning Deep Reinforcement Learning Microservices Chip Multiprocessor Energy Efficiency
69	Research on Dynamic Offloading Strategy of Satellite Edge Computing Based on Deep Reinforcement Learning Geng, Rui January 2021 (has links) Nowadays more and more data is generated at the edge of the network, and people are beginning to consider decentralizing computing tasks to the edge of the network. The network architecture of edge computing is different from the traditional network architecture. Its distributed configuration can make up for some shortcomings of traditional networks, such as data congestion, increased delay, and limited capacity. With the continuous development of 5G technology, satellite communication networks are also facing many new business challenges. By using idle computing power and storage space on satellites and integrating edge computing technology into satellite communication networks, it will greatly improve satellite communication service quality, and enhance satellite task processing capabilities, thereby improving the satellite edge computing system performance. The primary problem that limits the computing performance of satellite edge networks is how to obtain a more effective dynamic service offloading strategy. To study this problem, this thesis monitors the status information satellite nodes in different periods, such as service load and distance to the ground, uses the Markov decision process to model the dynamic offloading problem of the satellite edge computing system, and finally obtains the service offloading strategies. The deployment plan is based on deep reinforcement learning algorithms. We mainly study the performance of the Deep Q-Network (DQN) algorithm and two improved DQN algorithms Double DQN (DDQN) and Dueling DQN (DuDQN) in different service request types and different system scenarios. Compared with existing service deployment algorithms, deep reinforcement learning algorithms take into account the long-term service quality of the system and form more reasonable offloading strategies. / Med den snabba utvecklingen av mobil kommunikationsteknik genereras mer och mer data i utkanten av nätverket, och människor börjar överväga att decentralisera datoruppgifter till kanten av nätverket. Och byggde ett komplett mobilt edge computing -arkitektursystem. Edge -dators nätverksarkitektur skiljer sig från den traditionella nätverksarkitekturen. Dess distribuerade konfiguration kan kompensera för eventuella brister i traditionella nätverk, såsom överbelastning av data, ökad fördröjning och begränsad kapacitet. Med den ständiga utvecklingen av 5G -teknik står satellitkommunikationsnät också inför många nya affärsutmaningar. Genom att använda inaktiv datorkraft och lagringsutrymme på satelliter och integrera edge computing -teknik i satellitkommunikationsnät kommer det att förkorta servicetiden för traditionella mobila satelliter kraftigt, förbättra satellitkommunikationstjänstkvaliteten och förbättra satellituppgiftsbehandlingsförmågan och därigenom förbättra satelliten edge computing systemprestanda. Det primära problemet som begränsar datorprestanda för satellitkantnät är hur man får en mer effektiv dynamisk tjänstavlastningsstrategi. Detta papper övervakar servicebelastningen av satellitnoder i olika perioder, markpositionsinformation och annan statusinformation använder Markov - beslutsprocessen för att modellera den dynamiska distributionen av satellitkantstjänster och får slutligen en uppsättning tjänstedynamik baserad på modell och design . Distributionsplanen är baserad på en djupt förbättrad algoritm för dynamisk distribution av tjänster. Det här dokumentet studerar huvudsakligen prestandan för DQN -algoritmen och två förbättrade DQN - algoritmer Double DQN och Dueling DQN i olika serviceförfrågningstyper och olika systemscenarier. Jämfört med befintliga algoritmer för serviceutplacering är prestandan för algoritmer för djupförstärkning något bättre. Deep reinforcement learning Satellite edge computing Offloading strategy Djup Förstärkning Lärande Satellit Kant Datoranvändning Avlastning Strategi Computer and Information Sciences Data- och informationsvetenskap
70	Automatic game-testing with personality : Multi-task reinforcement learning for automatic game-testing / Automatisk speltestning med personlighet : Multi-task förstärkning lärande för automatisk speltestning Canal Anton, Oleguer January 2021 (has links) This work presents a scalable solution to automate game-testing. Traditionally, game-testing has been performed by either human players or scripted Artificial Intelligence (AI) agents. While the first produces the most reliable results, the process of organizing testing sessions is time consuming. On the other hand, scripted AI dramatically speeds up the process, however, the insights it provides are far less useful: these agents’ behaviors are highly predictable. The presented solution takes the best of both worlds: the automation of scripted AI, and the richness of human testing by framing the problem within the Deep Reinforcement Learning (DRL) paradigm. Reinforcement Learning (RL) agents are trained to adapt to any unseen level and present customizable human personality traits: such as aggressiveness, greed, fear, etc. This is achieved exploring the problem from a multi-task RL setting. Each personality trait is understood as a different task which can be linearly combined by the proposed algorithm. Furthermore, since Artificial Neural Networks (ANNs) have been used to model the agent’s policies, the solution is highly adaptable and scalable. This thesis reviews the state of the art in both automatic game-testing and RL, and proposes a solution to the above-mentioned problem. Finally, promising results are obtained evaluating the solution on two different environments: a simple environment used to quantify the quality of the designed algorithm, and a generic game environment useful to show-case its applicability. In particular, results show that the designed agent is able to perform good on game levels never seen before. In addition, the agent can display any convex combination of the trained behaviors. Furthermore, its performance is as good as if it had been specifically trained on that particular combination. / Detta arbete presenterar en skalbar lösning för att automatisera speltestning. Traditionellt har speltestning utförts av antingen mänskliga spelare eller förprogrammerade agenter. Även om det förstanämnda ger de mest tillförlitliga resultaten är processen tidskrävande. Å andra sidan påskyndar förprogrammerade agenter processen dramatiskt, men de insikter som de ger är mycket mindre användbara: dessa agenters beteenden är mycket förutsägbara. Den presenterade lösningen använder det bästa av två världar: automatiseringsmöjligheten från förprogrammerade agenter samt möjligheten att simulera djupet av mänskliga tester genom att inrama problemet inom paradigmet Djup Förstärkningsinlärning. En agent baserad på förstärkningsinlärning tränas i att anpassa sig till tidigare osedda spelmiljöer och presenterar anpassningsbara mänskliga personlighetsdrag: som aggressivitet, girighet, rädsla... Eftersom Artificiella Neurala Nätverk (ANNs) har använts för att modellera agentens policyer är lösningen potentiellt mycket anpassnings- och skalbar. Denna rapport granskar först den senaste forskningen inom både automatisk speltestning och förstärkningsinlärning. Senare presenteras en lösning för ovan nämnda problem. Slutligen evalueras lösningen i två olika miljöer med lovande resultat. Den första miljön används för att kvantifiera kvaliteten på den designade algoritmen. Den andra är en generisk spelmiljö som är användbar för att påvisa lösningens tillämplighet. Deep Reinforcement Learning Multi-Task Successor Features Game- Testing Personas Djup förstärkningsinlärning Multitasking Efterföljande kännetecken Speltestning Artificiell intelligens Persona. Computer and Information Sciences Data- och informationsvetenskap

Search results