• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 80
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 110
  • 110
  • 110
  • 36
  • 24
  • 22
  • 22
  • 19
  • 19
  • 19
  • 19
  • 18
  • 18
  • 17
  • 17
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Towards provably safe and robust learning-enabled systems

Fan, Jiameng 26 August 2022 (has links)
Machine learning (ML) has demonstrated great success in numerous complicated tasks. Fueled by these advances, many real-world systems like autonomous vehicles and aircraft are adopting ML techniques by adding learning-enabled components. Unfortunately, ML models widely used today, like neural networks, lack the necessary mathematical framework to provide formal guarantees on safety, causing growing concerns over these learning-enabled systems in safety-critical settings. In this dissertation, we tackle this problem by combining formal methods and machine learning to bring provable safety and robustness to learning-enabled systems. We first study the robustness verification problem of neural networks on classification tasks. We focus on providing provable safety guarantees on the absence of failures under arbitrarily strong adversaries. We propose an efficient neural network verifier LayR to compute a guaranteed and overapproximated range for the output of a neural network given an input set which contains all possible adversarially perturbed inputs. LayR relaxes nonlinear units in neural networks using linear bounds and refines such relaxations with mixed integer linear programming (MILP) to iteratively improve the approximation precision, which achieves tighter output range estimations compared to prior neural network verifiers. However, the neural network verifier focuses more on analyzing a trained neural network but less on learning provably safe neural networks. To tackle this problem, we study verifiable training that incorporates verification into training procedures to train provably safe neural networks and scale to larger models and datasets. We propose a novel framework, AdvIBP, for combining adversarial training and provable robustness verification. We show that the proposed framework can learn provable robust neural networks at a sublinear convergence rate. In the second part of the dissertation, we study the verification of system-level properties in neural-network controlled systems (NNCS). We focus on proving bounded-time safety properties by computing reachable sets. We first introduce two efficient NNCS verifiers ReachNN* and POLAR that construct polynomial-based overapproximations of neural-network controllers. We transfer NNCSs to tractable closed-loop systems with approximated polynomial controllers for computing reachable sets using existing reachability analysis tools of dynamical systems. The combination of polynomial overapproximations and reachability analysis tools opens promising directions for NNCS verification. We also include a survey and experimental study of existing NNCS verification methods, including combining state-of-the-art neural network verifiers with reachability analysis tools, to discuss what overapproximation is suitable for NNCS reachability analysis. While these verifiers enable proving safety properties of NNCS, the nonlinearity of neural-network controllers is the main bottleneck that limits their efficiency and scalability. We propose a novel framework of knowledge distillation to control “the degree of nonlinearity” of NN controllers to ease NNCS verification which improves provable safety of NNCSs especially when they are safe but cannot be verified due to their complexity. For the verification community, this opens up the possibility of reducing verification complexity by influencing how a system is trained. Though NNCS verification can prove safety when system models are known, modern deep learning, e.g., deep reinforcement learning (DRL), often targets tasks with unknown system models, also known as the model-free setting. To tackle this issue, we first focus on safe exploration of DRL and propose a novel Lyapunov-inspired method. Our method uses Gaussian Process models to provide probabilistic guarantees on the policies, and guide the exploration of the unknown environment in a safe fashion. Then, we study learning robust visual control policies in DRL to enhance the robustness against visual changes that were not seen during training. We propose a method DRIBO, which can learn robust state representations for RL via a novel contrastive version of the Multi-View Information Bottleneck (MIB). This approach enables us to train high-performance visual policies that are robust to visual distractions, and can generalize well to unseen environments.
52

Voltage-Based Multi-step Prediction : Data Labeling, Software Evaluation, and Contrasting DRL with Traditional Prediction Methods

Svensson, Joakim January 2023 (has links)
In this project, three primary problems were addressed to improve battery data management and software performance evaluation. All solutions used voltage values in time together with various device characteristics. Battery replacement labeling was performed using Hidden Markov Models. Both deep reinforcement learning, specifically TD3 with an LSTM layer, and traditional models were employed to predict future battery voltages. These predictions subsequently informed a developed novel method for early evaluation of software impact on battery performance. A baseline model was also introduced for optimal battery replacement timing. Results indicated that the TD3-LSTM model achieved a mean absolute percentage error below 5%, on par with traditional methods. The battery replacement labeling had above 85% correctly labeled replacements, impact on battery performance was above 90% correct in software comparisons. TD3-LSTM proved a viable choice for multi-step predictions requiring online learning, albeit requiring potentially more tuning. / I detta projekt behandlades tre primära problem i syfte att förbättra batteridatahantering och utvärdering av mjukvaruprestanda. Alla lösningar använde spänningsvärden i tid tillsammans med olika enhetsegenskaper. Batteribytesmärkning utfördes med hjälp av Hidden Markov Models. Både deep reinforcement learning, särskilt TD3 med ett LSTM-lager, och traditionella modeller användes för att förutsäga framtida batterispänningar. Dessa förutsägelser användes sedan i en framtagen ny metod för tidig utvärdering av mjukvarans påverkan på batteriprestanda. En basmodell introducerades också för optimal batteribytestid. Resultaten indikerade att TD3-LSTM modellen uppnådde ett genomsnittligt absolut procentfel under 5%, i nivå med traditionella metoder. Batteribytesmärkningen hade över 85% korrekt märkta batteribyten, inverkan på batteriprestanda var över 90% korrekt i mjukvarujämförelser. TD3-LSTM visade sig vara ett hållbart val för flerstegsförutsägelser som kräver onlineinlärning, även om det krävde potentiellt mer justering.
53

Deep reinforcement learning for automated building climate control

Snällfot, Erik, Hörnberg, Martin January 2024 (has links)
The building sector is the single largest contributor to greenhouse gas emissions, making it a natural focal point for reducing energy consumption. More efficient use of energy is also becoming increasingly important for property managers as global energy prices are skyrocketing. This report is conducted on behalf of Sustainable Intelligence, a Swedish company that specializes in building automation solutions. It investigates whether deep reinforcement learning (DLR) algorithms can be implemented in a building control environment, if it can be more effective than traditional solutions, and if it can be achieved in reasonable time. The algorithms that were tested were Deep Deterministic Policy Gradient, DDPG, and Proximal Policy Optimization, PPO. They were implemented in a simulated BOPTEST environment in Brussels, Belgium, along with a traditional heating curve and a PI-controller for benchmarks. DDPG never converged, but PPO managed to reduce energy consumption compared to the best benchmark, while only having slightly worse thermal discomfort. The results indicate that DRL algorithms can be implemented in a building environment and reduce green house gas emissions in a reasonable training time. This might especially be interesting in a complex building where DRL can adapt and scale better than traditional solutions. Further research along with implementations on physical buildings need to be done in order to determine if DRL is the superior option.
54

Heterogeneous IoT Network Architecture Design for Age of Information Minimization

Xia, Xiaohao 01 February 2023 (has links) (PDF)
Timely data collection and execution in heterogeneous Internet of Things (IoT) networks in which different protocols and spectrum bands coexist such as WiFi, RFID, Zigbee, and LoRa, requires further investigation. This thesis studies the problem of age-of-information minimization in heterogeneous IoT networks consisting of heterogeneous IoT devices, an intermediate layer of multi-protocol mobile gateways (M-MGs) that collects and relays data from IoT objects and performs computing tasks, and heterogeneous access points (APs). A federated matching framework is presented to model the collaboration between different service providers (SPs) to deploy and share M-MGs and minimize the average weighted sum of the age-of-information and energy consumption. Further, we develop a two-level multi-protocol multi-agent actor-critic (MP-MAAC) to solve the optimization problem, where M-MGs and SPs can learn collaborative strategies through their own observations. The M-MGs' strategies include selecting IoT objects for data collection, execution, relaying, and/or offloading to SPs’ access points while SPs decide on spectrum allocation. Finally, to improve the convergence of the learning process we incorporate federated learning into the multi-agent collaborative framework. The numerical results show that our Fed-Match algorithm reduces the AoI by factor four, collects twice more packets than existing approaches, reduces the penalty by factor five when enabling relaying, and establishes design principles for the stability of the training process.
55

Adversarial Reinforcement Learning for Control System Design: A Deep Reinforcement Learning Approach

Yang, Zhaoyuan, Yang 15 August 2018 (has links)
No description available.
56

Deep Reinforcement Learning for Open Multiagent System

Zhu, Tianxing 20 September 2022 (has links)
No description available.
57

Learning sensori-motor mappings using little knowledge : application to manipulation robotics / Apprentissage de couplages sensori-moteur en utilisant très peu d'informations : application à la robotique de manipulation

De La Bourdonnaye, François 18 December 2018 (has links)
La thèse consiste en l'apprentissage d'une tâche complexe de robotique de manipulation en utilisant très peu d'aprioris. Plus précisément, la tâche apprise consiste à atteindre un objet avec un robot série. L'objectif est de réaliser cet apprentissage sans paramètres de calibrage des caméras, modèles géométriques directs, descripteurs faits à la main ou des démonstrations d'expert. L'apprentissage par renforcement profond est une classe d'algorithmes particulièrement intéressante dans cette optique. En effet, l'apprentissage par renforcement permet d’apprendre une compétence sensori-motrice en se passant de modèles dynamiques. Par ailleurs, l'apprentissage profond permet de se passer de descripteurs faits à la main pour la représentation d'état. Cependant, spécifier les objectifs sans supervision humaine est un défi important. Certaines solutions consistent à utiliser des signaux de récompense informatifs ou des démonstrations d'experts pour guider le robot vers les solutions. D'autres consistent à décomposer l'apprentissage. Par exemple, l'apprentissage "petit à petit" ou "du simple au compliqué" peut être utilisé. Cependant, cette stratégie nécessite la connaissance de l'objectif en termes d'état. Une autre solution est de décomposer une tâche complexe en plusieurs tâches plus simples. Néanmoins, cela n'implique pas l'absence de supervision pour les sous tâches mentionnées. D'autres approches utilisant plusieurs robots en parallèle peuvent également être utilisés mais nécessite du matériel coûteux. Pour notre approche, nous nous inspirons du comportement des êtres humains. Ces derniers généralement regardent l'objet avant de le manipuler. Ainsi, nous décomposons la tâche d'atteinte en 3 sous tâches. La première tâche consiste à apprendre à fixer un objet avec un système de deux caméras pour le localiser dans l'espace. Cette tâche est apprise avec de l'apprentissage par renforcement profond et un signal de récompense faiblement supervisé. Pour la tâche suivante, deux compétences sont apprises en parallèle : la fixation d'effecteur et une fonction de coordination main-oeil. Comme la précédente tâche, un algorithme d'apprentissage par renforcement profond est utilisé avec un signal de récompense faiblement supervisé. Le but de cette tâche est d'être capable de localiser l'effecteur du robot à partir des coordonnées articulaires. La dernière tâche utilise les compétences apprises lors des deux précédentes étapes pour apprendre au robot à atteindre un objet. Cet apprentissage utilise les mêmes aprioris que pour les tâches précédentes. En plus de la tâche d'atteinte, un predicteur d'atteignabilité d'objet est appris. La principale contribution de ces travaux est l'apprentissage d'une tâche de robotique complexe en n'utilisant que très peu de supervision. / The thesis is focused on learning a complex manipulation robotics task using little knowledge. More precisely, the concerned task consists in reaching an object with a serial arm and the objective is to learn it without camera calibration parameters, forward kinematics, handcrafted features, or expert demonstrations. Deep reinforcement learning algorithms suit well to this objective. Indeed, reinforcement learning allows to learn sensori-motor mappings while dispensing with dynamics. Besides, deep learning allows to dispense with handcrafted features for the state spacerepresentation. However, it is difficult to specify the objectives of the learned task without requiring human supervision. Some solutions imply expert demonstrations or shaping rewards to guiderobots towards its objective. The latter is generally computed using forward kinematics and handcrafted visual modules. Another class of solutions consists in decomposing the complex task. Learning from easy missions can be used, but this requires the knowledge of a goal state. Decomposing the whole complex into simpler sub tasks can also be utilized (hierarchical learning) but does notnecessarily imply a lack of human supervision. Alternate approaches which use several agents in parallel to increase the probability of success can be used but are costly. In our approach,we decompose the whole reaching task into three simpler sub tasks while taking inspiration from the human behavior. Indeed, humans first look at an object before reaching it. The first learned task is an object fixation task which is aimed at localizing the object in the 3D space. This is learned using deep reinforcement learning and a weakly supervised reward function. The second task consists in learning jointly end-effector binocular fixations and a hand-eye coordination function. This is also learned using a similar set-up and is aimed at localizing the end-effector in the 3D space. The third task uses the two prior learned skills to learn to reach an object and uses the same requirements as the two prior tasks: it hardly requires supervision. In addition, without using additional priors, an object reachability predictor is learned in parallel. The main contribution of this thesis is the learning of a complex robotic task with weak supervision.
58

Knowledge reuse for deep reinforcement learning. / Reutilização do conhecimento para aprendizado por reforço profundo.

Glatt, Ruben 12 June 2019 (has links)
With the rise of Deep Learning the field of Artificial Intelligence (AI) Research has entered a new era. Together with an increasing amount of data and vastly improved computing capabilities, Machine Learning builds the backbone of AI, providing many of the tools and algorithms that drive development and applications. While we have already achieved many successes in the fields of image recognition, language processing, recommendation engines, robotics, or autonomous systems, most progress was achieved when the algorithms were focused on learning only a single task with little regard to effort and reusability. Since learning a new task from scratch often involves an expensive learning process, in this work, we are considering the use of previously acquired knowledge to speed up the learning of a new task. For that, we investigated the application of Transfer Learning methods for Deep Reinforcement Learning (DRL) agents and propose a novel framework for knowledge preservation and reuse. We show, that the knowledge transfer can make a big difference if the source knowledge is chosen carefully in a systematic approach. To get to this point, we provide an overview of existing literature of methods that realize knowledge transfer for DRL, a field which has been starting to appear frequently in the relevant literature only in the last two years. We then formulate the Case-based Reasoning methodology, which describes a framework for knowledge reuse in general terms, in Reinforcement Learning terminology to facilitate the adaption and communication between the respective communities. Building on this framework, we propose Deep Case-based Policy Inference (DECAF) and demonstrate in an experimental evaluation the usefulness of our approach for sequential task learning with knowledge preservation and reuse. Our results highlight the benefits of knowledge transfer while also making aware of the challenges that come with it. We consider the work in this area as an important step towards more stable general learning agents that are capable of dealing with the most complex tasks, which would be a key achievement towards Artificial General Intelligence. / Com a evolução da Aprendizagem Profunda (Deep Learning), o campo da Inteligência Artificial (IA) entrou em uma nova era. Juntamente com uma quantidade crescente de dados e recursos computacionais cada vez mais aprimorados, o Aprendizado de Máquina estabelece a base para a IA moderna, fornecendo muitas das ferramentas e algoritmos que impulsionam seu desenvolvimento e aplicações. Apesar dos muitos sucessos nas áreas de reconhecimento de imagem, processamento de linguagem natural, sistemas de recomendação, robótica e sistemas autônomos, a maioria dos avanços foram feitos focando no aprendizado de apenas uma única tarefa, sem muita atenção aos esforços dispendidos e reusabilidade da solução. Como o aprendizado de uma nova tarefa geralmente envolve um processo de aprendizado despendioso, neste trabalho, estamos considerando o reúso de conhecimento para acelerar o aprendizado de uma nova tarefa. Para tanto, investigamos a aplicação dos métodos de Transferência de Aprendizado (Transfer Learning) para agentes de Aprendizado por Reforço profundo (Deep Reinforcement Learning - DRL) e propomos um novo arcabouço para preservação e reutilização de conhecimento. Mostramos que a transferência de conhecimento pode fazer uma grande diferença no aprendizado se a origem do conhecimento for escolhida cuidadosa e sistematicamente. Para chegar a este ponto, nós fornecemos uma visão geral da literatura existente de métodos que realizam a transferência de conhecimento para DRL, um campo que tem despontado com frequência na literatura relevante apenas nos últimos dois anos. Em seguida, formulamos a metodologia Raciocínio baseado em Casos (Case-based Reasoning), que descreve uma estrutura para reutilização do conhecimento em termos gerais, na terminologia de Aprendizado por Reforço, para facilitar a adaptação e a comunicação entre as respectivas comunidades. Com base nessa metodologia, propomos Deep Casebased Policy Inference (DECAF) e demonstramos, em uma avaliação experimental, a utilidade de nossa proposta para a aprendizagem sequencial de tarefas, com preservação e reutilização do conhecimento. Nossos resultados destacam os benefícios da transferência de conhecimento e, ao mesmo tempo, conscientizam os desafios que a acompanham. Consideramos o trabalho nesta área como um passo importante para agentes de aprendizagem mais estáveis, capazes de lidar com as tarefas mais complexas, o que seria um passo fundamental para a Inteligência Geral Artificial.
59

Reinforcement learning and reward estimation for dialogue policy optimisation

Su, Pei-Hao January 2018 (has links)
Modelling dialogue management as a reinforcement learning task enables a system to learn to act optimally by maximising a reward function. This reward function is designed to induce the system behaviour required for goal-oriented applications, which usually means fulfilling the user’s goal as efficiently as possible. However, in real-world spoken dialogue systems, the reward is hard to measure, because the goal of the conversation is often known only to the user. Certainly, the system can ask the user if the goal has been satisfied, but this can be intrusive. Furthermore, in practice, the reliability of the user’s response has been found to be highly variable. In addition, due to the sparsity of the reward signal and the large search space, reinforcement learning-based dialogue policy optimisation is often slow. This thesis presents several approaches to address these problems. To better evaluate a dialogue for policy optimisation, two methods are proposed. First, a recurrent neural network-based predictor pre-trained from off-line data is proposed to estimate task success during subsequent on-line dialogue policy learning to avoid noisy user ratings and problems related to not knowing the user’s goal. Second, an on-line learning framework is described where a dialogue policy is jointly trained alongside a reward function modelled as a Gaussian process with active learning. This mitigates the noisiness of user ratings and minimises user intrusion. It is shown that both off-line and on-line methods achieve practical policy learning in real-world applications, while the latter provides a more general joint learning system directly from users. To enhance the policy learning speed, the use of reward shaping is explored and shown to be effective and complementary to the core policy learning algorithm. Furthermore, as deep reinforcement learning methods have the potential to scale to very large tasks, this thesis also investigates the application to dialogue systems. Two sample-efficient algorithms, trust region actor-critic with experience replay (TRACER) and episodic natural actor-critic with experience replay (eNACER), are introduced. In addition, a corpus of demonstration data is utilised to pre-train the models prior to on-line reinforcement learning to handle the cold start problem. Combining these two methods, a practical approach is demonstrated to effectively learn deep reinforcement learning-based dialogue policies in a task-oriented information seeking domain. Overall, this thesis provides solutions which allow truly on-line and continuous policy learning in spoken dialogue systems.
60

Quoting behaviour of a market-maker under different exchange fee structures / Quoting behaviour of a market-maker under different exchange fee structures

Kiseľ, Rastislav January 2018 (has links)
During the last few years, market micro-structure research has been active in analysing the dependence of market efficiency on different market character­ istics. Make-take fees are one of those topics as they might modify the incen­ tives for participating agents, e.g. broker-dealers or market-makers. In this thesis, we propose a Hawkes process-based model that captures statistical differences arising from different fee regimes and we estimate the differences on limit order book data. We then use these estimates in an attempt to measure the execution quality from the perspective of a market-maker. We appropriate existing theoretical market frameworks, however, for the pur­ pose of hireling optimal market-making policies we apply a novel method of deep reinforcement learning. Our results suggest, firstly, that maker-taker exchanges provide better liquidity to the markets, and secondly, that deep reinforcement learning methods may be successfully applied to the domain of optimal market-making. JEL Classification Keywords Author's e-mail Supervisor's e-mail C32, C45, C61, C63 make-take fees, Hawkes process, limit order book, market-making, deep reinforcement learn­ ing kiselrastislavSgmail.com barunik@f sv.cuni.cz

Page generated in 0.1187 seconds