• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 658
  • 81
  • 66
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1072
  • 1072
  • 260
  • 218
  • 196
  • 178
  • 161
  • 160
  • 154
  • 150
  • 144
  • 127
  • 123
  • 122
  • 113
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Counter Autonomy Defense for Aerial Autonomous Systems

Mark E Duntz (8747724) 22 April 2020 (has links)
<div>Here, we explore methods of counter autonomy defense for aerial autonomous multi-agent systems. First, the case is made for vast capabilities made possible by these systems. Recognizing that widespread use is likely on the horizon, we assert that it will be necessary for system designers to give appropriate attention to the security and vulnerabilities of such systems. We propose a method of learning-based resilient control for the multi-agent formation tracking problem, which uses reinforcement learning and neural networks to attenuate adversarial inputs and ensure proper operation. We also devise a learning-based method of cyber-physical attack detection for UAVs, which requires no formal system dynamics model yet learns to recognize abnormal behavior. We also utilize similar techniques for time signal analysis to achieve epileptic seizure prediction. Finally, a blockchain-based method for network security in the presence of Byzantine agents is explored.</div>
42

Regret Minimization in Structured Reinforcement Learning

Tranos, Damianos January 2021 (has links)
We consider a class of sequential decision making problems in the presence of uncertainty, which belongs to the field of Reinforcement Learning (RL). Specifically, we study discrete Markov decision Processes (MDPs) which model a decision maker or agent that interacts with a stochastic and dynamic environment and receives feedback from it in the form of a reward. The agent seeks to maximize a notion of cumulative reward. Because the environment (both the system dynamics and reward function) is unknown, it faces an exploration-exploitation dilemma, where it must balance exploring its available actions or exploiting what it believes to be the best one. This dilemma captured by the notion of regret, which compares the rewards that the agent has accumulated thus far with those that would have been obtained by an optimal policy. The agent is then said to behave optimally, if it minimizes its regret. This thesis investigates the fundamental regret limits that can be achieved by any agent. We derive general asymptotic and problem specific regret lower bounds for the cases of ergodic and deterministic MDPs. We make these explicit for ergodic MDPs that are unstructured, for MDPs with Lipschitz transitions and rewards, as well as for deterministic MDPs that satisfy a decoupling property. Furthermore, we propose DEL, an algorithm that is valid for any ergodic MDP with any structure and whose regret upper bound matches the associated regret lower bounds, thus being truly optimal. For this algorithm, we present theoretical regret guarantees as well as a numerical demonstration that verifies its ability to exploit the underlying structure. / <p>QC 20210603</p>
43

Distributed Online Learning in Cognitive Radar Networks

Howard, William Waddell 21 December 2023 (has links)
Cognitive radar networks (CRNs) were first proposed in 2006 by Simon Haykin, shortly after the introduction of cognitive radar. In order for CRNs to benefit from many of the optimization techniques developed for cognitive radar, they must have some method of coordination and control. Both centralized and distributed architectures have been proposed, and both have drawbacks. This work addresses gaps in the literature by providing the first consideration of the problems that appear when typical cognitive radar tools are extended into networks. This work first examines the online learning techniques available to distributed CRNs, enabling optimal resource allocation without requiring a dedicated communication resource. While this problem has been addressed for single-node cognitive radar, we provide the first consideration of mutual interference in such networks. We go on to propose the first hybrid cognitive radar network structure which takes advantage of central feedback while maintaining the benefits of distributed networks. Then, we go on to investigate a novel problem of timely updating in CRNs, addressing questions of target update frequency and node updating methods. We draw from the Age of Information literature to propose Bellman-optimal solutions. Finally, we introduce the notion of mode control, and develop a way to select between active and passive target observation. / Doctor of Philosophy / Cognitive radar was inspired by biological models, where animals such as dolphins or bats use vocal pulses to form a model of their environment. As these animals seek after prey, they use information they observe to modify their vocal pulses. Cognitive radar networks are an extension of this model to a group of radar devices, which must work together cooperatively to detect and track targets. As the scene changes in time, the radar nodes in the cognitive radar network must change their operating parameters to continue performing well. This networked problem has issues not present in the single-node cognitive radar problem. In particular, as each node in the network changes operating parameters, it risks degrading the performance of the other nodes. In the contribution of this dissertation, we investigate the techniques that a cognitive radar network can use to avoid these cases of mutual performance degradation, and in particular, we investigate how this can be done without advance coordination between the nodes. In the second contribution, we go on to explore what performance improvements are available as central control is introduced. The third and fourth contributions investigate further efficiencies available to a cognitive radar network. The third contribution discusses how a resource-constrained network should communicate updates to a central aggregator. Lastly, the fourth contribution investigates additional estimation tools available to such a network, and how the network should choose between these modes.
44

Deep Reinforcement Learning of IoT System Dynamics  for Optimal Orchestration and Boosted Efficiency

Haowei Shi (16636062) 30 August 2023 (has links)
<p>This thesis targets the orchestration challenge of the Wearable Internet of Things (IoT) systems, for optimal configurations of the system in terms of energy efficiency, computing, and  data transmission activities. We have firstly investigated the reinforcement learning on the  simulated IoT environments to demonstrate its effectiveness, and afterwards studied the algorithm  on the real-world wearable motion data to show the practical promise. More specifically, firstly,  challenge arises in the complex massive-device orchestration, meaning that it is essential to  configure and manage the massive devices and the gateway/server. The complexity on the massive  wearable IoT devices, lies in the diverse energy budget, computing efficiency, etc. On the phone  or server side, it lies in how global diversity can be analyzed and how the system configuration  can be optimized. We therefore propose a new reinforcement learning architecture, called boosted  deep deterministic policy gradient, with enhanced actor-critic co-learning and multi-view state?transformation. The proposed actor-critic co-learning allows for enhanced dynamics abstraction  through the shared neural network component. Evaluated on a simulated massive-device task, the proposed deep reinforcement learning framework has achieved much more efficient system  configurations with enhanced computing capabilities and improved energy efficiency. Secondly, we have leveraged the real-world motion data to demonstrate the potential of leveraging  reinforcement learning to optimally configure the motion sensors. We used paradigms in  sequential data estimation to obtain estimated data for some sensors, allowing energy savings since  these sensors no longer need to be activated to collect data for estimation intervals. We then  introduced the Deep Deterministic Policy Gradient algorithm to learn to control the estimation  timing. This study will provide a real-world demonstration of maximizing energy efficiency wearable IoT applications while maintaining data accuracy. Overall, this thesis will greatly  advance the wearable IoT system orchestration for optimal system configurations.   </p>
45

Offline Reinforcement Learning from Imperfect Human Guidance / 不完全な人間の誘導からのオフライン強化学習

Zhang, Guoxi 24 July 2023 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24856号 / 情博第838号 / 新制||情||140(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 鹿島, 久嗣, 教授 河原, 達也, 教授 森本, 淳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
46

Multi-Task Reinforcement Learning: From Single-Agent to Multi-Agent Systems

Trang, Matthew Luu 06 January 2023 (has links)
Generalized collaborative drones are a technology that has many potential benefits. General purpose drones that can handle exploration, navigation, manipulation, and more without having to be reprogrammed would be an immense breakthrough for usability and adoption of the technology. The ability to develop these multi-task, multi-agent drone systems is limited by the lack of available training environments, as well as deficiencies of multi-task learning due to a phenomenon known as catastrophic forgetting. In this thesis, we present a set of simulation environments for exploring the abilities of multi-task drone systems and provide a platform for testing agents in incremental single-agent and multi-agent learning scenarios. The multi-task platform is an extension of an existing drone simulation environment written in Python using the PyBullet Physics Simulation Engine, with these environments incorporated. Using this platform, we present an analysis of Incremental Learning and detail the beneficial impacts of using the technique for multi-task learning, with respect to multi-task learning speed and catastrophic forgetting. Finally, we introduce a novel algorithm, Incremental Learning with Second-Order Approximation Regularization (IL-SOAR), to mitigate some of the effects of catastrophic forgetting in multi-task learning. We show the impact of this method and contrast the performance relative to a multi-agent multi-task approach using a centralized policy sharing algorithm. / Master of Science / Machine Learning techniques allow drones to be trained to achieve tasks which are otherwise time-consuming or difficult. The goal of this thesis is to facilitate the work of creating these complex drone machine learning systems by exploring Reinforcement Learning (RL), a field of machine learning which involves learning the correct actions to take through experience. Currently, RL methods are effective in the design of drones which are able to solve one particular task. The next step in this technology is to develop RL systems which are able to handle generalization and perform well across multiple tasks. In this thesis, simulation environments for drones to learn complex tasks are created, and algorithms which are able to train drones in multiple hard tasks are developed and tested. We explore the benefits of using a specific multi-task training technique known as Incremental Learning. Additionally, we consider one of the prohibitive factors of multi-task machine learning-based solutions, the degradation problem of agent performance on previously learned tasks, known as catastrophic forgetting. We create an algorithm that aims to prevent the impact of forgetting when training drones sequentially on new tasks. We contrast this approach with a multi-agent solution, where multiple drones learn simultaneously across the tasks.
47

Action selection in modular reinforcement learning

Zhang, Ruohan 16 September 2014 (has links)
Modular reinforcement learning is an approach to resolve the curse of dimensionality problem in traditional reinforcement learning. We design and implement a modular reinforcement learning algorithm, which is based on three major components: Markov decision process decomposition, module training, and global action selection. We define and formalize module class and module instance concepts in decomposition step. Under our framework of decomposition, we train each modules efficiently using SARSA($\lambda$) algorithm. Then we design, implement, test, and compare three action selection algorithms based on different heuristics: Module Combination, Module Selection, and Module Voting. For last two algorithms, we propose a method to calculate module weights efficiently, by using standard deviation of Q-values of each module. We show that Module Combination and Module Voting algorithms produce satisfactory performance in our test domain. / text
48

Design of optimal neural network control strategies with minimal a priori knowledge

Paraskevopoulos, Vasileios January 2000 (has links)
No description available.
49

Learning user modelling strategies for adaptive referring expression generation in spoken dialogue systems

Janarthanam, Srinivasan Chandrasekaran January 2011 (has links)
We address the problem of dynamic user modelling for referring expression generation in spoken dialogue systems, i.e how a spoken dialogue system should choose referring expressions to refer to domain entities to users with different levels of domain expertise, whose domain knowledge is initially unknown to the system. We approach this problem using a statistical planning framework: Reinforcement Learning techniques in Markov Decision Processes (MDP). We present a new reinforcement learning framework to learn user modelling strategies for adaptive referring expression generation (REG) in resource scarce domains (i.e. where no large corpus exists for learning). As a part of the framework, we present novel user simulation models that are sensitive to the referring expressions used by the system and are able to simulate users with different levels of domain knowledge. Such models are shown to simulate real user behaviour more closely than baseline user simulation models. In contrast to previous approaches to user adaptive systems, we do not assume that the user’s domain knowledge is available to the system before the conversation starts. We show that using a small corpus of non-adaptive dialogues it is possible to learn an adaptive user modelling policy in resource scarce domains using our framework. We also show that the learned user modelling strategies performed better in terms of adaptation than hand-coded baselines policies on both simulated and real users. With real users, the learned policy produced around 20% increase in adaptation in comparison to the best performing hand-coded adaptive baseline. We also show that adaptation to user’s domain knowledge results in improving task success (99.47% for learned policy vs 84.7% for hand-coded baseline) and reducing dialogue time of the conversation (11% relative difference). This is because users found it easier to identify domain objects when the system used adaptive referring expressions during the conversations.
50

Neural mechanisms of suboptimal decisions

Chau, Ka Hung Bolton January 2014 (has links)
Making good decisions and adapting flexibly to environmental change are critical to the survival of animals. In this thesis, I investigated neural mechanisms underlying suboptimal decision making in humans and underlying behavioural adaptation in monkeys with the use of functional magnetic resonance imaging (fMRI) in both species. In recent decades, in the neuroscience of decision making, there has been a prominent focus on binary decisions. Whether the presence of an additional third option could have an impact on behaviour and neural signals has been largely overlooked. I designed an experiment in which decisions were made between two options in the presence of a third option. A biophysical model simulation made surprising predictions that more suboptimal decisions were made in the presence of a very poor third alternative. Subsequent human behavioural testing showed consistent results with these predictions. In the ventromedial prefrontal cortex (vmPFC), I found that a value comparison signal that is critical for decision making became weaker in the presence of a poor value third option. The effect contrasts with another prominent potential mechanism during multi-alternative decision making – divisive normalization – the signatures of which were observed in the posterior parietal cortex. It has long been thought that the orbitofrontal cortex (OFC) and amygdala mediate reward-guided behavioural adaptation. However, this viewpoint has been recently challenged. I recorded whole brain activity in macaques using fMRI while they performed an object discrimination reversal task over multiple testing sessions. I identified a lateral OFC (lOFC) region in which activity predicted adaptive win-stay/lose-shift behaviour. In contrast, anterior cingulate cortex (ACC) activity predicted future exploratory decisions regardless of reward outcome. Amygdala and lOFC activity was more strongly coupled for adaptive choice shifting and decoupled for task irrelevant reward memory. Day-to-day fluctuations in signals and signal coupling were correlated with day-to-day fluctuations in performance. These data demonstrate OFC, ACC, and amygdala each make unique contributions to flexible behaviour and credit assignment.

Page generated in 0.1408 seconds