Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
41 |
QoE-Fair Video Streaming over DASHAltamimi, Sadi 19 December 2018 (has links)
Video streaming has become, and is expected to remain, the dominant type of traffic over the Internet. With this high demand for multimedia streaming, there is always a question on how to provide acceptable and fair Quality of Experience (QoE) for consumers of the over-the-top video services, despite the best-effort nature of the Internet and the limited network resources, shared by concurrent users. MPEG-DASH, as one of the most widely used standards of HTTP-based adaptive streaming, uses a client-side rate adaptation algorithms; which is known to suffer from two practical challenges: in one hand, clients use fixed heuristics that have been fine-tuned according to strict assumptions about deployment environments which limit its ability to generalize across network conditions. On the other hand, the absence of collaboration among DASH clients leads to unfair bandwidth allocation, and typically ends up in an unbalanced equilibrium point. We believe that augmenting a server-side rate adaptation significantly improves the fairness of network bandwidth allocation among concurrent users. We have formulated the problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) model, and used RL to train two neural networks to find an optimal solution to the proposed Dec-POMDP problem in a distributed way. We showed that our proposed client-server collaboration outperforms the state-of-the-art schemes in terms of QoE-efficiency, QoE-fairness, and social welfare by as much as 16%, 21%, and 24% respectively.
|
42 |
Counter Autonomy Defense for Aerial Autonomous SystemsMark E Duntz (8747724) 22 April 2020 (has links)
<div>Here, we explore methods of counter autonomy defense for aerial autonomous multi-agent systems. First, the case is made for vast capabilities made possible by these systems. Recognizing that widespread use is likely on the horizon, we assert that it will be necessary for system designers to give appropriate attention to the security and vulnerabilities of such systems. We propose a method of learning-based resilient control for the multi-agent formation tracking problem, which uses reinforcement learning and neural networks to attenuate adversarial inputs and ensure proper operation. We also devise a learning-based method of cyber-physical attack detection for UAVs, which requires no formal system dynamics model yet learns to recognize abnormal behavior. We also utilize similar techniques for time signal analysis to achieve epileptic seizure prediction. Finally, a blockchain-based method for network security in the presence of Byzantine agents is explored.</div>
|
43 |
Regret Minimization in Structured Reinforcement LearningTranos, Damianos January 2021 (has links)
We consider a class of sequential decision making problems in the presence of uncertainty, which belongs to the field of Reinforcement Learning (RL). Specifically, we study discrete Markov decision Processes (MDPs) which model a decision maker or agent that interacts with a stochastic and dynamic environment and receives feedback from it in the form of a reward. The agent seeks to maximize a notion of cumulative reward. Because the environment (both the system dynamics and reward function) is unknown, it faces an exploration-exploitation dilemma, where it must balance exploring its available actions or exploiting what it believes to be the best one. This dilemma captured by the notion of regret, which compares the rewards that the agent has accumulated thus far with those that would have been obtained by an optimal policy. The agent is then said to behave optimally, if it minimizes its regret. This thesis investigates the fundamental regret limits that can be achieved by any agent. We derive general asymptotic and problem specific regret lower bounds for the cases of ergodic and deterministic MDPs. We make these explicit for ergodic MDPs that are unstructured, for MDPs with Lipschitz transitions and rewards, as well as for deterministic MDPs that satisfy a decoupling property. Furthermore, we propose DEL, an algorithm that is valid for any ergodic MDP with any structure and whose regret upper bound matches the associated regret lower bounds, thus being truly optimal. For this algorithm, we present theoretical regret guarantees as well as a numerical demonstration that verifies its ability to exploit the underlying structure. / <p>QC 20210603</p>
|
44 |
Distributed Online Learning in Cognitive Radar NetworksHoward, William Waddell 21 December 2023 (has links)
Cognitive radar networks (CRNs) were first proposed in 2006 by Simon Haykin, shortly after the introduction of cognitive radar. In order for CRNs to benefit from many of the optimization techniques developed for cognitive radar, they must have some method of coordination and control. Both centralized and distributed architectures have been proposed, and both have drawbacks. This work addresses gaps in the literature by providing the first consideration of the problems that appear when typical cognitive radar tools are extended into networks. This work first examines the online learning techniques available to distributed CRNs, enabling optimal resource allocation without requiring a dedicated communication resource. While this problem has been addressed for single-node cognitive radar, we provide the first consideration of mutual interference in such networks. We go on to propose the first hybrid cognitive radar network structure which takes advantage of central feedback while maintaining the benefits of distributed networks. Then, we go on to investigate a novel problem of timely updating in CRNs, addressing questions of target update frequency and node updating methods. We draw from the Age of Information literature to propose Bellman-optimal solutions. Finally, we introduce the notion of mode control, and develop a way to select between active and passive target observation. / Doctor of Philosophy / Cognitive radar was inspired by biological models, where animals such as dolphins or bats use vocal pulses to form a model of their environment. As these animals seek after prey, they use information they observe to modify their vocal pulses. Cognitive radar networks are an extension of this model to a group of radar devices, which must work together cooperatively to detect and track targets. As the scene changes in time, the radar nodes in the cognitive radar network must change their operating parameters to continue performing well. This networked problem has issues not present in the single-node cognitive radar problem. In particular, as each node in the network changes operating parameters, it risks degrading the performance of the other nodes. In the contribution of this dissertation, we investigate the techniques that a cognitive radar network can use to avoid these cases of mutual performance degradation, and in particular, we investigate how this can be done without advance coordination between the nodes. In the second contribution, we go on to explore what performance improvements are available as central control is introduced. The third and fourth contributions investigate further efficiencies available to a cognitive radar network. The third contribution discusses how a resource-constrained network should communicate updates to a central aggregator. Lastly, the fourth contribution investigates additional estimation tools available to such a network, and how the network should choose between these modes.
|
45 |
Beats, Bots, and Bananas: Modeling reinforcement learning of sensorimotor synchronizationOmmi, Yassaman January 2024 (has links)
This thesis investigates the computational principles underlying sensorimotor synchronization (SMS) through the novel application of deep reinforcement learning (RL). SMS, the coordination of rhythmic movement with external stimuli, is essential for human activities like music performance and social interaction, yet its neural mechanisms and learning processes are not fully understood.
We present a computational framework utilizing recurrent neural networks with Long Short-Term Memory (LSTM) units, trained via RL, to model SMS behavior. This approach allows for the exploration of how different reward structures shape the acquisition and execution of synchronization skills. Our model is evaluated on both steady-state synchronization and perturbation response tasks, paralleling human SMS studies.
Key findings reveal that agents trained with a combined reward—minimizing next-beat asynchrony and maintaining interval accuracy—exhibit human-like adaptive behaviors. Notably, these agents exhibited asymmetric error correction, making larger adjustments for late versus early taps, a phenomenon documented in human subjects. This suggests that such asymmetry may arise from the inherent reward structure of the task rather than from specific neural architectures.
While our model did not consistently reproduce the negative mean asynchrony observed in human steady-state tapping, it demonstrated anticipatory behavior in response to perturbations. This offers new insights into how the brain might learn and execute rhythmic tasks, indicating that anticipatory strategies in human synchronization could naturally arise from processing rewards and timing errors.
Our work contributes to the growing integration of machine learning techniques with cognitive neuroscience, offering new computational insights into the acquisition of timing skills. It establishes a flexible framework, which can be extended for future investigations in studying more complex rhythms, coordination between individuals, and even the neural basis of rhythm perception and production. / Thesis / Master of Science (MSc) / Have you ever wondered how we naturally tap our foot in time with music? This thesis investigates this human ability, known as sensorimotor synchronization, using artificial intelligence. By creating artificial agents that learn to tap along with a steady beat through reinforcement learning—like a person tapping to a metronome—we aimed to understand how the brain acquires this skill.
Our experiments showed that how we define success, significantly affects how the agents learn the skill. Notably, when we rewarded both precise timing and consistent tapping, the agents' behavior closely resembled that of humans. They even exhibited a human-like pattern in error correction, making larger adjustments when tapping too late rather than too early.
This research offers new insights into how our brains process and learn rhythm and timing. It also lays the groundwork for developing AI systems capable of replicating human-like timing behaviors, with potential applications in music technology and robotics.
|
46 |
Deep Reinforcement Learning of IoT System Dynamics for Optimal Orchestration and Boosted EfficiencyHaowei Shi (16636062) 30 August 2023 (has links)
<p>This thesis targets the orchestration challenge of the Wearable Internet of Things (IoT) systems, for optimal configurations of the system in terms of energy efficiency, computing, and data transmission activities. We have firstly investigated the reinforcement learning on the simulated IoT environments to demonstrate its effectiveness, and afterwards studied the algorithm on the real-world wearable motion data to show the practical promise. More specifically, firstly, challenge arises in the complex massive-device orchestration, meaning that it is essential to configure and manage the massive devices and the gateway/server. The complexity on the massive wearable IoT devices, lies in the diverse energy budget, computing efficiency, etc. On the phone or server side, it lies in how global diversity can be analyzed and how the system configuration can be optimized. We therefore propose a new reinforcement learning architecture, called boosted deep deterministic policy gradient, with enhanced actor-critic co-learning and multi-view state?transformation. The proposed actor-critic co-learning allows for enhanced dynamics abstraction through the shared neural network component. Evaluated on a simulated massive-device task, the proposed deep reinforcement learning framework has achieved much more efficient system configurations with enhanced computing capabilities and improved energy efficiency. Secondly, we have leveraged the real-world motion data to demonstrate the potential of leveraging reinforcement learning to optimally configure the motion sensors. We used paradigms in sequential data estimation to obtain estimated data for some sensors, allowing energy savings since these sensors no longer need to be activated to collect data for estimation intervals. We then introduced the Deep Deterministic Policy Gradient algorithm to learn to control the estimation timing. This study will provide a real-world demonstration of maximizing energy efficiency wearable IoT applications while maintaining data accuracy. Overall, this thesis will greatly advance the wearable IoT system orchestration for optimal system configurations. </p>
|
47 |
Offline Reinforcement Learning from Imperfect Human Guidance / 不完全な人間の誘導からのオフライン強化学習Zhang, Guoxi 24 July 2023 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24856号 / 情博第838号 / 新制||情||140(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 鹿島, 久嗣, 教授 河原, 達也, 教授 森本, 淳 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
48 |
Multi-Task Reinforcement Learning: From Single-Agent to Multi-Agent SystemsTrang, Matthew Luu 06 January 2023 (has links)
Generalized collaborative drones are a technology that has many potential benefits. General purpose drones that can handle exploration, navigation, manipulation, and more without having to be reprogrammed would be an immense breakthrough for usability and adoption of the technology. The ability to develop these multi-task, multi-agent drone systems is limited by the lack of available training environments, as well as deficiencies of multi-task learning due to a phenomenon known as catastrophic forgetting. In this thesis, we present a set of simulation environments for exploring the abilities of multi-task drone systems and provide a platform for testing agents in incremental single-agent and multi-agent learning scenarios. The multi-task platform is an extension of an existing drone simulation environment written in Python using the PyBullet Physics Simulation Engine, with these environments incorporated. Using this platform, we present an analysis of Incremental Learning and detail the beneficial impacts of using the technique for multi-task learning, with respect to multi-task learning speed and catastrophic forgetting. Finally, we introduce a novel algorithm, Incremental Learning with Second-Order Approximation Regularization (IL-SOAR), to mitigate some of the effects of catastrophic forgetting in multi-task learning. We show the impact of this method and contrast the performance relative to a multi-agent multi-task approach using a centralized policy sharing algorithm. / Master of Science / Machine Learning techniques allow drones to be trained to achieve tasks which are otherwise time-consuming or difficult. The goal of this thesis is to facilitate the work of creating these complex drone machine learning systems by exploring Reinforcement Learning (RL), a field of machine learning which involves learning the correct actions to take through experience. Currently, RL methods are effective in the design of drones which are able to solve one particular task. The next step in this technology is to develop RL systems which are able to handle generalization and perform well across multiple tasks. In this thesis, simulation environments for drones to learn complex tasks are created, and algorithms which are able to train drones in multiple hard tasks are developed and tested. We explore the benefits of using a specific multi-task training technique known as Incremental Learning. Additionally, we consider one of the prohibitive factors of multi-task machine learning-based solutions, the degradation problem of agent performance on previously learned tasks, known as catastrophic forgetting. We create an algorithm that aims to prevent the impact of forgetting when training drones sequentially on new tasks. We contrast this approach with a multi-agent solution, where multiple drones learn simultaneously across the tasks.
|
49 |
Action selection in modular reinforcement learningZhang, Ruohan 16 September 2014 (has links)
Modular reinforcement learning is an approach to resolve the curse of dimensionality problem in traditional reinforcement learning. We design and implement a modular reinforcement learning algorithm, which is based on three major components: Markov decision process decomposition, module training, and global action selection. We define and formalize module class and module instance concepts in decomposition step. Under our framework of decomposition, we train each modules efficiently using SARSA($\lambda$) algorithm. Then we design, implement, test, and compare three action selection algorithms based on different heuristics: Module Combination, Module Selection, and Module Voting. For last two algorithms, we propose a method to calculate module weights efficiently, by using standard deviation of Q-values of each module. We show that Module Combination and Module Voting algorithms produce satisfactory performance in our test domain. / text
|
50 |
Design of optimal neural network control strategies with minimal a priori knowledgeParaskevopoulos, Vasileios January 2000 (has links)
No description available.
|
Page generated in 0.1077 seconds