Global ETD Search

51	Regret analysis of constrained irreducible MDPs with reset action / リセット行動が存在する制約付き既約MDPに対するリグレット解析 Watanabe, Takashi 23 March 2020 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(人間・環境学) / 甲第22535号 / 人博第938号 / 新制\|\|人\|\|223(附属図書館) / 2019\|\|人博\|\|938(吉田南総合図書館) / 京都大学大学院人間・環境学研究科共生人間学専攻 / (主査)准教授櫻川貴司, 教授立木秀樹, 教授日置尋久 / 学位規則第4条第1項該当 / Doctor of Human and Environmental Studies / Kyoto University / DGAM reinforcement learning long-term average reward constrained Markov decision processes regret online-learning 361
52	Innovative Simulation and Tree Models and Reinforcement Learning Methods with Applications in Cybersecurity Liu, Enhao January 2021 (has links) No description available. Industrial Engineering Optimal Trees Discrete Event Simulation Cybersecurity
53	Solving Large MDPs Quickly with Partitioned Value Iteration Wingate, David 14 June 2004 (has links) (PDF) Value iteration is not typically considered a viable algorithm for solving large-scale MDPs because it converges too slowly. However, its performance can be dramatically improved by eliminating redundant or useless backups, and by backing up states in the right order. We present several methods designed to help structure value dependency, and present a systematic study of companion prioritization techniques which focus computation in useful regions of the state space. In order to scale to solve ever larger problems, we evaluate all enhancements and methods in the context of parallelizability. Using the enhancements, we discover that in many instances the limiting factor of the algorithms is no longer time, but space. We thus evaluate all metrics and decisions with respect to cache performance. We generate a family of algorithms by combining several of the methods discussed, and present empirical evidence demonstrating that performance can improve by several orders of magnitude for real-world problems, while preserving accuracy and convergence guarantees. Machine learning reinforcement learning value iteration Markov Decision Processes Computer Sciences
54	Exploiting Structure in Coordinating Multiple Decision Makers Mostafa, Hala 01 September 2011 (has links) This thesis is concerned with sequential decision making by multiple agents, whether they are acting cooperatively to maximize team reward or selfishly trying to maximize their individual rewards. The practical intractability of this general problem led to efforts in identifying special cases that admit efficient computation, yet still represent a wide enough range of problems. In our work, we identify the class of problems with structured interactions, where actions of one agent can have non-local effects on the transitions and/or rewards of another agent. We addressed the following research questions: 1) How can we compactly represent this class of problems? 2) How can we efficiently calculate agent policies that maximize team reward (for cooperative agents) or achieve equilibrium (selfinterested agents)? 3) How can we exploit structured interactions to make reasoning about communication offline tractable? For representing our class of problems, we developed a new decision-theoretic model, Event-Driven Interactions with Complex Rewards (EDI-CR), that explicitly represents structured interactions. EDI-CR is a compact yet general representation capable of capturing problems where the degree of coupling among agents ranges from complete independence to complete dependence. For calculating agent policies, we draw on several techniques from the field of mathematical optimization and adapt them to exploit the special structure in EDI-CR. We developed a Mixed Integer Linear Program formulation of EDI-CR with cooperative agents that results in programs much more compact and faster to solve than formulations ignoring structure. We also investigated the use of homotopy methods as an optimization technique, as well as formulation of self-interested EDI-CR as a system of non-linear equations. We looked at the issue of communication in both cooperative and self-interested settings. For the cooperative setting, we developed heuristics that assess the impact of potential communication points and add the ones with highest impact to the agents' decision problems. Our heuristics successfully pick communication points that improve team reward while keeping problem size manageable. Also, by controlling the amount of communication introduced by a heuristic, our approach allows us to control the tradeoff between solution quality and problem size. For self-interested agents, we look at an example setting where communication is an integral part of problem solving, but where the self-interested agents have a reason to be reticent (e.g. privacy concerns). We formulate this problem as a game of incomplete information and present a general algorithm for calculating approximate equilibrium profile in this class of games. coordination decision processes decision theory game-theory multi-agent systems Computer Sciences
55	DYNAMIC DECISION APPROXIMATE EMPIRICAL REWARD (DDAER) PROCESSES Xie, Chen 29 September 2014 (has links) No description available. Engineering Industrial Engineering
56	Data-Driven Cyber Vulnerability Maintenance of Network Vulnerabilities with Markov Decision Processes Jiang, Tianyu 23 October 2017 (has links) No description available. Operations Research Markov Decision Processes
57	Optimal and Simulation-Based Approximate Dynamic Programming Approaches for the Control of Re-Entrant Line Manufacturing Models Ramirez, Jose A. 22 November 2010 (has links) No description available. Electrical Engineering approximate dynamic programming re-entrant lines queueing networks optimal control Markov decision processes
58	Dynamic Programming under Parametric Uncertainty with Applications in Cyber Security and Project Management Hou, Chengjun 01 October 2015 (has links) No description available. Industrial Engineering Operations Research Cyber security Dynamic programming Markov decision processes Parametric uncertainty Project management
59	Robust Optimal Maintenance Policies and Charts for Cyber Vulnerability Management Afful-Dadzi, Anthony 18 December 2012 (has links) No description available. Industrial Engineering Cyber Attack Value function Markov Decision Processes Control Charts
60	Cross-layer Control for Adaptive Video Streaming over Wireless Access Networks Abdallah AbouSheaisha, Abdallah Sabry 17 March 2016 (has links) Over the last decade, the wide deployment of wireless access technologies (e.g. WiFi, 3G, and LTE) and the remarkable growth in the volume of streaming video content have significantly altered the telecommunications field. These developments introduce new challenges to the research community including the need to develop new solutions (e.g. traffic models and transport protocols) to address changing traffic patterns and the characteristics of wireless links and the need for new evaluation methods that generate higher fidelity results under more realistic scenarios. Unfortunately, for the last two decades, simulation studies have been the main tool for researchers in wireless networks. In spite of the advantages of simulation studies, overall they have had a negative influence on the credibility of published results. In partial response to this simulation crisis, the research community has adopted testing and evaluation using implementation-based experiments. Implementation-based experiments include field experiments, prototypes, emulations, and testbeds. An example of an implementation-based experiment is the MANIAC Challenge, a wireless networking competition that we designed and hosted, which included creation and operation of ad hoc networks using commodity hardware. However, the lack of software tools to facilitate these sorts of experiments has created new challenges. Currently, researchers must practice kernel programming in order to implement networking experiments, and there is an urgent need to lower the barriers of entry to wireless network experimentation. With respect to the growth in video traffic over wireless networks, the main challenge is a mismatch between the design concepts of current internet protocols (e.g. the Transport Control Protocol (TCP)) and the reality of modern wireless networks and streaming video techniques. Internet protocols were designed to be deployed over wired networks and often perform poorly over wireless links; video encoding is highly loss tolerant and delay-constrained and yet, for reasons of expedience is carried using protocols that emphasize reliable delivery at the cost of potentially high delay. This dissertation addresses the lack of software tools to support implementation-based networking experiments and the need to improve the performance of video streaming over wireless access networks. We propose a new software tool that allows researchers to implement experiments without a need to become kernel programmers. The new tool, called the Flexible Internetwork Stack (FINS) Framework, is available under an open source license. With our tool, researchers can implement new network layers, protocols, and algorithms, and redesign the interconnections between the protocols. It offers logging and monitoring capabilities as well as dynamic reconfigurability of the modules' attributes and interconnections during runtime. We present details regarding the architecture, design, and implementation of the FINS Framework and provide an assessment of the framework including both qualitative and quantitative comparison with significant previous tools. We also address the problem of HTTP-based adaptive video streaming (HAVS) over WiFi access networks. We focus on the negative influence of wireless last-hop connections on network utilization and the end-user quality of experience (QoE). We use a cross-layer approach to design three controllers. The first and second controllers adopt a heuristic cross-layer design, while the third controller formulates the HAVS problem as a Markov decision process (MDP). By solving the model using reinforcement learning, we achieved 20% performance improvement (after enough training) with respect to the performance of the best heuristic controller under unstable channel conditions. Our simulation results are backed by a system prototype using the FINS Framework. Although it may seem predictable to achieve more gain in performance and in QoE by using cross-layer design, this dissertation not only presents a new technique that improves performance, but also suggests that it is time to move cross-layer and machine-learning-based approaches from the research field to actual deployment. It is time to move cognitive network techniques from the simulation environment to real world implementations. / Ph. D. Wireless Networks Cross-layer Optimization HTTP Adaptive Video Streaming Markov Decision Processes Reinforcement Learning

Search results