Global ETD Search

81	Stochastic Dynamic Optimization and Games in Operations Management Wei, Wei 12 March 2013 (has links) No description available. Operations Research myopic Markov decision process dynamic program sequential game homogeneous job-lot disposal nonlinear pricing revenue management
82	Using Markov Decision Processes and Reinforcement Learning to Guide Penetration Testers in the Search for Web Vulnerabilities / Användandet av Markov Beslutsprocesser och Förstärkt Inlärning för att Guida Penetrationstestare i Sökandet efter Sårbarheter i Webbapplikationer Pettersson, Anders, Fjordefalk, Ossian January 2019 (has links) Bug bounties are an increasingly popular way of performing penetration tests of web applications. User statistics of bug bounty platforms show that a lot of hackers struggle to find bugs. This report explores a way of using Markov decision processes and reinforcement learning to help hackers find vulnerabilities in web applications by building a tool that suggests attack surfaces to examine and vulnerability reports to read to get the relevant knowledge. The attack surfaces, vulnerabilities and reports are all derived from a taxonomy of web vulnerabilities created in a collaborating project. A Markov decision process (MDP) was defined, this MDP includes the environment, different states of knowledge and actions that can take a user from one state of knowledge to another. To be able to suggest the best possible next action to perform, the MDP uses a policy that describes the value of entering each state. Each state is given a value that is called Q-value. This value indicates how close that state is to another state where a vulnerability has been found. This means that a state has a high Q-value if the knowledge gives a user a high probability of finding a vulnerability and vice versa. This policy was created using a reinforcement learning algorithm called Q-learning. The tool was implemented as a web application using Java Spring Boot and ReactJS. The resulting tool is best suited for new hackers in the learning process. The current version is trained on the indexed reports of the vulnerability taxonomy but future versions should be trained on user behaviour collected from the tool. / Bug bounties är ett alltmer populärt sätt att utföra penetrationstester av webbapplikationer. Användarstatistik från bug bounty-plattformar visar att många hackare har svårt att hitta buggar. Denna rapport undersöker ett sätt att använda Markov-beslutsprocesser och förstärkt inlärning för att hjälpa hackare att hitta sårbarheter i webbapplikationer genom att bygga ett verktyg som föreslår attackytor att undersöka och sårbarhetsrapporter att läsa för att tillgodogöra sig rätt kunskaper. Attackytor, sårbarheter och rapporter är alla hämtade från en taxonomi över webbsårbarheter skapad i ett samarbetande projekt. En Markovbeslutsprocess (MDP) definierades. Denna MDP inkluderar miljön, olika kunskapstillstånd och handlingar som kan ta användaren från ett kunskapstillstånd till ett annat. För kunna föreslå nästa handling på bästa möjliga sätt använder MDPn en policy som beskriver värdet av att träda in i alla de olika tillstånden. Alla tillstånd ges ett värde som kallas Q-värde. Detta värde indikerar hur nära ett tillstånd har till ett annat tillstånd där en sårbarhet har hittats. Detta betyder att ett tillstånd har ett högt Q-värde om kunskapen ger användaren en hög sannolikhet att hitta en sårbarhet och vice versa. Policyn skapades med hjälp av en typ av förstärkt inlärningsalgoritm kallad Q-inlärning. Verktyget implementerades som en webbapplikation med hjälp av Java Spring Boot och ReactJS. Det resulterande verktyget är bäst lämpat för nya hackare i inlärningsstadiet. Den nuvarande versionen är tränad på indexerade rapporter från sårbarhetstaxonomin men framtida versioner bör tränas på användarbeteende insamlat från verktyget. Computer and Information Sciences Data- och informationsvetenskap
83	Resource allocation and load-shedding policies based on Markov decision processes for renewable energy generation and storage Jimenez, Edwards 01 January 2015 (has links) In modern power systems, renewable energy has become an increasingly popular form of energy generation as a result of all the rules and regulations that are being implemented towards achieving clean energy worldwide. However, clean energy can have drawbacks in several forms. Wind energy, for example can introduce intermittency. In this thesis, we discuss a method to deal with this intermittency. In particular, by shedding some specific amount of load we can avoid a total system breakdown of the entire power plant. The load shedding method discussed in this thesis utilizes a Markov Decision Process with backward policy iteration. This is based on a probabilistic method that chooses the best load-shedding path that minimizes the expected total cost to ensure no power failure. We compare our results with two control policies, a load-balancing policy and a less-load shedding policy. It is shown that the proposed MDP policy outperforms the other control policies and achieves the minimum total expected cost. Natural energy markov decision process mdp load shedding energy storage intermittency expected cost Electrical and Computer Engineering Electrical and Electronics Engineering
84	Hidden Markov models : Identification, control and inverse filtering Mattila, Robert January 2018 (has links) The hidden Markov model (HMM) is one of the workhorse tools in, for example, statistical signal processing and machine learning. It has found applications in a vast number of fields, ranging all the way from bioscience to speech recognition to modeling of user interactions in social networks. In an HMM, a latent state transitions according to Markovian dynamics. The state is only observed indirectly via a noisy sensor – that is, it is hidden. This type of model is at the center of this thesis, which in turn touches upon three main themes. Firstly, we consider how the parameters of an HMM can be estimated from data. In particular, we explore how recently proposed methods of moments can be combined with more standard maximum likelihood (ML) estimation procedures. The motivation for this is that, albeit the ML estimate possesses many attractive statistical properties, many ML schemes have to rely on local-search procedures in practice, which are only guaranteed to converge to local stationary points in the likelihood surface – potentially inhibiting them from reaching the ML estimate. By combining the two types of algorithms, the goal is to obtain the benefits of both approaches: the consistency and low computational complexity of the former, and the high statistical efficiency of the latter. The filtering problem – estimating the hidden state of the system from observations – is of fundamental importance in many applications. As a second theme, we consider inverse filtering problems for HMMs. In these problems, the setup is reversed; what information about an HMM-filtering system is exposed by its state estimates? We show that it is possible to reconstruct the specifications of the sensor, as well as the observations that were made, from the filtering system’s posterior distributions of the latent state. This can be seen as a way of reverse engineering such a system, or as using an alternative data source to build a model. Thirdly, we consider Markov decision processes (MDPs) – systems with Markovian dynamics where the parameters can be influenced by the choice of a control input. In particular, we show how it is possible to incorporate prior information regarding monotonic structure of the optimal decision policy so as to accelerate its computation. Subsequently, we consider a real-world application by investigating how these models can be used to model the treatment of abdominal aortic aneurysms (AAAs). Our findings are that the structural properties of the optimal treatment policy are different than those used in clinical practice – in particular, that younger patients could benefit from earlier surgery. This indicates an opportunity for improved care of patients with AAAs. / <p>QC 20180301</p> hidden markov models system identification method of moments inverse filtering abdominal aortic aneurysm medical markov decision process structure Control Engineering Reglerteknik
85	Optimal Call Admission Control Policies in Wireless Cellular Networks Using Semi Markov Decision Proces Ni, Wenlong January 2008 (has links) No description available. Electrical Engineering Call Admission Control (CAC) Wireless Cellular Network Resource Allocation Scheme Semi Markov Decision Process (SMDP) Optimal Policy
86	Information Freshness and Delay Optimization in Unreliable Wireless Systems Yao, Guidan 02 September 2022 (has links) No description available. Computer Engineering Electrical Engineering
87	Deep Reinforcement Learning for Open Multiagent System Zhu, Tianxing 20 September 2022 (has links) No description available. Computer Science Artificial Intelligence Reinforcement learning Multiagnet systems Artificial intelligence Open environment Deep reinforcement learning Neural networks Markov decision process
88	Integrated and Coordinated Relief Logistics Planning Under Uncertainty for Relief Logistics Operations Kamyabniya, Afshin 22 September 2022 (has links) In this thesis, we explore three critical emergency logistics problems faced by healthcare and humanitarian relief service providers for short-term post-disaster management. In the first manuscript, we investigate various integration mechanisms (fully integrated horizontal-vertical, horizontal, and vertical resource sharing mechanisms) following a natural disaster for a multi-type whole blood-derived platelets, multi-patient logistics network. The goal is to reduce the amount of shortage and wastage of multi-blood-group of platelets in the response phase of relief logistics operations. To solve the logistics model for a large scale problem, we develop a hybrid exact solution approach involving an augmented epsilon-constraint and Lagrangian relaxation algorithms and demonstrate the model's applicability for a case study of an earthquake. Due to uncertainty in the number of injuries needing multi-type blood-derived platelets, we apply a robust optimization version of the proposed model which captures the expected performance of the system. The results show that the performance of the platelets logistics network under coordinated and integrated mechanisms better control the level of shortage and wastage compared with that of a non-integrated network. In the second manuscript, we propose a two-stage casualty evacuation model that involves routing of patients with different injury levels during wildfires. The first stage deals with field hospital selection and the second stage determines the number of patients that can be transferred to the selected hospitals or shelters via different routes of the evacuation network. The goal of this model is to reduce the evacuation response time, which ultimately increase the number of evacuated people from evacuation assembly points under limited time windows. To solve the model for large-scale problems, we develop a two-step meta-heuristic algorithm. To consider multiple sources of uncertainty, a flexible robust approach considering the worst-case and expected performance of the system simultaneously is applied to handle any realization of the uncertain parameters. The results show that the fully coordinated evacuation model in which the vehicles can freely pick up and off-board the patients at different locations and are allowed to start their next operations without being forced to return to the departure point (evacuation assembly points) outperforms the non-coordinated and non-integrated evacuation models in terms of number of evacuated patients. In the third manuscript, we propose an integrated transportation and hospital capacity model to optimize the assignment of relevant medical resources to multi-level-injury patients in the time of a MCI. We develop a finite-horizon MDP to efficiently allocate resources and hospital capacities to injured people in a dynamic fashion under limited time horizon. We solve this model using the linear programming approach to ADP, and by developing a two-phase heuristics based on column generation algorithm. The results show better policies can be derived for allocating limited resources (i.e., vehicles) and hospital capacities to the injured people compared with the benchmark. Each paper makes a worthwhile contribution to the humanitarian relief operations literature and can help relief and healthcare providers optimize resource and service logistics by applying the proposed integration and coordination mechanisms. Emergency Management Disaster Relief Operations Operations Research Optimization Robust optimization stochastic optimization Markov Decision Process Mass Casualty Incidents
89	Computing Quantiles in Markov Reward Models Ummels, Michael, Baier, Christel 10 July 2014 (has links) (PDF) Probabilistic model checking mainly concentrates on techniques for reasoning about the probabilities of certain path properties or expected values of certain random variables. For the quantitative system analysis, however, there is also another type of interesting performance measure, namely quantiles. A typical quantile query takes as input a lower probability bound p ∈ ]0,1] and a reachability property. The task is then to compute the minimal reward bound r such that with probability at least p the target set will be reached before the accumulated reward exceeds r. Quantiles are well-known from mathematical statistics, but to the best of our knowledge they have not been addressed by the model checking community so far. In this paper, we study the complexity of quantile queries for until properties in discrete-time finite-state Markov decision processes with nonnegative rewards on states. We show that qualitative quantile queries can be evaluated in polynomial time and present an exponential algorithm for the evaluation of quantitative quantile queries. For the special case of Markov chains, we show that quantitative quantile queries can be evaluated in pseudo-polynomial time. Markow-Entscheidungsproblem Belohnung Quantil Model Checking Stochastische Modelle Softwareentwicklung Markov Reward Model Markov decision process quantile queries probabilistic model checking software engineering ddc:004 rvk:SK 820 rvk:ST 230
90	Optimisation des Systèmes Partiellement Observables dans les Réseaux Sans-fil : Théorie des jeux, Auto-adaptation et Apprentissage / Optimization of Partially Observable Systems in Wireless Networks : Game Theory, Self-adaptivity and Learning Habachi, Oussama 28 September 2012 (has links) La dernière décennie a vu l'émergence d'Internet et l'apparition des applications multimédia qui requièrent de plus en plus de bande passante, ainsi que des utilisateurs qui exigent une meilleure qualité de service. Dans cette perspective, beaucoup de travaux ont été effectués pour améliorer l'utilisation du spectre sans fil.Le sujet de ma thèse de doctorat porte sur l'application de la théorie des jeux, la théorie des files d'attente et l'apprentissage dans les réseaux sans fil,en particulier dans des environnements partiellement observables. Nous considérons différentes couches du modèle OSI. En effet, nous étudions l'accès opportuniste au spectre sans fil à la couche MAC en utilisant la technologie des radios cognitifs (CR). Par la suite, nous nous concentrons sur le contrôle de congestion à la couche transport, et nous développons des mécanismes de contrôle de congestion pour le protocole TCP. / Since delay-sensitive and bandwidth-intense multimedia applications have emerged in the Internet, the demand for network resources has seen a steady increase during the last decade. Specifically, wireless networks have become pervasive and highly populated.These motivations are behind the problems considered in this dissertation.The topic of my PhD is about the application of game theory, queueing theory and learning techniques in wireless networks under some QoS constraints, especially in partially observable environments.We consider different layers of the protocol stack. In fact, we study the Opportunistic Spectrum Access (OSA) at the Medium Access Control (MAC) layer through Cognitive Radio (CR) approaches.Thereafter, we focus on the congestion control at the transport layer, and we develop some congestion control mechanisms under the TCP protocol.The roadmap of the research is as follows. Firstly, we focus on the MAC layer, and we seek for optimal OSA strategies in CR networks. We consider that Secondary Users (SUs) take advantage of opportunities in licensed channels while ensuring a minimum level of QoS. In fact, SUs have the possibility to sense and access licensed channels, or to transmit their packets using a dedicated access (like 3G). Therefore, a SU has two conflicting goals: seeking for opportunities in licensed channels, but spending energy for sensing those channels, or transmitting over the dedicated channel without sensing, but with higher transmission delay. We model the slotted and the non-slotted systems using a queueing framework. Thereafter, we analyze the non-cooperative behavior of SUs, and we prove the existence of a Nash equilibrium (NE) strategy. Moreover, we measure the gap of performance between the centralized and the decentralized systems using the Price of Anarchy (PoA).Even if the OSA at the MAC layer was deeply investigated in the last decade, the performance of SUs, such as energy consumption or Quality of Service (QoS) guarantee, was somehow ignored. Therefore, we study the OSA taking into account energy consumption and delay. We consider, first, one SU that access opportunistically licensed channels, or transmit its packets through a dedicated channel. Due to the partial spectrum sensing, the state of the spectrum is partially observable. Therefore, we use the Partially Observable Markov Decision Process (POMDP) framework to design an optimal OSA policy for SUs. Specifically, we derive some structural properties of the value function, and we prove that the optimal OSA policy has a threshold structure.Thereafter, we extend the model to the context of multiple SUs. We study the non-cooperative behavior of SUs and we prove the existence of a NE. Moreover, we highlight a paradox in this situation: more opportunities in the licensed spectrum may lead to worst performances for SUs. Thereafter, we focus on the study of spectrum management issues. In fact, we introduce a spectrum manager to the model, and we analyze the hierarchical game between the network manager and SUs.Finally, we focus on the transport layer and we study the congestion control for wireless networks under some QoS and Quality of Experience (QoE) constraints. Firstly, we propose a congestion control algorithm that takes into account applications' parameters and multimedia quality. In fact, we consider that network users maximize their expected multimedia quality by choosing the congestion control strategy. Since users ignore the congestion status at bottleneck links, we use a POMDP framework to determine the optimal congestion control strategy.Thereafter, we consider a subjective measure of the multimedia quality, and we propose a QoE-based congestion control algorithm. This algorithm bases on QoE feedbacks from receivers in order to adapt the congestion window size. Note that the proposed algorithms are designed based on some learning methods in order to face the complexity of solving POMDP problems. Théorie des jeux Apprentissage Auto organisation Processus de décision Markoviens POMDP POSG Evaluation de performances Game theory Learning Self-adaptivity Markov decision process POMDP POSG 519.3 621.382 004.6

Search results