Spelling suggestions: "subject:"decisionprocess"" "subject:"decisionsprocess""
191 |
Stochastic Dynamic Optimization and Games in Operations ManagementWei, Wei 12 March 2013 (has links)
No description available.
|
192 |
Using Markov Decision Processes and Reinforcement Learning to Guide Penetration Testers in the Search for Web Vulnerabilities / Användandet av Markov Beslutsprocesser och Förstärkt Inlärning för att Guida Penetrationstestare i Sökandet efter Sårbarheter i WebbapplikationerPettersson, Anders, Fjordefalk, Ossian January 2019 (has links)
Bug bounties are an increasingly popular way of performing penetration tests of web applications. User statistics of bug bounty platforms show that a lot of hackers struggle to find bugs. This report explores a way of using Markov decision processes and reinforcement learning to help hackers find vulnerabilities in web applications by building a tool that suggests attack surfaces to examine and vulnerability reports to read to get the relevant knowledge. The attack surfaces, vulnerabilities and reports are all derived from a taxonomy of web vulnerabilities created in a collaborating project. A Markov decision process (MDP) was defined, this MDP includes the environment, different states of knowledge and actions that can take a user from one state of knowledge to another. To be able to suggest the best possible next action to perform, the MDP uses a policy that describes the value of entering each state. Each state is given a value that is called Q-value. This value indicates how close that state is to another state where a vulnerability has been found. This means that a state has a high Q-value if the knowledge gives a user a high probability of finding a vulnerability and vice versa. This policy was created using a reinforcement learning algorithm called Q-learning. The tool was implemented as a web application using Java Spring Boot and ReactJS. The resulting tool is best suited for new hackers in the learning process. The current version is trained on the indexed reports of the vulnerability taxonomy but future versions should be trained on user behaviour collected from the tool. / Bug bounties är ett alltmer populärt sätt att utföra penetrationstester av webbapplikationer. Användarstatistik från bug bounty-plattformar visar att många hackare har svårt att hitta buggar. Denna rapport undersöker ett sätt att använda Markov-beslutsprocesser och förstärkt inlärning för att hjälpa hackare att hitta sårbarheter i webbapplikationer genom att bygga ett verktyg som föreslår attackytor att undersöka och sårbarhetsrapporter att läsa för att tillgodogöra sig rätt kunskaper. Attackytor, sårbarheter och rapporter är alla hämtade från en taxonomi över webbsårbarheter skapad i ett samarbetande projekt. En Markovbeslutsprocess (MDP) definierades. Denna MDP inkluderar miljön, olika kunskapstillstånd och handlingar som kan ta användaren från ett kunskapstillstånd till ett annat. För kunna föreslå nästa handling på bästa möjliga sätt använder MDPn en policy som beskriver värdet av att träda in i alla de olika tillstånden. Alla tillstånd ges ett värde som kallas Q-värde. Detta värde indikerar hur nära ett tillstånd har till ett annat tillstånd där en sårbarhet har hittats. Detta betyder att ett tillstånd har ett högt Q-värde om kunskapen ger användaren en hög sannolikhet att hitta en sårbarhet och vice versa. Policyn skapades med hjälp av en typ av förstärkt inlärningsalgoritm kallad Q-inlärning. Verktyget implementerades som en webbapplikation med hjälp av Java Spring Boot och ReactJS. Det resulterande verktyget är bäst lämpat för nya hackare i inlärningsstadiet. Den nuvarande versionen är tränad på indexerade rapporter från sårbarhetstaxonomin men framtida versioner bör tränas på användarbeteende insamlat från verktyget.
|
193 |
Resource allocation and load-shedding policies based on Markov decision processes for renewable energy generation and storageJimenez, Edwards 01 January 2015 (has links)
In modern power systems, renewable energy has become an increasingly popular form of energy generation as a result of all the rules and regulations that are being implemented towards achieving clean energy worldwide. However, clean energy can have drawbacks in several forms. Wind energy, for example can introduce intermittency. In this thesis, we discuss a method to deal with this intermittency. In particular, by shedding some specific amount of load we can avoid a total system breakdown of the entire power plant. The load shedding method discussed in this thesis utilizes a Markov Decision Process with backward policy iteration. This is based on a probabilistic method that chooses the best load-shedding path that minimizes the expected total cost to ensure no power failure. We compare our results with two control policies, a load-balancing policy and a less-load shedding policy. It is shown that the proposed MDP policy outperforms the other control policies and achieves the minimum total expected cost.
|
194 |
Hidden Markov models : Identification, control and inverse filteringMattila, Robert January 2018 (has links)
The hidden Markov model (HMM) is one of the workhorse tools in, for example, statistical signal processing and machine learning. It has found applications in a vast number of fields, ranging all the way from bioscience to speech recognition to modeling of user interactions in social networks. In an HMM, a latent state transitions according to Markovian dynamics. The state is only observed indirectly via a noisy sensor – that is, it is hidden. This type of model is at the center of this thesis, which in turn touches upon three main themes. Firstly, we consider how the parameters of an HMM can be estimated from data. In particular, we explore how recently proposed methods of moments can be combined with more standard maximum likelihood (ML) estimation procedures. The motivation for this is that, albeit the ML estimate possesses many attractive statistical properties, many ML schemes have to rely on local-search procedures in practice, which are only guaranteed to converge to local stationary points in the likelihood surface – potentially inhibiting them from reaching the ML estimate. By combining the two types of algorithms, the goal is to obtain the benefits of both approaches: the consistency and low computational complexity of the former, and the high statistical efficiency of the latter. The filtering problem – estimating the hidden state of the system from observations – is of fundamental importance in many applications. As a second theme, we consider inverse filtering problems for HMMs. In these problems, the setup is reversed; what information about an HMM-filtering system is exposed by its state estimates? We show that it is possible to reconstruct the specifications of the sensor, as well as the observations that were made, from the filtering system’s posterior distributions of the latent state. This can be seen as a way of reverse engineering such a system, or as using an alternative data source to build a model. Thirdly, we consider Markov decision processes (MDPs) – systems with Markovian dynamics where the parameters can be influenced by the choice of a control input. In particular, we show how it is possible to incorporate prior information regarding monotonic structure of the optimal decision policy so as to accelerate its computation. Subsequently, we consider a real-world application by investigating how these models can be used to model the treatment of abdominal aortic aneurysms (AAAs). Our findings are that the structural properties of the optimal treatment policy are different than those used in clinical practice – in particular, that younger patients could benefit from earlier surgery. This indicates an opportunity for improved care of patients with AAAs. / <p>QC 20180301</p>
|
195 |
Strategic Decision-making Process in the Qatari Public Sector. Relationship between the Decision-Making Process, Implementation, and OutcomeAl-Hashimi, Khalid M.I.A. January 2022 (has links)
Although several multi-dimensional models of strategic decision-making processes (SDMPs) have been examined in the literature, these studies have paid insufficient attention to the public sector context and Gulf Cooperation Council (GCC) region. SDMP in the public sector and the State of Qatar can vary to SDMP in the private sector due to institutional and socio-cultural differences respectively. Therefore, more research is urgently needed to better understand SDPM within this context.
To contribute to filling this void, this study develops and tests a multi-dimensional SDMP model including SDMP dimensions, implementation, and outcome. The study model examines (𝑖) the impact of four SDMP dimensions—procedural rationality, intuition, constructive politics, and participation—on the implementation success of the strategic decision; (𝑖𝑖) the impact of the successful implementation of SD over the SD quality; (𝑖𝑖𝑖) the mediation role of the implementation success of SD; (𝑖𝑣) the moderation effect of stakeholder uncertainty.
The model was analysed using Partial Least Square Structural Equation Modelling (PLS-SEM) and tested using data from multiple informants on 170 strategic decisions in 38 Qatari public organisations. The study finds that procedural rationality, constructive politics, participations, and the implementation Success of SD plays a significant and positive role on SDMP and its overall outcome. Finally, the study provides substantial and original contributions to the knowledge of SDMP in the public sector; implications for decision-makers and directions for future research.
|
196 |
Optimal Call Admission Control Policies in Wireless Cellular Networks Using Semi Markov Decision ProcesNi, Wenlong January 2008 (has links)
No description available.
|
197 |
Information Freshness and Delay Optimization in Unreliable Wireless SystemsYao, Guidan 02 September 2022 (has links)
No description available.
|
198 |
Deep Reinforcement Learning for Open Multiagent SystemZhu, Tianxing 20 September 2022 (has links)
No description available.
|
199 |
Integrated and Coordinated Relief Logistics Planning Under Uncertainty for Relief Logistics OperationsKamyabniya, Afshin 22 September 2022 (has links)
In this thesis, we explore three critical emergency logistics problems faced by healthcare and humanitarian relief service providers for short-term post-disaster management.
In the first manuscript, we investigate various integration mechanisms (fully integrated horizontal-vertical, horizontal, and vertical resource sharing mechanisms) following a natural disaster for a multi-type whole blood-derived platelets, multi-patient logistics network. The goal is to reduce the amount of shortage and wastage of multi-blood-group of platelets in the response phase of relief logistics operations. To solve the logistics model for a large scale problem, we develop a hybrid exact solution approach involving an augmented epsilon-constraint and Lagrangian relaxation algorithms and demonstrate the model's applicability for a case study of an earthquake. Due to uncertainty in the number of injuries needing multi-type blood-derived platelets, we apply a robust optimization version of the proposed model which captures the expected performance of the system. The results show that the performance of the platelets logistics network under coordinated and integrated mechanisms better control the level of shortage and wastage compared with that of a non-integrated network.
In the second manuscript, we propose a two-stage casualty evacuation model that involves routing of patients with different injury levels during wildfires. The first stage deals with field hospital selection and the second stage determines the number of patients that can be transferred to the selected hospitals or shelters via different routes of the evacuation network. The goal of this model is to reduce the evacuation response time, which ultimately increase the number of evacuated people from evacuation assembly points under limited time windows. To solve the model for large-scale problems, we develop a two-step meta-heuristic algorithm. To consider multiple sources of uncertainty, a flexible robust approach considering the worst-case and expected performance of the system simultaneously is applied to handle any realization of the uncertain parameters. The results show that the fully coordinated evacuation model in which the vehicles can freely pick up and off-board the patients at different locations and are allowed to start their next operations without being forced to return to the departure point (evacuation assembly points) outperforms the non-coordinated and non-integrated evacuation models in terms of number of evacuated patients.
In the third manuscript, we propose an integrated transportation and hospital capacity model to optimize the assignment of relevant medical resources to multi-level-injury patients in the time of a MCI. We develop a finite-horizon MDP to efficiently allocate resources and hospital capacities to injured people in a dynamic fashion under limited time horizon. We solve this model using the linear programming approach to ADP, and by developing a two-phase heuristics based on column generation algorithm. The results show better policies can be derived for allocating limited resources (i.e., vehicles) and hospital capacities to the injured people compared with the benchmark.
Each paper makes a worthwhile contribution to the humanitarian relief operations literature and can help relief and healthcare providers optimize resource and service logistics by applying the proposed integration and coordination mechanisms.
|
200 |
MODEL-FREE ALGORITHMS FOR CONSTRAINED REINFORCEMENT LEARNING IN DISCOUNTED AND AVERAGE REWARD SETTINGSQinbo Bai (19804362) 07 October 2024 (has links)
<p dir="ltr">Reinforcement learning (RL), which aims to train an agent to maximize its accumulated reward through time, has attracted much attention in recent years. Mathematically, RL is modeled as a Markov Decision Process, where the agent interacts with the environment step by step. In practice, RL has been applied to autonomous driving, robotics, recommendation systems, and financial management. Although RL has been greatly studied in the literature, most proposed algorithms are model-based, which requires estimating the transition kernel. To this end, we begin to study the sample efficient model-free algorithms under different settings.</p><p dir="ltr">Firstly, we propose a conservative stochastic primal-dual algorithm in the infinite horizon discounted reward setting. The proposed algorithm converts the original problem from policy space to the occupancy measure space, which makes the non-convex problem linear. Then, we advocate the use of a randomized primal-dual approach to achieve O(\eps^-2) sample complexity, which matches the lower bound.</p><p dir="ltr">However, when it comes to the infinite horizon average reward setting, the problem becomes more challenging since the environment interaction never ends and can’t be reset, which makes reward samples not independent anymore. To solve this, we design an epoch-based policy-gradient algorithm. In each epoch, the whole trajectory is divided into multiple sub-trajectories with an interval between each two of them. Such intervals are long enough so that the reward samples are asymptotically independent. By controlling the length of trajectory and intervals, we obtain a good gradient estimator and prove the proposed algorithm achieves O(T^3/4) regret bound.</p>
|
Page generated in 0.0829 seconds