Global ETD Search

1	Regret analysis of constrained irreducible MDPs with reset action / リセット行動が存在する制約付き既約MDPに対するリグレット解析 Watanabe, Takashi 23 March 2020 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(人間・環境学) / 甲第22535号 / 人博第938号 / 新制\|\|人\|\|223(附属図書館) / 2019\|\|人博\|\|938(吉田南総合図書館) / 京都大学大学院人間・環境学研究科共生人間学専攻 / (主査)准教授櫻川貴司, 教授立木秀樹, 教授日置尋久 / 学位規則第4条第1項該当 / Doctor of Human and Environmental Studies / Kyoto University / DGAM reinforcement learning long-term average reward constrained Markov decision processes regret online-learning 361
2	A Reinforcement Learning Approach To Obtain Treatment Strategies In Sequential Medical Decision Problems Poolla, Radhika 14 August 2003 (has links) Medical decision problems are extremely complex owing to their dynamic nature, large number of variable factors, and the associated uncertainty. Decision support technology entered the medical field long after other areas such as the airline industry and the manufacturing industry. Yet, it is rapidly becoming an indispensable tool in medical decision making problems including the class of sequential decision problems. In these problems, physicians decide on a treatment plan that optimizes a benefit measure such as the treatment cost, and the quality of life of the patient. The last decade saw the emergence of many decision support applications in medicine. However, the existing models have limited applications to decision problems with very few states and actions. An urgent need is being felt by the medical research community to expand the applications to more complex dynamic problems with large state and action spaces. This thesis proposes a methodology which models the class of sequential medical decision problems as a Markov decision process, and solves the model using a simulation based reinforcement learning (RL) algorithm. Such a methodology is capable of obtaining near optimal treatment strategies for problems with large state and action spaces. This methodology overcomes, to a large extent, the computational complexity of the value-iteration and policy-iteration algorithms of dynamic programming. An average reward reinforcement-learning algorithm is developed. The algorithm is applied on a sample problem of treating hereditary spherocytosis. The application demonstrates the ability of the proposed methodology to obtain effective treatment strategies for sequential medical decision problems. dynamic decision model markov decision process hereditory spherocytosis intervention quality adjusted life years average reward American Studies Arts and Humanities
3	MODEL-FREE ALGORITHMS FOR CONSTRAINED REINFORCEMENT LEARNING IN DISCOUNTED AND AVERAGE REWARD SETTINGS Qinbo Bai (19804362) 07 October 2024 (has links) <p dir="ltr">Reinforcement learning (RL), which aims to train an agent to maximize its accumulated reward through time, has attracted much attention in recent years. Mathematically, RL is modeled as a Markov Decision Process, where the agent interacts with the environment step by step. In practice, RL has been applied to autonomous driving, robotics, recommendation systems, and financial management. Although RL has been greatly studied in the literature, most proposed algorithms are model-based, which requires estimating the transition kernel. To this end, we begin to study the sample efficient model-free algorithms under different settings.</p><p dir="ltr">Firstly, we propose a conservative stochastic primal-dual algorithm in the infinite horizon discounted reward setting. The proposed algorithm converts the original problem from policy space to the occupancy measure space, which makes the non-convex problem linear. Then, we advocate the use of a randomized primal-dual approach to achieve O(\eps^-2) sample complexity, which matches the lower bound.</p><p dir="ltr">However, when it comes to the infinite horizon average reward setting, the problem becomes more challenging since the environment interaction never ends and can’t be reset, which makes reward samples not independent anymore. To solve this, we design an epoch-based policy-gradient algorithm. In each epoch, the whole trajectory is divided into multiple sub-trajectories with an interval between each two of them. Such intervals are long enough so that the reward samples are asymptotically independent. By controlling the length of trajectory and intervals, we obtain a good gradient estimator and prove the proposed algorithm achieves O(T^3/4) regret bound.</p> Reinforcement learning Optimisation Stochastic analysis and modelling Reinforcement learning Markov Decision Process Discounted Reward Average Reward Policy Gradient
4	A reinforcement learning approach to obtain treatment strategies in sequential medical decision problems [electronic resource] / by Radhika Poolla. Poolla, Radhika. January 2003 (has links) Title from PDF of title page. / Document formatted into pages; contains 104 pages. / Thesis (M.S.I.E.)--University of South Florida, 2003. / Includes bibliographical references. / Text (Electronic thesis) in PDF format. / ABSTRACT: Medical decision problems are extremely complex owing to their dynamic nature, large number of variable factors, and the associated uncertainty. Decision support technology entered the medical field long after other areas such as the airline industry and the manufacturing industry. Yet, it is rapidly becoming an indispensable tool in medical decision making problems including the class of sequential decision problems. In these problems, physicians decide on a treatment plan that optimizes a benefit measure such as the treatment cost, and the quality of life of the patient. The last decade saw the emergence of many decision support applications in medicine. However, the existing models have limited applications to decision problems with very few states and actions. An urgent need is being felt by the medical research community to expand the applications to more complex dynamic problems with large state and action spaces. / ABSTRACT: This thesis proposes a methodology which models the class of sequential medical decision problems as a Markov decision process, and solves the model using a simulation based reinforcement learning (RL) algorithm. Such a methodology is capable of obtaining near optimal treatment strategies for problems with large state and action spaces. This methodology overcomes, to a large extent, the computational complexity of the value-iteration and policy-iteration algorithms of dynamic programming. An average reward reinforcement-learning algorithm is developed. The algorithm is applied on a sample problem of treating hereditary spherocytosis. The application demonstrates the ability of the proposed methodology to obtain effective treatment strategies for sequential medical decision problems. / System requirements: World Wide Web browser and PDF reader. / Mode of access: World Wide Web. quality adjusted life years. intervention. dynamic decision model. markov decision process. hereditory spherocytosis. average reward.

1

Page generated in 0.4542 seconds