Spelling suggestions: "subject:"markov chains with reward"" "subject:"darkov chains with reward""
1 |
PROBABLY APPROXIMATELY CORRECT BOUNDS FOR ESTIMATING MARKOV TRANSITION KERNELSImon Banerjee (17555685) 06 December 2023 (has links)
<p dir="ltr">This thesis presents probably approximately correct (PAC) bounds on estimates of the transition kernels of Controlled Markov chains (CMC’s). CMC’s are a natural choice for modelling various industrial and medical processes, and are also relevant to reinforcement learning (RL). Learning the transition dynamics of CMC’s in a sample efficient manner is an important question that is open. This thesis aims to close this gap in knowledge in literature.</p><p dir="ltr">In Chapter 2, we lay the groundwork for later chapters by introducing the relevant concepts and defining the requisite terms. The two subsequent chapters focus on non-parametric estimation. </p><p dir="ltr">In Chapter 3, we restrict ourselves to a finitely supported CMC with d states and k controls and produce a general theory for minimax sample complexity of estimating the transition matrices.</p><p dir="ltr">In Chapter 4 we demonstrate the applicability of this theory by using it to recover the sample complexities of various controlled Markov chains, as well as RL problems.</p><p dir="ltr">In Chapter 5 we move to a continuous state and action spaces with compact supports. We produce a robust, non-parametric test to find the best histogram based estimator of the transition density; effectively reducing the problem into one of model selection based on empricial processes.</p><p dir="ltr">Finally, in Chapter 6 we move to a parametric and Bayesian regime, and restrict ourselves to Markov chains. Under this setting we provide a PAC-Bayes bound for estimating model parameters under tempered posteriors.</p>
|
Page generated in 0.0762 seconds