Global ETD Search

1	Machine Learning Solution Methods for Multistage Stochastic Programming Defourny, Boris 20 December 2010 (has links) This thesis investigates the following question: Can supervised learning techniques be successfully used for finding better solutions to multistage stochastic programs? A similar question had already been posed in the context of reinforcement learning, and had led to algorithmic and conceptual advances in the field of approximate value function methods over the years. This thesis identifies several ways to exploit the combination "multistage stochastic programming/supervised learning" for sequential decision making under uncertainty. Multistage stochastic programming is essentially the extension of stochastic programming to several recourse stages. After an introduction to multistage stochastic programming and a summary of existing approximation approaches based on scenario trees, this thesis mainly focusses on the use of supervised learning for building decision policies from scenario-tree approximations. Two ways of exploiting learned policies in the context of the practical issues posed by the multistage stochastic programming framework are explored: the fast evaluation of performance guarantees for a given approximation, and the selection of good scenario trees. The computational efficiency of the approach allows novel investigations relative to the construction of scenario trees, from which novel insights, solution approaches and algorithms are derived. For instance, we generate and select scenario trees with random branching structures for problems over large planning horizons. Our experiments on the empirical performances of learned policies, compared to golden-standard policies, suggest that the combination of stochastic programming and machine learning techniques could also constitute a method per se for sequential decision making under uncertainty, inasmuch as learned policies are simple to use, and come with performance guarantees that can actually be quite good. Finally, limitations of approaches that build an explicit model to represent an optimal solution mapping are studied in a simple parametric programming setting, and various insights regarding this issue are obtained. optimization/optimisation
2	Semi-Cooperative Learning in Smart Grid Agents Reddy, Prashant P. 01 December 2013 (has links) Striving to reduce the environmental impact of our growing energy demand creates tough new challenges in how we generate and use electricity. We need to develop Smart Grid systems in which distributed sustainable energy resources are fully integrated and energy consumption is efficient. Customers, i.e., consumers and distributed producers, require agent technology that automates much of their decision-making to become active participants in the Smart Grid. This thesis develops models and learning algorithms for such autonomous agents in an environment where customers operate in modern retail power markets and thus have a choice of intermediary brokers with whom they can contract to buy or sell power. In this setting, customers face a learning and multiscale decision-making problem – they must manage contracts with one or more brokers and simultaneously, on a finer timescale, manage their consumption or production levels under existing contracts. On a contextual scale, they can optimize their isolated selfinterest or consider their shared goals with other agents. We advance the idea that a Learning Utility Management Agent (LUMA), or a network of such agents, deployed on behalf of a Smart Grid customer can autonomously address that customer’s multiscale decision-making responsibilities. We study several relationships between a given LUMA and other agents in the environment. These relationships are semi-cooperative and the degree of expected cooperation can change dynamically with the evolving state of the world. We exploit the multiagent structure of the problem to control the degree of partial observability. Since a large portion of relevant hidden information is visible to the other agents in the environment, we develop methods for Negotiated Learning, whereby a LUMA can offer incentives to the other agents to obtain information that sufficiently reduces its own uncertainty while trading off the cost of offering those incentives. The thesis first introduces pricing algorithms for autonomous broker agents, time series forecasting models for long range simulation, and capacity optimization algorithms for multi-dwelling customers. We then introduce Negotiable Entity Selection Processes (NESP) as a formal representation where partial observability is negotiable amongst certain classes of agents. We then develop our ATTRACTIONBOUNDED- LEARNING algorithm, which leverages the variability of hidden information for efficient multiagent learning. We apply the algorithm to address the variable-rate tariff selection and capacity aggregate management problems faced by Smart Grid customers. We evaluate the work on real data using Power TAC, an agent-based Smart Grid simulation platform and substantiate the value of autonomous Learning Utility Management Agents in the Smart Grid. Semi-Cooperative Learning Negotiated Learning Multiagent Learning Online Learning Reinforcement Learning Sequential Decision-Making
3	Online Combinatorial Optimization under Bandit Feedback Talebi Mazraeh Shahi, Mohammad Sadegh January 2016 (has links) Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making problems with an exploration vs. exploitation trade-off. In such problems, the decision maker selects an arm in each round and observes a realization of the corresponding unknown reward distribution. Each decision is based on past decisions and observed rewards. The objective is to maximize the expected cumulative reward over some time horizon by balancing exploitation (arms with higher observed rewards should be selectedoften) and exploration (all arms should be explored to learn their average rewards). Equivalently, the performanceof a decision rule or algorithm can be measured through its expected regret, defined as the gap betweenthe expected reward achieved by the algorithm and that achieved by an oracle algorithm always selecting the bestarm. This thesis investigates stochastic and adversarial combinatorial MAB problems, where each arm is a collection of several basic actions taken from a set of $d$ elements, in a way that the set of arms has a certain combinatorial structure. Examples of such sets include the set of fixed-size subsets, matchings, spanning trees, paths, etc. These problems are specific forms of online linear optimization, where the decision space is a subset of $d$-dimensional hypercube.Due to the combinatorial nature, the number of arms generically grows exponentially with $d$. Hence, treating arms as independent and applying classical sequential arm selection policies would yield a prohibitive regret. It may then be crucial to exploit the combinatorial structure of the problem to design efficient arm selection algorithms.As the first contribution of this thesis, in Chapter 3 we investigate combinatorial MABs in the stochastic setting and with Bernoulli rewards. We derive asymptotic (i.e., when the time horizon grows large) lower bounds on the regret of any algorithm under bandit and semi-bandit feedback. The proposed lower bounds are problem-specific and tight in the sense that there exists an algorithm that achieves these regret bounds. Our derivation leverages some theoretical results in adaptive control of Markov chains. Under semi-bandit feedback, we further discuss the scaling of the proposed lower bound with the dimension of the underlying combinatorial structure. For the case of semi-bandit feedback, we propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the fourth chapter, we consider stochastic combinatorial MAB problems where the underlying combinatorial structure is a matroid. Specializing the results of Chapter 3 to matroids, we provide explicit regret lower bounds for this class of problems. For the case of semi-bandit feedback, we propose KL-OSM, a computationally efficient greedy-based algorithm that exploits the matroid structure. Through a finite-time analysis, we prove that the regret upper bound of KL-OSM matches the proposed lower bound, thus making it the first asymptotically optimal algorithm for this class of problems. Numerical experiments validate that KL-OSM outperforms state-of-the-art algorithms in practice, as well.In the fifth chapter, we investigate the online shortest-path routing problem which is an instance of combinatorial MABs with geometric rewards. We consider and compare three different types of online routing policies, depending (i) on where routing decisions are taken (at the source or at each node), and (ii) on the received feedback (semi-bandit or bandit). For each case, we derive the asymptotic regret lower bound. These bounds help us to understand the performance improvements we can expect when (i) taking routing decisions at each hop rather than at the source only, and (ii) observing per-link delays rather than end-to-end path delays. In particular, we show that (i) is of no use while (ii) can have a spectacular impact.For source routing under semi-bandit feedback, we then propose two algorithms with a trade-off betweencomputational complexity and performance. The regret upper bounds of these algorithms improve over those ofthe existing algorithms, and they significantly outperform state-of-the-art algorithms in numerical experiments. Finally, we discuss combinatorial MABs in the adversarial setting and under bandit feedback. We concentrate on the case where arms have the same number of basic actions but are otherwise arbitrary. We propose CombEXP, an algorithm that has the same regret scaling as state-of-the-art algorithms. Furthermore, we show that CombEXP admits lower computational complexity for some combinatorial problems. / <p>QC 20160201</p> Combinatorial Optimization Online Learning Multi-armed Bandits Sequential Decision Making Engineering and Technology Teknik och teknologier
4	Design of Joint Verification-Correction Strategies for Engineered Systems Xu, Peng 28 June 2022 (has links) System verification is a critical process in the development of engineered systems. Engineers gain confidence in the correct functionality of the system by executing system verification. Traditionally, system verification is implemented by conducting a verification strategy (VS) consisting of verification activities (VA). A VS can be generated using industry standards, expert experience, or quantitative-based methods. However, two limitations exist in these previous studies. First, as an essential part of system verification, correction activities (CA) are used to correct system errors or defects identified by VAs. However, CAs are usually simplified and treated as a component associated with VAs instead of independent decisions. Even though this simplification may accelerate the VS design, it results in inferior VSs because the optimization of correction decisions is ignored. Second, current methods have not handled the issue of complex engineered systems. As the number of activities increases, the magnitude of the possible VSs becomes so large that finding the optimal VS is impossible or impractical. Therefore, these limitations leave room for improving the VS design, especially for complex engineered systems. This dissertation presents a joint verification-correction model (JVCM) to address these gaps. The basic idea of this model is to provide an engineering paradigm for complex engineered systems that simultaneously consider decisions about VAs and CAs. The accompanying research problem is to develop a modeling and analysis framework to solve for joint verification-correction strategies (JVCS). This dissertation aims to address them in three steps. First, verification processes (VP) are modeled mathematically to capture the impacts of VAs and CAs. Second, a JVCM with small strategy spaces is established with all conditions of a VP. A modified backward induction method is proposed to solve for an optimal JVCS in small strategy spaces. Third, a UCB-based tree search approach is designed to find near-optimal JVCSs in large strategy spaces. A case study is conducted and analyzed in each step to show the feasibility of the proposed models and methods. / Doctor of Philosophy / System verification is a critical step in the life cycle of system development. It is used to check that a system conforms to its design requirements. Traditionally, system verification is implemented by conducting a verification strategy (VS) consisting of verification activities (VA). A VS can be generated using industry standards, expert experience, or quantitative-based methods. However, two limitations exist in these methods. First, as an essential part of system verification, correction activities (CA) are used to correct system errors or defects identified by VAs. However, CAs are usually simplified and treated as remedial measures that depend on the results of VAs instead of independent decision choices. Even though this simplification may accelerate the VS design, it results in inferior VSs because the optimization of correction decisions is ignored. Second, current methods have not handled the issue of large systems. As the number of activities increases, the total number of possible VSs becomes so large that it is impossible to find the optimal solution. Therefore, these limitations leave room for improving the VS design, especially for large systems. This dissertation presents a joint verification-correction model (JVCM) to address these gaps. The basic idea of this model is to provide a paradigm for large systems that simultaneously consider decisions about VAs and CAs. The accompanying research problem is to develop a modeling and analysis framework to solve for joint verification-correction strategies (JVCS). This dissertation aims to address them in three steps. First, verification processes (VP) are modeled mathematically to capture the impacts of VAs and CAs. Second, a JVCM with small strategy spaces is established with all conditions of a VP. A modified backward induction method is proposed to solve for an optimal JVCS in small strategy spaces. Third, a UCB-based tree search approach is designed to find near-optimal JVCSs in large strategy spaces. A case study is conducted and analyzed in each step to show the feasibility of the proposed models and methods. system verification correction activity Bayesian network sequential decision-making tree search
5	A Novel Control Engineering Approach to Designing and Optimizing Adaptive Sequential Behavioral Interventions January 2014 (has links) abstract: Control engineering offers a systematic and efficient approach to optimizing the effectiveness of individually tailored treatment and prevention policies, also known as adaptive or ``just-in-time'' behavioral interventions. These types of interventions represent promising strategies for addressing many significant public health concerns. This dissertation explores the development of decision algorithms for adaptive sequential behavioral interventions using dynamical systems modeling, control engineering principles and formal optimization methods. A novel gestational weight gain (GWG) intervention involving multiple intervention components and featuring a pre-defined, clinically relevant set of sequence rules serves as an excellent example of a sequential behavioral intervention; it is examined in detail in this research. A comprehensive dynamical systems model for the GWG behavioral interventions is developed, which demonstrates how to integrate a mechanistic energy balance model with dynamical formulations of behavioral models, such as the Theory of Planned Behavior and self-regulation. Self-regulation is further improved with different advanced controller formulations. These model-based controller approaches enable the user to have significant flexibility in describing a participant's self-regulatory behavior through the tuning of controller adjustable parameters. The dynamic simulation model demonstrates proof of concept for how self-regulation and adaptive interventions influence GWG, how intra-individual and inter-individual variability play a critical role in determining intervention outcomes, and the evaluation of decision rules. Furthermore, a novel intervention decision paradigm using Hybrid Model Predictive Control framework is developed to generate sequential decision policies in the closed-loop. Clinical considerations are systematically taken into account through a user-specified dosage sequence table corresponding to the sequence rules, constraints enforcing the adjustment of one input at a time, and a switching time strategy accounting for the difference in frequency between intervention decision points and sampling intervals. Simulation studies illustrate the potential usefulness of the intervention framework. The final part of the dissertation presents a model scheduling strategy relying on gain-scheduling to address nonlinearities in the model, and a cascade filter design for dual-rate control system is introduced to address scenarios with variable sampling rates. These extensions are important for addressing real-life scenarios in the GWG intervention. / Dissertation/Thesis / Doctoral Dissertation Chemical Engineering 2014 Chemical engineering Behavioral sciences Adaptive Intervention Control Engineering Dynamical Systems Modeling Gestational Weight Gain Model Predictive Control Sequential Decision Making
6	Dynamic computational models of risk and effort discounting in sequential decision making Cuevas Rivera, Dario 30 June 2021 (has links) Dissertation based on my publications in the field of risky behavior in dynamic, sequential decision making tasks.:1.- Introduction 2.- Context-dependent risk aversion: a model-based approach 3.- Modeling dynamic allocation of effort in a sequential task using discounting models 4.- General discussion info:eu-repo/classification/ddc/153 ddc:153
7	Markovian sequential decision-making in non-stationary environments : application to argumentative debates / Décision séquentielle markovienne en environnements non-stationnaires : application aux débats d'argumentation Hadoux, Emmanuel 26 November 2015 (has links) Les problèmes de décision séquentielle dans l’incertain requièrent qu’un agent prenne des décisions, les unes après les autres, en fonction de l’état de l’environnement dans lequel il se trouve. Dans la plupart des travaux, l’environnement dans lequel évolue l’agent est supposé stationnaire, c’est-à-dire qu’il n’évolue pas avec le temps. Toute- fois, l’hypothèse de stationnarité peut ne pas être vérifiée quand, par exemple, des évènements exogènes au problème interviennent. Dans cette thèse, nous nous intéressons à la prise de décision séquentielle dans des environnements non-stationnaires. Nous proposons un nouveau modèle appelé HS3MDP permettant de représenter les problèmes non-stationnaires dont les dynamiques évoluent parmi un ensemble fini de contextes. Afin de résoudre efficacement ces problèmes, nous adaptons l’algorithme POMCP aux HS3MDP. Dans le but d’apprendre les dynamiques des problèmes de cette classe, nous présentons RLCD avec SCD, une méthode utilisable sans connaître à priori le nombre de contextes. Nous explorons ensuite le domaine de l’argumentation où peu de travaux se sont intéressés à la décision séquentielle. Nous étudions deux types de problèmes : les débats stochastiques (APS ) et les problèmes de médiation face à des agents non-stationnaires (DMP). Nous présentons dans ce travail un modèle formalisant les APS et permettant de les transformer en MOMDP afin d’optimiser la séquence d’arguments d’un des agents du débat. Nous étendons cette modélisation aux DMP afin de permettre à un médiateur de répartir stratégiquement la parole dans un débat. / In sequential decision-making problems under uncertainty, an agent makes decisions, one after another, considering the current state of the environment where she evolves. In most work, the environment the agent evolves in is assumed to be stationary, i.e., its dynamics do not change over time. However, the stationarity hypothesis can be invalid if, for instance, exogenous events can occur. In this document, we are interested in sequential decision-making in non-stationary environments. We propose a new model named HS3MDP, allowing us to represent non-stationary problems whose dynamics evolve among a finite set of contexts. In order to efficiently solve those problems, we adapt the POMCP algorithm to HS3MDPs. We also present RLCD with SCD, a new method to learn the dynamics of the environments, without knowing a priori the number of contexts. We then explore the field of argumentation problems, where few works consider sequential decision-making. We address two types of problems: stochastic debates (APS ) and mediation problems with non-stationary agents (DMP). In this work, we present a model formalizing APS and allowing us to transform them into an MOMDP in order to optimize the sequence of arguments of one agent in the debate. We then extend this model to DMPs to allow a mediator to strategically organize speak-turns in a debate. Intelligence artificielle Décisions séquentielles Modèles markoviens Planification Argumentation Environnements non-Stationnaires Artificial intelligence 004
8	Statistical Methods for Offline Deep Reinforcement Learning Danyang Wang (18414336) 20 April 2024 (has links) <p dir="ltr">Reinforcement learning (RL) has been a rapidly evolving field of research over the past years, enhancing developments in areas such as artificial intelligence, healthcare, and education, to name a few. Regardless of the success of RL, its inherent online learning nature presents obstacles for its real-world applications, since in many settings, online data collection with the latest learned policy can be expensive and/or dangerous (such as robotics, healthcare, and autonomous driving). This challenge has catalyzed research into offline RL, which involves reinforcement learning from previously collected static datasets, without the need for further online data collection. However, most existing offline RL methods depend on two key assumptions: unconfoundedness and positivity (also known as the full-coverage assumption), which frequently do not hold in the context of static datasets. </p><p dir="ltr">In the first part of this dissertation, we simultaneously address these two challenges by proposing a novel policy learning algorithm: PESsimistic CAusal Learning (PESCAL). We utilize the mediator variable based on Front-Door Criterion, to remove the confounding bias. Additionally, we adopt the pessimistic principle to tackle the distributional shift problem induced by the under-coverage issue. This issue refers to the mismatch of distributions between the action distributions induced by candidate policies, and the policy that generates the observational data (known as the behavior policy). Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function, to partially mitigate the issue of distributional shift. This insight significantly simplifies our algorithm, by circumventing the challenging task of sequential uncertainty quantification for the estimated Q-function. Moreover, we provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.</p><p dir="ltr">In the second part of this dissertation, in contrast to the first part, which approaches the distributional shift issue implicitly by penalizing the value function as a whole, we explicitly constrain the learned policy to not deviate significantly from the behavior policy, while still enabling flexible adjustment of the degree of constraints. Building upon the offline reinforcement learning algorithm, TD3+BC \cite{fujimoto2021minimalist}, we propose a model-free actor-critic algorithm with an adjustable behavior cloning (BC) term. We employ an ensemble of networks to quantify the uncertainty of the estimated value function, thus addressing the issue of overestimation. Moreover, we introduce a method that is both convenient and intuitively simple for controlling the degree of BC, through a Bernoulli random variable based on the user-specified confidence level for different offline datasets. Our proposed algorithm, named Ensemble-based Actor Critic with Adaptive Behavior Cloning (EABC), is straightforward to implement, exhibits low variance, and achieves strong performance across all D4RL benchmarks.</p> Reinforcement learning Computational statistics Statistical data science reinforcement learning Offline Learning Sequential Decision-Making artificial intellgience Algorithm Design
9	Multi-objective sequential decision making / La prise de décisions séquentielles multi-objectif Wang, Weijia 11 July 2014 (has links) La présente thèse porte sur l'étude de prise de décisions séquentielles multi-Objectif (MOSDM). La motivation de ce travail est double. D'un côté, la prise de décision, par exemple, dans les domaines de robotique et de planification, concerne l'optimisation séquentielle. De l'autre côté, nombreuses applications dans le monde réel sont plus naturellement formulés en termes d'optimisation multi-Objectif (MOO). La méthode proposée dans la thèse adapte le cadre bien connue de recherche Monte-Carlo arborescente (MCTS) à l'optimisation multi-Objectif, dans lequel multiple séquences de décision optimales sont développées dans un seul arbre de recherche. Le principal défi est de proposer une nouvelle récompense, capable de guider l'exploration de l'arbre bien que le problème de MOO n'applique pas un ordre total entre les solutions. La contribution principale de cette thèse est de proposer et d'étudier expérimentalement ces deux récompenses : l'indicateur de hypervolume et la récompense de dominance Pareto, qui sont inspirées de la littérature de MOO et basés sur une archive de solutions antérieures (archives Pareto). L'étude montre la complémentarité de ces deux récompenses. L'indicateur de hypervolume souffre de sa complexité algorithmique. Cependant, cet indicateur fournit des informations à grains fins de la qualité des solutions à l'égard de l'archive actuelle. Bien au contraire, la complexité de la récompense de dominance Pareto est linéaire, mais cette récompense fournit des informations de plus en plus rare au long de la recherche. Les preuves de principe de l'approche sont donnés sur les problèmes articiaux et les défis internationaux, et confirment la valeur de l'approche. En particulier, MOMCTS est capable de découvrir les politiques se trouvant dans les régions non-Convexes du front Pareto, qui contraste avec l'état de l'art: les algorithmes d'apprentissage par renforcement multi-Objectif existants sont basés sur scalarization linéaire et donc ne sont pas capables de explorer ces régions non-Convexes. Enfin, MOMCTS a fait honorablement la concurrence avec l'état de l'art sur la compétition internationale de MOPTSP 2013. / This thesis is concerned with multi-Objective sequential decision making (MOSDM). The motivation is twofold. On the one hand, many decision problems in the domains of e.g., robotics, scheduling or games, involve the optimization of sequences of decisions. On the other hand, many real-World applications are most naturally formulated in terms of multi-Objective optimization (MOO). The proposed approach extends the well-Known Monte-Carlo tree search (MCTS) framework to the MOO setting, with the goal of discovering several optimal sequences of decisions through growing a single search tree. The main challenge is to propose a new reward, able to guide the exploration of the tree although the MOO setting does not enforce a total order among solutions. The main contribution of the thesis is to propose and experimentally study two such rewards, inspired from the MOO literature and assessing a solution with respect to the archive of previous solutions (Pareto archive): the hypervolume indicator and the Pareto dominance reward. The study shows the complementarity of these two criteria. The hypervolume indicator suffers from its known computational complexity; however the proposed extension thereof provides fine-Grained information about the quality of solutions with respect to the current archive. Quite the contrary, the Pareto-Dominance reward is linear but it provides increasingly rare information. Proofs of principle of the approach are given on artificial problems and challenges, and confirm the merits of the approach. In particular, MOMCTS is able to discover policies lying in non-Convex regions of the Pareto front, contrasting with the state of the art: existing Multi-Objective Reinforcement Learning algorithms are based on linear scalarization and thus fail to sample such non-Convex regions. Finally MOMCTS honorably competes with the state of the art on the 2013 MOPTSP competition. Apprentissage par renforcement Recherche arborescente Monte-Carlo Optimisation multi-objectif Prise de décisions séquentielles Reinforcement learning Monte-Carlo tree search Multi-objective optimization Sequential decision making
10	Processos de decisão Markovianos com probabilidades imprecisas e representações relacionais: algoritmos e fundamentos. / Markov decision processes with imprecise probabilities and relational representations: foundations and algorithms. Shirota Filho, Ricardo 03 May 2012 (has links) Este trabalho é dedicado ao desenvolvimento teórico e algorítmico de processos de decisão markovianos com probabilidades imprecisas e representações relacionais. Na literatura, essa configuração tem sido importante dentro da área de planejamento em inteligência artificial, onde o uso de representações relacionais permite obter descrições compactas, e o emprego de probabilidades imprecisas resulta em formas mais gerais de incerteza. São três as principais contribuições deste trabalho. Primeiro, efetua-se uma discussão sobre os fundamentos de tomada de decisão sequencial com probabilidades imprecisas, em que evidencia-se alguns problemas ainda em aberto. Esses resultados afetam diretamente o (porém não restrito ao) modelo de interesse deste trabalho, os processos de decisão markovianos com probabilidades imprecisas. Segundo, propõe-se três algoritmos para processos de decisão markovianos com probabilidades imprecisas baseadas em programação (otimização) matemática. E terceiro, desenvolvem-se ideias propostas por Trevizan, Cozman e de Barros (2008) no uso de variantes do algoritmo Real-Time Dynamic Programming para resolução de problemas de planejamento probabilístico descritos através de versões estendidas da linguagem de descrição de domínios de planejamento (PPDDL). / This work is devoted to the theoretical and algorithmic development of Markov Decision Processes with Imprecise Probabilities and relational representations. In the literature, this configuration is important within artificial intelligence planning, where the use of relational representations allow compact representations and imprecise probabilities result in a more general form of uncertainty. There are three main contributions. First, we present a brief discussion of the foundations of decision making with imprecise probabilities, pointing towards key questions that remain unanswered. These results have direct influence upon the model discussed within this text, that is, Markov Decision Processes with Imprecise Probabilities. Second, we propose three algorithms for Markov Decision Processes with Imprecise Probabilities based on mathematical programming. And third, we develop ideas proposed by Trevizan, Cozman e de Barros (2008) on the use of variants of Real-Time Dynamic Programming to solve problems of probabilistic planning described by an extension of the Probabilistic Planning Domain Definition Language (PPDDL). Algorithm Algoritmos Foundations Fundamentos Imprecise probabilities Markov decision process Probabilidades imprecisas Processo de decisão Markoviano Relational representations Representações relacionais Sequential decision making Tomada de decisão sequencial

Search results