Global ETD Search

11	Multi-objective sequential decision making / La prise de décisions séquentielles multi-objectif Wang, Weijia 11 July 2014 (has links) La présente thèse porte sur l'étude de prise de décisions séquentielles multi-Objectif (MOSDM). La motivation de ce travail est double. D'un côté, la prise de décision, par exemple, dans les domaines de robotique et de planification, concerne l'optimisation séquentielle. De l'autre côté, nombreuses applications dans le monde réel sont plus naturellement formulés en termes d'optimisation multi-Objectif (MOO). La méthode proposée dans la thèse adapte le cadre bien connue de recherche Monte-Carlo arborescente (MCTS) à l'optimisation multi-Objectif, dans lequel multiple séquences de décision optimales sont développées dans un seul arbre de recherche. Le principal défi est de proposer une nouvelle récompense, capable de guider l'exploration de l'arbre bien que le problème de MOO n'applique pas un ordre total entre les solutions. La contribution principale de cette thèse est de proposer et d'étudier expérimentalement ces deux récompenses : l'indicateur de hypervolume et la récompense de dominance Pareto, qui sont inspirées de la littérature de MOO et basés sur une archive de solutions antérieures (archives Pareto). L'étude montre la complémentarité de ces deux récompenses. L'indicateur de hypervolume souffre de sa complexité algorithmique. Cependant, cet indicateur fournit des informations à grains fins de la qualité des solutions à l'égard de l'archive actuelle. Bien au contraire, la complexité de la récompense de dominance Pareto est linéaire, mais cette récompense fournit des informations de plus en plus rare au long de la recherche. Les preuves de principe de l'approche sont donnés sur les problèmes articiaux et les défis internationaux, et confirment la valeur de l'approche. En particulier, MOMCTS est capable de découvrir les politiques se trouvant dans les régions non-Convexes du front Pareto, qui contraste avec l'état de l'art: les algorithmes d'apprentissage par renforcement multi-Objectif existants sont basés sur scalarization linéaire et donc ne sont pas capables de explorer ces régions non-Convexes. Enfin, MOMCTS a fait honorablement la concurrence avec l'état de l'art sur la compétition internationale de MOPTSP 2013. / This thesis is concerned with multi-Objective sequential decision making (MOSDM). The motivation is twofold. On the one hand, many decision problems in the domains of e.g., robotics, scheduling or games, involve the optimization of sequences of decisions. On the other hand, many real-World applications are most naturally formulated in terms of multi-Objective optimization (MOO). The proposed approach extends the well-Known Monte-Carlo tree search (MCTS) framework to the MOO setting, with the goal of discovering several optimal sequences of decisions through growing a single search tree. The main challenge is to propose a new reward, able to guide the exploration of the tree although the MOO setting does not enforce a total order among solutions. The main contribution of the thesis is to propose and experimentally study two such rewards, inspired from the MOO literature and assessing a solution with respect to the archive of previous solutions (Pareto archive): the hypervolume indicator and the Pareto dominance reward. The study shows the complementarity of these two criteria. The hypervolume indicator suffers from its known computational complexity; however the proposed extension thereof provides fine-Grained information about the quality of solutions with respect to the current archive. Quite the contrary, the Pareto-Dominance reward is linear but it provides increasingly rare information. Proofs of principle of the approach are given on artificial problems and challenges, and confirm the merits of the approach. In particular, MOMCTS is able to discover policies lying in non-Convex regions of the Pareto front, contrasting with the state of the art: existing Multi-Objective Reinforcement Learning algorithms are based on linear scalarization and thus fail to sample such non-Convex regions. Finally MOMCTS honorably competes with the state of the art on the 2013 MOPTSP competition. Apprentissage par renforcement Recherche arborescente Monte-Carlo Optimisation multi-objectif Prise de décisions séquentielles Reinforcement learning Monte-Carlo tree search Multi-objective optimization Sequential decision making
12	Processos de decisão Markovianos com probabilidades imprecisas e representações relacionais: algoritmos e fundamentos. / Markov decision processes with imprecise probabilities and relational representations: foundations and algorithms. Shirota Filho, Ricardo 03 May 2012 (has links) Este trabalho é dedicado ao desenvolvimento teórico e algorítmico de processos de decisão markovianos com probabilidades imprecisas e representações relacionais. Na literatura, essa configuração tem sido importante dentro da área de planejamento em inteligência artificial, onde o uso de representações relacionais permite obter descrições compactas, e o emprego de probabilidades imprecisas resulta em formas mais gerais de incerteza. São três as principais contribuições deste trabalho. Primeiro, efetua-se uma discussão sobre os fundamentos de tomada de decisão sequencial com probabilidades imprecisas, em que evidencia-se alguns problemas ainda em aberto. Esses resultados afetam diretamente o (porém não restrito ao) modelo de interesse deste trabalho, os processos de decisão markovianos com probabilidades imprecisas. Segundo, propõe-se três algoritmos para processos de decisão markovianos com probabilidades imprecisas baseadas em programação (otimização) matemática. E terceiro, desenvolvem-se ideias propostas por Trevizan, Cozman e de Barros (2008) no uso de variantes do algoritmo Real-Time Dynamic Programming para resolução de problemas de planejamento probabilístico descritos através de versões estendidas da linguagem de descrição de domínios de planejamento (PPDDL). / This work is devoted to the theoretical and algorithmic development of Markov Decision Processes with Imprecise Probabilities and relational representations. In the literature, this configuration is important within artificial intelligence planning, where the use of relational representations allow compact representations and imprecise probabilities result in a more general form of uncertainty. There are three main contributions. First, we present a brief discussion of the foundations of decision making with imprecise probabilities, pointing towards key questions that remain unanswered. These results have direct influence upon the model discussed within this text, that is, Markov Decision Processes with Imprecise Probabilities. Second, we propose three algorithms for Markov Decision Processes with Imprecise Probabilities based on mathematical programming. And third, we develop ideas proposed by Trevizan, Cozman e de Barros (2008) on the use of variants of Real-Time Dynamic Programming to solve problems of probabilistic planning described by an extension of the Probabilistic Planning Domain Definition Language (PPDDL). Algorithm Algoritmos Foundations Fundamentos Imprecise probabilities Markov decision process Probabilidades imprecisas Processo de decisão Markoviano Relational representations Representações relacionais Sequential decision making Tomada de decisão sequencial
13	Policy Explanation and Model Refinement in Decision-Theoretic Planning Khan, Omar Zia January 2013 (has links) Decision-theoretic systems, such as Markov Decision Processes (MDPs), are used for sequential decision-making under uncertainty. MDPs provide a generic framework that can be applied in various domains to compute optimal policies. This thesis presents techniques that offer explanations of optimal policies for MDPs and then refine decision theoretic models (Bayesian networks and MDPs) based on feedback from experts. Explaining policies for sequential decision-making problems is difficult due to the presence of stochastic effects, multiple possibly competing objectives and long-range effects of actions. However, explanations are needed to assist experts in validating that the policy is correct and to help users in developing trust in the choices recommended by the policy. A set of domain-independent templates to justify a policy recommendation is presented along with a process to identify the minimum possible number of templates that need to be populated to completely justify the policy. The rejection of an explanation by a domain expert indicates a deficiency in the model which led to the generation of the rejected policy. Techniques to refine the model parameters such that the optimal policy calculated using the refined parameters would conform with the expert feedback are presented in this thesis. The expert feedback is translated into constraints on the model parameters that are used during refinement. These constraints are non-convex for both Bayesian networks and MDPs. For Bayesian networks, the refinement approach is based on Gibbs sampling and stochastic hill climbing, and it learns a model that obeys expert constraints. For MDPs, the parameter space is partitioned such that alternating linear optimization can be applied to learn model parameters that lead to a policy in accordance with expert feedback. In practice, the state space of MDPs can often be very large, which can be an issue for real-world problems. Factored MDPs are often used to deal with this issue. In Factored MDPs, state variables represent the state space and dynamic Bayesian networks model the transition functions. This helps to avoid the exponential growth in the state space associated with large and complex problems. The approaches for explanation and refinement presented in this thesis are also extended for the factored case to demonstrate their use in real-world applications. The domains of course advising to undergraduate students, assisted hand-washing for people with dementia and diagnostics for manufacturing are used to present empirical evaluations. Decision-Theoretic Planning Markov Decision Processes Sequential Decision Making Reasoning under Uncertainty Bayesian Networks Policy Explanation Parameter Learning Model Refinement Computer Science
14	Policy Explanation and Model Refinement in Decision-Theoretic Planning Khan, Omar Zia January 2013 (has links) Decision-theoretic systems, such as Markov Decision Processes (MDPs), are used for sequential decision-making under uncertainty. MDPs provide a generic framework that can be applied in various domains to compute optimal policies. This thesis presents techniques that offer explanations of optimal policies for MDPs and then refine decision theoretic models (Bayesian networks and MDPs) based on feedback from experts. Explaining policies for sequential decision-making problems is difficult due to the presence of stochastic effects, multiple possibly competing objectives and long-range effects of actions. However, explanations are needed to assist experts in validating that the policy is correct and to help users in developing trust in the choices recommended by the policy. A set of domain-independent templates to justify a policy recommendation is presented along with a process to identify the minimum possible number of templates that need to be populated to completely justify the policy. The rejection of an explanation by a domain expert indicates a deficiency in the model which led to the generation of the rejected policy. Techniques to refine the model parameters such that the optimal policy calculated using the refined parameters would conform with the expert feedback are presented in this thesis. The expert feedback is translated into constraints on the model parameters that are used during refinement. These constraints are non-convex for both Bayesian networks and MDPs. For Bayesian networks, the refinement approach is based on Gibbs sampling and stochastic hill climbing, and it learns a model that obeys expert constraints. For MDPs, the parameter space is partitioned such that alternating linear optimization can be applied to learn model parameters that lead to a policy in accordance with expert feedback. In practice, the state space of MDPs can often be very large, which can be an issue for real-world problems. Factored MDPs are often used to deal with this issue. In Factored MDPs, state variables represent the state space and dynamic Bayesian networks model the transition functions. This helps to avoid the exponential growth in the state space associated with large and complex problems. The approaches for explanation and refinement presented in this thesis are also extended for the factored case to demonstrate their use in real-world applications. The domains of course advising to undergraduate students, assisted hand-washing for people with dementia and diagnostics for manufacturing are used to present empirical evaluations. Decision-Theoretic Planning Markov Decision Processes Sequential Decision Making Reasoning under Uncertainty Bayesian Networks Policy Explanation Parameter Learning Model Refinement Computer Science
15	Contributions to Simulation-based High-dimensional Sequential Decision Making Hoock, Jean-Baptiste 10 April 2013 (has links) (PDF) My thesis is entitled "Contributions to Simulation-based High-dimensional Sequential Decision Making". The context of the thesis is about games, planning and Markov Decision Processes. An agent interacts with its environment by successively making decisions. The agent starts from an initial state until a final state in which the agent can not make decision anymore. At each timestep, the agent receives an observation of the state of the environment. From this observation and its knowledge, the agent makes a decision which modifies the state of the environment. Then, the agent receives a reward and a new observation. The goal is to maximize the sum of rewards obtained during a simulation from an initial state to a final state. The policy of the agent is the function which, from the history of observations, returns a decision. We work in a context where (i) the number of states is huge, (ii) reward carries little information, (iii) the probability to reach quickly a good final state is weak and (iv) prior knowledge is either nonexistent or hardly exploitable. Both applications described in this thesis present these constraints : the game of Go and a 3D simulator of the european project MASH (Massive Sets of Heuristics). In order to take a satisfying decision in this context, several solutions are brought : 1. Simulating with the compromise exploration/exploitation (MCTS) 2. Reducing the complexity by local solving (GoldenEye) 3. Building a policy which improves itself (RBGP) 4. Learning prior knowledge (CluVo+GMCTS) Monte-Carlo Tree Search (MCTS) is the state of the art for the game of Go. From a model of the environment, MCTS builds incrementally and asymetrically a tree of possible futures by performing Monte-Carlo simulations. The tree starts from the current observation of the agent. The agent switches between the exploration of the model and the exploitation of decisions which statistically give a good cumulative reward. We discuss 2 ways for improving MCTS : the parallelization and the addition of prior knowledge. The parallelization does not solve some weaknesses of MCTS; in particular some local problems remain challenges. We propose an algorithm (GoldenEye) which is composed of 2 parts : detection of a local problem and then its resolution. The algorithm of resolution reuses some concepts of MCTS and it solves difficult problems of a classical database. The addition of prior knowledge by hand is laborious and boring. We propose a method called Racing-based Genetic Programming (RBGP) in order to add automatically prior knowledge. The strong point is that RBGP rigorously validates the addition of a prior knowledge and RBGP can be used for building a policy (instead of only optimizing an algorithm). In some applications such as MASH, simulations are too expensive in time and there is no prior knowledge and no model of the environment; therefore Monte-Carlo Tree Search can not be used. So that MCTS becomes usable in this context, we propose a method for learning prior knowledge (CluVo). Then we use pieces of prior knowledge for improving the rapidity of learning of the agent and for building a model, too. We use from this model an adapted version of Monte-Carlo Tree Search (GMCTS). This method solves difficult problems of MASH and gives good results in an application to a word game. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Monte Carlo Tree Search Learning from simulations Games Planning Markov decision process MoGo MASH
16	Lexicographic refinements in possibilistic sequential decision-making models / Raffinements lexicographiques en prise de décision séquentielle possibiliste El Khalfi, Zeineb 31 October 2017 (has links) Ce travail contribue à la théorie de la décision possibiliste et plus précisément à la prise de décision séquentielle dans le cadre de la théorie des possibilités, à la fois au niveau théorique et pratique. Bien qu'attrayante pour sa capacité à résoudre les problèmes de décision qualitatifs, la théorie de la décision possibiliste souffre d'un inconvénient important : les critères d'utilité qualitatives possibilistes comparent les actions avec les opérateurs min et max, ce qui entraîne un effet de noyade. Pour surmonter ce manque de pouvoir décisionnel, plusieurs raffinements ont été proposés dans la littérature. Les raffinements lexicographiques sont particulièrement intéressants puisqu'ils permettent de bénéficier de l'arrière-plan de l'utilité espérée, tout en restant "qualitatifs". Cependant, ces raffinements ne sont définis que pour les problèmes de décision non séquentiels. Dans cette thèse, nous présentons des résultats sur l'extension des raffinements lexicographiques aux problèmes de décision séquentiels, en particulier aux Arbres de Décision et aux Processus Décisionnels de Markov possibilistes. Cela aboutit à des nouveaux algorithmes de planification plus "décisifs" que leurs contreparties possibilistes. Dans un premier temps, nous présentons des relations de préférence lexicographiques optimistes et pessimistes entre les politiques avec et sans utilités intermédiaires, qui raffinent respectivement les utilités possibilistes optimistes et pessimistes. Nous prouvons que les critères proposés satisfont le principe de l'efficacité de Pareto ainsi que la propriété de monotonie stricte. Cette dernière garantit la possibilité d'application d'un algorithme de programmation dynamique pour calculer des politiques optimales. Nous étudions tout d'abord l'optimisation lexicographique des politiques dans les Arbres de Décision possibilistes et les Processus Décisionnels de Markov à horizon fini. Nous fournissons des adaptations de l'algorithme de programmation dynamique qui calculent une politique optimale en temps polynomial. Ces algorithmes sont basés sur la comparaison lexicographique des matrices de trajectoires associées aux sous-politiques. Ce travail algorithmique est complété par une étude expérimentale qui montre la faisabilité et l'intérêt de l'approche proposée. Ensuite, nous prouvons que les critères lexicographiques bénéficient toujours d'une fondation en termes d'utilité espérée, et qu'ils peuvent être capturés par des utilités espérées infinitésimales. La dernière partie de notre travail est consacrée à l'optimisation des politiques dans les Processus Décisionnels de Markov (éventuellement infinis) stationnaires. Nous proposons un algorithme d'itération de la valeur pour le calcul des politiques optimales lexicographiques. De plus, nous étendons ces résultats au cas de l'horizon infini. La taille des matrices augmentant exponentiellement (ce qui est particulièrement problématique dans le cas de l'horizon infini), nous proposons un algorithme d'approximation qui se limite à la partie la plus intéressante de chaque matrice de trajectoires, à savoir les premières lignes et colonnes. Enfin, nous rapportons des résultats expérimentaux qui prouvent l'efficacité des algorithmes basés sur la troncation des matrices. / This work contributes to possibilistic decision theory and more specifically to sequential decision-making under possibilistic uncertainty, at both the theoretical and practical levels. Even though appealing for its ability to handle qualitative decision problems, possibilisitic decision theory suffers from an important drawback: qualitative possibilistic utility criteria compare acts through min and max operators, which leads to a drowning effect. To overcome this lack of decision power, several refinements have been proposed in the literature. Lexicographic refinements are particularly appealing since they allow to benefit from the expected utility background, while remaining "qualitative". However, these refinements are defined for the non-sequential decision problems only. In this thesis, we present results on the extension of the lexicographic preference relations to sequential decision problems, in particular, to possibilistic Decision trees and Markov Decision Processes. This leads to new planning algorithms that are more "decisive" than their original possibilistic counterparts. We first present optimistic and pessimistic lexicographic preference relations between policies with and without intermediate utilities that refine the optimistic and pessimistic qualitative utilities respectively. We prove that these new proposed criteria satisfy the principle of Pareto efficiency as well as the property of strict monotonicity. This latter guarantees that dynamic programming algorithm can be used for calculating lexicographic optimal policies. Considering the problem of policy optimization in possibilistic decision trees and finite-horizon Markov decision processes, we provide adaptations of dynamic programming algorithm that calculate lexicographic optimal policy in polynomial time. These algorithms are based on the lexicographic comparison of the matrices of trajectories associated to the sub-policies. This algorithmic work is completed with an experimental study that shows the feasibility and the interest of the proposed approach. Then we prove that the lexicographic criteria still benefit from an Expected Utility grounding, and can be represented by infinitesimal expected utilities. The last part of our work is devoted to policy optimization in (possibly infinite) stationary Markov Decision Processes. We propose a value iteration algorithm for the computation of lexicographic optimal policies. We extend these results to the infinite-horizon case. Since the size of the matrices increases exponentially (which is especially problematic in the infinite-horizon case), we thus propose an approximation algorithm which keeps the most interesting part of each matrix of trajectories, namely the first lines and columns. Finally, we reports experimental results that show the effectiveness of the algorithms based on the cutting of the matrices. Décision séquentielle Théorie de possibilités Critères lexicographiques Arbres de décision Processus décisionnels de Markov Sequential decision theory Possibility theory Lexicographic criteria Decision trees Markov decision processes
17	Processos de decisão Markovianos com probabilidades imprecisas e representações relacionais: algoritmos e fundamentos. / Markov decision processes with imprecise probabilities and relational representations: foundations and algorithms. Ricardo Shirota Filho 03 May 2012 (has links) Este trabalho é dedicado ao desenvolvimento teórico e algorítmico de processos de decisão markovianos com probabilidades imprecisas e representações relacionais. Na literatura, essa configuração tem sido importante dentro da área de planejamento em inteligência artificial, onde o uso de representações relacionais permite obter descrições compactas, e o emprego de probabilidades imprecisas resulta em formas mais gerais de incerteza. São três as principais contribuições deste trabalho. Primeiro, efetua-se uma discussão sobre os fundamentos de tomada de decisão sequencial com probabilidades imprecisas, em que evidencia-se alguns problemas ainda em aberto. Esses resultados afetam diretamente o (porém não restrito ao) modelo de interesse deste trabalho, os processos de decisão markovianos com probabilidades imprecisas. Segundo, propõe-se três algoritmos para processos de decisão markovianos com probabilidades imprecisas baseadas em programação (otimização) matemática. E terceiro, desenvolvem-se ideias propostas por Trevizan, Cozman e de Barros (2008) no uso de variantes do algoritmo Real-Time Dynamic Programming para resolução de problemas de planejamento probabilístico descritos através de versões estendidas da linguagem de descrição de domínios de planejamento (PPDDL). / This work is devoted to the theoretical and algorithmic development of Markov Decision Processes with Imprecise Probabilities and relational representations. In the literature, this configuration is important within artificial intelligence planning, where the use of relational representations allow compact representations and imprecise probabilities result in a more general form of uncertainty. There are three main contributions. First, we present a brief discussion of the foundations of decision making with imprecise probabilities, pointing towards key questions that remain unanswered. These results have direct influence upon the model discussed within this text, that is, Markov Decision Processes with Imprecise Probabilities. Second, we propose three algorithms for Markov Decision Processes with Imprecise Probabilities based on mathematical programming. And third, we develop ideas proposed by Trevizan, Cozman e de Barros (2008) on the use of variants of Real-Time Dynamic Programming to solve problems of probabilistic planning described by an extension of the Probabilistic Planning Domain Definition Language (PPDDL). Algoritmos Fundamentos Probabilidades imprecisas Processo de decisão Markoviano Representações relacionais Tomada de decisão sequencial Algorithm Foundations Imprecise probabilities Markov decision process Relational representations Sequential decision making
18	SEQUENTIAL INFORMATION ACQUISITION AND DECISION MAKING IN DESIGN CONTESTS: THEORETICAL AND EXPERIMENTAL STUDIES Murtuza Shergadwala (9183527) 30 July 2020 (has links) <p>The primary research question of this dissertation is, \textit{How do contestants make sequential design decisions under the influence of competition?} To address this question, I study the influence of three factors, that can be controlled by the contest organizers, on the contestants' sequential information acquisition and decision-making behaviors. These factors are (i) a contestant's domain knowledge, (ii) framing of a design problem, and (iii) information about historical contests. The \textit{central hypothesis} is that by conducting controlled behavioral experiments we can acquire data of contestant behaviors that can be used to calibrate computational models of contestants' sequential decision-making behaviors, thereby, enabling predictions about the design outcomes. The behavioral results suggest that (i) contestants better understand problem constraints and generate more feasible design solutions when a design problem is framed in a domain-specific context as compared to a domain-independent context, (ii) contestants' efforts to acquire information about a design artifact to make design improvements are significantly affected by the information provided to them about their opponent who is competing to achieve the same objectives, and (iii) contestants make information acquisition decisions such as when to stop acquiring information, based on various criteria such as the number of resources, the target objective value, and the observed amount of improvement in their design quality. Moreover, the threshold values of such criteria are influenced by the information the contestants have about their opponent. The results imply that (i) by understanding the influence of an individual's domain knowledge and framing of a problem we can provide decision-support tools to the contestants in engineering design contexts to better acquire problem-specific information (ii) we can enable contest designers to decide what information to share to improve the quality of the design outcomes of design contest, and (iii) from an educational standpoint, we can enable instructors to provide students with accurate assessments of their domain knowledge by understanding students' information acquisition and decision making behaviors in their design projects. The \textit{primary contribution} of this dissertation is the computational models of an individual's sequential decision-making process that incorporate the behavioral results discussed above in competitive design scenarios. Moreover, a framework to conduct factorial investigations of human decision making through a combination of theory and behavioral experimentation is illustrated. <br></p> Decision Making Engineering Design Knowledge Engineering Systems Design Models of Engineering Design decision making design contests strategic decision making crowdsourcing sequential decision making engineering design Systems engineering and theory
19	Situation-appropriate Investment of Cognitive Resources Ott, Florian 29 March 2022 (has links) The human brain is equipped with the ability to plan ahead, i.e. to mentally simulate the expected consequences of candidate actions to select the one with the most desirable expected long-term outcome. Insufficient planning can lead to maladaptive behaviour and may even be a contributory cause of important societal problems such as the depletion of natural resources or man-made climate change. Understanding the cognitive and neural mechanisms of forward planning and its regulation are therefore of great importance and could ultimately give us clues on how to better align our behaviour with long-term goals. Apart from its potential beneficial effects, planning is time-consuming and therefore associated with opportunity costs. It is assumed that the brain regulates the investment into planning based on a cost-benefit analysis, so that planning only takes place when the perceived benefits outweigh the costs. But how can the brain know in advance how beneficial or costly planning will be? One potential solution is that people learn from experience how valuable planning would be in a given situation. It is however largely unknown how the brain implements such learning, especially in environments with large state spaces. This dissertation tested the hypothesis that humans construct and use so-called control contexts to efficiently adjust the degree of planning to the demands of the current situation. Control contexts can be seen as abstract state representations, that conveniently cluster together situations with a similar demand for planning. Inferring context thus allows to prospectively adjust the control system to the learned demands of the global context. To test the control context hypothesis, two complex sequential decision making tasks were developed. Each of the two tasks had to fulfil two important criteria. First, the tasks should generate both situations in which planning had the potential to improve performance, as well as situations in which a simple strategy was sufficient. Second, the tasks had to feature rich state spaces requiring participants to compress their state representation for efficient regulation of planning. Participants’ planning was modelled using a parametrized dynamic programming solution to a Markov Decision Process, with parameters estimated via hierarchical Bayesian inference. The first study used a 15-step task in which participants had to make a series of decisions to achieve one or multiple goals. In this task, the computational costs of accurate forward planning increased exponentially with the length of the planning horizon. We therefore hypothesized that participants identify ‘distance from goal’ as the relevant contextual feature to guide their regulation of forward planning. As expected we found that participants predominantly relied on a simple heuristic when still far from the goal but progressively switched towards forward planning when the goal approached. In the second study participants had to sustainably invest a limited but replenishable energy resource, that was needed to accept offers, in order to accumulate a maximum number of points in the long run. The demand for planning varied across the different situations of the task, but due to the large number of possible situations (n = 448) it would be difficult for the participants to develop an expectation for each individual situation of how beneficial planning would be. We therefore hypothesized, that to regulate their forward planning participants used a compressed tasks representation, clustering together states with similar demands for planning. Consistent with this, reaction times (operationalising planning duration) increased with trial-by-trial value-conflict (operationalising approximate planning demand), but this increase was more pronounced in a context with generally high demand for planning. We further found that fMRI activity in the dorsal anterior cingulate cortex (dACC) increased with conflict, but this increase was more pronounced in a context with generally high demand for planning as well. Taken together, the results suggest that the dACC integrates representations of planning demand on different levels of abstraction to regulate prospective information sampling in an efficient and situation-appropriate way. This dissertation provides novel insights into the question how humans adapt their planning to the demands of the current situation. The results are consistent with the view that the regulation of planning is based on an integrated signal of the expected costs and benefits of planning. Furthermore, the results of this dissertation provide evidence that the regulation of planning in environments with real-world complexity critically relies on the brain’s powerful ability to construct and use abstract hierarchical representations. info:eu-repo/classification/ddc/153 ddc:153
20	Oracle-based algorithms for optimizing sophisticated decision criteria in sequential, robust and fair decision problems / Algorithmes à base d'oracles pour optimiser des critères décisionnels sophistiqués pour les problèmes de décision séquentielle, robuste et équitable Gilbert, Hugo 11 December 2017 (has links) Cette thèse s'inscrit dans le cadre de la théorie de la décision algorithmique, qui est une discipline au croisement de la théorie de la décision, la recherche opérationnelle et l'intelligence artificielle. Dans cette thèse, nous étudions l'utilisation de plusieurs modèles décisionnels pour résoudre des problèmes de décision séquentielle dans l'incertain, d'optimisation robuste, et d'optimisation multi-agents équitable. Pour résoudre efficacement ces problèmes, nous utilisons des méthodes de type maître-esclaves, dites à base d'oracles dans la thèse. Ces méthodes permettent de résoudre des problèmes de grande taille en procédant de manière incrémentale. Une attention particulière est portée au modèle de l'espérance d'utilité antisymétrique et bilinéaire, au modèle de l'espérance d'utilité pondérée et à leurs pendants en décision multicritère. L'intérêt de ces modèles est multiple. En effet, ils étendent les modèles standards (e.g., modèle de l'espérance d'utilité) et permettent de représenter un spectre étendu de préférences tout en conservant leurs bonnes propriétés théoriques et algorithmiques. La thèse apporte des réponses sur des aspects théoriques (e.g., résultats de complexité algorithmique) et sur des aspects opérationnels (e.g., conception de méthodes de résolution efficaces) aux problèmes soulevés par l'emploi de ces critères dans les contextes susmentionnés. / This thesis falls within the area of algorithmic decision theory, which is at the crossroads between decision theory, operational research and artificial intelligence. In this thesis, we study several decision models to solve problems in different domains: sequential decision problems under risk, robust optimization problems, and fair multi-agent optimization problems. To solve these problems efficiently, we use master-slave algorithms which solve the problem through an incremental process. These procedures, referred to as oracle methods in the thesis, make it possible to solve problems of large size. A particular attention is given to the skew-symmetric bilinear utility model, the weighted expected utility model and their counterparts in multicriteria decision making. These models are interesting at several respects. They extend the standard models (e.g., the expected utility model) and allow to represent a broader class of preferences while retaining their good theoretical and algorithmic properties. The thesis focuses both on theoretic (e.g., complexity results) and operational (e.g., design of practically efficient solution methods) aspects of the problems raised by the use of these criteria in the domains aforementioned. Théorie de la décision algorithmique Décision séquentielle dans l'incertain Optimisation robuste Optimisation équitable Méthodes à base d'oracles Elicitation de préférences Algorithmic decision theory Oracle methods 003.56

Search results