Global ETD Search

1	Testing specifications in partial observability models : a Bayesian encompassing approach Almeida, Carlos 04 October 2007 (has links) A structural approach for modelling a statistical problem permits to introduce a contextual theory based in previous knowledge. This approach makes the parameters completely meaningful; but, in the intermediate steps, some unobservable characteristics are introduced because of their contextual meaning. When the model is completely specified, the marginalisation into the observed variables is operated in order to obtain a tatistical model. The variables can be discrete or continuous both at the level of unobserved and at the level of observed or manifest variables. We are sometimes faced, especially in behavioural sciences, with ordinal variables; this is the case of the so-called Likert scales. Therefore, an ordinal variable could be nterpreted as a discrete version of a latent concept (the discretization model). The normality of the latent variables simplifies the study of this model into the analysis of the structure of the covariance matrix of the "ideally" measured variables, but only a sub-parameter of these matrix can be identified and consistently estimated (i.e. the matrix of polychoric correlations). Consequently, two questions rise here: Is the normality of the latent variables testable? If not, what is the aspect of this hypothesis which could be testable?. In the discretization model, we observe a loss of information with related to the information contained in the latent variables. In order to treat this situation we introduce the concept of partial observability through a (non bijective) measurable function of the latent variable. We explore this definition and verify that other models can be adjusted to this concept. The definition of partial observability permits us to distinguish between two cases depending on whether the involved function is or not depending on a Euclidean parameter. Once the partial observability is introduced, we expose a set of conditions for building a specification test at the level of latent variables. The test is built using the encompassing principle in a Bayesian framework. More precisely, the problem treated in this thesis is: How to test, in a Bayesian framework, the multivariate normality of a latent vector when only a discretized version of that vector is observed. More generally, the problem can be extended to (or re-paraphrased in): How to test, in Bayesian framework, a parametric specification on latent variables against a nonparametric alternative when only a partial observation of these latent variables is available. Bayesian statistics Partial observability Encompassing test
2	Learning with Deictic Representation Finney, Sarah, Gardiol, Natalia H., Kaelbling, Leslie Pack, Oates, Tim 10 April 2002 (has links) Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a naive propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects. AI Reinforcement Learning Partial Observability Representations
3	Decision-Theoretic Meta-reasoning in Partially Observable and Decentralized Settings Carlin, Alan Scott 01 February 2012 (has links) This thesis examines decentralized meta-reasoning. For a single agent or multiple agents, it may not be enough for agents to compute correct decisions if they do not do so in a timely or resource efficient fashion. The utility of agent decisions typically increases with decision quality, but decreases with computation time. The reasoning about one's computation process is referred to as meta-reasoning. Aspects of meta-reasoning considered in this thesis include the reasoning about how to allocate computational resources, including when to stop one type of computation and begin another, and when to stop all computation and report an answer. Given a computational model, this translates into computing how to schedule the basic computations that solve a problem. This thesis constructs meta-reasoning strategies for the purposes of monitoring and control in multi-agent settings, specifically settings that can be modeled by the Decentralized Partially Observable Markov Decision Process (Dec-POMDP). It uses decision theory to optimize computation for efficiency in time and space in communicative and non-communicative decentralized settings. Whereas base-level reasoning describes the optimization of actual agent behaviors, the meta-reasoning strategies produced by this thesis dynamically optimize the computational resources which lead to the selection of base-level behaviors. Agents Dec-POMDP MDP Meta-reasoning Multiagent Partial Observability Computer Sciences
4	Geometry of Optimization in Markov Decision Processes and Neural Network-Based PDE Solvers Müller, Johannes 07 June 2024 (has links) This thesis is divided into two parts dealing with the optimization problems in Markov decision processes (MDPs) and different neural network-based numerical solvers for partial differential equations (PDEs). In Part I we analyze the optimization problem arising in (partially observable) Markov decision processes using tools from algebraic statistics and information geometry, which can be viewed as neighboring fields of applied algebra and differential geometry, respectively. Here, we focus on infinite horizon problems and memoryless stochastic policies. Markov decision processes provide a mathematical framework for sequential decision-making on which most current reinforcement learning algorithms are built. They formalize the task of optimally controlling the state of a system through appropriate actions. For fully observable problems, the action can be selected knowing the current state of the system. This case has been studied extensively and optimizing the action selection is known to be equivalent to solving a linear program over the (generalized) stationary distributions of the Markov decision process, which are also referred to as state-action frequencies. In Chapter 3, we study partially observable problems where an action must be chosen based solely on an observation of the current state, which might not fully reveal the underlying state. We characterize the feasible state-action frequencies of partially observable Markov decision processes by polynomial inequalities. In particular, the optimization problem in partially observable MDPs is described as a polynomially constrained linear objective program that generalizes the (dual) linear programming formulation of fully observable problems. We use this to study the combinatorial and algebraic complexity of this optimization problem and to upper bound the number of critical points over the individual boundary components of the feasible set. Furthermore, we show that our polynomial programming formulation can be used to effectively solve partially observable MDPs using interior point methods, numerical algebraic techniques, and convex relaxations. Gradient-based methods, including variants of natural gradient methods, have gained tremendous attention in the theoretical reinforcement learning community, where they are commonly referred to as (natural) policy gradient methods. In Chapter 4, we provide a unified treatment of a variety of natural policy gradient methods for fully observable problems by studying their state-action frequencies from the standpoint of information geometry. For a variety of NPGs and reward functions, we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Morimura and co-authors and Kakade by observing that these arise from the Hessian geometries of the entropy and conditional entropy, respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. We provide experimental evidence indicating that our predicted rates are essentially tight. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the inverse penalization strength, which recovers existing results as special cases. Part II addresses neural network-based PDE solvers that have recently experienced tremendous growth in popularity and attention in the scientific machine learning community. We focus on two approaches that represent the approximation of a solution of a PDE as the minimization over the parameters of a neural network: the deep Ritz method and physically informed neural networks. In Chapter 5, we study the theoretical properties of the boundary penalty for these methods and obtain a uniform convergence result for the deep Ritz method for a large class of potentially nonlinear problems. For linear PDEs, we estimate the error of the deep Ritz method in terms of the optimization error, the approximation capabilities of the neural network, and the strength of the penalty. This reveals a trade-off in the choice of the penalization strength, where too little penalization allows large boundary values, and too strong penalization leads to a poor solution of the PDE inside the domain. For physics-informed networks, we show that when working with neural networks that have zero boundary values also the second derivatives of the solution are approximated whereas otherwise only lower-order derivatives are approximated. In Chapter 6, we propose energy natural gradient descent, a natural gradient method with respect to second-order information in the function space, as an optimization algorithm for physics-informed neural networks and the deep Ritz method. We show that this method, which can be interpreted as a generalized Gauss-Newton method, mimics Newton’s method in function space except for an orthogonal projection onto the tangent space of the model. We show that for a variety of PDEs, natural energy gradients converge rapidly and approximations to the solution of the PDE are several orders of magnitude more accurate than gradient descent, Adam and Newton’s methods, even when these methods are given more computational time.:Chapter 1. Introduction 1 1.1 Notation and conventions 7 Part I. Geometry of Markov decision processes 11 Chapter 2. Background on Markov decision processes 12 2.1 State-action frequencies 19 2.2 The advantage function and Bellman optimality 23 2.3 Rational structure of the reward and an explicit line theorem 26 2.4 Solution methods for Markov decision processes 35 Chapter 3. State-action geometry of partially observable MDPs 44 3.1 The state-action polytope of fully observables systems 45 3.2 State-action geometry of partially observable systems 54 3.3 Number and location of critical points 69 3.4 Reward optimization in state-action space (ROSA) 83 Chapter 4. Geometry and convergence of natural policy gradient methods 94 4.1 Natural gradients 96 4.2 Natural policy gradient methods 101 4.3 Convergence of natural policy gradient flows 107 4.4 Locally quadratic convergence for regularized problems 128 4.5 Discussion and outlook 131 Part II. Neural network-based PDE solvers 133 Chapter 5. Theoretical analysis of the boundary penalty method for neural network-based PDE solvers 134 5.1 Presentation and discussion of the main results 137 5.2 Preliminaries regarding Sobolev spaces and neural networks 146 5.3 Proofs regarding uniform convergence for the deep Ritz method 150 5.4 Proofs of error estimates for the deep Ritz method 156 5.5 Proofs of implications of exact boundary values in residual minimization 167 Chapter 6. Energy natural gradients for neural network-based PDE solvers 174 6.1 Energy natural gradients 176 6.2 Experiments 183 6.3 Conclusion and outlook 192 Bibliography 193 info:eu-repo/classification/ddc/500 ddc:500
5	A belief-desire-intention architechture with a logic-based planner for agents in stochastic domains Rens, Gavin B. 02 1900 (has links) This dissertation investigates high-level decision making for agents that are both goal and utility driven. We develop a partially observable Markov decision process (POMDP) planner which is an extension of an agent programming language called DTGolog, itself an extension of the Golog language. Golog is based on a logic for reasoning about action—the situation calculus. A POMDP planner on its own cannot cope well with dynamically changing environments and complicated goals. This is exactly a strength of the belief-desire-intention (BDI) model: BDI theory has been developed to design agents that can select goals intelligently, dynamically abandon and adopt new goals, and yet commit to intentions for achieving goals. The contribution of this research is twofold: (1) developing a relational POMDP planner for cognitive robotics, (2) specifying a preliminary BDI architecture that can deal with stochasticity in action and perception, by employing the planner. / Computing / M. Sc. (Computer Science) Cognitive robotics Intelligent agents Partial observability Situation calculus Planning POMDP Belief-desire-intention paradigm BDI theory Logic Situation calculus Golog 006.3 Markov processes Statistical decision Dynamic programming Robots -- Control systems Automatic control Robotics
6	Automatic State Construction using Decision Trees for Reinforcement Learning Agents Au, Manix January 2005 (has links) Reinforcement Learning (RL) is a learning framework in which an agent learns a policy from continual interaction with the environment. A policy is a mapping from states to actions. The agent receives rewards as feedback on the actions performed. The objective of RL is to design autonomous agents to search for the policy that maximizes the expectation of the cumulative reward. When the environment is partially observable, the agent cannot determine the states with certainty. These states are called hidden in the literature. An agent that relies exclusively on the current observations will not always find the optimal policy. For example, a mobile robot needs to remember the number of doors went by in order to reach a specific door, down a corridor of identical doors. To overcome the problem of partial observability, an agent uses both current and past (memory) observations to construct an internal state representation, which is treated as an abstraction of the environment. This research focuses on how features of past events are extracted with variable granularity regarding the internal state construction. The project introduces a new method that applies Information Theory and decision tree technique to derive a tree structure, which represents the state and the policy. The relevance, of a candidate feature, is assessed by the Information Gain Ratio ranking with respect to the cumulative expected reward. Experiments carried out on three different RL tasks have shown that our variant of the U-Tree (McCallum, 1995) produces a more robust state representation and faster learning. This better performance can be explained by the fact that the Information Gain Ratio exhibits a lower variance in return prediction than the Kolmogorov-Smirnov statistical test used in the original U-Tree algorithm. Reinforcement learning State Action Reward Policy Value based method Policy search method Automatic state construction Decision tree Partial observability U-Tree Kolmogorov-Smirnov two sample test Information gain ratio test
7	A belief-desire-intention architechture with a logic-based planner for agents in stochastic domains Rens, Gavin B. 02 1900 (has links) This dissertation investigates high-level decision making for agents that are both goal and utility driven. We develop a partially observable Markov decision process (POMDP) planner which is an extension of an agent programming language called DTGolog, itself an extension of the Golog language. Golog is based on a logic for reasoning about action—the situation calculus. A POMDP planner on its own cannot cope well with dynamically changing environments and complicated goals. This is exactly a strength of the belief-desire-intention (BDI) model: BDI theory has been developed to design agents that can select goals intelligently, dynamically abandon and adopt new goals, and yet commit to intentions for achieving goals. The contribution of this research is twofold: (1) developing a relational POMDP planner for cognitive robotics, (2) specifying a preliminary BDI architecture that can deal with stochasticity in action and perception, by employing the planner. / Computing / M. Sc. (Computer Science) Cognitive robotics Intelligent agents Partial observability Situation calculus Planning POMDP Belief-desire-intention paradigm BDI theory Logic Situation calculus Golog 006.3 Markov processes Statistical decision Dynamic programming Robots -- Control systems Automatic control Robotics
8	Το "παράδοξο της ενέργειας" στην ελληνική βιομηχανία : έκταση, υιοθέτηση τεχνολογιών εξοικονόμησης ενέργειας και αντιρρύπανσης και επιδράσεις στην απόδοση, αποτελεσματικότητα και παραγωγικότητα Κουνετάς, Κωνσταντίνος 13 April 2009 (has links) Το πρόβλημα της κλιματικής αλλαγής αποτελεί έναν από τα κυριότερα σημεία έντονου ενδιαφέροντος για τις περισσότερες χώρες. Μάλιστα, τα επόμενα χρόνια αναμένεται να δοθεί μεγαλύτερη προσοχή στην ανάπτυξη πολιτικών που θα μειώνουν τις εκπομπές ρυπογόνων αέριων ρύπων. Η εξοικονόμηση ενέργειας, ως μέτρο πολιτικής, θα συνεχίσει να αποτελεί μια σημαντική στρατηγική ανάπτυξης για την οικονομία της χώρας μας, μια και συνδέεται σε σημαντικό βαθμό, με την κατανάλωση ενέργειας όπως και με την μείωση των εκπομπών αερίων ρύπων. Επιπλέον, συντονισμένες προσπάθειες τόσο από την Ευρωπαϊκή Ένωση όσο και από άλλους οργανισμούς (IEA, OECD) θέτουν σε βασικό άξονα προτεραιότητας την μείωση της κατανάλωσης ενέργειας, την χρησιμοποίηση εναλλακτικών μορφών και ανανεώσιμων πηγών και την μείωση των ρυπογόνων εκπομπών ιδιαίτερα στον βιομηχανικό κλάδο. Στα πλαίσια της παρούσας διατριβής αναλύονται θέματα που σχετίζονται με την υιοθέτηση τεχνολογιών εξοικονόμησης ενέργειας από Βιομηχανικές επιχειρήσεις. Κεντρικό στοιχείο αυτής της προσέγγισης είναι το “Παράδοξο της Ενεργειακής Αποδοτικότητας”. Τρία συγκεκριμένα θέματα εξετάζονται σε αυτή την κατεύθυνση. Πρώτον, η διερεύνηση των παραγόντων που οδηγούν στην εμφάνιση του “παραδόξου της ενεργειακής αποτελεσματικότητας” και συγκεκριμένα αν οι αποφάσεις των επιχειρήσεων για υιοθέτηση τέτοιων τεχνολογιών συνυπολογίζουν το στοιχείο της αποδοτικότητας των επενδεδυμένων κεφαλαίων. Δεύτερον, και με δεδομένο ότι στα αποτελέσματα του προηγούμενου σταδίου ανάλυσης αναδεικνύουν την σημαντικότητα του παράγοντα της πληροφορίας, αναπτύσσεται μια εκτενής προσέγγιση που αφορά τόσο το περιεχόμενο όσο και τον ρόλο του παράγοντα της πληροφορίας στην διαδικασία υιοθέτησης ΤΕΕ. Τρίτο, διερευνάται η επίδραση της υιοθέτησης των ΤΕΕ στην παραγωγική αποτελεσματικότητα και παραγωγικότητα των βιομηχανικών επιχειρήσεων. Για τις ανάγκες της ανάλυσης αυτών των θεμάτων αναπτύσσονται δύο επιμέρους μικροοικονομικά υποδείγματα και μια μέθοδος μέτρησης της παραγωγικότητας σε ετερογενής τεχνολογίες. Το πρώτο μικροοικονομικό υπόδειγμα διερευνά την διαδικασία λήψης επενδυτικών αποφάσεων σε ΤΕΕ υπό το πρίσμα της συσχέτισης της επενδυτικής επιλογής με την κερδοφορία, σε πλαίσιο μερικής παρατηρησιμότητας. Το δεύτερο μικροοικονομικό υπόδειγμα επανατοποθετεί την έννοια της πληροφορίας και διερευνά τους παράγοντες που προσδιορίζουν το επίπεδο πληροφόρησης της επιχείρησης για ΤΕΕ. Τέλος για την μέτρηση της επίδρασης των ΤΕΕ στην παραγωγική αποτελεσματικότητα και παραγωγικότητα αναπτύσσεται μια μέθοδος που λαμβάνει ρητά υπόψη της την τεχνολογική ετερογένεια. Η διερεύνηση των τριών αυτών ζητημάτων βασίζεται στην ανάλυση εμπειρικών δεδομένων που αφορούν επιχειρήσεις οι οποίες ενέταξαν στην παραγωγική τους διαδικασία τεχνολογίες εξοικονόμησης ενέργειας στην περίοδο 1990-2004. Οι επενδύσεις αυτές επιδοτήθηκαν κυρίως στα πλαίσια του Β’ και Γ’ Κοινοτικού Πλαισίου Στήριξης. Τα δεδομένα συλλέχθηκαν με την μέθοδο των προσωπικών συνεντεύξεων (ερωτηματολόγια). Συμπληρωματικά δεδομένα αντλήθηκαν από την βάση δεδομένων της ICAP. / The improvement for energy efficiency is generally viewed as an important option to reduce greenhouse gas emissions and environmental damage caused by other pollutants (e.g. NOX,SOX). Moreover, is clearly interwoven with the exploitation of new and innovative technologies through the production process and its consequent paradox, the so called “energy efficiency paradox”. This paradox has recently attracted the interest of researchers and organizations (IEA,OECD) in an attempt to bring to light the source of it, the causalities between the adoption of energy efficient technology (EET) and the behaviour of firms . Three research questions have been examined in the specific Phd Thesis. Our first main research question were examined by formulating and testing the following hypothesis: the decision of the firms to adopt or not EET, is correlated to their profitability. Our second research project develops in two stages. The first stage aims at examining the factors influencing retrieval of information concerning EETs by manufacturing firms, while at the second stage we distinguish between readily available and emerging energy efficiency technologies and examine the factors affecting information acquisition for each one of these two broad sets of technologies. Finally, in order to disentangle firm’s heterogeneity we developed a methodological framework to calculate total factor productivity and its components differences arising from EETs adoption. Our first research question examines the energy efficiency paradox demonstrated in Greek manufacturing firms through a partial observability approach. Maximum likelihood estimates that arise from an incidental truncation model reveal that the adoption of the energy saving technologies is indeed strongly correlated to the returns of assets that are required in order to undertake the corresponding investments. The source of the energy efficiency paradox lies within a wide range of factors. Policy schemes that aim to increase the adoption rate of energy saving technologies within the field of manufacturing are significantly affected by differences in the size of firms. Finally, mixed policies seem to be more effective than polices that are only capital subsidy or regulation oriented. Answering the second research question, we aim to redefine the notion of awareness regarding the adoption of EETs. In a second stage we explore the crucial factors that affect the information level of EET adopters, distinguishing between epidemic and emerging technologies information. Our empirical findings reveal that the main factor that exerts positive influence on the level of information acquired by the firms may be encompassed in a set of variables that reflect what may be called a “business culture” regarding the EET Finally, we examined the impact of EETs adoption to Greek manufacturing firms operating under heterogeneous technology sets and we measured the components of total factor productivity (TFP) and its components arising from scale and technological differences. In order to examine our research questions we formulate a unique database. Our database came to light from the necessity of the Greek government to conserve energy in manufacturing and to reduce dangerous emissions in order to meet the criteria of the Kyoto Protocol. An extensive questionnaire was addressed to the 298 firms across the country that adopt EETs that have been subsidized from (i) the Support Frameworks for Regional and Industrial Development, (ii) the Energy Operational Program (OPE), which was part of the second European Union Support Framework (1994-2000) and (iii) the Operational Program ‘Competitiveness’, which is part of the third European Union Support Framework (2000-2006). Finally, 161 of them agreed to be interviewed on the basis of the questionnaire. Face to face interviews took place in the first six months of 2004. Additional data derived from ICAP financial database. Νέες τεχνολογίες Υιοθέτηση Ενέργεια 333.791 6 Epidemic information Emerging technologies Energy saving technologies Energy efficiency paradox Partial observability Adoption Production efficiency Heterogeneous technologies Most Productive scale size Energy

Search results