Global ETD Search

91	Itération sur les politiques optimiste et apprentissage du jeu de Tetris / Optimistic Policy Iteration and Learning the Game of Tetris Thiéry, Christophe 25 November 2010 (has links) Cette thèse s'intéresse aux méthodes d'itération sur les politiques dans l'apprentissage par renforcement à grand espace d'états avec approximation linéaire de la fonction de valeur. Nous proposons d'abord une unification des principaux algorithmes du contrôle optimal stochastique. Nous montrons la convergence de cette version unifiée vers la fonction de valeur optimale dans le cas tabulaire, ainsi qu'une garantie de performances dans le cas où la fonction de valeur est estimée de façon approximative. Nous étendons ensuite l'état de l'art des algorithmes d'approximation linéaire du second ordre en proposant une généralisation de Least-Squares Policy Iteration (LSPI) (Lagoudakis et Parr, 2003). Notre nouvel algorithme, Least-Squares [lambda] Policy Iteration (LS[lambda]PI), ajoute à LSPI un concept venant de [lambda]-Policy Iteration (Bertsekas et Ioffe, 1996) : l'évaluation amortie (ou optimiste) de la fonction de valeur, qui permet de réduire la variance de l'estimation afin d'améliorer l'efficacité de l'échantillonnage. LS[lambda]PI propose ainsi un compromis biais-variance réglable qui peut permettre d'améliorer l'estimation de la fonction de valeur et la qualité de la politique obtenue. Dans un second temps, nous nous intéressons en détail au jeu de Tetris, une application sur laquelle se sont penchés plusieurs travaux de la littérature. Tetris est un problème difficile en raison de sa structure et de son grand espace d'états. Nous proposons pour la première fois une revue complète de la littérature qui regroupe des travaux d'apprentissage par renforcement, mais aussi des techniques de type évolutionnaire qui explorent directement l'espace des politiques et des algorithmes réglés à la main. Nous constatons que les approches d'apprentissage par renforcement sont à l'heure actuelle moins performantes sur ce problème que des techniques de recherche directe de la politique telles que la méthode d'entropie croisée (Szita et Lorincz, 2006). Nous expliquons enfin comment nous avons mis au point un joueur de Tetris qui dépasse les performances des meilleurs algorithmes connus jusqu'ici et avec lequel nous avons remporté l'épreuve de Tetris de la Reinforcement Learning Competition 2008 / This thesis studies policy iteration methods with linear approximation of the value function for large state space problems in the reinforcement learning context. We first introduce a unified algorithm that generalizes the main stochastic optimal control methods. We show the convergence of this unified algorithm to the optimal value function in the tabular case, and a performance bound in the approximate case when the value function is estimated. We then extend the literature of second-order linear approximation algorithms by proposing a generalization of Least-Squares Policy Iteration (LSPI) (Lagoudakis and Parr, 2003). Our new algorithm, Least-Squares [lambda] Policy Iteration (LS[lambda]PI), adds to LSPI an idea of [lambda]-Policy Iteration (Bertsekas and Ioffe, 1996): the damped (or optimistic) evaluation of the value function, which allows to reduce the variance of the estimation to improve the sampling efficiency. Thus, LS[lambda]PI offers a bias-variance trade-off that may improve the estimation of the value function and the performance of the policy obtained. In a second part, we study in depth the game of Tetris, a benchmark application that several works from the literature attempt to solve. Tetris is a difficult problem because of its structure and its large state space. We provide the first full review of the literature that includes reinforcement learning works, evolutionary methods that directly explore the policy space and handwritten controllers. We observe that reinforcement learning is less successful on this problem than direct policy search approaches such as the cross-entropy method (Szita et Lorincz, 2006). We finally show how we built a controller that outperforms the previously known best controllers, and shortly discuss how it allowed us to win the Tetris event of the 2008 Reinforcement Learning Competition Contrôle optimal stochastique Apprentissage par renforcement Programmation dynamique Processus Décisionnels de Markov Least-Squares Policy Iteration [lambda]-Policy Iteration Approximation de la fonction de valeur Tetris Méthode d'entropie croisée
92	Itération sur les Politiques Optimiste et Apprentissage du Jeu de Tetris Thiery, Christophe 25 November 2010 (has links) (PDF) Cette thèse s'intéresse aux méthodes d'itération sur les politiques dans l'apprentissage par renforcement à grand espace d'états avec approximation linéaire de la fonction de valeur. Nous proposons d'abord une unification des principaux algorithmes du contrôle optimal stochastique. Nous montrons la convergence de cette version unifiée vers la fonction de valeur optimale dans le cas tabulaire, ainsi qu'une garantie de performances dans le cas où la fonction de valeur est estimée de façon approximative. Nous étendons ensuite l'état de l'art des algorithmes d'approximation linéaire du second ordre en proposant une généralisation de Least-Squares Policy Iteration (LSPI) (Lagoudakis et Parr, 2003). Notre nouvel algorithme, Least-Squares λ Policy Iteration (LSλPI), ajoute à LSPI un concept venant de λ-Policy Iteration (Bertsekas et Ioffe, 1996) : l'évaluation amortie (ou optimiste) de la fonction de valeur, qui permet de réduire la variance de l'estimation afin d'améliorer l'efficacité de l'échantillonnage. LSλPI propose ainsi un compromis biais-variance réglable qui peut permettre d'améliorer l'estimation de la fonction de valeur et la qualité de la politique obtenue. Dans un second temps, nous nous intéressons en détail au jeu de Tetris, une application sur laquelle se sont penchés plusieurs travaux de la littérature. Tetris est un problème difficile en raison de sa structure et de son grand espace d'états. Nous proposons pour la première fois une revue complète de la littérature qui regroupe des travaux d'apprentissage par renforcement, mais aussi des techniques de type évolutionnaire qui explorent directement l'espace des politiques et des algorithmes réglés à la main. Nous constatons que les approches d'apprentissage par renforcement sont à l'heure actuelle moins performantes sur ce problème que des techniques de recherche directe de la politique telles que la méthode d'entropie croisée (Szita et Lőrincz, 2006). Nous expliquons enfin comment nous avons mis au point un joueur de Tetris qui dépasse les performances des meilleurs algorithmes connus jusqu'ici et avec lequel nous avons remporté l'épreuve de Tetris de la Reinforcement Learning Competition 2008. contrôle optimal stochastique apprentissage par renforcement programmation dynamique Processus Décisionnels de Markov Least-Squares Policy Iteration λ-Policy Iteration approximation de la fonction de valeur compromis biais-variance fonctions de base Tetris méthode d'entropie croisée
93	Computation of Acoustic Wave Propagation Under Water / Beräkning av akustisk vågutbredning under vatten Thörn, Frida January 2022 (has links) In this thesis we look at acoustic wave propagation under water. We look in particular at waves generated by a point source and what happens with the propagation when we model the bottom as flat or as curvilinear. We assume the source to be working at a certain frequency and therefore we model this problem by solving the Helmholtz equation. Since Helmholtz equation has some unwanted numerical properties we are interested in finding new numerical methods that could accelerate the solver. In this thesis we use the Waveholtz iteration, which solves Helmholtz equation by connecting it to the time-dependent wave equation. We use finite differences and the SBP-SAT method to approximate the spatial problem numerically and for modelling the sea bottom we use curvilinear coordinates. To compare the Waveholtz iteration we also solve Helmholtz equation with a naive solver. The naive solver consists of approximating the equation with finite differences and then solving the linear system of equation by some iterative solver, which for our tests will be GMRES. The results show that the Waveholtz iteration converges in less iterations than our naive solver. It also shows that the number of iterations stays unchanged when changing our discretization, which otherwise is a big problem for our naive solver. This allows us to increase the accuracy of our numerical solution without changing the computation time too much. We show that the number of iterations increases according to theory for an increasing frequency, and that for open problems we even see a smaller increase. For certain resonant frequencies in Helmholtz equation we do not expect the Waveholtz iteration to converge. In the neighbourhood of these frequencies the convergence becomes slow and we need many iterations for a solution of a certain accuracy. By reformulating the Waveholtz iteration as a Krylov solution we can see that resonances in Helmholtz equation have a smaller impact of the convergence. / I detta examensarbete undersöker vi akustisk vågutbredning i vatten. Vi kollar specifikt på vågor som genereras av en punktkälla och vad som sker när vi modellerar botten som plan eller som kurvlinjär. Då vi antar att punktkällan arbetar vid en bestämd frekvens, kommer vi modellera det fysikaliska problemet genom att lösa Helmholtz ekvation. Helmholtz ekvation har dock några numeriska egenskaper som är oönskade, och därför finns ett intresse av att hitta nya numeriska metoder som löser ekvationen. I detta examensarbete undersöker vi Waveholtz iteration, som löser Helmholtz ekvation genom att koppla den till den tidsberoende vågekvationen. Vi använder finita differenser och SBP-SAT metoden för att approximera det rumsliga problemet numeriskt. För att ge en detaljerad beskrivning av botten använder vi kurvlinjära koordinater. För att jämföra Waveholtz iterationen med något löser vi även Helmholtz med hjälp av en naiv lösare. Den naiva lösaren består av att approximera problemet med finita differenser och sedan lösa det linjära systemet rakt av med en iterativ lösare (vilket för våra fall kommer vara GMRES). Resultatet visar att Waveholtz iteration konvergerar på ett lägre antal iterationer än vår naiva lösare. Det visar även att antalet iterationer inte förändras när vi ändrar diskretisering, vilket annars är ett problem för vår naiva lösare. Detta innebär att vi kan få en högre noggrannhet utan att förlänga beräkningstiden alltför mycket. Vi visar även att antalet iterationer ökar som förväntat med en ökad frekvens, samt att för öppna problem så ökar antalet iteration mindre än enligt teorin. Vid vissa resonanta frekvenser i Helmholtz ekvation förväntar vi oss att Waveholtz iteration inte kommer konvergerar. I närheten av dessa frekvenser blir konvergensen långsam och vi behöver många iterationer för att lösa problemet. Genom att formulera Waveholtz iteration som en Krylov lösning kommer resonanser i Helmholtz ekvation ge en mindre negativ effekt på konvergensen än om den är formulerad som en fixpunkts iteration. Wave equation Helmholtz equation Waveholtz iteration finite difference method summation by parts energy method point source curvilinear coordinates vågekvationen helmholtz ekvation waveholtz iteration finita differens methoden energimetoden punktkälla kurvlinjära koordinater kroklinjiga koordinater SBP Computational Mathematics Beräkningsmatematik
94	Modifying Some Iterative Methods for Solving Quadratic Eigenvalue Problems Ali, Ali Hasan January 2017 (has links) No description available. Mathematics Applied Mathematics Quadratic Eigenvalue problem Matrix Polynomial Problem Nonlinear Eigenvalue Problem Newton Iteration Generalized Eigenvalue Problem Newton Maehly Method Newton Maehly Iteration Newton Correction QEP NLEP NLEVP MPP GEP
95	Traction Adaptive Motion Planning for Autonomous Racing / Tractionadaptiv rörelseplanering för autonom racing Raikar, Shekhar January 2022 (has links) Autonomous driving technology is continuously evolving at an accelerated pace. The road environment is always uncertain, which requires an evasive manoeuvre that an autonomous vehicle can take. This evasive behaviour to avoid accidents in a critical situation is analogous to autonomous racing that operates at the limits of stable vehicle handling. In autonomous racing, the vehicle must operate in highly nonlinear operating conditions such as high-speed manoeuvre on sharp turns, avoiding obstacles and slippery road conditions. These dynamically changing racing situations require advanced path planning systems with obstacle avoidance executed in real-time. Therefore, the motion planning problem for autonomous racing is analogous to safe and reliable autonomous vehicle operation in critical situations. This thesis project evaluates the application of traction adaptive motion planning to autonomous racing on different road surfaces for a small-scale test vehicle in real-time. The evaluation is based on a state-of-the-art algorithm that uses a combination of optimization, trajectory rollout, and constraint adaption framework called "Sampling Augmented Real-Time Iteration (SAARTI)". SAARTI allows motion planning and control with respect to time-varying vehicle actuation capabilities while taking locally adaptive traction into account for different parts of the track as a constraint. Initially, the SAARTI framework is adapted to work with the SmallVehicles-for-Autonomy (SVEA) system; then, the whole system is simulated in a ROS (Robot Operating System) based SVEA simulator with a Hardware-in-the-loop setup. Later, the same setup is used for the real time experiments that are carried out using the SVEA vehicles, and the different critical scenarios are tested on the SVEA vehicle. The emphasis was given to the experimental results; therefore, the results also consider computationally intensive localization inputs while the motion planner was implemented in real-time instead of a simulation setup. The experimental results showed the impact of planning motions according to an approximately correct friction estimate when the friction parameter was close to the actual value. The results indicated that the traction variation had indeed affected the lap time and trajectory taken by the test vehicle. The lap time is affected significantly when the coefficient of friction value is far away from the real friction coefficient. It is observed that the lap time increased significantly at higher values of friction coefficient, when involving more excessive over-estimation of the traction, leading to the oscillatory motion and lane exits. Furthermore, the non-adaptive case scenario result shows that the test vehicle performed better when given friction parameter inputs to the algorithm approximately equal to the real friction value. / Teknik för autonom körning har utvecklats i snabb takt de senaste åren. Trafikmiljön innehåller många källor till osäkerhet, vilket ibland kräver undanmanövrar av det autonoma fordonet. Undanmanövrar i kritiska situationer är analoga med autonom racing i det avseendet att fordonet opererar nära gränsen av dess fysiska förmåga. I autonom racing måste fordonet fungera i hög grad olinjära driftsförhållanden som höghastighetsmanöver i skarpa svängar, undvika hinder och halt väglag. Dessa dynamiska föränderliga racingsituationer kräver avancerad vägplaneringssystem med undvikande av hinder exekveras i realtid. Därför är rörelseplaneringsproblemet för autonom racing är analogt med det för säkra undanmanövrer i kritiska situationer. Detta examensarbete utvärderar tillämpningen av dragkraft adaptiv till autonom racing på olika väglag för ett småskaligt testfordon i realtid. Utvärderingen baseras på en algoritm som kallas "Sampling Augmented Real Time Iteration (SAARTI)" som tillåter rörelse planering och kontroll med avseende på tidsvarierande fordonsdynamik, på så vis tar algoritmen hänsyn till lokalt varierande väglag. Arbetet började med att integrera SAARTI-ramverket med testplattformen Small-Vehicles-for-Autonomy (SVEA). Därefter utfördes hardware-in-the-loop simuleringar i ROS (Robot Operating System), och därefter utfördes fysiska tester med SVEA plattformen. Under experimenten kördes SAARTI-algoritmen parallellt med en beräkningsintensiv SLAM-algoritm för lokalisering. De experimentella resultaten visade att adaptiv rörelseplanering kan avhjälpa problemet med lokalt varierande väglag, givet att den uppskattade friktionsparametern är approximativt korrekt. Varvtiden påverkas negativt när friktionsskattningen avviker från den verkliga friktionskoefficienten. Vidare observerades att varvtiden ökade vid höga värden på den skattade friktionsparametern, vilket gav upphov till mer aggressiva manövrer, vilket i sin tur gav upphov till oscillerande rörelser och avåkningar. Autonomous vehicles Motion planning and control Traction-adaptive Localization Robotics GPU Sampling Augmented Real Time Iteration. Autonoma fordon Rörelseplanering och kontroll Traction adaptiv Lokalisering Robotik Engineering and Technology Teknik och teknologier
96	Efficient algorithms for compressed sensing and matrix completion Wei, Ke January 2014 (has links) Compressed sensing and matrix completion are two new data acquisition techniques whose efficiency is achieved by exploring low dimensional structures in high dimensional data. Despite the combinatorial nature of compressed sensing and matrix completion, there has been significant development of computationally efficient algorithms which can produce accurate desired solutions to these problems. In this thesis, we are concerned with the development of low per iteration computational complexity algorithms for compressed sensing and matrix completion. First, we derive a locally optimal stepsize selection rule for the simplest iterative hard thresholding algorithm for matrix completion, and obtain a simple yet efficient algorithm. It is observed to have average case performance superior in some aspects to other matrix completion algorithms. To balance the fast convergence rates of more sophisticated recovery algorithms with the low per iteration computational cost of simple line-search algorithms, we introduce a family of conjugate gradient iterative hard thresholding algorithms for both compressed sensing and matrix completion. The theoretical results establish recovery guarantees for the restarted and projected variants of the algorithms, while the empirical performance comparisons establish significant computational advantages of the proposed methods over other hard thresholding algorithms. Finally, we introduce an alternating steepest descent method and a scaled variant especially designed for the matrix completion problem based on a simple factorization model of the low rank matrix. The computational efficacy of this method is achieved by reducing the high per iteration computational cost of the second order method and fully exploring the numerical linear algebra structure in the algorithm. Empirical evaluations establish the effectiveness of the proposed algorithms, compared with other state-of-the-art algorithms. 518
97	Formas ponderadas do Teorema de Euler e partições com raiz : estabelecendo um tratamento combinatório para certas identidades de Ramanujan Silva, Eduardo Alves da January 2018 (has links) O artigo Weighted forms of Euler's theorem de William Y.C. Chen e Kathy Q. Ji, em resposta ao questionamento de George E. Andrews, matemático estadunidense, sobre encontrar demonstrações combinatórias de duas identidades no Caderno Perdido de Ramanujan, nos mostra algumas formas ponderadas do Teorema de Euler sobre partições com partes ímpares e partes distintas via a introdução do conceito de partição com raiz. A propositura deste trabalho é envolta à apresentação de resultados sobre partições com raiz de modo a posteriormente realizar formulações combinatórias das identidades de Ramanujan por meio deste conceito, procurando estabelecer conexões com formas ponderadas do Teorema de Euler. Em particular, a bijeção de Sylvester e a iteração de Pak da função de Dyson são elementos primordiais para obtê-las. / The article Weighted forms of Euler's theorem by William Y.C. Chen and Kathy Q. Ji in response to the questioning of George E. Andrews, American mathematician, about nding combinatorial proofs for two identities in Ramanujan's Lost Notebook shows us some weighted forms of Euler's Theorem on partitions with odd parts and distinct parts through the introduction of the concept of rooted partition. The purpose of this work involves the presentation of results on rooted partitions in order to make combinatorial formulations of Ramanujan's identities, seeking to establish connections with weighted forms of Euler's Theorem. In particular, the Sylvester's bijection and the Pak's iteration of the Dyson's map are primordial elements to obtain them. Teorema de Euler Partições Iteração de funções Integer partitions Rooted partitions Pak's iteration of the Dyson's map Sylvester's bijection Ramanujan's identities Euler's theorem
98	Inner-outer iterative methods for eigenvalue problems : convergence and preconditioning Freitag, Melina January 2007 (has links) Many methods for computing eigenvalues of a large sparse matrix involve shift-invert transformations which require the solution of a shifted linear system at each step. This thesis deals with shift-invert iterative techniques for solving eigenvalue problems where the arising linear systems are solved inexactly using a second iterative technique. This approach leads to an inner-outer type algorithm. We provide convergence results for the outer iterative eigenvalue computation as well as techniques for efficient inner solves. In particular eigenvalue computations using inexact inverse iteration, the Jacobi-Davidson method without subspace expansion and the shift-invert Arnoldi method as a subspace method are investigated in detail. A general convergence result for inexact inverse iteration for the non-Hermitian generalised eigenvalue problem is given, using only minimal assumptions. This convergence result is obtained in two different ways; on the one hand, we use an equivalence result between inexact inverse iteration applied to the generalised eigenproblem and modified Newton's method; on the other hand, a splitting method is used which generalises the idea of orthogonal decomposition. Both approaches also include an analysis for the convergence theory of a version of inexact Jacobi-Davidson method, where equivalences between Newton's method, inverse iteration and the Jacobi-Davidson method are exploited. To improve the efficiency of the inner iterative solves we introduce a new tuning strategy which can be applied to any standard preconditioner. We give a detailed analysis on this new preconditioning idea and show how the number of iterations for the inner iterative method and hence the total number of iterations can be reduced significantly by the application of this tuning strategy. The analysis of the tuned preconditioner is carried out for both Hermitian and non-Hermitian eigenproblems. We show how the preconditioner can be implemented efficiently and illustrate its performance using various numerical examples. An equivalence result between the preconditioned simplified Jacobi-Davidson method and inexact inverse iteration with the tuned preconditioner is given. Finally, we discuss the shift-invert Arnoldi method both in the standard and restarted fashion. First, existing relaxation strategies for the outer iterative solves are extended to implicitly restarted Arnoldi's method. Second, we apply the idea of tuning the preconditioner to the inner iterative solve. As for inexact inverse iteration the tuned preconditioner for inexact Arnoldi's method is shown to provide significant savings in the number of inner solves. The theory in this thesis is supported by many numerical examples. 519
99	In-Between Brands : Exploring the Essence of Brand Portfolio Management Filipsson, Daniel January 2008 (has links) <p>During the past two decades research has shown that brands are among a company’s most valuable assets. However, in today’s competitive landscape, it is not enough to just create strong brands. The focus lies rather in managing a range of brand lever-age strategies within complex brand portfolios. Moreover, the majority of today’s established brand concepts do not represent the reality of contemporary brand man-agement. Instead, they tend to be based on dichotomies and simplifications. In addi-tion, there is a lack of criticism towards many of the established brand concepts resulting in the reduction of brand management to a number of static categories and stagnated definitions – thereby missing out on the analysis of important intersec-tional issues between the various categories. This book explores the somewhat for-gotten area of intersection, investigating the territory in-between brands.</p><p>The methods used consist of a literature review covering some of the most influ-ential brand models within the area of brand portfolio and brand leverage as well as an empirical case study including the following seven brands: Adidas, Bang & Oluf-sen, Electrolux, H&M, Microsoft, Peak Performance and W. L. Gore & Associates.</p><p>The findings show that conventional brand management models and terminology do not fully explain common marketplace strategies and practice. As a result, this research introduces a more realistic viewpoint and dynamic framework that is based on convergence and that allows migration and iteration rather than today’s static approach. The framework, named the brand leverage palette, introduces various nuances between different leverage strategies, both adding clarity and offering guid-ance by explaining different migration movements among today’s brand portfolios.</p> brand portfolio management brand leverage brand extension co-branding ingredient branding in-between brands migration and iteration. Business studies Företagsekonomi
100	In-Between Brands : Exploring the Essence of Brand Portfolio Management Filipsson, Daniel January 2008 (has links) During the past two decades research has shown that brands are among a company’s most valuable assets. However, in today’s competitive landscape, it is not enough to just create strong brands. The focus lies rather in managing a range of brand lever-age strategies within complex brand portfolios. Moreover, the majority of today’s established brand concepts do not represent the reality of contemporary brand man-agement. Instead, they tend to be based on dichotomies and simplifications. In addi-tion, there is a lack of criticism towards many of the established brand concepts resulting in the reduction of brand management to a number of static categories and stagnated definitions – thereby missing out on the analysis of important intersec-tional issues between the various categories. This book explores the somewhat for-gotten area of intersection, investigating the territory in-between brands. The methods used consist of a literature review covering some of the most influ-ential brand models within the area of brand portfolio and brand leverage as well as an empirical case study including the following seven brands: Adidas, Bang & Oluf-sen, Electrolux, H&M, Microsoft, Peak Performance and W. L. Gore & Associates. The findings show that conventional brand management models and terminology do not fully explain common marketplace strategies and practice. As a result, this research introduces a more realistic viewpoint and dynamic framework that is based on convergence and that allows migration and iteration rather than today’s static approach. The framework, named the brand leverage palette, introduces various nuances between different leverage strategies, both adding clarity and offering guid-ance by explaining different migration movements among today’s brand portfolios. brand portfolio management brand leverage brand extension co-branding ingredient branding in-between brands migration and iteration. Business studies Företagsekonomi

Search results