Global ETD Search

1	On impact of mixing times in continual reinforcement learning Raparthy, Sharath Chandra 02 1900 (has links) Le temps de mélange de la chaîne de Markov induite par une politique limite ses performances dans les scénarios réels d'apprentissage continu. Pourtant, l'effet des temps de mélange sur l'apprentissage dans l'apprentissage par renforcement (RL) continu reste peu exploré. Dans cet article, nous caractérisons des problèmes qui sont d'un intérêt à long terme pour le développement de l'apprentissage continu, que nous appelons processus de décision markoviens (MDP) « extensibles » (scalable), à travers le prisme des temps de mélange. En particulier, nous établissons théoriquement que les MDP extensibles ont des temps de mélange qui varient de façon polynomiale avec la taille du problème. Nous démontrons ensuite que les temps de mélange polynomiaux présentent des difficultés importantes pour les approches existantes, qui souffrent d'un biais myope et d'estimations à base de ré-échantillonnage avec remise ensembliste (bootstrapping) périmées. Pour valider notre théorie, nous étudions la complexité des temps de mélange en fonction du nombre de tâches et de la durée des tâches pour des politiques très performantes déployées sur plusieurs jeux Atari. Notre analyse démontre à la fois que des temps de mélange polynomiaux apparaissent en pratique et que leur existence peut conduire à un comportement d'apprentissage instable, comme l'oubli catastrophique dans des contextes d'apprentissage continu. / The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mixing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we theoretically establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches, which suffer from myopic bias and stale bootstrapped estimates. To validate our theory, we study the empirical scaling behavior of mixing times with respect to the number of tasks and task duration for high performing policies deployed across multiple Atari games. Our analysis demonstrates both that polynomial mixing times do emerge in practice and how their existence may lead to unstable learning behavior like catastrophic forgetting in continual learning settings. Reinforcement Learning Continual Learning Mixing Times Apprentissage par Renforcement Apprentissage Continuel Temps de Mélange
2	[en] MIXING TIMES FOR RANDOM WALKS ON THE SYMMETRIC GROUP / [pt] TEMPOS DE MISTURA PARA PASSEIOS ALEATÓRIOS NO GRUPO SIMÉTRICO RODRIGO MARINHO DE SOUZA 28 February 2018 (has links) [pt] O objetivo desta dissertação é apresentar algumas técnicas e ferramentas para a obtenção de cotas superiores e inferiores para tempos de mistura de cadeias de Markov. Para que isso se torne mais interessante, apresentaremos estes conceitos através de cadeias de Markov que atuam sobre o grupo simétrico, que podem ser vistas como embaralhamentos de cartas. Ademais, usaremos um destes embaralhamentos como toy model para o processo de exclusão simples simétrico, o que nos ajudará a determinar os tempos de mistura do embaralhamento e do famoso sistema de partículas. / [en] The aim of this dissertation is to introduce some techniques and tools to obtain upper and lower bounds for Markov chains mixing times. To make it more interesting, we introduce these concepts through Markov chains that act on the symmetric group, which can be seen as card shuffles. Furthermore, we use one of these shuffles as a toy model for the symmetric simple exclusion process, which helps us to determine mixing times for the shuffle and for the famous particle system. [pt] CADEIAS DE MARKOV [en] MARKOV CHAINS [pt] EMBARALHAMENTOS [en] SHUFFLES [pt] TEMPOS DE MISTURA [en] MIXING TIMES [pt] CUTOFFS [en] CUTOFFS [pt] PROCESSO DE EXCLUSAO SIMPLES [en] SIMPLE EXCLUSION PROCESS
3	Concentration et compression sur alphabets infinis, temps de mélange de marches aléatoires sur des graphes aléatoires / Concentration and compression over infinite alphabets, mixing times of random walks on random graphs Ben-Hamou, Anna 15 September 2016 (has links) Ce document rassemble les travaux effectués durant mes années de thèse. Je commence par une présentation concise des résultats principaux, puis viennent trois parties relativement indépendantes.Dans la première partie, je considère des problèmes d'inférence statistique sur un échantillon i.i.d. issu d'une loi inconnue à support dénombrable. Le premier chapitre est consacré aux propriétés de concentration du profil de l'échantillon et de la masse manquante. Il s'agit d'un travail commun avec Stéphane Boucheron et Mesrob Ohannessian. Après avoir obtenu des bornes sur les variances, nous établissons des inégalités de concentration de type Bernstein, et exhibons un vaste domaine de lois pour lesquelles le facteur de variance dans ces inégalités est tendu. Le deuxième chapitre présente un travail en cours avec Stéphane Boucheron et Elisabeth Gassiat, concernant le problème de la compression universelle adaptative d'un tel échantillon. Nous établissons des bornes sur la redondance minimax des classes enveloppes, et construisons un code quasi-adaptatif sur la collection des classes définies par une enveloppe à variation régulière. Dans la deuxième partie, je m'intéresse à des marches aléatoires sur des graphes aléatoires à degrés precrits. Je présente d'abord un résultat obtenu avec Justin Salez, établissant le phénomène de cutoff pour la marche sans rebroussement. Sous certaines hypothèses sur les degrés, nous déterminons précisément le temps de mélange, la fenêtre du cutoff, et montrons que le profil de la distance à l'équilibre converge vers la fonction de queue gaussienne. Puis je m'intéresse à la comparaison des temps de mélange de la marche simple et de la marche sans rebroussement. Enfin, la troisième partie est consacrée aux propriétés de concentration de tirages pondérés sans remise et correspond à un travail commun avec Yuval Peres et Justin Salez. / This document presents the problems I have been interested in during my PhD thesis. I begin with a concise presentation of the main results, followed by three relatively independent parts. In the first part, I consider statistical inference problems on an i.i.d. sample from an unknown distribution over a countable alphabet. The first chapter is devoted to the concentration properties of the sample's profile and of the missing mass. This is a joint work with Stéphane Boucheron and Mesrob Ohannessian. After obtaining bounds on variances, we establish Bernstein-type concentration inequalities and exhibit a vast domain of sampling distributions for which the variance factor in these inequalities is tight. The second chapter presents a work in progress with Stéphane Boucheron and Elisabeth Gassiat, on the problem of universal adaptive compression over countable alphabets. We give bounds on the minimax redundancy of envelope classes, and construct a quasi-adaptive code on the collection of classes defined by a regularly varying envelope. In the second part, I consider random walks on random graphs with prescribed degrees. I first present a result obtained with Justin Salez, establishing the cutoff phenomenon for non-backtracking random walks. Under certain degree assumptions, we precisely determine the mixing time, the cutoff window, and show that the profile of the distance to equilibrium converges to the Gaussian tail function. Then I consider the problem of comparing the mixing times of the simple and non-backtracking random walks. The third part is devoted to the concentration properties of weighted sampling without replacement and corresponds to a joint work with Yuval Peres and Justin Salez. Inégalités de concentration Masse manquante Codage adaptatif Temps de mélanges Phénomène de cutoff Tirages sans remplacement Concentration inequalities Missing mass Adaptive coding Mixing times Cutoff phenomenon Sampling without replacement
4	Management of technology in the process industries: Matching market and machine Samuelsson, Peter January 2017 (has links) The process industries span multiple industrial sectors and constitute a substantial part of the entire manufacturing industry. Since companies belonging to this family of industries are often very asset intensive, their ability to respond to changes is often limited in the short term. The adaptation of the capabilities of existing processes, and conversely finding products and market segments to match the production system capabilities, are an important part of product- and market development activities in the process industry. The importance to companies in the process industry of having a well-articulated manufacturing strategy congruent with the business strategy is second to none. However, to facilitate manufacturing strategy developments, it is essential to start with an improved characterization and understanding of the material transformation system. To that end an extensive set of variables was developed and related measures and scales were defined. The resulting configuration model, focusing on company generic process capabilities in the process industries, is to be regarded as a conceptual taxonomy and as a proposition available for further testing. The usability of the model was subsequently assessed using “mini-cases” in the forestry industry, where the respondents confirmed that the company’s overall strategy could benefit from this kind of platform as a possible avenue to follow. The model was deployed as an instrument in the profiling of company material transformation systems to facilitate the further development of companies' functional and business strategies. The use of company-generic production capabilities was studied in three case companies representing the mineral, food and steel industries. The model was found by the respondents to be usable as a knowledge platform to develop production strategies. In the final analysis of the research results, a new concept emerged called “production capability configuration": A process-industrial company’s alignment of its generic production capabilities in the areas of raw materials, process technology and products to improve the consistency among the variable elements that define operations and improve the congruence between operations and its environment. From the perspective of value creation and capture, firms must be able to manufacture products in a competitive cost structure within the framework of a proper business model. By using the configuration model, the relationship between manufacturing and innovation activities has been studied in the previously mentioned three case studies. In many cases the gap in capability appears as a limitation in the production system, requiring development efforts and sometimes investments to overcome. This is illustrated with two examples from the steel industry, where development efforts of the production system capabilities are initiated to better match the market demands. One example is the increase the volume- and product flexibility of an existing stainless steel melt shop, resulting in a proposed oblong Argon Oxygen Decarburisation (AOD) converter configuration that was subsequently verified using water modelling. The second example is from a carbon steel mill, where the target was to increase the raw material- and volume flexibility of another melt shop, by modifying the capabilities of the Electric Arc Furnace (EAF). Enabling EAF technologies are further described and evaluated using operational data and engineering type of estimates. / <p>QC 20170116</p> process industries manufacturing innovation configuration strategizing capability configuration modelling process industries production system physical modelling AOD converter geometry injection mixing times electric arc furnace EAF post-combustion high-impedance network disturbances operational results

1

Page generated in 0.0378 seconds