1 |
On impact of mixing times in continual reinforcement learningRaparthy, Sharath Chandra 02 1900 (has links)
Le temps de mélange de la chaîne de Markov induite par une politique limite ses performances dans les scénarios réels d'apprentissage continu. Pourtant, l'effet des temps de mélange sur l'apprentissage dans l'apprentissage par renforcement (RL) continu reste peu exploré. Dans cet article, nous caractérisons des problèmes qui sont d'un intérêt à long terme pour le développement de l'apprentissage continu, que nous appelons processus de décision markoviens (MDP) « extensibles » (scalable), à travers le prisme des temps de mélange. En particulier, nous établissons théoriquement que les MDP extensibles ont des temps de mélange qui varient de façon polynomiale avec la taille du problème. Nous démontrons ensuite que les temps de mélange polynomiaux présentent des difficultés importantes pour les approches existantes, qui souffrent d'un biais myope et d'estimations à base de ré-échantillonnage avec remise ensembliste (bootstrapping) périmées. Pour valider notre théorie, nous étudions la complexité des temps de mélange en fonction du nombre de tâches et de la durée des tâches pour des politiques très performantes déployées sur plusieurs jeux Atari. Notre analyse démontre à la fois que des temps de mélange polynomiaux apparaissent en pratique et que leur existence peut conduire à un comportement d'apprentissage instable, comme l'oubli catastrophique dans des contextes d'apprentissage continu. / The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mixing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we theoretically establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches, which suffer from myopic bias and stale bootstrapped estimates. To validate our theory, we study the empirical scaling behavior of mixing times with respect to the number of tasks and task duration for high performing policies deployed across multiple Atari games. Our analysis demonstrates both that polynomial mixing times do emerge in practice and how their existence may lead to unstable learning behavior like catastrophic forgetting in continual learning settings.
|
2 |
Markov chains at the interface of combinatorics, computing, and statistical physicsStreib, Amanda Pascoe 22 March 2012 (has links)
The fields of statistical physics, discrete probability, combinatorics, and theoretical computer science have converged around efforts to understand random structures and algorithms. Recent activity in the interface of these fields has enabled tremendous breakthroughs in each domain and has supplied a new set of techniques for researchers approaching related problems. This thesis makes progress on several problems in this interface whose solutions all build on insights from multiple disciplinary perspectives.
First, we consider a dynamic growth process arising in the context of DNA-based self-assembly. The assembly process can be modeled as a simple Markov chain. We prove that the chain is rapidly mixing for large enough bias in regions of Z^d. The proof uses a geometric distance function and a variant of path coupling in order to handle distances that can be exponentially large. We also provide the first results in the case of fluctuating bias, where the bias can vary depending on the location of the tile, which arises in the nanotechnology application. Moreover, we use intuition from statistical physics to construct a choice of the biases for which the Markov chain M_mon requires exponential time to converge.
Second, we consider a related problem regarding the convergence rate of biased permutations that arises in the context of self-organizing lists. The Markov chain M_nn in this case is a nearest-neighbor chain that allows adjacent transpositions, and the rate of these exchanges is governed by various input parameters. It was conjectured that the chain is
always rapidly mixing when the inversion probabilities are positively biased, i.e., we put nearest neighbor pair x<y
in order with bias 1/2 <= p_{xy} <= 1 and out of order with bias
1-p_{xy}. The Markov chain M_mon was known to have connections to a simplified version of this biased card-shuffling. We provide new connections between M_nn and M_mon by using simple combinatorial bijections, and we prove that M_nn is always rapidly mixing for two general classes of positively biased {p_{xy}}. More significantly, we also prove that the general conjecture is false by exhibiting values for the p_{xy}, with
1/2 <= p_{xy} <= 1 for all x< y, but for which the transposition chain will require exponential time to converge.
Finally, we consider a model of colloids, which are binary mixtures of molecules with one type of molecule suspended in another. It is believed that at low density typical configurations will be well-mixed throughout, while at high density they will separate into clusters. This clustering has proved elusive to verify, since all local sampling algorithms are known to be inefficient at high density, and in fact a new nonlocal algorithm was recently shown to require exponential time in some cases.
We characterize the high and low density phases for a general family of discrete {it interfering binary mixtures} by showing that they exhibit a "clustering property' at high density and not at low density. The clustering property states that
there will be a region that has very high area, very small perimeter, and high density of one type of molecule. Special cases of interfering binary mixtures include the Ising model at fixed magnetization and independent sets.
|
3 |
[en] MIXING TIMES FOR RANDOM WALKS ON THE SYMMETRIC GROUP / [pt] TEMPOS DE MISTURA PARA PASSEIOS ALEATÓRIOS NO GRUPO SIMÉTRICORODRIGO MARINHO DE SOUZA 28 February 2018 (has links)
[pt] O objetivo desta dissertação é apresentar algumas técnicas e ferramentas para a obtenção de cotas superiores e inferiores para tempos de mistura de cadeias de Markov. Para que isso se torne mais interessante, apresentaremos estes conceitos através de cadeias de Markov que atuam sobre o grupo
simétrico, que podem ser vistas como embaralhamentos de cartas. Ademais, usaremos um destes embaralhamentos como toy model para o processo de exclusão simples simétrico, o que nos ajudará a determinar os tempos de mistura do embaralhamento e do famoso sistema de partículas. / [en] The aim of this dissertation is to introduce some techniques and tools to obtain upper and lower bounds for Markov chains mixing times. To make it more interesting, we introduce these concepts through Markov chains that act on the symmetric group, which can be seen as card shuffles. Furthermore, we use one of these shuffles as a toy model for the symmetric simple exclusion process, which helps us to determine mixing times for the shuffle and for the famous particle system.
|
4 |
Concentration et compression sur alphabets infinis, temps de mélange de marches aléatoires sur des graphes aléatoires / Concentration and compression over infinite alphabets, mixing times of random walks on random graphsBen-Hamou, Anna 15 September 2016 (has links)
Ce document rassemble les travaux effectués durant mes années de thèse. Je commence par une présentation concise des résultats principaux, puis viennent trois parties relativement indépendantes.Dans la première partie, je considère des problèmes d'inférence statistique sur un échantillon i.i.d. issu d'une loi inconnue à support dénombrable. Le premier chapitre est consacré aux propriétés de concentration du profil de l'échantillon et de la masse manquante. Il s'agit d'un travail commun avec Stéphane Boucheron et Mesrob Ohannessian. Après avoir obtenu des bornes sur les variances, nous établissons des inégalités de concentration de type Bernstein, et exhibons un vaste domaine de lois pour lesquelles le facteur de variance dans ces inégalités est tendu. Le deuxième chapitre présente un travail en cours avec Stéphane Boucheron et Elisabeth Gassiat, concernant le problème de la compression universelle adaptative d'un tel échantillon. Nous établissons des bornes sur la redondance minimax des classes enveloppes, et construisons un code quasi-adaptatif sur la collection des classes définies par une enveloppe à variation régulière. Dans la deuxième partie, je m'intéresse à des marches aléatoires sur des graphes aléatoires à degrés precrits. Je présente d'abord un résultat obtenu avec Justin Salez, établissant le phénomène de cutoff pour la marche sans rebroussement. Sous certaines hypothèses sur les degrés, nous déterminons précisément le temps de mélange, la fenêtre du cutoff, et montrons que le profil de la distance à l'équilibre converge vers la fonction de queue gaussienne. Puis je m'intéresse à la comparaison des temps de mélange de la marche simple et de la marche sans rebroussement. Enfin, la troisième partie est consacrée aux propriétés de concentration de tirages pondérés sans remise et correspond à un travail commun avec Yuval Peres et Justin Salez. / This document presents the problems I have been interested in during my PhD thesis. I begin with a concise presentation of the main results, followed by three relatively independent parts. In the first part, I consider statistical inference problems on an i.i.d. sample from an unknown distribution over a countable alphabet. The first chapter is devoted to the concentration properties of the sample's profile and of the missing mass. This is a joint work with Stéphane Boucheron and Mesrob Ohannessian. After obtaining bounds on variances, we establish Bernstein-type concentration inequalities and exhibit a vast domain of sampling distributions for which the variance factor in these inequalities is tight. The second chapter presents a work in progress with Stéphane Boucheron and Elisabeth Gassiat, on the problem of universal adaptive compression over countable alphabets. We give bounds on the minimax redundancy of envelope classes, and construct a quasi-adaptive code on the collection of classes defined by a regularly varying envelope. In the second part, I consider random walks on random graphs with prescribed degrees. I first present a result obtained with Justin Salez, establishing the cutoff phenomenon for non-backtracking random walks. Under certain degree assumptions, we precisely determine the mixing time, the cutoff window, and show that the profile of the distance to equilibrium converges to the Gaussian tail function. Then I consider the problem of comparing the mixing times of the simple and non-backtracking random walks. The third part is devoted to the concentration properties of weighted sampling without replacement and corresponds to a joint work with Yuval Peres and Justin Salez.
|
5 |
Management of technology in the process industries: Matching market and machineSamuelsson, Peter January 2017 (has links)
The process industries span multiple industrial sectors and constitute a substantial part of the entire manufacturing industry. Since companies belonging to this family of industries are often very asset intensive, their ability to respond to changes is often limited in the short term. The adaptation of the capabilities of existing processes, and conversely finding products and market segments to match the production system capabilities, are an important part of product- and market development activities in the process industry. The importance to companies in the process industry of having a well-articulated manufacturing strategy congruent with the business strategy is second to none. However, to facilitate manufacturing strategy developments, it is essential to start with an improved characterization and understanding of the material transformation system. To that end an extensive set of variables was developed and related measures and scales were defined. The resulting configuration model, focusing on company generic process capabilities in the process industries, is to be regarded as a conceptual taxonomy and as a proposition available for further testing. The usability of the model was subsequently assessed using “mini-cases” in the forestry industry, where the respondents confirmed that the company’s overall strategy could benefit from this kind of platform as a possible avenue to follow. The model was deployed as an instrument in the profiling of company material transformation systems to facilitate the further development of companies' functional and business strategies. The use of company-generic production capabilities was studied in three case companies representing the mineral, food and steel industries. The model was found by the respondents to be usable as a knowledge platform to develop production strategies. In the final analysis of the research results, a new concept emerged called “production capability configuration": A process-industrial company’s alignment of its generic production capabilities in the areas of raw materials, process technology and products to improve the consistency among the variable elements that define operations and improve the congruence between operations and its environment. From the perspective of value creation and capture, firms must be able to manufacture products in a competitive cost structure within the framework of a proper business model. By using the configuration model, the relationship between manufacturing and innovation activities has been studied in the previously mentioned three case studies. In many cases the gap in capability appears as a limitation in the production system, requiring development efforts and sometimes investments to overcome. This is illustrated with two examples from the steel industry, where development efforts of the production system capabilities are initiated to better match the market demands. One example is the increase the volume- and product flexibility of an existing stainless steel melt shop, resulting in a proposed oblong Argon Oxygen Decarburisation (AOD) converter configuration that was subsequently verified using water modelling. The second example is from a carbon steel mill, where the target was to increase the raw material- and volume flexibility of another melt shop, by modifying the capabilities of the Electric Arc Furnace (EAF). Enabling EAF technologies are further described and evaluated using operational data and engineering type of estimates. / <p>QC 20170116</p>
|
Page generated in 0.0874 seconds