Global ETD Search

1	Planification optimiste pour systèmes déterministes / Optimistic planning for deterministic dystems Hren, Jean-François 21 June 2012 (has links) Dans le domaine de l'apprentissage par renforcement, la planification dans le cas de systèmes déterministes consiste à effectuer une recherche avant grâce à un modèle génératif du système considéré et ce pour trouver l'action à appliquer dans son état courant. Dans notre cas, cette recherche avant conduira à la construction d'un arbre des possibilités, sa racine correspondant à l'état courant du système. Dans le cas où les ressources computationnelles sont limitées et inconnues, il convient d'utiliser un algorithme cherchant à minimiser son regret. Autrement dit, un algorithme retournant une action à effectuer qui soit la plus proche possible de l'optimale en terme de qualité et en fonction des ressources computationnelles. Nous présentons l'algorithme de planification optimiste dans le cas où l'espace d'action est discret. Nous prouvons une borne inférieure et supérieure sur son regret dans le pire des cas ainsi que dans une classe particulière de problèmes. Nous présentons ensuite deux autres algorithmes inspirés de l'approche optimiste dans le cas où l'espace d'action est continu. / In the field of reinforcement learning, planning in the case of deterministic systems consists of doing a forward search using a generative model of the system so as to find the action to apply in its current state. In our case, the forward search leads us to build a look-ahead tree, its root being the current state of the system. If the computational resources are limited and unknown, we have to use an algorithm which tries to minimize its regret. In other words, an algorithm returning an action to apply which is as close as possible to the optimal one in term of quality and with respect to the computational resources used. We present the optimistic planing algorithm in the case of a discrete action space. We prove a lower and upper bound in the worst case and in a particular class of problems. Also we present two algorithms using the optimistic approach but in the case of a continuous action space. Apprentissage par renforcement Bornes supérieure et inférieure Minimisation du regret 006.31
2	Analyse de stratégies bayésiennes et fréquentistes pour l'allocation séquentielle de ressources / Analysis of bayesian and frequentist strategies for sequential resource allocation Kaufmann, Emilie 01 October 2014 (has links) Dans cette thèse, nous étudions des stratégies d’allocation séquentielle de ressources. Le modèle statistique adopté dans ce cadre est celui du bandit stochastique à plusieurs bras. Dans ce modèle, lorsqu’un agent tire un bras du bandit, il reçoit pour récompense une réalisation d’une distribution de probabilité associée au bras. Nous nous intéressons à deux problèmes de bandit différents : la maximisation de la somme des récompenses et l’identification des meilleurs bras (où l’agent cherche à identifier le ou les bras conduisant à la meilleure récompense moyenne, sans subir de perte lorsqu’il tire un «mauvais» bras). Nous nous attachons à proposer pour ces deux objectifs des stratégies de tirage des bras, aussi appelées algorithmes de bandit, que l’on peut qualifier d’optimales. La maximisation des récompenses est équivalente à la minimisation d’une quantité appelée regret. Grâce à une borne inférieure asymptotique sur le regret d’une stratégie uniformément efficace établie par Lai et Robbins, on peut définir la notion d’algorithme asymptotiquement optimal comme un algorithme dont le regret atteint cette borne inférieure. Dans cette thèse, nous proposons pour deux algorithmes d’inspiration bayésienne, Bayes-UCB et Thompson Sampling, une analyse à temps fini dans le cadre des modèles de bandit à récompenses binaires, c’est-à-dire une majoration non asymptotique de leur regret. Cette majoration permetd’établir l’optimalité asymptotique des deux algorithmes. Dans le cadre de l’identification des meilleurs bras, on peut chercher à déterminer le nombre total d’échantillons des bras nécessaires pour identifier, avec forte probabilité, le ou les meilleurs bras, sans la contrainte de maximiser la somme des observations. Nous définissons deux termes de complexité pour l’identification des meilleurs bras dans deux cadres considérés dans la littérature, qui correspondent à un budget fixé ou à un niveau de confiance fixé. Nous proposons de nouvelles bornes inférieures sur ces complexités, et nous analysons de nouveaux algorithmes, dont certains atteignent les bornes inférieures dans des cas particuliers de modèles de bandit à deux bras, et peuvent donc être qualifiés d’optimaux. / In this thesis, we study strategies for sequential resource allocation, under the so-called stochastic multi-armed bandit model. In this model, when an agent draws an arm, he receives as a reward a realization from a probability distribution associated to the arm. In this document, we consider two different bandit problems. In the reward maximization objective, the agent aims at maximizing the sum of rewards obtained during his interaction with the bandit, whereas in the best arm identification objective, his goal is to find the set of m best arms (i.e. arms with highest mean reward), without suffering a loss when drawing ‘bad’ arms. For these two objectives, we propose strategies, also called bandit algorithms, that are optimal (or close to optimal), in a sense precised below. Maximizing the sum of rewards is equivalent to minimizing a quantity called regret. Thanks to an asymptotic lower bound on the regret of any uniformly efficient algorithm given by Lai and Robbins, one can define asymptotically optimal algorithms as algorithms whose regret reaches this lower bound. In this thesis, we propose, for two Bayesian algorithms, Bayes-UCB and Thompson Sampling, a finite-time analysis, that is a non-asymptotic upper bound on their regret, in the particular case of bandits with binary rewards. This upper bound allows to establish the asymptotic optimality of both algorithms. In the best arm identification framework, a possible goal is to determine the number of samples of the armsneeded to identify, with high probability, the set of m best arms. We define a notion of complexity for best arm identification in two different settings considered in the literature: the fixed-budget and fixed-confidence settings. We provide new lower bounds on these complexity terms and we analyse new algorithms, some of which reach the lower bound in particular cases of two-armed bandit models and are therefore optimal Statistiques Bayesien Modèle de bandit Apprentissage Minimisation du regret Identification des meilleurs bras Statistics Bayesian Bandit models Learning Regret minimization Best arm identification
3	Prédiction de structure tridimensionnelle de molécules d’ARN par minimisation de regret / Prediction of three-dimensional structure of RNA molecules by regret minimization Boudard, Mélanie 29 April 2016 (has links) Les fonctions d'une molécule d'ARN dans les processus cellulaires sont très étroitement liées à sa structure tridimensionnelle. Il est donc essentiel de pouvoir prédire cette structure pour étudier sa fonction. Le repliement de l'ARN peut être vu comme un processus en deux étapes : le repliement en structure secondaire, grâce à des interactions fortes, puis le repliement en structure tridimensionnelle par des interactions tertiaires. Prédire la structure secondaire a donné lieu à de nombreuses avancées depuis plus de trente ans. Toutefois, la prédiction de la structure tridimensionnelle est un problème bien plus difficile. Nous nous intéressons ici au problème de prédiction de la structure 3D d'ARN sous la forme d'un jeu. Nous représentons la structure secondaire de l'ARN comme un graphe : cela correspond à une modélisation à gros grain de cette structure. Cette modélisation permet de réaliser un jeu de repliement dans l'espace. Notre hypothèse consiste à voir la structure 3D comme un équilibre en théorie des jeux. Pour atteindre cet équilibre, nous utiliserons des algorithmes de minimisation de regret. Nous étudierons aussi différentes formalisations du jeu, basées sur des statistiques biologiques. L'objectif de ce travail est de développer une méthode de repliement d'ARN fonctionnant sur tous les types de molécule d'ARN et obtenant des structures similaires aux molécules réelles. Notre méthode, nommée GARN, a atteint les objectifs attendus et nous a permis d'approfondir l'impact de certains paramètres pour la prédiction de structure à gros grain des molécules. / The functions of RNA molecules in cellular processes are related very closely to its three dimensional structure. It is thus essential to predict the structure for understanding RNA functions. This folding can be seen as a two-step process: the formation of a secondary structure and the formation of three-dimensional structure. This first step is the results of strong interactions between nucleotides, and the second one is obtain by the tertiary interactions. Predicting the secondary structure is well-known and results in numerous advances since thirty years. However, predicting the three-dimensional structure is a more difficult problem due to the high number of possibility. To overcome this problem, we decided to see the folding of the RNA structure as a game. The secondary structure of the RNA is represented as a graph: its corresponds to a coarse-grained modeling of this structure. This modeling allows us to fold the RNA molecule in a discrete space. Our hypothesis is to understand the 3D structure like an equilibrium in game theory. To find this equilibrium, we will use regret minimization algorithms. We also study different formalizations of the game, based on biological statistics. The objective of this work is to develop a method of RNA folding which will work on all types of secondary structures and results more accurate than current approaches. Our method, called GARN, reached the expected objectives and allowed us to deepen the interesting factors for coarse-grained structure prediction on molecules. Arn Structure 3D Théorie des jeux Minimisation de regret Potentiel statistique Discrétisation de l'espace Rna 3D structure Game theory Regret minimization Statistical potential Discrete space

1

Page generated in 0.1255 seconds