Global ETD Search

1	Sur la notion d'optimalité dans les problèmes de bandit stochastique / On the notion of optimality in the stochastic multi-armed bandit problems Ménard, Pierre 03 July 2018 (has links) Cette thèse s'inscrit dans les domaines de l'apprentissage statistique et de la statistique séquentielle. Le cadre principal est celui des problèmes de bandit stochastique à plusieurs bras. Dans une première partie, on commence par revisiter les bornes inférieures sur le regret. On obtient ainsi des bornes non-asymptotiques dépendantes de la distribution que l'on prouve de manière très simple en se limitant à quelques propriétés bien connues de la divergence de Kullback-Leibler. Puis, on propose des algorithmes pour la minimisation du regret dans les problèmes de bandit stochastique paramétrique dont les bras appartiennent à une certaine famille exponentielle ou non-paramétrique en supposant seulement que les bras sont à support dans l'intervalle unité, pour lesquels on prouve l'optimalité asymptotique (au sens de la borne inférieure de Lai et Robbins) et l'optimalité minimax. On analyse aussi la complexité pour l'échantillonnage séquentielle visant à identifier la distribution ayant la moyenne la plus proche d'un seuil fixé, avec ou sans l'hypothèse que les moyennes des bras forment une suite croissante. Ce travail est motivé par l'étude des essais cliniques de phase I, où l'hypothèse de croissance est naturelle. Finalement, on étend l'inégalité de Fano qui contrôle la probabilité d'évènements disjoints avec une moyenne de divergences de Kullback-leibler à des variables aléatoires arbitraires bornées sur l'intervalle unité. Plusieurs nouvelles applications en découlent, les plus importantes étant une borne inférieure sur la vitesse de concentration de l'a posteriori Bayésien et une borne inférieure sur le regret pour un problème de bandit non-stochastique. / The topics addressed in this thesis lie in statistical machine learning and sequential statistic. Our main framework is the stochastic multi-armed bandit problems. In this work we revisit lower bounds on the regret. We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergence. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. Then, we propose algorithms for regret minimization in stochastic bandit models with exponential families of distributions or with distribution only assumed to be supported by the unit interval, that are simultaneously asymptotically optimal (in the sense of Lai and Robbins lower bound) and minimax optimal. We also analyze the sample complexity of sequentially identifying the distribution whose expectation is the closest to some given threshold, with and without the assumption that the mean values of the distributions are increasing. This work is motivated by phase I clinical trials, a practically important setting where the arm means are increasing by nature. Finally we extend Fano's inequality, which controls the average probability of (disjoint) events in terms of the average of some Kullback-Leibler divergences, to work with arbitrary unit-valued random variables. Several novel applications are provided, in which the consideration of random variables is particularly handy. The most important applications deal with the problem of Bayesian posterior concentration (minimax or distribution-dependent) rates and with a lower bound on the regret in non-stochastic sequential learning. Bandits stochastiques multi-bras Théorie de l'information Bornes inférieures non-asymptotiques Analyse du regret Optimalité asymptotique Optimalité minimax Borne supérieure de confiance Stochastic multi-armed bandits Information theory
2	Valorisation optimale asymptotique avec risque asymétrique et applications en finance / Asymptotic optimal pricing with asymmetric risk and applications in finance Santa brigida pimentel, Isaque 16 October 2018 (has links) Cette thèse est constituée de deux parties qui peuvent être lues indépendamment. Dans la première partie de la thèse, nous étudions des problèmes de couverture et de valorisation d’options liés à une mesure de risque. Notre approche principale est l’utilisation d’une fonction de risque asymétrique et d’un cadre asymptotique dans lequel nous obtenons des solutions optimales à travers des équations aux dérivées partielles (EDP) non-linéaires.Dans le premier chapitre, nous nous intéressons à la valorisation et la couverture des options européennes. Nous considérons le problème de l’optimisation du risque résiduel généré par une couverture à temps discret en présence d’un critère asymétrique de risque. Au lieu d'analyser le comportement asymptotique de la solution du problème discret associé, nous avons étudié la mesure asymétrique du risque résiduel intégré dans un cadre Markovian. Dans ce contexte, nous montrons l’existence de cette mesure de risque asymptotique. Ainsi, nous décrivons une stratégie de couverture asymptotiquement optimale via la solution d’une EDP totalement non-linéaire.Le deuxième chapitre est une application de cette méthode de couverture au problème de valorisation de la production d’une centrale. Puisque la centrale génère de coûts de maintenance qu’elle soit allumée ou non, nous nous sommes intéressés à la réduction du risque associé aux revenus incertains de cette centrale en se couvrant avec des contrats à terme. Nous avons étudié l’impact d’un coût de maintenance dépendant du prix d’électricité dans la stratégie couverture.Dans la seconde partie de la thèse, nous considérons plusieurs problèmes de contrôle liés à l'économie et la finance.Le troisième chapitre est dédié à l’étude d’une classe de problème du type McKean-Vlasov (MKV) avec bruit commun, appelée MKV polynomiale conditionnelle. Nous réduisons cette classe polynomiale par plongement de Markov à des problèmes de contrôle en dimension finie.Nous comparons trois techniques probabilistes différentes pour la résolution numérique du problème réduit: la quantification, la régression par randomisation du contrôle et la régression différée. Nous fournissons de nombreux exemples numériques, comme par exemple, la sélection de portefeuille avec incertitude sur une tendance du sous-jacent.Dans le quatrième chapitre, nous résolvons des équations de programmation dynamique associées à des valorisations financières sur le marché de l’énergie. Nous considérons qu’un modèle calibré pour les sous-jacents n’est pas disponible et qu’un petit échantillon obtenu des données historiques est accessible.En plus, dans ce contexte, nous supposons que les contrats à terme sont souvent gouvernés par des facteurs cachés modélisés par des processus de Markov. Nous proposons une méthode nonintrusive pour résoudre ces équations à travers les techniques de régression empirique en utilisant seulement l’historique du log du prix des contrats à terme observables. / This thesis is constituted by two parts that can be read independently.In the first part, we study several problems of hedging and pricing of options related to a risk measure. Our main approach is the use of an asymmetric risk function and an asymptotic framework in which we obtain optimal solutions through nonlinear partial differential equations (PDE).In the first chapter, we focus on pricing and hedging European options. We consider the optimization problem of the residual risk generated by a discrete-time hedging in the presence of an asymmetric risk criterion. Instead of analyzing the asymptotic behavior of the solution to the associated discrete problem, we study the integrated asymmetric measure of the residual risk in a Markovian framework. In this context, we show the existence of the asymptotic risk measure. Thus, we describe an asymptotically optimal hedging strategy via the solution to a fully nonlinear PDE.The second chapter is an application of the hedging method to the valuation problem of the power plant. Since the power plant generates maintenance costs whether it is on or off, we are interested in reducing the risk associated with its uncertain revenues by hedging with forwards contracts. We study the impact of a maintenance cost depending on the electricity price into the hedging strategy.In the second part, we consider several control problems associated with economy and finance.The third chapter is dedicated to the study of a McKean-Vlasov (MKV) problem class with common noise, called polynomial conditional MKV. We reduce this polynomial class by a Markov embedding to finite-dimensional control problems.We compare three different probabilistic techniques for numerical resolution of the reduced problem: quantization, control randomization and regress later.We provide numerous numerical examples, such as the selection of a portfolio under drift uncertainty.In the fourth chapter, we solve dynamic programming equations associated with financial valuations in the energy market. We consider that a calibrated underlying model is not available and that a limited sample of historical data is accessible.In this context, we suppose that forward contracts are governed by hidden factors modeled by Markov processes. We propose a non-intrusive method to solve these equations through empirical regression techniques using only the log price history of observable futures contracts. Risque asymétrique Optimalité asymptotique Régression empirique Marché d’électricité Asymmetric Risk Asymptotic optimality Nonlinear Partial Differential Equations Empirical regression Electricity market 332.015 118

Search results

Sur la notion d'optimalité dans les problèmes de bandit stochastique / On the notion of optimality in the stochastic multi-armed bandit problems

Valorisation optimale asymptotique avec risque asymétrique et applications en finance / Asymptotic optimal pricing with asymmetric risk and applications in finance