Global ETD Search

1	Thompson sampling-based online decision making in network routing Huang, Zhiming 02 September 2020 (has links) Online decision making is a kind of machine learning problems where decisions are made in a sequential manner so as to accumulate as many rewards as possible. Typical examples include multi-armed bandit (MAB) problems where an agent needs to decide which arm to pull in each round, and network routing problems where each router needs to decide the next hop for each packet. Thompson sampling (TS) is an efficient and effective algorithm for online decision making problems. Although TS has been proposed for a long time, it was not until recent years that the theoretical guarantees for TS in the standard MAB were given. In this thesis, we first analyze the performance of TS both theoretically and practically in a special MAB called combinatorial MAB with sleeping arms and long-term fairness constraints (CSMAB-F). Then, we apply TS to a novel reactive network routing problem, called \emph{opportunistic routing without link metrics known a priori}, and use the proof techniques we developed for CSMAB-F to analyze the performance. / Graduate Online Decision Making Multi-armed Bandits Thompson Sampling Network Routing
2	An Empirical Evaluation of Context Aware Clustering of Bandits using Thompson Sampling Campolongo, Nicolò January 2017 (has links) Stochastic bandit algorithms are increasingly being used in the domain of recommender systems, when the environment is very dynamic and the items to recommend are frequently changing over time. While traditional approaches consider a single bandit instance which assumes all users to be equal, recent developments in the literature showed that the quality of recommendations can be improved when individual bandit instances for different users are considered and clustering techniques are used. In this work we develop an algorithm which clusters users based on the context at disposal using a Bayesian framework called Thompson Sampling, opposed to a similar algorithm called CAB recently presented at ICML 2017 (International Conference on Machine Learning), which uses a deterministic strategy. We show extensively through experiments on synthetic and real world data that the perfor- mance of our algorithm is in line with its deterministic version CAB when it comes to the quality of recommendations. On the other hand, our approach is relatively faster when considering running time. / Stokastiska bandit-algoritmer används allt mer för rekommenderande system där omgivningen är väldigt dynamisk och de rekommenderade föremålen ständigt byts ut. Medan traditionella tillvägagångssätt använder en enkel bandit-instans som antar alla användare att vara likadana, har litteraturen under senare tid visat att kvaliteten av rekommendationerna kan förhöjas med användarspecifika bandit-instanser och med hjälp av klustringsalgoritmer. I detta projekt har en algoritm utvecklats som klustrar användare baserat på kontexten för disposition genom att använda ett Bayesianskt ramverk som kallas Thompson-sampling, till skillnad från den liknande algoritmen CAD som nyligen presenterades vid ICML 2017 (International Conference on Machine Learning), som använder en deterministisk strategi. Projektet visar med stor omfattning, med hjälp av experiment som använt både syntetiska och reella data, att den framtagna algoritmens genererade rekommendationer håller likvärdig kvalitet som den deterministiska varianten av CAB. Däremot ger vårt presenterade tillvägagångssätt högre prestanda avseende tidsåtgång. bandits multi-armed Thompson Sampling Engineering and Technology Teknik och teknologier
3	Exploration-exploitation with Thompson sampling in linear systems / Algorithmes de Thompson sampling pour l’exploration-exploitation dans les systèmes linéaires Abeille, Marc 13 December 2017 (has links) Cette thèse est dédiée à l'étude du Thompson Sampling (TS), une heuristique qui vise à surmonter le dilemme entre exploration et exploitation qui est inhérent à tout processus décisionnel face à l'incertain. Contrairement aux algorithmes issus de l'heuristique optimiste face à l'incertain (OFU), où l'exploration provient du choix du modèle le plus favorable possible au vu de la connaissance accumulée, les algorithmes TS introduisent de l'aléa dans le processus décisionnel en sélectionnant aléatoirement un modèle plausible, ce qui les rend bien moins coûteux numériquement. Cette étude se concentre sur les problèmes paramétriques linéaires, qui autorisent les espaces état-action continus (infinis), en particulier les problèmes de Bandits Linéaires (LB) et les problèmes de contrôle Linéaire et Quadratique (LQ). Nous proposons dans cette thèse de nouvelles analyses du regret des algorithmes TS pour chacun de ces deux problèmes. Bien que notre démonstration pour les LB garantisse une borne supérieure identique aux résultats préexistants, la structure de la preuve offre une nouvelle vision du fonctionnement de l'algorithme TS, et nous permet d'étendre cette analyse aux problèmes LQ. Nous démontrons la première borne supérieure pour le regret de l'algorithme TS dans les problèmes LQ, qui garantie dans le cadre fréquentiste un regret au plus d'ordre O(\sqrt{T}). Enfin, nous proposons une application des méthodes d'exploration-exploitation pour les problèmes d'optimisation de portefeuille, et discutons dans ce cadre le besoin ou non d'explorer activement. / This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to address the exploration-exploitation dilemma that is inherent in sequential decision-making under uncertainty. As opposed to algorithms derived from the optimism-in-the-face-of-uncertainty (OFU) principle, where the exploration is performed by selecting the most favorable model within the set of plausible one, TS algorithms rely on randomization to enhance the exploration, and thus are much more computationally efficient. We focus on linearly parametrized problems that allow for continuous state-action spaces, namely the Linear Bandit (LB) problems and the Linear Quadratic (LQ) control problems. We derive two novel analyses for the regret of TS algorithms in those settings. While the obtained regret bound for LB is similar to previous results, the proof sheds new light on the functioning of TS, and allows us to extend the analysis to LQ problems. As a result, we prove the first regret bound for TS in LQ, and show that the frequentist regret is of order O(sqrt{T}) which matches the existing guarantee for the regret of OFU algorithms in LQ. Finally, we propose an application of exploration-exploitation techniques to the practical problem of portfolio construction, and discuss the need for active exploration in this setting. Algorithme Thompson sampling Bandits multi-Bras Bandits linéaires 519.62
4	Decision making using Thompson Sampling Mellor, Joseph Charles January 2014 (has links) The ability to make decisions is a crucial ability of many autonomous systems. In many scenarios the consequence of a decision is unknown and often stochastic. The same decision may lead to a different outcome every time it is taken. An agent that can learn to make decisions based purely on its past experience needs less tuning and is likely more robust. An agent must often balance between learning the payoff of actions by exploring, and exploiting the knowledge they currently have. The multi-armed bandit problem exhibits such an exploration-exploitation dilemma. Thompson Sampling is a strategy for the problem, first proposed in 1933. In the last several years there has been renewed interest in it, with the emergence of strong empirical and theoretical justification for its use. This thesis seeks to take advantage of the benefits of Thompson Sampling while applying it to other decision-making models. In doing so we propose different algorithms for these scenarios. Firstly we explore a switching multi-armed bandit problem. In real applications the most appropriate decision to take often changes over time. We show that an agent assuming switching is often robust to many types of changing environment. Secondly we consider the best arm identification problem. Unlike the multi-armed bandit problem, where an agent wants to increase reward over the entire period of decision making, the best arm identification is concerned in increasing the reward gained by a final decision. This thesis argues that both problems can be tackled effectively using Thompson Sampling based approaches and provides empirical evidence to support this claim. 006.3
5	Algorithmic and Ethical Aspects of Recommender Systems in e-Commerce Paraschakis, Dimitris January 2018 (has links) Recommender systems have become an integral part of virtually every e-commerce application on the web. These systems enable users to quickly discover relevant products, at the same time increasing business value. Over the past decades, recommender systems have been modeled using numerous machine learning techniques. However, the adoptability of these models by commercial applications remains unclear. We assess the receptiveness of the industrial sector to algorithmic contributions of the research community by surveying more than 30 e-commerce platforms, and experimenting with various recommenders on proprietary e-commerce datasets. Another overlooked but important factor that complicates the design and use of recommender systems is their ethical implications. We identify and summarize these issues in our ethical recommendation framework, which also enables users to control sensitive moral aspects of recommendations via the proposed “ethical toolbox”. The feasibility of this tool is supported by the results of our user study. Because of moral implications associated with user profiling, we investigate algorithms capable of generating user-agnostic recommendations. We propose an ensemble learning scheme based on Thompson Sampling bandit policy, which models arms as base recommendation functions. We show how to adapt this algorithm to realistic situations when neither arm availability nor reward stationarity is guaranteed. recommender systems e-commerce recommendation ethics collaborative filtering thompson sampling multi-arm bandits reinforcement learning Engineering and Technology Teknik och teknologier
6	A Study of Thompson Sampling Approach for the Sleeping Multi-Armed Bandit Problem Chatterjee, Aritra January 2017 (has links) (PDF) The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision problems arising in modern applications including Internet display advertising, crowdsourcing, online procurement, smart grids, etc. Several variants of the MAB problem have been proposed to extend the basic model to a variety of practical and general settings. The sleeping multi-armed bandit (SMAB) problem is one such variant where the set of available arms varies with time. This study is focused on analyzing the efficacy of the Thompson Sampling algorithm for solving the SMAB problem. Any algorithm for the classical MAB problem is expected to choose one of K available arms (actions) in each of T consecutive rounds. Each choice of an arm generates a stochastic reward from an unknown but fixed distribution. The goal of the algorithm is to maximize the expected sum of rewards over the T rounds (or equivalently minimize the expected total regret), relative to the best fixed action in hindsight. In many real-world settings, however, not all arms may be available in any given round. For example, in Internet display advertising, some advertisers might choose to stay away from the auction due to budget constraints; in crowdsourcing, some workers may not be available at a given time due to timezone difference, etc. Such situations give rise to the sleeping MAB abstraction. In the literature, several upper confidence bound (UCB)-based approaches have been proposed and investigated for the SMAB problem. Our contribution is to investigate the efficacy of a Thomp-son Sampling-based approach. Our key finding is to establish a logarithmic regret bound, which non-trivially generalizes a similar bound known for this approach in the classical MAB setting. Our bound also matches (up to constants) the best-known lower bound for the SMAB problem. Furthermore, we show via detailed simulations, that the Thompson Sampling approach in fact outperforms the known algorithms for the SMAB problem. Thompson Sampling Multi-Armed Bandit Problem Upper Confidence Bound (UCB) Awake Upper Estimated Reward Multi-Armed Bandit Algorithms Sleeping Multi-Armed Bandit Model TS-SMAB Computer Science

1

Page generated in 0.0485 seconds