Global ETD Search

21	Thompson sampling-based online decision making in network routing Huang, Zhiming 02 September 2020 (has links) Online decision making is a kind of machine learning problems where decisions are made in a sequential manner so as to accumulate as many rewards as possible. Typical examples include multi-armed bandit (MAB) problems where an agent needs to decide which arm to pull in each round, and network routing problems where each router needs to decide the next hop for each packet. Thompson sampling (TS) is an efficient and effective algorithm for online decision making problems. Although TS has been proposed for a long time, it was not until recent years that the theoretical guarantees for TS in the standard MAB were given. In this thesis, we first analyze the performance of TS both theoretically and practically in a special MAB called combinatorial MAB with sleeping arms and long-term fairness constraints (CSMAB-F). Then, we apply TS to a novel reactive network routing problem, called \emph{opportunistic routing without link metrics known a priori}, and use the proof techniques we developed for CSMAB-F to analyze the performance. / Graduate Online Decision Making Multi-armed Bandits Thompson Sampling Network Routing
22	An Empirical Evaluation of Context Aware Clustering of Bandits using Thompson Sampling Campolongo, Nicolò January 2017 (has links) Stochastic bandit algorithms are increasingly being used in the domain of recommender systems, when the environment is very dynamic and the items to recommend are frequently changing over time. While traditional approaches consider a single bandit instance which assumes all users to be equal, recent developments in the literature showed that the quality of recommendations can be improved when individual bandit instances for different users are considered and clustering techniques are used. In this work we develop an algorithm which clusters users based on the context at disposal using a Bayesian framework called Thompson Sampling, opposed to a similar algorithm called CAB recently presented at ICML 2017 (International Conference on Machine Learning), which uses a deterministic strategy. We show extensively through experiments on synthetic and real world data that the perfor- mance of our algorithm is in line with its deterministic version CAB when it comes to the quality of recommendations. On the other hand, our approach is relatively faster when considering running time. / Stokastiska bandit-algoritmer används allt mer för rekommenderande system där omgivningen är väldigt dynamisk och de rekommenderade föremålen ständigt byts ut. Medan traditionella tillvägagångssätt använder en enkel bandit-instans som antar alla användare att vara likadana, har litteraturen under senare tid visat att kvaliteten av rekommendationerna kan förhöjas med användarspecifika bandit-instanser och med hjälp av klustringsalgoritmer. I detta projekt har en algoritm utvecklats som klustrar användare baserat på kontexten för disposition genom att använda ett Bayesianskt ramverk som kallas Thompson-sampling, till skillnad från den liknande algoritmen CAD som nyligen presenterades vid ICML 2017 (International Conference on Machine Learning), som använder en deterministisk strategi. Projektet visar med stor omfattning, med hjälp av experiment som använt både syntetiska och reella data, att den framtagna algoritmens genererade rekommendationer håller likvärdig kvalitet som den deterministiska varianten av CAB. Däremot ger vårt presenterade tillvägagångssätt högre prestanda avseende tidsåtgång. bandits multi-armed Thompson Sampling Engineering and Technology Teknik och teknologier
23	Modelling Infertility with Markov Chains Dorff, Rebecca 20 June 2013 (has links) (PDF) Infertility affects approximately 15% of couples. Testing and interventions are costly, in time, money, and emotional energy. This paper will discuss using Markov decision and multi-armed bandit processes to identify a systematic approach of interventions that will lead to the desired baby while minimizing costs. Infertility Medical Diagnosis Markov Decision Processes Multi-armed Bandits Mathematics
24	Group-Envy Fairness in the Stochastic Bandit Setting Scinocca, Stephen 29 September 2022 (has links) We introduce a new, group fairness-inspired stochastic multi-armed bandit problem in the pure exploration setting. We look at the discrepancy between an arm’s mean reward from a group and the highest mean reward for any arm from that group, and call this the disappointment that group suffers from that arm. We define the optimal arm to be the one that minimizes the maximum disappointment over all groups. This optimal arm addresses one problem with maximin fairness, where the group used to choose the maximin best arm suffers little disappointment regardless of what arm is picked, but another group suffers significantly more disappointment by picking that arm as the best one. The challenge of this problem is that the highest mean reward for a group and the arm that gives that reward are unknown. This means we need to pull arms for multiple goals: to find the optimal arm, and to estimate the highest mean reward of certain groups. This leads to the new adaptive sampling algorithm for best arm identification in the fixed confidence setting called MD-LUCB, or Minimax Disappointment LUCB. We prove bounds on MD-LUCB’s sample complexity and then study its performance with empirical simulations. / Graduate Multi-armed bandits Machine learning theory Algorithmic Fairness
25	Bestar, vildar och banditer. Om känslor i Apuleius Metamorphoses / Beasts, savages, and bandits. On Emotions in Apuleius' Metamorphoses Garfvé, Daniel January 2023 (has links) Folksägen har haft en djup kulturell betydelse under mänsklighetens historia. Som en form av muntligt och litterärt historieberättande som användes för att presentera olika typer av kunskap och förmedla livsläxor till människor som lever i olika kulturer som ständigt utvecklas. Syftet med denna undersökning är att utifrån verket Metamorphoses av Apuleius göra en undersökning av hur känslor så som skräck och rädsla presenteras i mötet mellan romare och banditer samt djur under slutet av det första århundradet e.Kr. Metoden som kommer att användas är en form av närläsning utav texten Metamorphoses. Utöver denna metod så kommer undersökningen utgå från den så kallade medlande skolan inom forskningen av känslor i ett historiskt perspektiv. Medlande skolans perspektiv är att det finns vissa kulturella aspekter som går att medan att inte bortse från att det finns vissa biologiska komponenter till vad som utgör känslor som psykologer anser är den primära aspekten i känslor. Undersökningen kommer fram till att romare hade en öppen rädsla för banditer och djur som hade kulturella kopplingar till även natten. Skillnader i hur de agerade kunde skilja sig åt mellan grupperingar vilket tyder på att det inte fanns en universell ideal om hur en romare skulle bemöta denna rädsla. / Folklore has had a deep cultural significance throughout human history. As a form of oral and literary storytelling that was used to present different types of knowledge and convey life lessons to people living in different cultures that are constantly evolving. The purpose of this research is to, based on the work Metamorphoses by Apuleius, make an investigation of how emotions such as terror and fear are presented in the meeting between Romans and bandits and animals during the end of the first century AD. The method that will be used is a form of close reading from the text Metamorphoses. In addition to this method, the investigation will be based on the so-called mediating school within the research of emotions in a historical perspective. The perspective of the mediating school is that there are certain cultural aspects that go while not ignoring that there are certain biological components to what constitutes emotions which psychologists consider to be the primary aspect in emotions. The research concludes that Romans had an open fear of bandits and animals that had cultural connections to the night as well. Differences in how they acted could differ between groups, suggesting that there was no universal ideal of how a Roman should respond to this fear. Apuleius Metamorphoses emotions bandits animals Rome Apuleius Metamorphoses emotions bandits animals Rome History Historia History and Archaeology Historia och arkeologi
26	Structured Stochastic Bandits Magureanu, Stefan January 2016 (has links) In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlated arms. Particularly, we investigate the case when the expected rewards are a Lipschitz function of the arm, and the learning to rank problem, as viewed from a MAB perspective. For the former, we derive a problem specific lower bound and propose both an asymptotically optimal algorithm (OSLB) and a (pareto)optimal, algorithm (POSLB). For the latter, we construct the regret lower bound and determine its closed form for some particular settings, as well as propose two asymptotically optimal algorithms PIE and PIE-C. For all algorithms mentioned above, we present performance analysis in the form of theoretical regret guarantees as well as numerical evaluation on artificial datasets as well as real-world datasets, in the case of PIE and PIE-C. / <p>QC 20160223</p> Multi-armed bandits Learning to rank reinforcement learning Lipschitz Bandits Annan elektroteknik och elektronik
27	Environment Adaptive Regret Analysis in Bandit Problems / バンディット問題における環境適応的リグレット解析 Tsuchiya, Taira 25 September 2023 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第24939号 / 情博第850号 / 新制\|\|情\|\|142(附属図書館) / 京都大学大学院情報学研究科システム科学専攻 / (主査)准教授本多淳也, 教授田中利幸, 教授鹿島久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM bandit problems online learning regret analysis best-of-both-worlds algorithms structured bandits partial monitoring combinatorial semi-bandits 007
28	Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits / Contributions aux bandits manchots : gestion du risque et sous-échantillonnage pour les bandits contextuels linéaires Galichet, Nicolas 28 September 2015 (has links) Cette thèse s'inscrit dans le domaine de la prise de décision séquentielle en environnement inconnu, et plus particulièrement dans le cadre des bandits manchots (multi-armed bandits, MAB), défini par Robbins et Lai dans les années 50. Depuis les années 2000, ce cadre a fait l'objet de nombreuses recherches théoriques et algorithmiques centrées sur le compromis entre l'exploration et l'exploitation : L'exploitation consiste à répéter le plus souvent possible les choix qui se sont avérés les meilleurs jusqu'à présent. L'exploration consiste à essayer des choix qui ont rarement été essayés, pour vérifier qu'on a bien identifié les meilleurs choix. Les applications des approches MAB vont du choix des traitements médicaux à la recommandation dans le contexte du commerce électronique, en passant par la recherche de politiques optimales de l'énergie. Les contributions présentées dans ce manuscrit s'intéressent au compromis exploration vs exploitation sous deux angles spécifiques. Le premier concerne la prise en compte du risque. Toute exploration dans un contexte inconnu peut en effet aboutir à des conséquences indésirables ; par exemple l'exploration des comportements d'un robot peut aboutir à des dommages pour le robot ou pour son environnement. Dans ce contexte, l'objectif est d'obtenir un compromis entre exploration, exploitation, et prise de risque (EER). Plusieurs algorithmes originaux sont proposés dans le cadre du compromis EER. Sous des hypothèses fortes, l'algorithme MIN offre des garanties de regret logarithmique, à l'état de l'art ; il offre également une grande robustesse, contrastant avec la forte sensibilité aux valeurs des hyper-paramètres de e.g. (Auer et al. 2002). L'algorithme MARAB s'intéresse à un critère inspiré de la littérature économique(Conditional Value at Risk), et montre d'excellentes performances empiriques comparées à (Sani et al. 2012), mais sans garanties théoriques. Enfin, l'algorithme MARABOUT modifie l'estimation du critère CVaR pour obtenir des garanties théoriques, tout en obtenant un bon comportement empirique. Le second axe de recherche concerne le bandit contextuel, où l'on dispose d'informations additionnelles relatives au contexte de la décision ; par exemple, les variables d'état du patient dans un contexte médical ou de l'utilisateur dans un contexte de recommandation. L'étude se focalise sur le choix entre bras qu'on a tirés précédemment un nombre de fois différent. Le choix repose en général sur la notion d'optimisme, comparant les bornes supérieures des intervalles de confiance associés aux bras considérés. Une autre approche appelée BESA, reposant sur le sous-échantillonnage des valeurs tirées pour les bras les plus visités, et permettant ainsi de se ramener au cas où tous les bras ont été tirés un même nombre de fois, a été proposée par (Baransi et al. 2014). / This thesis focuses on sequential decision making in unknown environment, and more particularly on the Multi-Armed Bandit (MAB) setting, defined by Lai and Robbins in the 50s. During the last decade, many theoretical and algorithmic studies have been aimed at cthe exploration vs exploitation tradeoff at the core of MABs, where Exploitation is biased toward the best options visited so far while Exploration is biased toward options rarely visited, to enforce the discovery of the the true best choices. MAB applications range from medicine (the elicitation of the best prescriptions) to e-commerce (recommendations, advertisements) and optimal policies (e.g., in the energy domain). The contributions presented in this dissertation tackle the exploration vs exploitation dilemma under two angles. The first contribution is centered on risk avoidance. Exploration in unknown environments often has adverse effects: for instance exploratory trajectories of a robot can entail physical damages for the robot or its environment. We thus define the exploration vs exploitation vs safety (EES) tradeoff, and propose three new algorithms addressing the EES dilemma. Firstly and under strong assumptions, the MIN algorithm provides a robust behavior with guarantees of logarithmic regret, matching the state of the art with a high robustness w.r.t. hyper-parameter setting (as opposed to, e.g. UCB (Auer 2002)). Secondly, the MARAB algorithm aims at optimizing the cumulative 'Conditional Value at Risk' (CVar) rewards, originated from the economics domain, with excellent empirical performances compared to (Sani et al. 2012), though without any theoretical guarantees. Finally, the MARABOUT algorithm modifies the CVar estimation and yields both theoretical guarantees and a good empirical behavior. The second contribution concerns the contextual bandit setting, where additional informations are provided to support the decision making, such as the user details in the ontent recommendation domain, or the patient history in the medical domain. The study focuses on how to make a choice between two arms with different numbers of samples. Traditionally, a confidence region is derived for each arm based on the associated samples, and the 'Optimism in front of the unknown' principle implements the choice of the arm with maximal upper confidence bound. An alternative, pioneered by (Baransi et al. 2014), and called BESA, proceeds instead by subsampling without replacement the larger sample set. In this framework, we designed a contextual bandit algorithm based on sub-sampling without replacement, relaxing the (unrealistic) assumption that all arm reward distributions rely on the same parameter. The CL-BESA algorithm yields both theoretical guarantees of logarithmic regret and good empirical behavior. Prise de décision séquentielle Apprentissage automatique Bandits manchots Sous-échantillonnage Aversion au risque CVaR Exploration vs Exploitation vs Risque Bandits linéaires Bandits contextuels Analyse de regret Sequential decision making Machine learning Multi-armed bandits Sub-Sampling Risk-aversion CvaR Exploration vs Exploitation vs Safety Linear bandits Contextual bandits Regret analysis
29	Sur la notion d'optimalité dans les problèmes de bandit stochastique / On the notion of optimality in the stochastic multi-armed bandit problems Ménard, Pierre 03 July 2018 (has links) Cette thèse s'inscrit dans les domaines de l'apprentissage statistique et de la statistique séquentielle. Le cadre principal est celui des problèmes de bandit stochastique à plusieurs bras. Dans une première partie, on commence par revisiter les bornes inférieures sur le regret. On obtient ainsi des bornes non-asymptotiques dépendantes de la distribution que l'on prouve de manière très simple en se limitant à quelques propriétés bien connues de la divergence de Kullback-Leibler. Puis, on propose des algorithmes pour la minimisation du regret dans les problèmes de bandit stochastique paramétrique dont les bras appartiennent à une certaine famille exponentielle ou non-paramétrique en supposant seulement que les bras sont à support dans l'intervalle unité, pour lesquels on prouve l'optimalité asymptotique (au sens de la borne inférieure de Lai et Robbins) et l'optimalité minimax. On analyse aussi la complexité pour l'échantillonnage séquentielle visant à identifier la distribution ayant la moyenne la plus proche d'un seuil fixé, avec ou sans l'hypothèse que les moyennes des bras forment une suite croissante. Ce travail est motivé par l'étude des essais cliniques de phase I, où l'hypothèse de croissance est naturelle. Finalement, on étend l'inégalité de Fano qui contrôle la probabilité d'évènements disjoints avec une moyenne de divergences de Kullback-leibler à des variables aléatoires arbitraires bornées sur l'intervalle unité. Plusieurs nouvelles applications en découlent, les plus importantes étant une borne inférieure sur la vitesse de concentration de l'a posteriori Bayésien et une borne inférieure sur le regret pour un problème de bandit non-stochastique. / The topics addressed in this thesis lie in statistical machine learning and sequential statistic. Our main framework is the stochastic multi-armed bandit problems. In this work we revisit lower bounds on the regret. We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergence. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. Then, we propose algorithms for regret minimization in stochastic bandit models with exponential families of distributions or with distribution only assumed to be supported by the unit interval, that are simultaneously asymptotically optimal (in the sense of Lai and Robbins lower bound) and minimax optimal. We also analyze the sample complexity of sequentially identifying the distribution whose expectation is the closest to some given threshold, with and without the assumption that the mean values of the distributions are increasing. This work is motivated by phase I clinical trials, a practically important setting where the arm means are increasing by nature. Finally we extend Fano's inequality, which controls the average probability of (disjoint) events in terms of the average of some Kullback-Leibler divergences, to work with arbitrary unit-valued random variables. Several novel applications are provided, in which the consideration of random variables is particularly handy. The most important applications deal with the problem of Bayesian posterior concentration (minimax or distribution-dependent) rates and with a lower bound on the regret in non-stochastic sequential learning. Bandits stochastiques multi-bras Théorie de l'information Bornes inférieures non-asymptotiques Analyse du regret Optimalité asymptotique Optimalité minimax Borne supérieure de confiance Stochastic multi-armed bandits Information theory
30	Méthodes adaptatives pour les applications d'accès à l'information centrées sur l'utilisateur / Adaptive Methods for User-Centric Information Access Applications Lagrée, Paul 12 October 2017 (has links) Lorsque les internautes naviguent sur le Web, ils laissent de nombreuses traces que nous nous proposons d’exploiter pour améliorer les applications d'accès à l'information. Nous étudions des techniques centrées sur les utilisateurs qui tirent parti des nombreux types de rétroaction pour perfectionner les services offerts aux utilisateurs. Nous nous concentrons sur des applications telles que la recommandation et le marketing d’influence dans lesquelles les utilisateurs génèrent des signaux (clics, "j'aime", etc.) que nous intégrons dans nos algorithmes afin de fournir des services fortement contextualisés. La première partie de cette thèse est consacrée à une approche interactive de la recherche d'information sur les médias sociaux. Le problème consiste à récupérer un ensemble de k résultats dans un réseau social sous la contrainte que la requête peut être incomplète (par exemple, si le dernier terme est un préfixe). Chaque fois que l'utilisateur met à jour sa requête, le système met à jour l'ensemble des résultats de recherche en conséquence. Nous adoptons une interprétation de la pertinence de l'information qui tient compte du réseau, selon laquelle l'information produite par les utilisateurs proches de l'utilisateur faisant la requête est jugée plus pertinente. Ensuite, nous étudions une version générique de la maximisation de l'influence, dans laquelle nous voulons maximiser l'influence des campagnes d'information ou de marketing en sélectionnant de manière adaptative les utilisateurs initiant la propagation de l'information parmi un petit sous-ensemble de la population. Notre approche ne fait aucune hypothèse sur le modèle de diffusion sous-jacent ni même sur la structure du réseau de diffusion. Notre méthode a d'importantes applications dans le marketing d’influence qui vise à s’appuyer sur les influenceurs de réseaux sociaux pour promouvoir des produits ou des idées. Enfin, nous abordons le problème bien connu du démarrage à froid auquel sont confrontés les systèmes de recommandation par une approche adaptative. Si aucune information n’est disponible concernant l'appréciation d’un article, le système de recommandation doit recueillir des signaux (clics, etc.) afin d'estimer la valeur de l'article. Cependant, afin de minimiser les mauvaises recommandations faites aux utilisateurs, le système ne doit pas recueillir ces signaux de façon négligente. Nous introduisons un algorithme dynamique qui vise à alterner intelligemment les recommandations visant à accumuler de l'information et celles s'appuyant sur les données déjà recueillies. / When users interact on modern Web systems, they let numerous footprints which we propose to exploit in order to develop better applications for information access. We study a family of techniques centered on users, which take advantage of the many types of feedback to adapt and improve services provided to users. We focus on applications like recommendation and influencer marketing in which users generate discrete feedback (e.g. clicks, "likes", reposts, etc.) that we incorporate in our algorithms in order to deliver strongly contextualized services. The first part of this dissertation is dedicated to an approach for as-you-type search on social media. The problem consists in retrieving a set of k search results in a social-aware environment under the constraint that the query may be incomplete (e.g., if the last term is a prefix). Every time the user updates his / her query, the system updates the set of search results accordingly. We adopt a "network-aware" interpretation of information relevance, by which information produced by users who are closer to the user issuing a request is considered more relevant. Then, we study a generic version of influence maximization, in which we want to maximize the influence of marketing or information campaigns by adaptively selecting "spread seeds" from a small subset of the population. Influencer marketing is a straightforward application of this, in which the focus of a campaign is placed on precise key individuals who are typically able to reach millions of consumers. This represents an unprecedented tool for online marketing that we propose to improve using an adaptive approach. Notably, our approach makes no assumptions on the underlying diffusion model and no diffusion network is needed. Finally, we propose to address the well-known cold start problem faced by recommender systems with an adaptive approach. If no information is available regarding the user appreciation of an item, the recommender system needs to gather feedback (e.g., clicks) so as to estimate the value of the item. However, in order to minimize "bad" recommendations, a well-designed system should not collect feedback carelessly. We introduce a dynamic algorithm that aims to intelligently achieve the balance between "bad" and "good" recommendations. Méthodes adaptatives Réseaux sociaux Bandits multi-bras Recommandation Marketing d'influence Adaptive methods Online social networks Multi-armed bandits Recommendation Influencer marketing

Search results