Spelling suggestions: "subject:"multiarmed"" "subject:"multiformed""
1 |
Sélection Séquentielle en Environnement Aléatoire Appliquée à l'Apprentissage SuperviséCaelen, Olivier 25 September 2009 (has links)
Cette thèse se penche sur les problèmes de décisions devant être prises de manière séquentielle au sein d'un environnement aléatoire. Lors de chaque étape d'un tel problème décisionnel, une alternative doit être sélectionnée parmi un ensemble d'alternatives. Chaque alternative possède un gain moyen qui lui est propre et lorsque l'une d'elles est sélectionnée, celle-ci engendre un gain aléatoire. La sélection opérée peut suivre deux types d'objectifs.
Dans un premier cas, les tests viseront à maximiser la somme des gains collectés. Un juste compromis doit alors être trouvé entre l'exploitation et l'exploration. Ce problème est couramment dénommé dans la littérature scientifique "multi-armed bandit problem".
Dans un second cas, un nombre de sélections maximal est imposé et l'objectif consistera à répartir ces sélections de façon à augmenter les chances de trouver l'alternative présentant le gain moyen le plus élevé. Ce deuxième problème est couramment repris dans la littérature scientifique sous l'appellation "selecting the best".
La sélection de type gloutonne joue un rôle important dans la résolution de ces problèmes de décision et opère en choisissant l'alternative qui s'est jusqu'ici montrée optimale. Or, la nature généralement aléatoire de l'environnement rend incertains les résultats d'une telle sélection.
Dans cette thèse, nous introduisons une nouvelle quantité, appelée le "gain espéré d'une action gloutonne". Sur base de quelques propriétés de cette quantité, de nouveaux algorithmes permettant de résoudre les deux problèmes décisionnels précités seront proposés.
Une attention particulière sera ici prêtée à l'application des techniques présentées au domaine de la sélection de modèles en l'apprentissage artificiel supervisé.
La collaboration avec le service d'anesthésie de l'Hôpital Erasme nous a permis d'appliquer les algorithmes proposés à des données réelles, provenant du milieu médical. Nous avons également développé un système d'aide à la décision dont un prototype a déjà été testé en conditions réelles sur un échantillon restreint de patients.
|
2 |
Multi-Armed Bandit Problems under Delayed FeedbackJoulani, Pooria Unknown Date
No description available.
|
3 |
Minimizing age of information for semi-periodic arrivals of multiple packetsChen, Mianlong 04 December 2019 (has links)
Age of information (AoI) captures the freshness of information and has been used broadly for scheduling data transmission in the Internet of Things (IoT). We consider a general scenario where a meaningful piece of information consists of multiple packets and the information would not be considered complete until all related packets have been correctly received. This general scenario, seemingly a trivial extension of exiting work where information update is in terms of single packet, is actually challenging in both scheduling algorithm design and theoretical analysis, because we need to track the history of received packets before a complete piece of information can be updated. We first analyse the necessary condition for optimal scheduling based on which we present an optimal scheduling method. The optimal solution, however, has high time complexity. To address the problem, we investigate the problem in the framework of restless multi-armed bandit (RMAB) and propose an index-based scheduling policy by applying Whittle index. We also propose a new transmission strategy based on erasure codes to improve the performance of scheduling policies in lossy networks. Performance evaluation results demonstrate that our solution outperforms other baseline policies such as greedy policy and naive Whittle index policy in both lossless and lossy networks. / Graduate
|
4 |
A Study of Thompson Sampling Approach for the Sleeping Multi-Armed Bandit ProblemChatterjee, Aritra January 2017 (has links) (PDF)
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision problems arising in modern applications including Internet display advertising, crowdsourcing, online procurement, smart grids, etc. Several variants of the MAB problem have been proposed to extend the basic model to a variety of practical and general settings. The sleeping multi-armed bandit (SMAB) problem is one such variant where the set of available arms varies with time. This study is focused on analyzing the efficacy of the Thompson Sampling algorithm for solving the SMAB problem.
Any algorithm for the classical MAB problem is expected to choose one of K available arms (actions) in each of T consecutive rounds. Each choice of an arm generates a stochastic reward from an unknown but fixed distribution. The goal of the algorithm is to maximize the expected sum of rewards over the T rounds (or equivalently minimize the expected total regret), relative to the best fixed action in hindsight. In many real-world settings, however, not all arms may be available in any given round. For example, in Internet display advertising, some advertisers might choose to stay away from the auction due to budget constraints; in crowdsourcing, some workers may not be available at a given time due to timezone difference, etc. Such situations give rise to the sleeping MAB abstraction.
In the literature, several upper confidence bound (UCB)-based approaches have been proposed and investigated for the SMAB problem. Our contribution is to investigate the efficacy of a Thomp-son Sampling-based approach. Our key finding is to establish a logarithmic regret bound, which non-trivially generalizes a similar bound known for this approach in the classical MAB setting. Our bound also matches (up to constants) the best-known lower bound for the SMAB problem. Furthermore, we show via detailed simulations, that the Thompson Sampling approach in fact outperforms the known algorithms for the SMAB problem.
|
5 |
Thompson sampling-based online decision making in network routingHuang, Zhiming 02 September 2020 (has links)
Online decision making is a kind of machine learning problems where decisions are made in a sequential manner so as to accumulate as many rewards as possible.
Typical examples include multi-armed bandit (MAB) problems where an agent needs to decide which arm to pull in each round, and network routing problems where each router needs to decide the next hop for each packet.
Thompson sampling (TS) is an efficient and effective algorithm for online decision making problems. Although TS has been proposed for a long time, it was not until recent years that the theoretical guarantees for TS in the standard MAB were given.
In this thesis, we first analyze the performance of TS both theoretically and practically in a special MAB called combinatorial MAB with sleeping arms and long-term fairness constraints (CSMAB-F). Then, we apply TS to a novel reactive network routing problem, called \emph{opportunistic routing without link metrics known a priori}, and use the proof techniques we developed for CSMAB-F to analyze the performance. / Graduate
|
6 |
An Empirical Evaluation of Context Aware Clustering of Bandits using Thompson SamplingCampolongo, Nicolò January 2017 (has links)
Stochastic bandit algorithms are increasingly being used in the domain of recommender systems, when the environment is very dynamic and the items to recommend are frequently changing over time. While traditional approaches consider a single bandit instance which assumes all users to be equal, recent developments in the literature showed that the quality of recommendations can be improved when individual bandit instances for different users are considered and clustering techniques are used. In this work we develop an algorithm which clusters users based on the context at disposal using a Bayesian framework called Thompson Sampling, opposed to a similar algorithm called CAB recently presented at ICML 2017 (International Conference on Machine Learning), which uses a deterministic strategy. We show extensively through experiments on synthetic and real world data that the perfor- mance of our algorithm is in line with its deterministic version CAB when it comes to the quality of recommendations. On the other hand, our approach is relatively faster when considering running time. / Stokastiska bandit-algoritmer används allt mer för rekommenderande system där omgivningen är väldigt dynamisk och de rekommenderade föremålen ständigt byts ut. Medan traditionella tillvägagångssätt använder en enkel bandit-instans som antar alla användare att vara likadana, har litteraturen under senare tid visat att kvaliteten av rekommendationerna kan förhöjas med användarspecifika bandit-instanser och med hjälp av klustringsalgoritmer. I detta projekt har en algoritm utvecklats som klustrar användare baserat på kontexten för disposition genom att använda ett Bayesianskt ramverk som kallas Thompson-sampling, till skillnad från den liknande algoritmen CAD som nyligen presenterades vid ICML 2017 (International Conference on Machine Learning), som använder en deterministisk strategi. Projektet visar med stor omfattning, med hjälp av experiment som använt både syntetiska och reella data, att den framtagna algoritmens genererade rekommendationer håller likvärdig kvalitet som den deterministiska varianten av CAB. Däremot ger vårt presenterade tillvägagångssätt högre prestanda avseende tidsåtgång.
|
7 |
Modelling Infertility with Markov ChainsDorff, Rebecca 20 June 2013 (has links) (PDF)
Infertility affects approximately 15% of couples. Testing and interventions are costly, in time, money, and emotional energy. This paper will discuss using Markov decision and multi-armed bandit processes to identify a systematic approach of interventions that will lead to the desired baby while minimizing costs.
|
8 |
Group-Envy Fairness in the Stochastic Bandit SettingScinocca, Stephen 29 September 2022 (has links)
We introduce a new, group fairness-inspired stochastic multi-armed bandit problem
in the pure exploration setting. We look at the discrepancy between an arm’s mean
reward from a group and the highest mean reward for any arm from that group, and
call this the disappointment that group suffers from that arm. We define the optimal
arm to be the one that minimizes the maximum disappointment over all groups. This
optimal arm addresses one problem with maximin fairness, where the group used to
choose the maximin best arm suffers little disappointment regardless of what arm is
picked, but another group suffers significantly more disappointment by picking that
arm as the best one. The challenge of this problem is that the highest mean reward
for a group and the arm that gives that reward are unknown. This means we need
to pull arms for multiple goals: to find the optimal arm, and to estimate the highest
mean reward of certain groups. This leads to the new adaptive sampling algorithm for
best arm identification in the fixed confidence setting called MD-LUCB, or Minimax
Disappointment LUCB. We prove bounds on MD-LUCB’s sample complexity and
then study its performance with empirical simulations. / Graduate
|
9 |
Multi-channel opportunistic access : a restless multi-armed bandit perspective / Accès opportuniste dans les systèmes de communication multi-canaux : une perspective du problème de bandit-manchotWang, Kehao 22 June 2012 (has links)
Dans cette thèse, nous abordons le problème fondamental de l'accès au spectre opportuniste dans un système de communication multi-canal. Plus précisément, nous considérons un système de communication dans lequel un utilisateur a accès à de multiples canaux, tout en étant limité à la détection et la transmission sur un sous-ensemble de canaux. Nous explorons comment l'utilisateur intelligent exploite ses observations passées et les propriétés stochastiques de ces canaux afin de maximiser son débit. Formellement, nous fournissons une analyse générique sur le problème d'accès au spectre opportuniste en nous basant sur le problème de `restless multi-bandit’ (RMAB), l'une des généralisations les plus connues du problème classique de multi-armed bandit (MAB), un problème fondamental dans la théorie de décision stochastique. Malgré les importants efforts de la communauté de recherche dans ce domaine, le problème RMAB dans sa forme générique reste encore ouvert. Jusqu'à aujourd'hui, très peu de résultats sont connus sur la structure de la politique optimale. L'obtention de la politique optimale pour un problème RMAB général est intraçable dû la complexité de calcul exponentiel. Par conséquent, une alternative naturelle est de se focaliser sur la politique myopique qui maximise la récompense à immédiate, tout en ignorant celles du futur. Donc, nous développons trois axiomes caractérisant une famille de fonctions que nous appelons fonctions régulières, qui sont génériques et pratiquement importantes. Nous établissons ensuite l'optimalité de la politique myopique lorsque la fonction de récompense peut être exprimée comme une fonction régulière et le facteur de discount est borné par un seuil déterminé par la fonction de récompense. Nous illustrons également l'application des résultats pour analyser une classe de problèmes RMAB dans l'accès opportuniste. Ensuite, nous étudions un problème plus difficile, où l'utilisateur doit configurer le nombre de canaux à accéder afin de maximiser son utilité (par exemple, le débit). Après avoir montré la complexité exponentielle du problème, nous développons une stratégie heuristique v-step look-ahead. Dans la stratégie développée, le paramètre v permet de parvenir à un compromis souhaité entre l'efficacité sociale et de la complexité de calcul. Nous démontrons les avantages de la stratégie proposée via des simulations numériques sur plusieurs scénarios typiques. / In the thesis, we address the fundamental problem of opportunistic spectrum access in a multi-channel communication system. Specifically, we consider a communication system in which a user has access to multiple channels, but is limited to sensing and transmitting only on one at a given time. We explore how the smart user should exploit past observations and the knowledge of the stochastic properties of these channels to maximize its transmission rate by switching channels opportunistically. Formally, we provide a generic analysis on the opportunistic spectrum access problem by casting the problem into the restless multi-armed bandit (RMAB) problem, one of the most well-known generalizations of the classic multi-armed bandit (MAB) problem, which is of fundamental importance in stochastic decision theory. Despite the significant research efforts in the field, the RMAB problem in its generic form still remains open. Until today, very little result is reported on the structure of the optimal policy. Obtaining the optimal policy for a general RMAB problem is often intractable due to the exponential computation complexity. Hence, a natural alternative is to seek a simple myopic policy maximizing the short-term reward. Therefore, we develop three axioms characterizing a family of functions which we refer to as regular functions, which are generic and practically important. We then establish the optimality of the myopic policy when the reward function can be expressed as a regular function and the discount factor is bounded by a closed-form threshold determined by the reward function. We also illustrate how the derived results, generic in nature, are applied to analyze a class of RMAB problems arising from multi-channel opportunistic access. Next, we further investigate the more challenging problem where the user has to decide the number of channels to sense in each slot in order to maximize its utility (e.g., throughput). After showing the exponential complexity of the problem, we develop a heuristic v-step look-ahead strategy. In the developed strategy, the parameter v allows to achieve a desired tradeoff between social efficiency and computation complexity. We demonstrate the benefits of the proposed strategy via numerical experiments on several typical settings.
|
10 |
Essays in Econometrics and Dynamic Kidney ExchangeBaisi Hadad, Vitor January 2018 (has links)
Thesis advisor: Stefan Hoderlein / This dissertation is divided into two parts. Part I - Dynamic Kidney Exchange In recent years, kidney paired donation (KPD) has an emerged as an attractive alternative for end-stage renal disease patients with incompatible living donors. However, we argue that the matching algorithm currently used by organ clearinghouses is inefficient, in the sense that a larger number of patients may be reached if kidney transplant centers take into consideration how their pool of patients and donors will evolve over time. In our work Two Novel Algorithms for Dynamic Kidney Exchange, we explore this claim and propose new computational algorithms to increase the cardinality of matchings in a discrete-time dynamic kidney exchange model with Poisson entries and Geometric deaths. Our algorithms are classified into direct prediction methods and multi-armed bandit methods. In the direct prediction method, we use machine learning estimator to produce a probability that each patient-donor pair should be matched today, as op- posed to being left for a future matching. The estimators are trained on offline optimal solutions. In contrast, in multi-armed bandit methods, we use simulations to evaluate the desirability of different matchings. Since the amount of different matchings is enormous, multi-armed bandits (MAB) are employed to decrease order to decrease the computational burden. Our methods are evaluated using simulations in a variety of simulation configurations. We find that the performance of at least one of our methods, based on multi-armed bandit algorithms, is able to uniformly dominate the myopic method that is used by kidney transplants in practice. We restrict our experiments to pairwise kidney exchange, but the methods described here are easily extensible, computational constraints permitting. Part II - Econometrics In our econometric paper Heterogenous Production Functions, Panel Data, and Productivity, we present methods for identification of moments and nonparametric marginal distributions of endogenous random coefficient models in fixed-T linear panel data models. Our identification strategy is constructive, immediately leading to relatively simple estimators that can be shown to be consistent and asymptotically normal. Because our strategy makes use of special properties of “small” (measure-zero) subpopulations, our estimators are irregularly identified: they can be shown to be consistent and asymptotically Normal, but converge at rates slower than root-n. We provide an illustration of our methods by estimating first and second moments of random Cobb-Douglas coefficients in production functions, using Indian plant-level microdata. / Thesis (PhD) — Boston College, 2018. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Economics.
|
Page generated in 0.1136 seconds