Global ETD Search

1	Sélection Séquentielle en Environnement Aléatoire Appliquée à l'Apprentissage Supervisé Caelen, Olivier 25 September 2009 (has links) Cette thèse se penche sur les problèmes de décisions devant être prises de manière séquentielle au sein d'un environnement aléatoire. Lors de chaque étape d'un tel problème décisionnel, une alternative doit être sélectionnée parmi un ensemble d'alternatives. Chaque alternative possède un gain moyen qui lui est propre et lorsque l'une d'elles est sélectionnée, celle-ci engendre un gain aléatoire. La sélection opérée peut suivre deux types d'objectifs. Dans un premier cas, les tests viseront à maximiser la somme des gains collectés. Un juste compromis doit alors être trouvé entre l'exploitation et l'exploration. Ce problème est couramment dénommé dans la littérature scientifique "multi-armed bandit problem". Dans un second cas, un nombre de sélections maximal est imposé et l'objectif consistera à répartir ces sélections de façon à augmenter les chances de trouver l'alternative présentant le gain moyen le plus élevé. Ce deuxième problème est couramment repris dans la littérature scientifique sous l'appellation "selecting the best". La sélection de type gloutonne joue un rôle important dans la résolution de ces problèmes de décision et opère en choisissant l'alternative qui s'est jusqu'ici montrée optimale. Or, la nature généralement aléatoire de l'environnement rend incertains les résultats d'une telle sélection. Dans cette thèse, nous introduisons une nouvelle quantité, appelée le "gain espéré d'une action gloutonne". Sur base de quelques propriétés de cette quantité, de nouveaux algorithmes permettant de résoudre les deux problèmes décisionnels précités seront proposés. Une attention particulière sera ici prêtée à l'application des techniques présentées au domaine de la sélection de modèles en l'apprentissage artificiel supervisé. La collaboration avec le service d'anesthésie de l'Hôpital Erasme nous a permis d'appliquer les algorithmes proposés à des données réelles, provenant du milieu médical. Nous avons également développé un système d'aide à la décision dont un prototype a déjà été testé en conditions réelles sur un échantillon restreint de patients. multi-armed bandit problem Selecting the Best greedy
2	Minimizing age of information for semi-periodic arrivals of multiple packets Chen, Mianlong 04 December 2019 (has links) Age of information (AoI) captures the freshness of information and has been used broadly for scheduling data transmission in the Internet of Things (IoT). We consider a general scenario where a meaningful piece of information consists of multiple packets and the information would not be considered complete until all related packets have been correctly received. This general scenario, seemingly a trivial extension of exiting work where information update is in terms of single packet, is actually challenging in both scheduling algorithm design and theoretical analysis, because we need to track the history of received packets before a complete piece of information can be updated. We first analyse the necessary condition for optimal scheduling based on which we present an optimal scheduling method. The optimal solution, however, has high time complexity. To address the problem, we investigate the problem in the framework of restless multi-armed bandit (RMAB) and propose an index-based scheduling policy by applying Whittle index. We also propose a new transmission strategy based on erasure codes to improve the performance of scheduling policies in lossy networks. Performance evaluation results demonstrate that our solution outperforms other baseline policies such as greedy policy and naive Whittle index policy in both lossless and lossy networks. / Graduate age of information restless multi-armed bandit problem
3	REVENUE AND RETURN MANAGEMENT IN E-COMMERCE Shahmardan, Amin January 2025 (has links) In this dissertation, we explore the intersection of return management and dynamic pricing strategies in online retailing. The dissertation consists of five chapters. In Chapter 1, we present the research motivations and provide an overview of the problems studied. Chapter 2 investigates the \Returnless Refund" policy, a novel return strategy in which retailers offer a full refund without requiring customers to return the product. We show that the optimal returnless refund policy is in the form of a threshold policy, offering a returnless refund when the salvage value of the returned product is below a positive threshold. This method allows retailers to decide between granting a refund and reselling the product efficiently. It is also shown that, for items with a high expected salvage value, this threshold-based policy is advantageous for both retailers and customers. We also show that in the early stages of policy implementation, when customers are unaware of returnless refunds, a naive policy is optimal, but the threshold rises as customer awareness increases. Furthermore, our findings show that dishonest customers, who may fake request a return to exploit this policy, can enhance the retailer profits when the price exceeds a certain level. In Chapter 3, we study a conservative dynamic pricing problem with demand learning in the presence of covariates, where the demand function follows a generalized linear model. We address managers’ concerns about transitioning from a legacy pricing system to a learning-based approach, focusing on risks of revenue loss. We propose two dynamic pricing models. The first, a stage-wise safe model, ensures that the instantaneous expected revenue from algorithmic pricing matches or exceeds a fraction of the baseline policy’s revenue in each period. Using a modified UCB algorithm, we show that the regret of this model is composed of two parts: the regret from the learning process and the regret from applying perturbed baseline prices. The second, a cumulative revenue safe, model extends this by ensuring the algorithm’s cumulative revenue meets a target compared to the baseline. Our analysis shows that the algorithm uses the baseline prices a finite number of times, even when the expected revenue of the baseline prices must be learned, offering a balance between exploration and revenue safety constraints. Chapter 4 addresses a dynamic pricing problem where customers can return products within a specified grace window, and purchasing and returning probabilities are unknown. We propose two approaches: in the first approach, the retailer learns the probabilities separately, leading to a higher regret due to censored data from return decisions. The second approach focuses on joint learning, where the final demand \|calculated as the product of purchasing and keeping probabilities \|is learned directly, resulting in lower overall regret. For both approaches, we extend the analysis to scenarios where return delays are dominated by a Pareto distribution. Finally, Chapter 5 summarizes the contributions and suggests directions for future research. / Dissertation / Doctor of Philosophy (PhD) Online Retail Return Management Demand Learning Multi-Armed Bandit Problem
4	Sélection séquentielle en environnement aléatoire appliquée à l'apprentissage supervisé Caelen, Olivier 25 September 2009 (has links) Cette thèse se penche sur les problèmes de décisions devant être prises de manière séquentielle au sein d'un environnement aléatoire. Lors de chaque étape d'un tel problème décisionnel, une alternative doit être sélectionnée parmi un ensemble d'alternatives. Chaque alternative possède un gain moyen qui lui est propre et lorsque l'une d'elles est sélectionnée, celle-ci engendre un gain aléatoire. La sélection opérée peut suivre deux types d'objectifs.<p>Dans un premier cas, les tests viseront à maximiser la somme des gains collectés. Un juste compromis doit alors être trouvé entre l'exploitation et l'exploration. Ce problème est couramment dénommé dans la littérature scientifique "multi-armed bandit problem".<p>Dans un second cas, un nombre de sélections maximal est imposé et l'objectif consistera à répartir ces sélections de façon à augmenter les chances de trouver l'alternative présentant le gain moyen le plus élevé. Ce deuxième problème est couramment repris dans la littérature scientifique sous l'appellation "selecting the best".<p>La sélection de type gloutonne joue un rôle important dans la résolution de ces problèmes de décision et opère en choisissant l'alternative qui s'est jusqu'ici montrée optimale. Or, la nature généralement aléatoire de l'environnement rend incertains les résultats d'une telle sélection. <p>Dans cette thèse, nous introduisons une nouvelle quantité, appelée le "gain espéré d'une action gloutonne". Sur base de quelques propriétés de cette quantité, de nouveaux algorithmes permettant de résoudre les deux problèmes décisionnels précités seront proposés.<p>Une attention particulière sera ici prêtée à l'application des techniques présentées au domaine de la sélection de modèles en l'apprentissage artificiel supervisé. <p>La collaboration avec le service d'anesthésie de l'Hôpital Erasme nous a permis d'appliquer les algorithmes proposés à des données réelles, provenant du milieu médical. Nous avons également développé un système d'aide à la décision dont un prototype a déjà été testé en conditions réelles sur un échantillon restreint de patients. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Sciences exactes et naturelles Decision making -- Data processing Supervised learning (Machine learning) Prise de décision -- Informatique multi-armed bandit problem Selecting the Best greedy
5	A Study of Thompson Sampling Approach for the Sleeping Multi-Armed Bandit Problem Chatterjee, Aritra January 2017 (has links) (PDF) The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision problems arising in modern applications including Internet display advertising, crowdsourcing, online procurement, smart grids, etc. Several variants of the MAB problem have been proposed to extend the basic model to a variety of practical and general settings. The sleeping multi-armed bandit (SMAB) problem is one such variant where the set of available arms varies with time. This study is focused on analyzing the efficacy of the Thompson Sampling algorithm for solving the SMAB problem. Any algorithm for the classical MAB problem is expected to choose one of K available arms (actions) in each of T consecutive rounds. Each choice of an arm generates a stochastic reward from an unknown but fixed distribution. The goal of the algorithm is to maximize the expected sum of rewards over the T rounds (or equivalently minimize the expected total regret), relative to the best fixed action in hindsight. In many real-world settings, however, not all arms may be available in any given round. For example, in Internet display advertising, some advertisers might choose to stay away from the auction due to budget constraints; in crowdsourcing, some workers may not be available at a given time due to timezone difference, etc. Such situations give rise to the sleeping MAB abstraction. In the literature, several upper confidence bound (UCB)-based approaches have been proposed and investigated for the SMAB problem. Our contribution is to investigate the efficacy of a Thomp-son Sampling-based approach. Our key finding is to establish a logarithmic regret bound, which non-trivially generalizes a similar bound known for this approach in the classical MAB setting. Our bound also matches (up to constants) the best-known lower bound for the SMAB problem. Furthermore, we show via detailed simulations, that the Thompson Sampling approach in fact outperforms the known algorithms for the SMAB problem. Thompson Sampling Multi-Armed Bandit Problem Upper Confidence Bound (UCB) Awake Upper Estimated Reward Multi-Armed Bandit Algorithms Sleeping Multi-Armed Bandit Model TS-SMAB Computer Science

1

Page generated in 0.0537 seconds