Return to search

Online and offline learning in operations

Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, September, 2020 / Cataloged from PDF version of thesis. / Includes bibliographical references (pages 213-219). / With the rapid advancement of information technology and accelerated development of data science, the importance of integrating data into decision-making has never been stronger. In this thesis, we propose data-driven algorithms to incorporate learning from data in three operations problems, concerning both online learning and offline learning settings. First, we study a single product pricing problem with demand censoring in an offline data-driven setting. In this problem, a retailer is given a finite level of inventory, and faces a random demand that is price sensitive in a linear fashion with unknown parameters and distribution. Any unsatisfied demand is lost and unobservable. The retailer's objective is to use offline censored demand data to find an optimal price, maximizing her expected revenue with finite inventories. / We characterize an exact condition for the identifiability of near-optimal algorithms, and propose a data-driven algorithm that guarantees near-optimality in the identifiable case and approaches best-achievable optimality gap in the unidentifiable case. Next, we study the classic multi-period joint pricing and inventory control problem in an offline data-driven setting. We assume the demand functions and noise distributions are unknown, and propose a data-driven approximation algorithm, which uses offline demand data to solve the joint pricing and inventory control problem. We establish a polynomial sample complexity bound, the number of data samples needed to guarantee a near-optimal profit. A simulation study suggests that the data-driven algorithm solves the dynamic program effectively. Finally, we study an online learning problem for product selection in urban warehouses managed by fast-delivery retailers. We distill the problem into a semi-bandit model with linear generalization. / There are n products, each with a feature vector of dimension T. In each of the T periods, a retailer selects K products to offer, where T is much greater than T or b. We propose an online learning algorithm that iteratively shrinks the upper confidence bounds within each period. Compared to the standard UCB algorithm, we prove the new algorithm reduces the most dominant regret term by a factor of d, and experiments on datasets from Alibaba Group suggest it lowers the total regret by at least 10%.. / by Li Wang. / Ph. D. / Ph.D. Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center

Identiferoai:union.ndltd.org:MIT/oai:dspace.mit.edu:1721.1/129080
Date January 2020
CreatorsWang, Li,Ph D.Massachusetts Institute of Technology.
ContributorsDavid Simchi-Levi., Massachusetts Institute of Technology. Operations Research Center., Massachusetts Institute of Technology. Operations Research Center
PublisherMassachusetts Institute of Technology
Source SetsM.I.T. Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format219 pages, application/pdf
RightsMIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided., http://dspace.mit.edu/handle/1721.1/7582

Page generated in 0.0024 seconds