We study dynamic decision-making problems in networks and markets under uncertainty about future payoffs. This problem is difficult in general since 1) Although the current decision (potentially) affects future decisions, the decision-maker does not have exact information on the future payoffs when he/she commits to the current decision; 2) The decision made at one part of the network usually interacts with the decisions made at the other parts of the network, which makes the computation scales very fast with the network size and brings computational challenges in practice. In this thesis, we propose computationally efficient methods to solve dynamic optimization problems on markets and networks, specify a general set of conditions under which the proposed methods give theoretical guarantees on global near-optimality, and further provide numerical studies to verify the performance empirically. The proposed methods/algorithms have a general theme as “local algorithms”, meaning that the decision at each node/agent on the network uses only partial information on the network.
In the first part of this thesis, we consider a network model with stochastic uncertainty about future payoffs. The network has a bounded degree, and each node takes a discrete decision at each period, leading to a per-period payoff which is a sum of three parts: node rewards for individual node decisions, temporal interactions between individual node decisions from the current and previous periods, and spatial interactions between decisions from pairs of neighboring nodes. The objective is to maximize the expected total payoffs over a finite horizon. We study a natural decentralized algorithm (whose computational requirement is linear in the network size and planning horizon) and prove that our decentralized algorithm achieves global near-optimality when temporal and spatial interactions are not dominant compared to the randomness in node rewards. Decentralized algorithms are parameterized by the locality parameter L: An L-local algorithm makes its decision at each node v based on current and (simulated) future payoffs only up to L periods ahead, and only in an L-radius neighborhood around v. Given any permitted error ε > 0, we show that our proposed L-local algorithm with L = O(log(1/ε)) has an average per-node-per- period optimality gap bounded above by ε, in networks where temporal and spatial interactions are not dominant. This constitutes the first theoretical result establishing the global near-optimality of a local algorithm for network dynamic optimization.
In the second part of this thesis, we consider the previous three types of payoff functions under adversarial uncertainty about the future. In general, there are no performance guarantees for arbitrary payoff functions. We consider an additional convexity structure in the individual node payoffs and interaction functions, which helps us leverage the tools in the broad Online Convex Optimization literature. In this work, we study the setting where there is a trade-off between developing future predictions for a longer lookahead horizon, denoted as k versus increasing spatial radius for decentralized computation, denoted as r. When deciding individual node decisions at each time, each node has access to predictions of local cost functions for the next k time steps in an r-hop neighborhood. Our work proposes a novel online algorithm, Localized Predictive Control (LPC), which generalizes predictive control to multi-agent systems. We show that LPC achieves a competitive ratio approaching to 1 exponentially fast in ρT and ρS in an adversarial setting, where ρT and ρS are constants in (0, 1) that increase with the relative strength of temporal and spatial interaction costs, respectively. This is the first competitive ratio bound on decentralized predictive control for networked online convex optimization. Further, we show that the dependence on k and r in our results is near-optimal by lower bounding the competitive ratio of any decentralized online algorithm.
In the third part of this work, we consider a general dynamic matching model for online competitive gaming platforms. Players arrive stochastically with a skill attribute, the Elo rating. The distribution of Elo is known and i.i.d across players. However, the individual’s rating is only observed upon arrival. Matching two players with different skills incurs a match cost. The goal is tominimize a weighted combination of waiting costs and matching costs in the system. We investigate a popular heuristic used in industry to trade-off between these two costs, the Bubble algorithm. The algorithm places arriving players on the Elo line with a growing bubble around them. When two bubbles touch, the two players get matched. We show that, with the optimal bubble expansion rate, the Bubble algorithm achieves a constant factor ratio against the offline optimal cost when the match cost (resp. waiting cost) is a power of Elo difference (resp. waiting time). We use players’ activity logs data from a gaming start-up to validate our approach and further provide guidance on how to tune the Bubble expansion rate in practice.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/6eja-wd96 |
Date | January 2023 |
Creators | Gan, Yuanling |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0026 seconds