Global ETD Search

1	Online Learning for Resource Allocation in Wireless Networks: Fairness, Communication Efficiency, and Data Privacy Li, Fengjiao 13 December 2022 (has links) As the Next-Generation (NextG, 5G and beyond) wireless network supports a wider range of services, optimization of resource allocation plays a crucial role in ensuring efficient use of the (limited) available network resources. Note that resource allocation may require knowledge of network parameters (e.g., channel state information and available power level) for package schedule. However, wireless networks operate in an uncertain environment where, in many practical scenarios, these parameters are unknown before decisions are made. In the absence of network parameters, a network controller, who performs resource allocation, may have to make decisions (aimed at optimizing network performance and satisfying users' QoS requirements) while emph{learning}. To that end, this dissertation studies two novel online learning problems that are motivated by autonomous resource management in NextG. Key contributions of the dissertation are two-fold. First, we study reward maximization under uncertainty with fairness constraints, which is motivated by wireless scheduling with Quality of Service constraints (e.g., minimum delivery ratio requirement) under uncertainty. We formulate a framework of combinatorial bandits with fairness constraints and develop a fair learning algorithm that successfully addresses the tradeoff between reward maximization and fairness constraints. This framework can also be applied to several other real-world applications, such as online advertising and crowdsourcing. Second, we consider global reward maximization under uncertainty with distributed biased feedback, which is motivated by the problem of cellular network configuration for optimizing network-level performance (e.g., average user-perceived Quality of Experience). We study both the linear-parameterized and non-parametric global reward functions, which are modeled as distributed linear bandits and kernelized bandits, respectively. For each model, we propose a learning algorithmic framework that can be integrated with different differential privacy models. We show that the proposed algorithms can achieve a near-optimal regret in a communication-efficient manner while protecting users' data privacy ``for free''. Our findings reveal that our developed algorithms outperform the state-of-the-art solutions in terms of the tradeoff among the regret, communication efficiency, and computation complexity. In addition, our proposed models and online learning algorithms can also be applied to several other real-world applications, e.g., dynamic pricing and public policy making, which may be of independent interest to a broader research community. / Doctor of Philosophy / As the Next-Generation (NextG) wireless network supports a wider range of services, optimization of resource allocation plays a crucial role in ensuring efficient use of the (limited) available network resources. Note that resource allocation may require knowledge of network parameters (e.g., channel state information and available power level) for package schedule. However, wireless networks operate in an uncertain environment where, in many practical scenarios, these parameters are unknown before decisions are made. In the absence of network parameters, a network controller, who performs resource allocation, may have to make decisions (aimed at optimizing network performance and satisfying users' QoS requirements) while emph{learning}. To that end, this dissertation studies two novel online learning problems that are motivated by resource allocation in the presence uncertainty in NextG. Key contributions of the dissertation are two-fold. First, we study reward maximization under uncertainty with fairness constraints, which is motivated by wireless scheduling with Quality of Service constraints (e.g., minimum delivery ratio requirement) under uncertainty. We formulate a framework of combinatorial bandits with fairness constraints and develop a fair learning algorithm that successfully addresses the tradeoff between reward maximization and fairness constraints. This framework can also be applied to several other real-world applications, such as online advertising and crowdsourcing. Second, we consider global reward maximization under uncertainty with distributed biased feedback, which is motivated by the problem of cellular network configuration for optimizing network-level performance (e.g., average user-perceived Quality of Experience). We consider both the linear-parameterized and non-parametric (unknown) global reward functions, which are modeled as distributed linear bandits and kernelized bandits, respectively. For each model, we propose a learning algorithmic framework that integrate different privacy models according to different privacy requirements or different scenarios. We show that the proposed algorithms can learn the unknown functions in a communication-efficient manner while protecting users' data privacy ``for free''. Our findings reveal that our developed algorithms outperform the state-of-the-art solutions in terms of the tradeoff among the regret, communication efficiency, and computation complexity. In addition, our proposed models and online learning algorithms can also be applied to several other real-world applications, e.g., dynamic pricing and public policy making, which may be of independent interest to a broader research community. resource allocation online bandit learning fairness communication efficiency data privacy
2	Navigating Uncertainty: Distributed and Bandit Solutions for Equilibrium Learning in Multiplayer Games Yuanhanqing Huang (18361527) 15 April 2024 (has links) <p dir="ltr">In multiplayer games, a collection of self-interested players aims to optimize their individual cost functions in a non-cooperative manner. The cost function of each player depends not only on its own actions but also on the actions of others. In addition, players' actions may also collectively satisfy some global constraints. The study of this problem has grown immensely in the past decades with applications arising in a wide range of societal systems, including strategic behaviors in power markets, traffic assignment of strategic risk-averse users, engagement of multiple humanitarian organizations in disaster relief, etc. Furthermore, with machine learning models playing an increasingly important role in practical applications, the robustness of these models becomes another prominent concern. Investigation into the solutions of multiplayer games and Nash equilibrium problems (NEPs) can advance the algorithm design for fitting these models in the presence of adversarial noises. </p><p dir="ltr">Most of the existing methods for solving multiplayer games assume the presence of a central coordinator, which, unfortunately, is not practical in many scenarios. Moreover, in addition to couplings in the objectives and the global constraints, all too often, the objective functions contain uncertainty in the form of stochastic noises and unknown model parameters. The problem is further complicated by the following considerations: the individual objectives of players may be unavailable or too complex to model; players may exhibit reluctance to disclose their actions; players may experience random delays when receiving feedback regarding their actions. To contend with these issues and uncertainties, in the first half of the thesis, we develop several algorithms based on the theory of operator splitting and stochastic approximation, where the game participants only share their local information and decisions with their trusted neighbors on the network. In the second half of the thesis, we explore the bandit online learning framework as a solution to the challenges, where decisions made by players are updated based solely on the realized objective function values. Our future work will delve into data-driven approaches for learning in multiplayer games and we will explore functional representations of players' decisions, in a departure from the vector form. </p> Control engineering Optimisation Decision theory Stochastic optimization algorithm Multi Agent System (MAS) bandit learning Distributed Algorithm
3	On the Value of Prediction and Feedback for Online Decision Making With Switching Costs Ming Shi (12621637) 01 June 2022 (has links) <p>Online decision making with switching costs has received considerable attention in many practical problems that face uncertainty in the inputs and key problem parameters. Because of the switching costs that penalize the change of decisions, making good online decisions under such uncertainty is known to be extremely challenging. This thesis aims at providing new online algorithms with strong performance guarantees to address this challenge.</p> <p><br></p> <p>In part 1 and part 2 of this thesis, motivated by Network Functions Virtualization and smart grid, we study competitive online convex optimization with switching costs. Specifically, in part 1, we focus on the setting with an uncertainty set (one type of prediction) and hard infeasibility constraints. We develop new online algorithms that can attain optimized competitive ratios, while ensuring feasibility at all times. Moreover, we design a robustification procedure that helps these algorithms obtain good average-case performance simultaneously. In part 2, we focus on the setting with look-ahead (another type of prediction). We provide the first algorithm that attains a competitive ratio that not only decreases to 1 as the look-ahead window size increases, but also remains upper-bounded for any ratio between the switching-cost coefficient and service-cost coefficient.</p> <p><br></p> <p>In part 3 of this thesis, motivated by edge computing with artificial intelligence, we study bandit learning with switching costs where, in addition to bandit feedback, full feedback can be requested at a cost. We show that, when only 1 arm can be chosen at a time, adding costly full-feedback is not helpful in fundamentally reducing the Θ(<em>T</em>2/3) regret over a time-horizon <em>T</em>. In contrast, when 2 (or more) arms can be chosen at a time, we provide a new online learning algorithm that achieves a significantly smaller regret equal to <em>O</em>(√<em>T</em>), without even using full feedback. To the best of our knowledge, this type of sharp transition from choosing 1 arm to choosing 2 (or more) arms has never been reported in the literature.</p> Network engineering bandit learning switching costs regret analysis online convex optimization competitive analysis look-ahead regularization competitive online algorithms Network Functions Virtualization ramp constraints robustification

1

Page generated in 0.0855 seconds