Spelling suggestions: "subject:"contextual bandit"" "subject:"nontextual bandit""
1 |
Machine learning Based Energy Efficient Bandwidth Optimization / Maskininlärning Baserad energieffektiv bandbredd optimeringMao, Jingxuan January 2024 (has links)
Long Term Evolution (LTE) is fourth generation broadband wireless access technology. As LTE networks continue to expand, their energy consumption has become a concern due to increased costs and environmental impact. To address this issue, we present a novel approach in this thesis that focuses on energy-efficient LTE network operation by optimizing cell bandwidth deployment. Only historical data extracted from a real-world LTE network is provided by Ericsson in the thesis, so the model has to learn the pattern by exploiting the data. The model formulates the task as an offline contextual bandits problem and implements offline learning and evaluation algorithms. In addition, the model incorporates two agents: an energy-driven agent and a teacher agent within the bandwidth recommendation engine. This enables energy savings while ensuring uninterrupted network service. Users have the flexibility to customize the desired percentage of energy savings by adjusting a hyper-parameter that determines the relative weight of the two agents in the final bandwidth decision. Through comprehensive offline evaluation, the model is assessed and demonstrates its ability to conserve energy without compromising network performance. / Long Term Evolution (LTE) är en fjärde generationens bredbandslösning för trådlös access. Medan LTE-nätverk fortsätter att expandera har energiförbrukningen blivit en oro på grund av ökade kostnader och miljöpåverkan. För att hantera denna fråga presenterar vi i denna avhandling en innovativ metod som fokuserar på energieffektiv drift av LTE-nätverk genom att optimera tilldelningen av cellbandbredd. Endast historiska data som utvunnits från ett verkligt LTE-nätverk tillhandahålls av Ericsson i avhandlingen, vilket innebär att modellen måste lära sig mönster genom att utnyttja dessa data. Modellen formulerar uppgiften som ett offline-contextual bandits-problem och implementerar algoritmer för offline-inlärning och utvärdering. Dessutom inkluderar modellen två agenter: en energistyrd agent och en läraragent i bandbreddrekommendationssystemet. Detta möjliggör energibesparingar samtidigt som kontinuerlig nätverkstjänst garanteras. Användare har flexibiliteten att anpassa den önskade procentandelen av energibesparingar genom att justera en hyperparameter som bestämmer den relativa vikten av de två agenterna vid beslutsfattandet om slutlig bandbredd. Genom omfattande utvärdering offline bedöms modellen och dess förmåga att spara energi utan att äventyra nätverksprestanda bekräftas.
|
2 |
Tutoring Students with Adaptive StrategiesWan, Hao 18 January 2017 (has links)
Adaptive learning is a crucial part in intelligent tutoring systems. It provides students with appropriate tutoring interventions, based on students’ characteristics, status, and other related features, in order to optimize their learning outcomes. It is required to determine students’ knowledge level or learning progress, based on which it then uses proper techniques to choose the optimal interventions. In this dissertation work, I focus on these aspects related to the process in adaptive learning: student modeling, k-armed bandits, and contextual bandits. Student modeling. The main objective of student modeling is to develop cognitive models of students, including modeling content skills and knowledge about learning. In this work, we investigate the effect of prerequisite skill in predicting students’ knowledge in post skills, and we make use of the prerequisite performance in different student models. As a result, this makes them superior to traditional models. K-armed bandits. We apply k-armed bandit algorithms to personalize interventions for students, to optimize their learning outcomes. Due to the lack of diverse interventions and small difference of intervention effectiveness in educational experiments, we also propose a simple selection strategy, and compare it with several k-armed bandit algorithms. Contextual bandits. In contextual bandit problem, additional side information, also called context, can be used to determine which action to select. First, we construct a feature evaluation mechanism, which determines which feature to be combined with bandits. Second, we propose a new decision tree algorithm, which is capable of detecting aptitude treatment effect for students. Third, with combined bandits with the decision tree, we apply the contextual bandits to make personalization in two different types of data, simulated data and real experimental data.
|
3 |
New Paradigms and Optimality Guarantees in Statistical Learning and EstimationWang, Yu-Xiang 01 December 2017 (has links)
Machine learning (ML) has become one of the most powerful classes of tools for artificial intelligence, personalized web services and data science problems across fields. Within the field of machine learning itself, there had been quite a number of paradigm shifts caused by the explosion of data size, computing power, modeling tools, and the new ways people collect, share, and make use of data sets. Data privacy, for instance, was much less of a problem before the availability of personal information online that could be used to identify users in anonymized data sets. Images, videos, as well as observations generated over a social networks, often have highly localized structures, that cannot be captured by standard nonparametric models. Moreover, the “common task framework” that is adopted by many sub- disciplines of AI has made it possible for many people to collaboratively and repeated work on the same data set, leading to implicit overfitting on public benchmarks. In addition, data collected in many internet services, e.g., web search and targeted ads, are not iid, but rather feedbacks specific to the deployed algorithm. This thesis presents technical contributions under a number of new mathematical frameworks that are designed to partially address these new paradigms. • Firstly, we consider the problem of statistical learning with privacy constraints. Under Vapnik’s general learning setting and the formalism of differential privacy (DP), we establish simple conditions that characterizes the private learnability, which reveals a mixture of positive and negative insight. We then identify generic methods that reuses existing randomness to effectively solve private learning in practice; and discuss weaker notions of privacy that allows for more favorable privacy-utility tradeoff. • Secondly, we develop a few generalizations of trend filtering, a locally-adaptive nonparametric regression technique that is minimax in 1D, to the multivariate setting and to graphs. We also study specific instances of the problems, e.g., total variation denoising on d-dimensional grids more closely and the results reveal interesting statistical computational trade-offs. • Thirdly, we investigate two problems in sequential interactive learning: a) off- policy evaluation in contextual bandits, that aims to use data collected from one algorithm to evaluate the performance of a different algorithm; b) the problem of adaptive data analysis, that uses randomization to prevent adversarial data analysts from a form of “p-hacking” through multiple steps of sequential data access. In the above problems, we will provide not only performance guarantees of algorithms but also certain notions of optimality. Whenever applicable, careful empirical studies on synthetic and real data are also included.
|
4 |
Beyond Disagreement-based Learning for Contextual BanditsPinaki Ranjan Mohanty (16522407) 26 July 2023 (has links)
<p>While instance-dependent contextual bandits have been previously studied, their analysis<br>
has been exclusively limited to pure disagreement-based learning. This approach lacks a<br>
nuanced understanding of disagreement and treats it in a binary and absolute manner.<br>
In our work, we aim to broaden the analysis of instance-dependent contextual bandits by<br>
studying them under the framework of disagreement-based learning in sub-regions. This<br>
framework allows for a more comprehensive examination of disagreement by considering its<br>
varying degrees across different sub-regions.<br>
To lay the foundation for our analysis, we introduce key ideas and measures widely<br>
studied in the contextual bandit and disagreement-based active learning literature. We<br>
then propose a novel, instance-dependent contextual bandit algorithm for the realizable<br>
case in a transductive setting. Leveraging the ability to observe contexts in advance, our<br>
algorithm employs a sophisticated Linear Programming subroutine to identify and exploit<br>
sub-regions effectively. Next, we provide a series of results tying previously introduced<br>
complexity measures and offer some insightful discussion on them. Finally, we enhance the<br>
existing regret bounds for contextual bandits by integrating the sub-region disagreement<br>
coefficient, thereby showcasing significant improvement in performance against the pure<br>
disagreement-based approach.<br>
In the concluding section of this thesis, we do a brief recap of the work done and suggest<br>
potential future directions for further improving contextual bandit algorithms within the<br>
framework of disagreement-based learning in sub-regions. These directions offer opportuni-<br>
ties for further research and development, aiming to refine and enhance the effectiveness of<br>
contextual bandit algorithms in practical applications.<br>
<br>
</p>
|
5 |
Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits / Contributions aux bandits manchots : gestion du risque et sous-échantillonnage pour les bandits contextuels linéairesGalichet, Nicolas 28 September 2015 (has links)
Cette thèse s'inscrit dans le domaine de la prise de décision séquentielle en environnement inconnu, et plus particulièrement dans le cadre des bandits manchots (multi-armed bandits, MAB), défini par Robbins et Lai dans les années 50. Depuis les années 2000, ce cadre a fait l'objet de nombreuses recherches théoriques et algorithmiques centrées sur le compromis entre l'exploration et l'exploitation : L'exploitation consiste à répéter le plus souvent possible les choix qui se sont avérés les meilleurs jusqu'à présent. L'exploration consiste à essayer des choix qui ont rarement été essayés, pour vérifier qu'on a bien identifié les meilleurs choix. Les applications des approches MAB vont du choix des traitements médicaux à la recommandation dans le contexte du commerce électronique, en passant par la recherche de politiques optimales de l'énergie. Les contributions présentées dans ce manuscrit s'intéressent au compromis exploration vs exploitation sous deux angles spécifiques. Le premier concerne la prise en compte du risque. Toute exploration dans un contexte inconnu peut en effet aboutir à des conséquences indésirables ; par exemple l'exploration des comportements d'un robot peut aboutir à des dommages pour le robot ou pour son environnement. Dans ce contexte, l'objectif est d'obtenir un compromis entre exploration, exploitation, et prise de risque (EER). Plusieurs algorithmes originaux sont proposés dans le cadre du compromis EER. Sous des hypothèses fortes, l'algorithme MIN offre des garanties de regret logarithmique, à l'état de l'art ; il offre également une grande robustesse, contrastant avec la forte sensibilité aux valeurs des hyper-paramètres de e.g. (Auer et al. 2002). L'algorithme MARAB s'intéresse à un critère inspiré de la littérature économique(Conditional Value at Risk), et montre d'excellentes performances empiriques comparées à (Sani et al. 2012), mais sans garanties théoriques. Enfin, l'algorithme MARABOUT modifie l'estimation du critère CVaR pour obtenir des garanties théoriques, tout en obtenant un bon comportement empirique. Le second axe de recherche concerne le bandit contextuel, où l'on dispose d'informations additionnelles relatives au contexte de la décision ; par exemple, les variables d'état du patient dans un contexte médical ou de l'utilisateur dans un contexte de recommandation. L'étude se focalise sur le choix entre bras qu'on a tirés précédemment un nombre de fois différent. Le choix repose en général sur la notion d'optimisme, comparant les bornes supérieures des intervalles de confiance associés aux bras considérés. Une autre approche appelée BESA, reposant sur le sous-échantillonnage des valeurs tirées pour les bras les plus visités, et permettant ainsi de se ramener au cas où tous les bras ont été tirés un même nombre de fois, a été proposée par (Baransi et al. 2014). / This thesis focuses on sequential decision making in unknown environment, and more particularly on the Multi-Armed Bandit (MAB) setting, defined by Lai and Robbins in the 50s. During the last decade, many theoretical and algorithmic studies have been aimed at cthe exploration vs exploitation tradeoff at the core of MABs, where Exploitation is biased toward the best options visited so far while Exploration is biased toward options rarely visited, to enforce the discovery of the the true best choices. MAB applications range from medicine (the elicitation of the best prescriptions) to e-commerce (recommendations, advertisements) and optimal policies (e.g., in the energy domain). The contributions presented in this dissertation tackle the exploration vs exploitation dilemma under two angles. The first contribution is centered on risk avoidance. Exploration in unknown environments often has adverse effects: for instance exploratory trajectories of a robot can entail physical damages for the robot or its environment. We thus define the exploration vs exploitation vs safety (EES) tradeoff, and propose three new algorithms addressing the EES dilemma. Firstly and under strong assumptions, the MIN algorithm provides a robust behavior with guarantees of logarithmic regret, matching the state of the art with a high robustness w.r.t. hyper-parameter setting (as opposed to, e.g. UCB (Auer 2002)). Secondly, the MARAB algorithm aims at optimizing the cumulative 'Conditional Value at Risk' (CVar) rewards, originated from the economics domain, with excellent empirical performances compared to (Sani et al. 2012), though without any theoretical guarantees. Finally, the MARABOUT algorithm modifies the CVar estimation and yields both theoretical guarantees and a good empirical behavior. The second contribution concerns the contextual bandit setting, where additional informations are provided to support the decision making, such as the user details in the ontent recommendation domain, or the patient history in the medical domain. The study focuses on how to make a choice between two arms with different numbers of samples. Traditionally, a confidence region is derived for each arm based on the associated samples, and the 'Optimism in front of the unknown' principle implements the choice of the arm with maximal upper confidence bound. An alternative, pioneered by (Baransi et al. 2014), and called BESA, proceeds instead by subsampling without replacement the larger sample set. In this framework, we designed a contextual bandit algorithm based on sub-sampling without replacement, relaxing the (unrealistic) assumption that all arm reward distributions rely on the same parameter. The CL-BESA algorithm yields both theoretical guarantees of logarithmic regret and good empirical behavior.
|
6 |
Statistical Design of Sequential Decision Making AlgorithmsChi-hua Wang (12469251) 27 April 2022 (has links)
<p>Sequential decision-making is a fundamental class of problem that motivates algorithm designs of online machine learning and reinforcement learning. Arguably, the resulting online algorithms have supported modern online service industries for their data-driven real-time automated decision making. The applications span across different industries, including dynamic pricing (Marketing), recommendation (Advertising), and dosage finding (Clinical Trial). In this dissertation, we contribute fundamental statistical design advances for sequential decision-making algorithms, leaping progress in theory and application of online learning and sequential decision making under uncertainty including online sparse learning, finite-armed bandits, and high-dimensional online decision making. Our work locates at the intersection of decision-making algorithm designs, online statistical machine learning, and operations research, contributing new algorithms, theory, and insights to diverse fields including optimization, statistics, and machine learning.</p>
<p><br></p>
<p>In part I, we contribute a theoretical framework of continuous risk monitoring for regularized online statistical learning. Such theoretical framework is desirable for modern online service industries on monitoring deployed model's performance of online machine learning task. In the first project (Chapter 1), we develop continuous risk monitoring for the online Lasso procedure and provide an always-valid algorithm for high-dimensional dynamic pricing problems. In the second project (Chapter 2), we develop continuous risk monitoring for online matrix regression and provide new algorithms for rank-constrained online matrix completion problems. Such theoretical advances are due to our elegant interplay between non-asymptotic martingale concentration theory and regularized online statistical machine learning.</p>
<p><br></p>
<p>In part II, we contribute a bootstrap-based methodology for finite-armed bandit problems, termed Residual Bootstrap exploration. Such a method opens a possibility to design model-agnostic bandit algorithms without problem-adaptive optimism-engineering and instance-specific prior-tuning. In the first project (Chapter 3), we develop residual bootstrap exploration for multi-armed bandit algorithms and shows its easy generalizability to bandit problems with complex or ambiguous reward structure. In the second project (Chapter 4), we develop a theoretical framework for residual bootstrap exploration in linear bandit with fixed action set. Such methodology advances are due to our development of non-asymptotic theory for the bootstrap procedure.</p>
<p><br></p>
<p>In part III, we contribute application-driven insights on the exploration-exploitation dilemma for high-dimensional online decision-making problems. Such insights help practitioners to implement effective high-dimensional statistics methods to solve online decisionmaking problems. In the first project (Chapter 5), we develop a bandit sampling scheme for online batch high-dimensional decision making, a practical scenario in interactive marketing, and sequential clinical trials. In the second project (Chapter 6), we develop a bandit sampling scheme for federated online high-dimensional decision-making to maintain data decentralization and perform collaborated decisions. These new insights are due to our new bandit sampling design to address application-driven exploration-exploitation trade-offs effectively. </p>
|
Page generated in 0.0571 seconds