Spelling suggestions: "subject:"bandit"" "subject:"pandit""
11 |
RESPONSE ADAPTIVE CLINICAL TRIALS WITH CENSORED LIFETIMES2013 October 1900 (has links)
We have constructed a response adaptive clinical trial to treat patients sequentially in order to maximize the total survival time of all patients. Typically the response adaptive design is based on the urn models or on sequential estimation procedures, but we used a bandit process in this dissertation. The objective of a bandit process is to optimize a measure of sequential selections from several treatments. Each treatment consist of a sequence of conditionally independent and identically distributed random variables, and some of these treatment have unknown distribution functions. For the purpose of this clinical trial, we are focusing on the bandit process with delayed response. These responses are lifetime variables which may be censored upon their observations. Following the Bayesian approach and dynamic programming technique, we formulated a controlled stochastic dynamic model. In addition, we used an example to illustrate the possible application of the main results as well as "R" to implement a model simulation.
|
12 |
EFFECTS OF RESPONSE FREQUENCY CONSTRAINTS ON LEARNING IN A NON-STATIONARY MULTI-ARMED BANDIT TASKRacey, Deborah Elaine 01 December 2009 (has links)
An n-armed bandit task was used to investigate the trade-off between exploratory (choosing lesser-known options) and exploitive (choosing options with the greatest probability of reinforcement) human choice in a trial-and-error learning problem. In Experiment 1 a different probability of reinforcement was assigned to each of 8 response options using random-ratios (RRs), and participants chose by clicking buttons in a circular display on a computer screen using a computer mouse. Relative frequency thresholds (ranging from .10 to 1.0) were randomly assigned to each participant and acted as task constraints limiting the proportion of total responses that could be attributed to any response option. Preference for the richer keys was shown, and those with greater constraints explored more and earned less reinforcement. Those with the highest constraints showed no preference, distributing their responses among the options with equal probability. In Experiment 2 the payoff probabilities changed partway through, for some the leanest options increased to richest, and for others the richest became leanest. When the RRs changed, the decrease participants with moderate and low constraints showed immediate increases in exploration and change in preference to the new richest keys, while increase participants showed no increase in exploration, and more gradual changes in preference. For Experiment 3 the constraint was held constant at .85, and the two richest options were decreased midway through the task by varying amounts (0 to .60). Decreases were detected early for participants in all but the smallest decrease conditions, and exploration increased.
|
13 |
A Study of Thompson Sampling Approach for the Sleeping Multi-Armed Bandit ProblemChatterjee, Aritra January 2017 (has links) (PDF)
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision problems arising in modern applications including Internet display advertising, crowdsourcing, online procurement, smart grids, etc. Several variants of the MAB problem have been proposed to extend the basic model to a variety of practical and general settings. The sleeping multi-armed bandit (SMAB) problem is one such variant where the set of available arms varies with time. This study is focused on analyzing the efficacy of the Thompson Sampling algorithm for solving the SMAB problem.
Any algorithm for the classical MAB problem is expected to choose one of K available arms (actions) in each of T consecutive rounds. Each choice of an arm generates a stochastic reward from an unknown but fixed distribution. The goal of the algorithm is to maximize the expected sum of rewards over the T rounds (or equivalently minimize the expected total regret), relative to the best fixed action in hindsight. In many real-world settings, however, not all arms may be available in any given round. For example, in Internet display advertising, some advertisers might choose to stay away from the auction due to budget constraints; in crowdsourcing, some workers may not be available at a given time due to timezone difference, etc. Such situations give rise to the sleeping MAB abstraction.
In the literature, several upper confidence bound (UCB)-based approaches have been proposed and investigated for the SMAB problem. Our contribution is to investigate the efficacy of a Thomp-son Sampling-based approach. Our key finding is to establish a logarithmic regret bound, which non-trivially generalizes a similar bound known for this approach in the classical MAB setting. Our bound also matches (up to constants) the best-known lower bound for the SMAB problem. Furthermore, we show via detailed simulations, that the Thompson Sampling approach in fact outperforms the known algorithms for the SMAB problem.
|
14 |
A figura do bandido no Novo Cinema Brasileiro / The outlaw image in New Brasilian CinemaTau, Ana Claudia 30 August 2005 (has links)
Orientador: Marcius Cesar Soares Freire / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Artes / Made available in DSpace on 2018-08-06T19:01:59Z (GMT). No. of bitstreams: 1
Tau_AnaClaudia_M.pdf: 364796 bytes, checksum: 363d266f552b150d090a1cb7c5cb693d (MD5)
Previous issue date: 2005 / Resumo: O presente trabalho visa a entender como é constituída a imagem do bandido em filmes brasileiros recentes, comparando-a com a figura do bandido social do Cinema Novo e Marginal. O trabalho parte do pressuposto de que existe uma releitura da figura do bandido social dos anos 60 nos filmes brasileiros da atualidade. No percurso, busca compreender o recente quadro do nosso cinema que, impulsionado pela Lei do Audiovisual de 1993, vem se deparando com novos desafios estéticos, mercadológicos e tecnológicos provenientes do processo de globalização.Foram escolhidos para análise quatro filmes, todos de temática urbana, tendo como locação duas metrópoles brasileiras: São Paulo e Rio de Janeiro. Os filmes cujas histórias se passam na cidade do Rio de Janeiro são: A Grande Cidade (Cacá Diegues, 1966) e O Primeiro Dia (Walter Salles e Daniela Thomas, 1999). O Bandido da Luz Vermelha (Rogério Sganzerla, 1968) e O Invasor (Beto Brant, 2001) têm São Paulo como palco dos acontecimentos / Abstract: The present study aims at understanding how the image of the villain is composed in recent Brazilian movies, comparing it with the figure of the social criminal in the Cinema Novo (New Film) and Marginal Film. This work starts from the presupposition that there is a new reading of the social outlaw figure of the sixties in the current Brazilian movies. As it develops, it seeks to grasp the contemporary scenario of our movie industry that, prompted by the Audiovisual Law of 1993, has been facing new aesthetic, marketing and technological challenges derived from the globalization process. Four movies were chosen for analysis, all of them with an urban theme and having two Brazilian metropolises as location: São Paulo and Rio de Janeiro. The movies whose stories take place in Rio de Janeiro are: A Grande Cidade (The Great City) (Cacá Diegues, 1966) and O Primeiro Dia (The First Day) (Walter Salles and Daniela Thomas, 1999). O Bandido da Luz Vermelha (The Red Light Bandit) (Rogério Sganzerla, 1968) and O Invasor (The Invasor) (Beto Brant, 2001) have São Paulo as the stage for the events / Mestrado / Multimeios / Mestre em Multimeios
|
15 |
Apprentissage automatique pour la prise de décisions / Machine learning for decisions-making under uncertaintySani, Amir 12 May 2015 (has links)
La prise de décision stratégique concernant des ressources de valeur devrait tenir compte du degré d'aversion au risque. D'ailleurs, de nombreux domaines d'application mettent le risque au cœur de la prise de décision. Toutefois, ce n'est pas le cas de l'apprentissage automatique. Ainsi, il semble essentiel de devoir fournir des indicateurs et des algorithmes dotant l'apprentissage automatique de la possibilité de prendre en considération le risque dans la prise de décision. En particulier, nous souhaiterions pouvoir estimer ce dernier sur de courtes séquences dépendantes générées à partir de la classe la plus générale possible de processus stochastiques en utilisant des outils théoriques d'inférence statistique et d'aversion au risque dans la prise de décision séquentielle. Cette thèse étudie ces deux problèmes en fournissant des méthodes algorithmiques prenant en considération le risque dans le cadre de la prise de décision en apprentissage automatique. Un algorithme avec des performances de pointe est proposé pour une estimation précise des statistiques de risque avec la classe la plus générale de processus ergodiques et stochastiques. De plus, la notion d'aversion au risque est introduite dans la prise de décision séquentielle (apprentissage en ligne) à la fois dans les jeux de bandits stochastiques et dans l'apprentissage séquentiel antagoniste. / Strategic decision-making over valuable resources should consider risk-averse objectives. Many practical areas of application consider risk as central to decision-making. However, machine learning does not. As a result, research should provide insights and algorithms that endow machine learning with the ability to consider decision-theoretic risk. In particular, in estimating decision-theoretic risk on short dependent sequences generated from the most general possible class of processes for statistical inference and through decision-theoretic risk objectives in sequential decision-making. This thesis studies these two problems to provide principled algorithmic methods for considering decision-theoretic risk in machine learning. An algorithm with state-of-the-art performance is introduced for accurate estimation of risk statistics on the most general class of stationary--ergodic processes and risk-averse objectives are introduced in sequential decision-making (online learning) in both the stochastic multi-arm bandit setting and the adversarial full-information setting.
|
16 |
Online Learning for Resource Allocation in Wireless Networks: Fairness, Communication Efficiency, and Data PrivacyLi, Fengjiao 13 December 2022 (has links)
As the Next-Generation (NextG, 5G and beyond) wireless network supports a wider range of services, optimization of resource allocation plays a crucial role in ensuring efficient use of the (limited) available network resources. Note that resource allocation may require knowledge of network parameters (e.g., channel state information and available power level) for package schedule. However, wireless networks operate in an uncertain environment where, in many practical scenarios, these parameters are unknown before decisions are made. In the absence of network parameters, a network controller, who performs resource allocation, may have to make decisions (aimed at optimizing network performance and satisfying users' QoS requirements) while emph{learning}. To that end, this dissertation studies two novel online learning problems that are motivated by autonomous resource management in NextG.
Key contributions of the dissertation are two-fold. First, we study reward maximization under uncertainty with fairness constraints, which is motivated by wireless scheduling with Quality of Service constraints (e.g., minimum delivery ratio requirement) under uncertainty. We formulate a framework of combinatorial bandits with fairness constraints and develop a fair learning algorithm that successfully addresses the tradeoff between reward maximization and fairness constraints. This framework can also be applied to several other real-world applications, such as online advertising and crowdsourcing. Second, we consider global reward maximization under uncertainty with distributed biased feedback, which is motivated by the problem of cellular network configuration for optimizing network-level performance (e.g., average user-perceived Quality of Experience). We study both the linear-parameterized and non-parametric global reward functions, which are modeled as distributed linear bandits and kernelized bandits, respectively. For each model, we propose a learning algorithmic framework that can be integrated with different differential privacy models. We show that the proposed algorithms can achieve a near-optimal regret in a communication-efficient manner while protecting users' data privacy ``for free''. Our findings reveal that our developed algorithms outperform the state-of-the-art solutions in terms of the tradeoff among the regret, communication efficiency, and computation complexity. In addition, our proposed models and online learning algorithms can also be applied to several other real-world applications, e.g., dynamic pricing and public policy making, which may be of independent interest to a broader research community. / Doctor of Philosophy / As the Next-Generation (NextG) wireless network supports a wider range of services, optimization of resource allocation plays a crucial role in ensuring efficient use of the (limited) available network resources. Note that resource allocation may require knowledge of network parameters (e.g., channel state information and available power level) for package schedule. However, wireless networks operate in an uncertain environment where, in many practical scenarios, these parameters are unknown before decisions are made. In the absence of network parameters, a network controller, who performs resource allocation, may have to make decisions (aimed at optimizing network performance and satisfying users' QoS requirements) while emph{learning}. To that end, this dissertation studies two novel online learning problems that are motivated by resource allocation in the presence uncertainty in NextG.
Key contributions of the dissertation are two-fold. First, we study reward maximization under uncertainty with fairness constraints, which is motivated by wireless scheduling with Quality of Service constraints (e.g., minimum delivery ratio requirement) under uncertainty. We formulate a framework of combinatorial bandits with fairness constraints and develop a fair learning algorithm that successfully addresses the tradeoff between reward maximization and fairness constraints. This framework can also be applied to several other real-world applications, such as online advertising and crowdsourcing. Second, we consider global reward maximization under uncertainty with distributed biased feedback, which is motivated by the problem of cellular network configuration for optimizing network-level performance (e.g., average user-perceived Quality of Experience). We consider both the linear-parameterized and non-parametric (unknown) global reward functions, which are modeled as distributed linear bandits and kernelized bandits, respectively. For each model, we propose a learning algorithmic framework that integrate different privacy models according to different privacy requirements or different scenarios. We show that the proposed algorithms can learn the unknown functions in a communication-efficient manner while protecting users' data privacy ``for free''. Our findings reveal that our developed algorithms outperform the state-of-the-art solutions in terms of the tradeoff among the regret, communication efficiency, and computation complexity. In addition, our proposed models and online learning algorithms can also be applied to several other real-world applications, e.g., dynamic pricing and public policy making, which may be of independent interest to a broader research community.
|
17 |
REVENUE AND RETURN MANAGEMENT IN E-COMMERCEShahmardan, Amin January 2025 (has links)
In this dissertation, we explore the intersection of return management and dynamic pricing strategies in online retailing. The dissertation consists of five chapters. In Chapter 1, we present the research motivations and provide an overview of the problems studied. Chapter 2 investigates the \Returnless Refund" policy, a novel return strategy in which retailers offer a full refund without requiring customers to return the product. We show that the optimal returnless refund policy is in the form of a threshold policy, offering a returnless refund when the salvage value of the returned product is below a positive threshold. This method allows retailers to decide between granting a refund and reselling the product efficiently. It is also shown that, for items with a high expected salvage value, this threshold-based policy is advantageous for both retailers and customers. We also show that in the early stages of policy implementation, when customers are unaware of returnless refunds, a naive policy is optimal, but the threshold rises as customer awareness increases. Furthermore, our findings show that dishonest customers, who may fake request a return to exploit this policy, can enhance the retailer profits when the price exceeds a certain level. In Chapter 3, we study a conservative dynamic pricing problem with demand learning in the presence of covariates, where the demand function follows a generalized linear model. We address managers’ concerns about transitioning from a legacy pricing system to a learning-based approach, focusing on risks of revenue loss. We propose two dynamic pricing models. The first, a stage-wise safe model, ensures that the instantaneous expected revenue from algorithmic pricing matches or exceeds a fraction of the baseline policy’s revenue in each period. Using a modified UCB algorithm, we show that the regret of this model is composed of two parts: the regret from the learning process and the regret from applying perturbed baseline prices. The second, a cumulative revenue safe, model extends this by ensuring the algorithm’s cumulative revenue meets a target compared to the baseline. Our analysis shows that the algorithm uses the baseline prices a finite number of times, even when the expected revenue of the baseline prices must be learned, offering a balance between exploration and revenue safety constraints.
Chapter 4 addresses a dynamic pricing problem where customers can return products within a specified grace window, and purchasing and returning probabilities are unknown. We propose two approaches: in the first approach, the retailer learns the probabilities separately, leading to a higher regret due to censored data from return decisions. The second approach focuses on joint learning, where the final demand |calculated as the product of purchasing and keeping probabilities |is learned directly, resulting in lower overall regret. For both approaches, we extend the analysis to scenarios where return delays are dominated by a Pareto distribution. Finally, Chapter 5 summarizes the contributions and suggests directions for future research. / Dissertation / Doctor of Philosophy (PhD)
|
18 |
Algorithmes stochastiques pour l'apprentissage, l'optimisation et l'approximation du régime stationnaire / Stochastic algorithms for learning, optimization and approximation of the steady regimeSaadane, Sofiane 02 December 2016 (has links)
Dans cette thèse, nous étudions des thématiques autour des algorithmes stochastiques et c'est pour cette raison que nous débuterons ce manuscrit par des éléments généraux sur ces algorithmes en donnant des résultats historiques pour poser les bases de nos travaux. Ensuite, nous étudierons un algorithme de bandit issu des travaux de N arendra et Shapiro dont l'objectif est de déterminer parmi un choix de plusieurs sources laquelle profite le plus à l'utilisateur en évitant toutefois de passer trop de temps à tester celles qui sont moins performantes. Notre but est dans un premier temps de comprendre les faiblesses structurelles de cet algorithme pour ensuite proposer une procédure optimale pour une quantité qui mesure les performances d'un algorithme de bandit, le regret. Dans nos résultats, nous proposerons un algorithme appelé NS sur-pénalisé qui permet d'obtenir une borne de regret optimale au sens minimax au travers d'une étude fine de l'algorithme stochastique sous-jacent à cette procédure. Un second travail sera de donner des vitesses de convergence pour le processus apparaissant dans l'étude de la convergence en loi de l'algorithme NS sur-pénalisé. La particularité de l'algorithme est qu'il ne converge pas en loi vers une diffusion comme la plupart des algorithmes stochastiques mais vers un processus à sauts non-diffusif ce qui rend l'étude de la convergence à l'équilibre plus technique. Nous emploierons une technique de couplage afin d'étudier cette convergence. Le second travail de cette thèse s'inscrit dans le cadre de l'optimisation d'une fonction au moyen d'un algorithme stochastique. Nous étudierons une version stochastique de l'algorithme déterministe de boule pesante avec amortissement. La particularité de cet algorithme est d'être articulé autour d'une dynamique qui utilise une moyennisation sur tout le passé de sa trajectoire. La procédure fait appelle à une fonction dite de mémoire qui, selon les formes qu'elle prend, offre des comportements intéressants. Dans notre étude, nous verrons que deux types de mémoire sont pertinents : les mémoires exponentielles et polynomiales. Nous établirons pour commencer des résultats de convergence dans le cas général où la fonction à minimiser est non-convexe. Dans le cas de fonctions fortement convexes, nous obtenons des vitesses de convergence optimales en un sens que nous définirons. Enfin, l'étude se termine par un résultat de convergence en loi du processus après une bonne renormalisation. La troisième partie s'articule autour des algorithmes de McKean-Vlasov qui furent introduit par Anatoly Vlasov et étudié, pour la première fois, par Henry McKean dans l'optique de la modélisation de la loi de distribution du plasma. Notre objectif est de proposer un algorithme stochastique capable d'approcher la mesure invariante du processus. Les méthodes pour approcher une mesure invariante sont connues dans le cas des diffusions et de certains autre processus mais ici la particularité du processus de McKean-Vlasov est de ne pas être une diffusion linéaire. En effet, le processus a de la mémoire comme les processus de boule pesante. De ce fait, il nous faudra développer une méthode alternative pour contourner ce problème. Nous aurons besoin d'introduire la notion de pseudo-trajectoires afin de proposer une procédure efficace. / In this thesis, we are studying severa! stochastic algorithms with different purposes and this is why we will start this manuscript by giving historicals results to define the framework of our work. Then, we will study a bandit algorithm due to the work of Narendra and Shapiro whose objectif was to determine among a choice of severa! sources which one is the most profitable without spending too much times on the wrong orres. Our goal is to understand the weakness of this algorithm in order to propose an optimal procedure for a quantity measuring the performance of a bandit algorithm, the regret. In our results, we will propose an algorithm called NS over-penalized which allows to obtain a minimax regret bound. A second work will be to understand the convergence in law of this process. The particularity of the algorith is that it converges in law toward a non-diffusive process which makes the study more intricate than the standard case. We will use coupling techniques to study this process and propose rates of convergence. The second work of this thesis falls in the scope of optimization of a function using a stochastic algorithm. We will study a stochastic version of the so-called heavy bali method with friction. The particularity of the algorithm is that its dynamics is based on the ali past of the trajectory. The procedure relies on a memory term which dictates the behavior of the procedure by the form it takes. In our framework, two types of memory will investigated : polynomial and exponential. We will start with general convergence results in the non-convex case. In the case of strongly convex functions, we will provide upper-bounds for the rate of convergence. Finally, a convergence in law result is given in the case of exponential memory. The third part is about the McKean-Vlasov equations which were first introduced by Anatoly Vlasov and first studied by Henry McKean in order to mode! the distribution function of plasma. Our objective is to propose a stochastic algorithm to approach the invariant distribution of the McKean Vlasov equation. Methods in the case of diffusion processes (and sorne more general pro cesses) are known but the particularity of McKean Vlasov process is that it is strongly non-linear. Thus, we will have to develop an alternative approach. We will introduce the notion of asymptotic pseudotrajectory in odrer to get an efficient procedure.
|
19 |
Analyse de stratégies bayésiennes et fréquentistes pour l'allocation séquentielle de ressources / Analysis of bayesian and frequentist strategies for sequential resource allocationKaufmann, Emilie 01 October 2014 (has links)
Dans cette thèse, nous étudions des stratégies d’allocation séquentielle de ressources. Le modèle statistique adopté dans ce cadre est celui du bandit stochastique à plusieurs bras. Dans ce modèle, lorsqu’un agent tire un bras du bandit, il reçoit pour récompense une réalisation d’une distribution de probabilité associée au bras. Nous nous intéressons à deux problèmes de bandit différents : la maximisation de la somme des récompenses et l’identification des meilleurs bras (où l’agent cherche à identifier le ou les bras conduisant à la meilleure récompense moyenne, sans subir de perte lorsqu’il tire un «mauvais» bras). Nous nous attachons à proposer pour ces deux objectifs des stratégies de tirage des bras, aussi appelées algorithmes de bandit, que l’on peut qualifier d’optimales. La maximisation des récompenses est équivalente à la minimisation d’une quantité appelée regret. Grâce à une borne inférieure asymptotique sur le regret d’une stratégie uniformément efficace établie par Lai et Robbins, on peut définir la notion d’algorithme asymptotiquement optimal comme un algorithme dont le regret atteint cette borne inférieure. Dans cette thèse, nous proposons pour deux algorithmes d’inspiration bayésienne, Bayes-UCB et Thompson Sampling, une analyse à temps fini dans le cadre des modèles de bandit à récompenses binaires, c’est-à-dire une majoration non asymptotique de leur regret. Cette majoration permetd’établir l’optimalité asymptotique des deux algorithmes. Dans le cadre de l’identification des meilleurs bras, on peut chercher à déterminer le nombre total d’échantillons des bras nécessaires pour identifier, avec forte probabilité, le ou les meilleurs bras, sans la contrainte de maximiser la somme des observations. Nous définissons deux termes de complexité pour l’identification des meilleurs bras dans deux cadres considérés dans la littérature, qui correspondent à un budget fixé ou à un niveau de confiance fixé. Nous proposons de nouvelles bornes inférieures sur ces complexités, et nous analysons de nouveaux algorithmes, dont certains atteignent les bornes inférieures dans des cas particuliers de modèles de bandit à deux bras, et peuvent donc être qualifiés d’optimaux. / In this thesis, we study strategies for sequential resource allocation, under the so-called stochastic multi-armed bandit model. In this model, when an agent draws an arm, he receives as a reward a realization from a probability distribution associated to the arm. In this document, we consider two different bandit problems. In the reward maximization objective, the agent aims at maximizing the sum of rewards obtained during his interaction with the bandit, whereas in the best arm identification objective, his goal is to find the set of m best arms (i.e. arms with highest mean reward), without suffering a loss when drawing ‘bad’ arms. For these two objectives, we propose strategies, also called bandit algorithms, that are optimal (or close to optimal), in a sense precised below. Maximizing the sum of rewards is equivalent to minimizing a quantity called regret. Thanks to an asymptotic lower bound on the regret of any uniformly efficient algorithm given by Lai and Robbins, one can define asymptotically optimal algorithms as algorithms whose regret reaches this lower bound. In this thesis, we propose, for two Bayesian algorithms, Bayes-UCB and Thompson Sampling, a finite-time analysis, that is a non-asymptotic upper bound on their regret, in the particular case of bandits with binary rewards. This upper bound allows to establish the asymptotic optimality of both algorithms. In the best arm identification framework, a possible goal is to determine the number of samples of the armsneeded to identify, with high probability, the set of m best arms. We define a notion of complexity for best arm identification in two different settings considered in the literature: the fixed-budget and fixed-confidence settings. We provide new lower bounds on these complexity terms and we analyse new algorithms, some of which reach the lower bound in particular cases of two-armed bandit models and are therefore optimal
|
20 |
Edge Compute Offloading Strategies using Heuristic and Reinforcement Learning Techniques.Dikonimaki, Chrysoula January 2023 (has links)
The emergence of 5G alongside the distributed computing paradigm called Edge computing has prompted a tremendous change in the industry through the opportunity for reducing network latency and energy consumption and providing scalability. Edge computing extends the capabilities of users’ resource-constrained devices by placing data centers at the edge of the network. Computation offloading enables edge computing by allowing the migration of users’ tasks to edge servers. Deciding whether it is beneficial for a mobile device to offload a task and on which server to offload, while environmental variables, such as availability, load, network quality, etc., are changing dynamically, is a challenging problem that requires careful consideration to achieve better performance. This project focuses on proposing lightweight and efficient algorithms to take offloading decisions from the mobile device perspective to benefit the user. Subsequently, heuristic techniques have been examined as a way to find quick but sub-optimal solutions. These techniques have been combined with a Multi-Armed Bandit algorithm, called Discounted Upper Confidence Bound (DUCB) to take optimal decisions quickly. The findings indicate that these heuristic approaches cannot handle the dynamicity of the problem and the DUCB provides the ability to adapt to changing circumstances without having to keep adding extra parameters. Overall, the DUCB algorithm performs better in terms of local energy consumption and can improve service time most of the times. / Utvecklingen av 5G har skett parallellt med det distribuerade beräkningsparadigm som går under namnet Edge Computing. Lokala datacenter placerade på kanten av nätverket kan reducera nätverkslatensen och energiförbrukningen för applikationer. Exempelvis kan användarenheter med begränsade resurser ges utökande möjligheter genom avlastning av beräkningsintensiva uppgifter. Avlastningen sker genom att migrera de beräkningsintensiva uppgifterna till en dator i datacentret på kanten. Det är dock inte säkert att det alltid lönar sig att avlasta en beräkningsintensiv uppgift från en enhet till kanten. Detta måste avgöras från fall till fall. Att avgöra om och när det lönar sig är ett svårt problem då förutsättningar som tillgänglighet, last, nätverkskvalitét, etcetera hela tiden varierar. Fokus i detta projekt är att identifiera enkla och effektiva algoritmer som kan avgöra om det lönar sig för en användare att avlasta en beräkningsintensiv uppgift från en mobil enhet till kanten. Heuristiska tekniker har utvärderats som en möjlig väg att snabbt hitta lösningar även om de råkar vara suboptimala. Dessa tekniker har kombinerats med en flerarmad banditalgoritm (Multi-Armed Bandit), kallad Discounted Upper Confidence Bound (DUCB), för att ta optimala beslut snabbt. Resultaten indikerar att dessa heuristiska tekniker inte kan hantera de dynamiska förändringar som hela tiden sker samtidigt som DUCB kan anpassa sig till dessa förändrade omständigheter utan att man måste addera extra parametrar. Sammantaget, ger DUCM-algoritmen bättre resultat när det gäller lokal energikonsumtion och kan i de flesta fallen förbättra tiden för tjänsten.
|
Page generated in 0.0397 seconds