Global ETD Search

11	RESPONSE ADAPTIVE CLINICAL TRIALS WITH CENSORED LIFETIMES 2013 October 1900 (has links) We have constructed a response adaptive clinical trial to treat patients sequentially in order to maximize the total survival time of all patients. Typically the response adaptive design is based on the urn models or on sequential estimation procedures, but we used a bandit process in this dissertation. The objective of a bandit process is to optimize a measure of sequential selections from several treatments. Each treatment consist of a sequence of conditionally independent and identically distributed random variables, and some of these treatment have unknown distribution functions. For the purpose of this clinical trial, we are focusing on the bandit process with delayed response. These responses are lifetime variables which may be censored upon their observations. Following the Bayesian approach and dynamic programming technique, we formulated a controlled stochastic dynamic model. In addition, we used an example to illustrate the possible application of the main results as well as "R" to implement a model simulation. Bandit process Delayed response Censored lifetimes Bayesian approach.
12	EFFECTS OF RESPONSE FREQUENCY CONSTRAINTS ON LEARNING IN A NON-STATIONARY MULTI-ARMED BANDIT TASK Racey, Deborah Elaine 01 December 2009 (has links) An n-armed bandit task was used to investigate the trade-off between exploratory (choosing lesser-known options) and exploitive (choosing options with the greatest probability of reinforcement) human choice in a trial-and-error learning problem. In Experiment 1 a different probability of reinforcement was assigned to each of 8 response options using random-ratios (RRs), and participants chose by clicking buttons in a circular display on a computer screen using a computer mouse. Relative frequency thresholds (ranging from .10 to 1.0) were randomly assigned to each participant and acted as task constraints limiting the proportion of total responses that could be attributed to any response option. Preference for the richer keys was shown, and those with greater constraints explored more and earned less reinforcement. Those with the highest constraints showed no preference, distributing their responses among the options with equal probability. In Experiment 2 the payoff probabilities changed partway through, for some the leanest options increased to richest, and for others the richest became leanest. When the RRs changed, the decrease participants with moderate and low constraints showed immediate increases in exploration and change in preference to the new richest keys, while increase participants showed no increase in exploration, and more gradual changes in preference. For Experiment 3 the constraint was held constant at .85, and the two richest options were decreased midway through the task by varying amounts (0 to .60). Decreases were detected early for participants in all but the smallest decrease conditions, and exploration increased. concurrent schedules constraints N-armed bandit reinforcement learning variability
13	A Study of Thompson Sampling Approach for the Sleeping Multi-Armed Bandit Problem Chatterjee, Aritra January 2017 (has links) (PDF) The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision problems arising in modern applications including Internet display advertising, crowdsourcing, online procurement, smart grids, etc. Several variants of the MAB problem have been proposed to extend the basic model to a variety of practical and general settings. The sleeping multi-armed bandit (SMAB) problem is one such variant where the set of available arms varies with time. This study is focused on analyzing the efficacy of the Thompson Sampling algorithm for solving the SMAB problem. Any algorithm for the classical MAB problem is expected to choose one of K available arms (actions) in each of T consecutive rounds. Each choice of an arm generates a stochastic reward from an unknown but fixed distribution. The goal of the algorithm is to maximize the expected sum of rewards over the T rounds (or equivalently minimize the expected total regret), relative to the best fixed action in hindsight. In many real-world settings, however, not all arms may be available in any given round. For example, in Internet display advertising, some advertisers might choose to stay away from the auction due to budget constraints; in crowdsourcing, some workers may not be available at a given time due to timezone difference, etc. Such situations give rise to the sleeping MAB abstraction. In the literature, several upper confidence bound (UCB)-based approaches have been proposed and investigated for the SMAB problem. Our contribution is to investigate the efficacy of a Thomp-son Sampling-based approach. Our key finding is to establish a logarithmic regret bound, which non-trivially generalizes a similar bound known for this approach in the classical MAB setting. Our bound also matches (up to constants) the best-known lower bound for the SMAB problem. Furthermore, we show via detailed simulations, that the Thompson Sampling approach in fact outperforms the known algorithms for the SMAB problem. Thompson Sampling Multi-Armed Bandit Problem Upper Confidence Bound (UCB) Awake Upper Estimated Reward Multi-Armed Bandit Algorithms Sleeping Multi-Armed Bandit Model TS-SMAB Computer Science
14	A figura do bandido no Novo Cinema Brasileiro / The outlaw image in New Brasilian Cinema Tau, Ana Claudia 30 August 2005 (has links) Orientador: Marcius Cesar Soares Freire / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Artes / Made available in DSpace on 2018-08-06T19:01:59Z (GMT). No. of bitstreams: 1 Tau_AnaClaudia_M.pdf: 364796 bytes, checksum: 363d266f552b150d090a1cb7c5cb693d (MD5) Previous issue date: 2005 / Resumo: O presente trabalho visa a entender como é constituída a imagem do bandido em filmes brasileiros recentes, comparando-a com a figura do bandido social do Cinema Novo e Marginal. O trabalho parte do pressuposto de que existe uma releitura da figura do bandido social dos anos 60 nos filmes brasileiros da atualidade. No percurso, busca compreender o recente quadro do nosso cinema que, impulsionado pela Lei do Audiovisual de 1993, vem se deparando com novos desafios estéticos, mercadológicos e tecnológicos provenientes do processo de globalização.Foram escolhidos para análise quatro filmes, todos de temática urbana, tendo como locação duas metrópoles brasileiras: São Paulo e Rio de Janeiro. Os filmes cujas histórias se passam na cidade do Rio de Janeiro são: A Grande Cidade (Cacá Diegues, 1966) e O Primeiro Dia (Walter Salles e Daniela Thomas, 1999). O Bandido da Luz Vermelha (Rogério Sganzerla, 1968) e O Invasor (Beto Brant, 2001) têm São Paulo como palco dos acontecimentos / Abstract: The present study aims at understanding how the image of the villain is composed in recent Brazilian movies, comparing it with the figure of the social criminal in the Cinema Novo (New Film) and Marginal Film. This work starts from the presupposition that there is a new reading of the social outlaw figure of the sixties in the current Brazilian movies. As it develops, it seeks to grasp the contemporary scenario of our movie industry that, prompted by the Audiovisual Law of 1993, has been facing new aesthetic, marketing and technological challenges derived from the globalization process. Four movies were chosen for analysis, all of them with an urban theme and having two Brazilian metropolises as location: São Paulo and Rio de Janeiro. The movies whose stories take place in Rio de Janeiro are: A Grande Cidade (The Great City) (Cacá Diegues, 1966) and O Primeiro Dia (The First Day) (Walter Salles and Daniela Thomas, 1999). O Bandido da Luz Vermelha (The Red Light Bandit) (Rogério Sganzerla, 1968) and O Invasor (The Invasor) (Beto Brant, 2001) have São Paulo as the stage for the events / Mestrado / Multimeios / Mestre em Multimeios Cinema da retomada Cinema - Brasil Brazilian cinema Outlaw Bandit
15	Apprentissage automatique pour la prise de décisions / Machine learning for decisions-making under uncertainty Sani, Amir 12 May 2015 (has links) La prise de décision stratégique concernant des ressources de valeur devrait tenir compte du degré d'aversion au risque. D'ailleurs, de nombreux domaines d'application mettent le risque au cœur de la prise de décision. Toutefois, ce n'est pas le cas de l'apprentissage automatique. Ainsi, il semble essentiel de devoir fournir des indicateurs et des algorithmes dotant l'apprentissage automatique de la possibilité de prendre en considération le risque dans la prise de décision. En particulier, nous souhaiterions pouvoir estimer ce dernier sur de courtes séquences dépendantes générées à partir de la classe la plus générale possible de processus stochastiques en utilisant des outils théoriques d'inférence statistique et d'aversion au risque dans la prise de décision séquentielle. Cette thèse étudie ces deux problèmes en fournissant des méthodes algorithmiques prenant en considération le risque dans le cadre de la prise de décision en apprentissage automatique. Un algorithme avec des performances de pointe est proposé pour une estimation précise des statistiques de risque avec la classe la plus générale de processus ergodiques et stochastiques. De plus, la notion d'aversion au risque est introduite dans la prise de décision séquentielle (apprentissage en ligne) à la fois dans les jeux de bandits stochastiques et dans l'apprentissage séquentiel antagoniste. / Strategic decision-making over valuable resources should consider risk-averse objectives. Many practical areas of application consider risk as central to decision-making. However, machine learning does not. As a result, research should provide insights and algorithms that endow machine learning with the ability to consider decision-theoretic risk. In particular, in estimating decision-theoretic risk on short dependent sequences generated from the most general possible class of processes for statistical inference and through decision-theoretic risk objectives in sequential decision-making. This thesis studies these two problems to provide principled algorithmic methods for considering decision-theoretic risk in machine learning. An algorithm with state-of-the-art performance is introduced for accurate estimation of risk statistics on the most general class of stationary--ergodic processes and risk-averse objectives are introduced in sequential decision-making (online learning) in both the stochastic multi-arm bandit setting and the adversarial full-information setting. Bandit manchot (Mathématiques) Aversion au risque Algorithme d'apprentissage incrémental 519.6
16	Online Learning for Resource Allocation in Wireless Networks: Fairness, Communication Efficiency, and Data Privacy Li, Fengjiao 13 December 2022 (has links) As the Next-Generation (NextG, 5G and beyond) wireless network supports a wider range of services, optimization of resource allocation plays a crucial role in ensuring efficient use of the (limited) available network resources. Note that resource allocation may require knowledge of network parameters (e.g., channel state information and available power level) for package schedule. However, wireless networks operate in an uncertain environment where, in many practical scenarios, these parameters are unknown before decisions are made. In the absence of network parameters, a network controller, who performs resource allocation, may have to make decisions (aimed at optimizing network performance and satisfying users' QoS requirements) while emph{learning}. To that end, this dissertation studies two novel online learning problems that are motivated by autonomous resource management in NextG. Key contributions of the dissertation are two-fold. First, we study reward maximization under uncertainty with fairness constraints, which is motivated by wireless scheduling with Quality of Service constraints (e.g., minimum delivery ratio requirement) under uncertainty. We formulate a framework of combinatorial bandits with fairness constraints and develop a fair learning algorithm that successfully addresses the tradeoff between reward maximization and fairness constraints. This framework can also be applied to several other real-world applications, such as online advertising and crowdsourcing. Second, we consider global reward maximization under uncertainty with distributed biased feedback, which is motivated by the problem of cellular network configuration for optimizing network-level performance (e.g., average user-perceived Quality of Experience). We study both the linear-parameterized and non-parametric global reward functions, which are modeled as distributed linear bandits and kernelized bandits, respectively. For each model, we propose a learning algorithmic framework that can be integrated with different differential privacy models. We show that the proposed algorithms can achieve a near-optimal regret in a communication-efficient manner while protecting users' data privacy ``for free''. Our findings reveal that our developed algorithms outperform the state-of-the-art solutions in terms of the tradeoff among the regret, communication efficiency, and computation complexity. In addition, our proposed models and online learning algorithms can also be applied to several other real-world applications, e.g., dynamic pricing and public policy making, which may be of independent interest to a broader research community. / Doctor of Philosophy / As the Next-Generation (NextG) wireless network supports a wider range of services, optimization of resource allocation plays a crucial role in ensuring efficient use of the (limited) available network resources. Note that resource allocation may require knowledge of network parameters (e.g., channel state information and available power level) for package schedule. However, wireless networks operate in an uncertain environment where, in many practical scenarios, these parameters are unknown before decisions are made. In the absence of network parameters, a network controller, who performs resource allocation, may have to make decisions (aimed at optimizing network performance and satisfying users' QoS requirements) while emph{learning}. To that end, this dissertation studies two novel online learning problems that are motivated by resource allocation in the presence uncertainty in NextG. Key contributions of the dissertation are two-fold. First, we study reward maximization under uncertainty with fairness constraints, which is motivated by wireless scheduling with Quality of Service constraints (e.g., minimum delivery ratio requirement) under uncertainty. We formulate a framework of combinatorial bandits with fairness constraints and develop a fair learning algorithm that successfully addresses the tradeoff between reward maximization and fairness constraints. This framework can also be applied to several other real-world applications, such as online advertising and crowdsourcing. Second, we consider global reward maximization under uncertainty with distributed biased feedback, which is motivated by the problem of cellular network configuration for optimizing network-level performance (e.g., average user-perceived Quality of Experience). We consider both the linear-parameterized and non-parametric (unknown) global reward functions, which are modeled as distributed linear bandits and kernelized bandits, respectively. For each model, we propose a learning algorithmic framework that integrate different privacy models according to different privacy requirements or different scenarios. We show that the proposed algorithms can learn the unknown functions in a communication-efficient manner while protecting users' data privacy ``for free''. Our findings reveal that our developed algorithms outperform the state-of-the-art solutions in terms of the tradeoff among the regret, communication efficiency, and computation complexity. In addition, our proposed models and online learning algorithms can also be applied to several other real-world applications, e.g., dynamic pricing and public policy making, which may be of independent interest to a broader research community. resource allocation online bandit learning fairness communication efficiency data privacy
17	Algorithmes stochastiques pour l'apprentissage, l'optimisation et l'approximation du régime stationnaire / Stochastic algorithms for learning, optimization and approximation of the steady regime Saadane, Sofiane 02 December 2016 (has links) Dans cette thèse, nous étudions des thématiques autour des algorithmes stochastiques et c'est pour cette raison que nous débuterons ce manuscrit par des éléments généraux sur ces algorithmes en donnant des résultats historiques pour poser les bases de nos travaux. Ensuite, nous étudierons un algorithme de bandit issu des travaux de N arendra et Shapiro dont l'objectif est de déterminer parmi un choix de plusieurs sources laquelle profite le plus à l'utilisateur en évitant toutefois de passer trop de temps à tester celles qui sont moins performantes. Notre but est dans un premier temps de comprendre les faiblesses structurelles de cet algorithme pour ensuite proposer une procédure optimale pour une quantité qui mesure les performances d'un algorithme de bandit, le regret. Dans nos résultats, nous proposerons un algorithme appelé NS sur-pénalisé qui permet d'obtenir une borne de regret optimale au sens minimax au travers d'une étude fine de l'algorithme stochastique sous-jacent à cette procédure. Un second travail sera de donner des vitesses de convergence pour le processus apparaissant dans l'étude de la convergence en loi de l'algorithme NS sur-pénalisé. La particularité de l'algorithme est qu'il ne converge pas en loi vers une diffusion comme la plupart des algorithmes stochastiques mais vers un processus à sauts non-diffusif ce qui rend l'étude de la convergence à l'équilibre plus technique. Nous emploierons une technique de couplage afin d'étudier cette convergence. Le second travail de cette thèse s'inscrit dans le cadre de l'optimisation d'une fonction au moyen d'un algorithme stochastique. Nous étudierons une version stochastique de l'algorithme déterministe de boule pesante avec amortissement. La particularité de cet algorithme est d'être articulé autour d'une dynamique qui utilise une moyennisation sur tout le passé de sa trajectoire. La procédure fait appelle à une fonction dite de mémoire qui, selon les formes qu'elle prend, offre des comportements intéressants. Dans notre étude, nous verrons que deux types de mémoire sont pertinents : les mémoires exponentielles et polynomiales. Nous établirons pour commencer des résultats de convergence dans le cas général où la fonction à minimiser est non-convexe. Dans le cas de fonctions fortement convexes, nous obtenons des vitesses de convergence optimales en un sens que nous définirons. Enfin, l'étude se termine par un résultat de convergence en loi du processus après une bonne renormalisation. La troisième partie s'articule autour des algorithmes de McKean-Vlasov qui furent introduit par Anatoly Vlasov et étudié, pour la première fois, par Henry McKean dans l'optique de la modélisation de la loi de distribution du plasma. Notre objectif est de proposer un algorithme stochastique capable d'approcher la mesure invariante du processus. Les méthodes pour approcher une mesure invariante sont connues dans le cas des diffusions et de certains autre processus mais ici la particularité du processus de McKean-Vlasov est de ne pas être une diffusion linéaire. En effet, le processus a de la mémoire comme les processus de boule pesante. De ce fait, il nous faudra développer une méthode alternative pour contourner ce problème. Nous aurons besoin d'introduire la notion de pseudo-trajectoires afin de proposer une procédure efficace. / In this thesis, we are studying severa! stochastic algorithms with different purposes and this is why we will start this manuscript by giving historicals results to define the framework of our work. Then, we will study a bandit algorithm due to the work of Narendra and Shapiro whose objectif was to determine among a choice of severa! sources which one is the most profitable without spending too much times on the wrong orres. Our goal is to understand the weakness of this algorithm in order to propose an optimal procedure for a quantity measuring the performance of a bandit algorithm, the regret. In our results, we will propose an algorithm called NS over-penalized which allows to obtain a minimax regret bound. A second work will be to understand the convergence in law of this process. The particularity of the algorith is that it converges in law toward a non-diffusive process which makes the study more intricate than the standard case. We will use coupling techniques to study this process and propose rates of convergence. The second work of this thesis falls in the scope of optimization of a function using a stochastic algorithm. We will study a stochastic version of the so-called heavy bali method with friction. The particularity of the algorithm is that its dynamics is based on the ali past of the trajectory. The procedure relies on a memory term which dictates the behavior of the procedure by the form it takes. In our framework, two types of memory will investigated : polynomial and exponential. We will start with general convergence results in the non-convex case. In the case of strongly convex functions, we will provide upper-bounds for the rate of convergence. Finally, a convergence in law result is given in the case of exponential memory. The third part is about the McKean-Vlasov equations which were first introduced by Anatoly Vlasov and first studied by Henry McKean in order to mode! the distribution function of plasma. Our objective is to propose a stochastic algorithm to approach the invariant distribution of the McKean Vlasov equation. Methods in the case of diffusion processes (and sorne more general pro cesses) are known but the particularity of McKean Vlasov process is that it is strongly non-linear. Thus, we will have to develop an alternative approach. We will introduce the notion of asymptotic pseudotrajectory in odrer to get an efficient procedure. Algorithme stochastique Optimisation stochastique Bandit McKean-Vlasov Algorithme à mémoire Stochastic algorithms Stochastic optimisation Memory algorithm Bandit McKaen-Vlasov
18	Analyse de stratégies bayésiennes et fréquentistes pour l'allocation séquentielle de ressources / Analysis of bayesian and frequentist strategies for sequential resource allocation Kaufmann, Emilie 01 October 2014 (has links) Dans cette thèse, nous étudions des stratégies d’allocation séquentielle de ressources. Le modèle statistique adopté dans ce cadre est celui du bandit stochastique à plusieurs bras. Dans ce modèle, lorsqu’un agent tire un bras du bandit, il reçoit pour récompense une réalisation d’une distribution de probabilité associée au bras. Nous nous intéressons à deux problèmes de bandit différents : la maximisation de la somme des récompenses et l’identification des meilleurs bras (où l’agent cherche à identifier le ou les bras conduisant à la meilleure récompense moyenne, sans subir de perte lorsqu’il tire un «mauvais» bras). Nous nous attachons à proposer pour ces deux objectifs des stratégies de tirage des bras, aussi appelées algorithmes de bandit, que l’on peut qualifier d’optimales. La maximisation des récompenses est équivalente à la minimisation d’une quantité appelée regret. Grâce à une borne inférieure asymptotique sur le regret d’une stratégie uniformément efficace établie par Lai et Robbins, on peut définir la notion d’algorithme asymptotiquement optimal comme un algorithme dont le regret atteint cette borne inférieure. Dans cette thèse, nous proposons pour deux algorithmes d’inspiration bayésienne, Bayes-UCB et Thompson Sampling, une analyse à temps fini dans le cadre des modèles de bandit à récompenses binaires, c’est-à-dire une majoration non asymptotique de leur regret. Cette majoration permetd’établir l’optimalité asymptotique des deux algorithmes. Dans le cadre de l’identification des meilleurs bras, on peut chercher à déterminer le nombre total d’échantillons des bras nécessaires pour identifier, avec forte probabilité, le ou les meilleurs bras, sans la contrainte de maximiser la somme des observations. Nous définissons deux termes de complexité pour l’identification des meilleurs bras dans deux cadres considérés dans la littérature, qui correspondent à un budget fixé ou à un niveau de confiance fixé. Nous proposons de nouvelles bornes inférieures sur ces complexités, et nous analysons de nouveaux algorithmes, dont certains atteignent les bornes inférieures dans des cas particuliers de modèles de bandit à deux bras, et peuvent donc être qualifiés d’optimaux. / In this thesis, we study strategies for sequential resource allocation, under the so-called stochastic multi-armed bandit model. In this model, when an agent draws an arm, he receives as a reward a realization from a probability distribution associated to the arm. In this document, we consider two different bandit problems. In the reward maximization objective, the agent aims at maximizing the sum of rewards obtained during his interaction with the bandit, whereas in the best arm identification objective, his goal is to find the set of m best arms (i.e. arms with highest mean reward), without suffering a loss when drawing ‘bad’ arms. For these two objectives, we propose strategies, also called bandit algorithms, that are optimal (or close to optimal), in a sense precised below. Maximizing the sum of rewards is equivalent to minimizing a quantity called regret. Thanks to an asymptotic lower bound on the regret of any uniformly efficient algorithm given by Lai and Robbins, one can define asymptotically optimal algorithms as algorithms whose regret reaches this lower bound. In this thesis, we propose, for two Bayesian algorithms, Bayes-UCB and Thompson Sampling, a finite-time analysis, that is a non-asymptotic upper bound on their regret, in the particular case of bandits with binary rewards. This upper bound allows to establish the asymptotic optimality of both algorithms. In the best arm identification framework, a possible goal is to determine the number of samples of the armsneeded to identify, with high probability, the set of m best arms. We define a notion of complexity for best arm identification in two different settings considered in the literature: the fixed-budget and fixed-confidence settings. We provide new lower bounds on these complexity terms and we analyse new algorithms, some of which reach the lower bound in particular cases of two-armed bandit models and are therefore optimal Statistiques Bayesien Modèle de bandit Apprentissage Minimisation du regret Identification des meilleurs bras Statistics Bayesian Bandit models Learning Regret minimization Best arm identification
19	Edge Compute Offloading Strategies using Heuristic and Reinforcement Learning Techniques. Dikonimaki, Chrysoula January 2023 (has links) The emergence of 5G alongside the distributed computing paradigm called Edge computing has prompted a tremendous change in the industry through the opportunity for reducing network latency and energy consumption and providing scalability. Edge computing extends the capabilities of users’ resource-constrained devices by placing data centers at the edge of the network. Computation offloading enables edge computing by allowing the migration of users’ tasks to edge servers. Deciding whether it is beneficial for a mobile device to offload a task and on which server to offload, while environmental variables, such as availability, load, network quality, etc., are changing dynamically, is a challenging problem that requires careful consideration to achieve better performance. This project focuses on proposing lightweight and efficient algorithms to take offloading decisions from the mobile device perspective to benefit the user. Subsequently, heuristic techniques have been examined as a way to find quick but sub-optimal solutions. These techniques have been combined with a Multi-Armed Bandit algorithm, called Discounted Upper Confidence Bound (DUCB) to take optimal decisions quickly. The findings indicate that these heuristic approaches cannot handle the dynamicity of the problem and the DUCB provides the ability to adapt to changing circumstances without having to keep adding extra parameters. Overall, the DUCB algorithm performs better in terms of local energy consumption and can improve service time most of the times. / Utvecklingen av 5G har skett parallellt med det distribuerade beräkningsparadigm som går under namnet Edge Computing. Lokala datacenter placerade på kanten av nätverket kan reducera nätverkslatensen och energiförbrukningen för applikationer. Exempelvis kan användarenheter med begränsade resurser ges utökande möjligheter genom avlastning av beräkningsintensiva uppgifter. Avlastningen sker genom att migrera de beräkningsintensiva uppgifterna till en dator i datacentret på kanten. Det är dock inte säkert att det alltid lönar sig att avlasta en beräkningsintensiv uppgift från en enhet till kanten. Detta måste avgöras från fall till fall. Att avgöra om och när det lönar sig är ett svårt problem då förutsättningar som tillgänglighet, last, nätverkskvalitét, etcetera hela tiden varierar. Fokus i detta projekt är att identifiera enkla och effektiva algoritmer som kan avgöra om det lönar sig för en användare att avlasta en beräkningsintensiv uppgift från en mobil enhet till kanten. Heuristiska tekniker har utvärderats som en möjlig väg att snabbt hitta lösningar även om de råkar vara suboptimala. Dessa tekniker har kombinerats med en flerarmad banditalgoritm (Multi-Armed Bandit), kallad Discounted Upper Confidence Bound (DUCB), för att ta optimala beslut snabbt. Resultaten indikerar att dessa heuristiska tekniker inte kan hantera de dynamiska förändringar som hela tiden sker samtidigt som DUCB kan anpassa sig till dessa förändrade omständigheter utan att man måste addera extra parametrar. Sammantaget, ger DUCM-algoritmen bättre resultat när det gäller lokal energikonsumtion och kan i de flesta fallen förbättra tiden för tjänsten. Computation offloading Edge Computing Heuristics Multi-armed bandit Beräkningsavlastning Heuristics Kantberäkning Flerarmad bandit-algoritm Computer and Information Sciences Data- och informationsvetenskap
20	Cognitive radio for coexistence of heterogeneous wireless networks / Radio cognitive pour la coexistence de réseaux radio hétérogènes Boldrini, Stefano 10 April 2014 (has links) Dans un scénario avec plusieurs réseaux sans fil de différentes technologies, ce travail a comme objectif la conception d'un moteur cognitif capable de reconnaitre l'environnement radio et de sélectionner un réseau avec le but final de maximiser la "qualité d'expérience" (QoE) de l'utilisateur. Un accent particulier est mis sur la simplicité de tous les éléments impliqués, du hardware aux algorithmes, afin de garder la faisabilité pratique de ce dispositif.Deux aspects ont été étudiés. Pour la reconnaissance de l'environnement radio une identification de réseau et une classification automatique sur la base de caractéristiques de la couche MAC a été proposée et testée. En ce qui concerne la sélection du réseau, des "Key Performance Indicators" (KPIs), qui sont des paramètres de la couche application, ont étés pris en compte afin d'obtenir la QoE désirée. Un modèle général pour la sélection du réseau a été proposé et testé avec de différents types de trafic par des simulations et par la réalisation d'un démonstrateur (application pour Android). De plus, comme il y a le problème de quand mesurer pour estimer la performance d'un réseau et quand l'utiliser effectivement pour transmettre et recevoir, le problème du bandit manchot ("Multi-armed bandit", MAB) a été appliqué à ce contexte et un nouveau modèle de MAB a été proposé afin de mieux répondre aux cas réels considérés. L'impact du nouveau modèle, qui introduit la distinction de deux actions différentes, mesurer et utiliser, a été testé par des simulations en utilisant des algorithmes déjà disponibles dans la littérature et deux algorithmes conçus spécifiquement. / In a scenario where multiple wireless networks of different technologies are available, this work addresses the problem of the design of a cognitive engine, core of a cognitive radio device, able to perform the surrounding radio environment recognition and the network selection with the final goal of maximization of final user Quality of Experience (QoE). Particular focus is put on the requirement of simplicity of all the elements involved, from hardware to algorithms, in order to keep in mind the importance of its practical realizability.Two aspects were investigated. For the surrounding radio environment recognition step, a network identification and automatic classification method based on MAC layer features was proposed and tested. As regards the network selection, Key Performance Indicators (KPIs), i.e. application layer parameters, were considered in order to obtain the desired goal of QoE. A general model for network selection was proposed and tested for different traffic types, both with simulations and a practical realization of a demonstrator (implemented as an application for Android OS). Moreover, as a consequence of the originated problem of when measuring to estimate a network performance and when effectively using the network for data transmission and reception purposes, the multi-armed bandit problem (MAB) was applied to this context and a new MAB model was proposed, in order to better fit the considered real cases scenarios. The impact of the new model, that introduces the distinction of two different actions, to measure and to use, was tested through simulations using algorithms already available in literature and two specifically designed algorithms. Radio cognitive Sélection des réseaux radio Caractéristiques de la couche MAC Qualité d'expérience Problème du bandit manchot Cognitive radio Wireless network selection MAC features Quality of Experience Multi-armed bandit 378.242

Search results