Global ETD Search

1	Multi-channel opportunistic access : a restless multi-armed bandit perspective Wang, Kehao 22 June 2012 (has links) (PDF) In the thesis, we address the fundamental problem of opportunistic spectrum access in a multi-channel communication system. Specifically, we consider a communication system in which a user has access to multiple channels, but is limited to sensing and transmitting only on one at a given time. We explore how the smart user should exploit past observations and the knowledge of the stochastic properties of these channels to maximize its transmission rate by switching channels opportunistically. Formally, we provide a generic analysis on the opportunistic spectrum access problem by casting the problem into the restless multi-armed bandit (RMAB) problem, one of the most well-known generalizations of the classic multi-armed bandit (MAB) problem, which is of fundamental importance in stochastic decision theory. Despite the significant research efforts in the field, the RMAB problem in its generic form still remains open. Until today, very little result is reported on the structure of the optimal policy. Obtaining the optimal policy for a general RMAB problem is often intractable due to the exponential computation complexity. Hence, a natural alternative is to seek a simple myopic policy maximizing the short-term reward. Therefore, we develop three axioms characterizing a family of functions which we refer to as regular functions, which are generic and practically important. We then establish the optimality of the myopic policy when the reward function can be expressed as a regular function and the discount factor is bounded by a closed-form threshold determined by the reward function. We also illustrate how the derived results, generic in nature, are applied to analyze a class of RMAB problems arising from multi-channel opportunistic access. Next, we further investigate the more challenging problem where the user has to decide the number of channels to sense in each slot in order to maximize its utility (e.g., throughput). After showing the exponential complexity of the problem, we develop a heuristic v-step look-ahead strategy. In the developed strategy, the parameter v allows to achieve a desired tradeoff between social efficiency and computation complexity. We demonstrate the benefits of the proposed strategy via numerical experiments on several typical settings. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Multi-Channel opportunistic access Restless Multi-Armed Bandit Myopic Policy Stochastic Optimization
2	Multi-channel opportunistic access : a restless multi-armed bandit perspective / Accès opportuniste dans les systèmes de communication multi-canaux : une perspective du problème de bandit-manchot Wang, Kehao 22 June 2012 (has links) Dans cette thèse, nous abordons le problème fondamental de l'accès au spectre opportuniste dans un système de communication multi-canal. Plus précisément, nous considérons un système de communication dans lequel un utilisateur a accès à de multiples canaux, tout en étant limité à la détection et la transmission sur un sous-ensemble de canaux. Nous explorons comment l'utilisateur intelligent exploite ses observations passées et les propriétés stochastiques de ces canaux afin de maximiser son débit. Formellement, nous fournissons une analyse générique sur le problème d'accès au spectre opportuniste en nous basant sur le problème de `restless multi-bandit’ (RMAB), l'une des généralisations les plus connues du problème classique de multi-armed bandit (MAB), un problème fondamental dans la théorie de décision stochastique. Malgré les importants efforts de la communauté de recherche dans ce domaine, le problème RMAB dans sa forme générique reste encore ouvert. Jusqu'à aujourd'hui, très peu de résultats sont connus sur la structure de la politique optimale. L'obtention de la politique optimale pour un problème RMAB général est intraçable dû la complexité de calcul exponentiel. Par conséquent, une alternative naturelle est de se focaliser sur la politique myopique qui maximise la récompense à immédiate, tout en ignorant celles du futur. Donc, nous développons trois axiomes caractérisant une famille de fonctions que nous appelons fonctions régulières, qui sont génériques et pratiquement importantes. Nous établissons ensuite l'optimalité de la politique myopique lorsque la fonction de récompense peut être exprimée comme une fonction régulière et le facteur de discount est borné par un seuil déterminé par la fonction de récompense. Nous illustrons également l'application des résultats pour analyser une classe de problèmes RMAB dans l'accès opportuniste. Ensuite, nous étudions un problème plus difficile, où l'utilisateur doit configurer le nombre de canaux à accéder afin de maximiser son utilité (par exemple, le débit). Après avoir montré la complexité exponentielle du problème, nous développons une stratégie heuristique v-step look-ahead. Dans la stratégie développée, le paramètre v permet de parvenir à un compromis souhaité entre l'efficacité sociale et de la complexité de calcul. Nous démontrons les avantages de la stratégie proposée via des simulations numériques sur plusieurs scénarios typiques. / In the thesis, we address the fundamental problem of opportunistic spectrum access in a multi-channel communication system. Specifically, we consider a communication system in which a user has access to multiple channels, but is limited to sensing and transmitting only on one at a given time. We explore how the smart user should exploit past observations and the knowledge of the stochastic properties of these channels to maximize its transmission rate by switching channels opportunistically. Formally, we provide a generic analysis on the opportunistic spectrum access problem by casting the problem into the restless multi-armed bandit (RMAB) problem, one of the most well-known generalizations of the classic multi-armed bandit (MAB) problem, which is of fundamental importance in stochastic decision theory. Despite the significant research efforts in the field, the RMAB problem in its generic form still remains open. Until today, very little result is reported on the structure of the optimal policy. Obtaining the optimal policy for a general RMAB problem is often intractable due to the exponential computation complexity. Hence, a natural alternative is to seek a simple myopic policy maximizing the short-term reward. Therefore, we develop three axioms characterizing a family of functions which we refer to as regular functions, which are generic and practically important. We then establish the optimality of the myopic policy when the reward function can be expressed as a regular function and the discount factor is bounded by a closed-form threshold determined by the reward function. We also illustrate how the derived results, generic in nature, are applied to analyze a class of RMAB problems arising from multi-channel opportunistic access. Next, we further investigate the more challenging problem where the user has to decide the number of channels to sense in each slot in order to maximize its utility (e.g., throughput). After showing the exponential complexity of the problem, we develop a heuristic v-step look-ahead strategy. In the developed strategy, the parameter v allows to achieve a desired tradeoff between social efficiency and computation complexity. We demonstrate the benefits of the proposed strategy via numerical experiments on several typical settings. Multi-canal d'accès opportuniste Restless Multi-Armed Bandit Politique myope Optimisation stochastique Multi-Channel opportunistic access Restless Multi-Armed Bandit Myopic Policy Stochastic Optimization
3	Delay Differentiation By Balancing Weighted Queue Lengths Chakraborty, Avijit 05 1900 (has links) (PDF) Scheduling policies adopted for statistical multiplexing should provide delay differentiation between different traffic classes, where each class represents an aggregate traﬃc of individual applications having same target-queueing-delay requirements. We propose scheduling to optimally balance weighted mean instanteneous queue lengths and later weighted mean cumulative queue lengths as an approach to delay differentiation, where the class weights are set inversely proportional to the respective products of target delays and packet arrival rates. In particular, we assume a discrete-time, two-class, single-server queueing model with unit service time per packet and provide mathematical frame-work throughout our work. For iid Bernoulli packet arrivals, using a step-wise cost-dominance analytical approach using instantaneous queue lengths alone, for a class of one-stage cost functions not necessarily convex, we find the structure of the total-cost optimal policies for a part of the state space. We then consider two particular one-stage cost functions for finding two scheduling policies that are total-cost optimal for the whole state-space. The policy for the absolute weighted difference cost function minimizes the stationary mean, and the policy for the weighted sum-of-square cost function minimizes the stationary second-order moment, of the absolute value of the weighted difference of queue lengths. For the case of weighted sum-of-square cost function, the ‘iid Bernoulli arrivals’ assumption can be relaxed to either ‘iid arrivals with general batch sizes’ or to ‘Markovian zero-one arrivals’ for all of the state space, but for the linear switching curve. We then show that the average cost, starting from any initial state, exists, and is finite for every stationary work-conserving policy for our choices of the one-stage cost-function. This is shown for arbitrary number of class queues and for any i.i.d. batch arrival processes with finite appropriate moments. We then use cumulative queue lengths information in the one-step cost function of the optimization formulation and obtain an optimal myopic policy with 3 stages to go for iid arrivals with general batch sizes. We show analytically that this policy achieves the given target delay ratio in the long run under finite buffer assumption, given that feasibility conditions are satisfied. We take recourse to numerical value iteration to show the existence of average-cost for this policy. Simulations with varied class-weights for Bernoulli arrivals and batch arrivals with Poisson batch sizes show that this policy achieves mean queueing delays closer to the respective target delays than the policy obtained earlier. We also note that the coefficients of variation of the queueing delays of both the classes using cumulative queue lengths are of the same order as those using instantaneous queue lengths. Moreover, the short-term behaviour of the optimal myopic policy using cumulative queue lengths is superior to the existing standard policy reported by Coffman and Mitrani by a factor in the range of 3 to 8. Though our policy performs marginally poorer compared to the value-iterated, sampled, and then stationarily employed policy, the later lacks any closed-form structure. We then modify the definition of the third state variable and look to directly balance weighted mean delays. We come up with another optimal myopic policy with 3 stages to go, following which the error in the ratio of mean delays decreases as the window-size, as opposed to the policy mentioned in the last paragraph, wherein the error decreases as the square-root of the window-size. We perform numerical value-iteration to show the existence of average-cost and study the performance by simulation. Performance of our policy is comparable with the value-iterated, sampled, and then stationarily employed policy, reported by Mallesh. We have then studied general inter-arrival time processes and obtained the optimal myopic policy for the Pareto inter-arrival process, in particular. We have supported with simulation that our policy fares similarly to the PAD policy, reported by Dovrolis et. al., which is primarily heuristic in nature. We then model the possible packet errors in the multiplexed channel by either a Bernoulli process, or a Markov modulated Bernoulli process with two possible channel states. We also consider two possible round-trip-time values for control information, namely zero and one-slot. The policies that are next-stage optimal (for zero round-trip-time), and two-stage optimal (for one-slot round-trip-time) are obtained. Simulations with varied class-weights for Bernoulli arrivals and batch arrivals with Poisson batch sizes show that these policies indeed achieve mean queueing delays very close to the respective target delays. We also obtain the structure for optimal policies with N = 2 + ⌈rtt⌉ stages-to-go for generic values of rtt, and which need not be multiple of time-slots. Queue Lengths Statistical Multiplexing Queueing Delay Differentiation Weighted Queue Lengths Queueing Delays Queue Length Balancing Optimal Myopic Policy Queue Length Scheduling Queuing Model Packet Erors Statistical Multiplexer Delay Differentiation Queueing Delay Balancing Multiclass Queueing Networks Communication Engineering

1

Page generated in 0.0387 seconds