Spelling suggestions: "subject:"[een] BANDIT"" "subject:"[enn] BANDIT""
61 |
A conceptualização de bandido em expressões bandido de x: uma perspectiva cognitivista / Conceptualization of bandit in expressions bandit of x: a cognitive perspectiveJuliana dos Santos Ferreira 29 May 2012 (has links)
Com vistas à conceptualização do conceito de BANDIDO em 32 expressões com a estrutura bandido de x, descrevemos, nesta dissertação, os modelos cognitivos idealizados subjacentes à construção de sentido de tais expressões, postulando-lhes um caráter de modelo cognitivo complexo, nos termos de Lakoff (1987), produtivo na língua. Constituem ainda o arcabouço teórico deste estudo a Teoria da Mesclagem Conceptual (FAUCONNIER e TURNER, 2002) e a Teoria da Metáfora Conceptual (LAKOFF e JOHNSON, 1980). A análise das construções bandido de x foi realizada a partir de 137 comentários retirados da internet e definições elaboradas por 15 alunos do ensino fundamental; 18 do ensino médio e 20 alunos do ensino superior. Os alunos que colaboraram com a pesquisa definiram 24 expressões bandido de x. A pesquisa obedeceu ao procedimento qualitativo de análise dos dados, no qual observamos as diferentes interpretações dadas para as expressões, fundamentando-as a partir dos processos cognitivos envolvidos no sentido das mesmas. Assim com base na análise dos comentários de internautas e nas definições de alunos, propomos quatro processos de conceptualização para as expressões bandido de x: (a) conceptualização com base em modelos cognitivos proposicionais, em que x é um locativo interpretado como lugar de origem ou de atuação do bandido bandido de morro, bandido de rua, bandido de cadeia ; (b) conceptualização com base em modelos esquemático-imagéticos, em que observamos a atribuição de uma espécie de escala ao sentido atribuído à construção, culminando em diferentes status para a categoria BANDIDO DE X, subjacente a expressões bandido de primeira/segunda/quinta categoria/linha; (c) conceptualização de BANDIDO DE X com base em modelos metonímicos, em que x é uma peça do vestuário/calçado/acessório, de modo a interpretar o BANDIDO como pertencendo a uma categoria que costuma utilizar determinada peça de roupa, acessório ou calçado bandido de colarinho branco, bandidos de farda, bandido de chinelo ; (d) conceptualização de BANDIDO DE X com base em modelos metafóricos, em que x é um conceito abstrato que pode ser entendido como um objeto possuído pelo bandido, de forma a caracterizá-lo pela maneira de agir ou expertise bandido de conceito, bandido de atitude, bandido de fé. Acreditamos, assim, na possibilidade de descrição de padrões que regem a conceptualização de BANDIDO DE X, cujos sentidos alcançados por meio de modificadores revelam a produtividade e complexidade do modelo cognitivo BANDIDO / The theme of this study is the concept of bandit. We intend to investigate, analyze and describe the idealized cognitive models of 32 expressions resulted from the construction bandit of x .We organized a corpus composed of 137 comments taken from the internet that contain bandit of x expressions. We provide a description of the Idealized Cognitive Models. It counts on the contributions of Conceptual Metaphor Theory (LAKOFF and JONHSON, 1980), Conceptual Blending Theory (FAUCONNIER and TURNER, 2002) and Idealized Cognitive Models Theory (LAKOFF, 1987). The another part of the corpus was made by analyzing responses of 15 elementary school students, 18 middle school students and 20 college students which students set 24 expressions bandit x. The research followed a qualitative procedure of data analysis which we see the different interpretations given to the terms on the basis of various cognitive processes Thus, based on analysis of comments from netizens and definitions of students, we propose four processes of conceptualization to outlaw expressions of x: (a) conceptualization based on propositional cognitive models, where x is interpreted as a rental place of origin or acting bandit - bandit hill, street thug, thug in jail - (b) conceptualization based on the schematic, pictorial models, we observe the allocation of a kind of scale to the meaning attributed to construction, culminating in different status for category villain of x, the underlying expressions bandit first / second / fifth category / line, (c) conceptualization of x-based metonymic models, where x is a piece of clothing / footwear / accessories, so to interpret the bandit as belonging to a category that tends to use certain piece of clothing, accessory or footwear - white collar crook, uniformed bandits, bandit slipper - (d) conceptualization of BANDIT of x, based on metaphorical models in x is an abstract concept that can be understood as an object owned by the BANDIT in order to characterize it by way of acting or expertise - bandit concept, attitude bandit, bandit of faith. We believe, therefore, the possibility of description of standards governing the conceptualization of BANDIT of x, whose senses achieved through modifiers reveal productivity and complexity of the cognitive model BANDIT
|
62 |
A conceptualização de bandido em expressões bandido de x: uma perspectiva cognitivista / Conceptualization of bandit in expressions bandit of x: a cognitive perspectiveJuliana dos Santos Ferreira 29 May 2012 (has links)
Com vistas à conceptualização do conceito de BANDIDO em 32 expressões com a estrutura bandido de x, descrevemos, nesta dissertação, os modelos cognitivos idealizados subjacentes à construção de sentido de tais expressões, postulando-lhes um caráter de modelo cognitivo complexo, nos termos de Lakoff (1987), produtivo na língua. Constituem ainda o arcabouço teórico deste estudo a Teoria da Mesclagem Conceptual (FAUCONNIER e TURNER, 2002) e a Teoria da Metáfora Conceptual (LAKOFF e JOHNSON, 1980). A análise das construções bandido de x foi realizada a partir de 137 comentários retirados da internet e definições elaboradas por 15 alunos do ensino fundamental; 18 do ensino médio e 20 alunos do ensino superior. Os alunos que colaboraram com a pesquisa definiram 24 expressões bandido de x. A pesquisa obedeceu ao procedimento qualitativo de análise dos dados, no qual observamos as diferentes interpretações dadas para as expressões, fundamentando-as a partir dos processos cognitivos envolvidos no sentido das mesmas. Assim com base na análise dos comentários de internautas e nas definições de alunos, propomos quatro processos de conceptualização para as expressões bandido de x: (a) conceptualização com base em modelos cognitivos proposicionais, em que x é um locativo interpretado como lugar de origem ou de atuação do bandido bandido de morro, bandido de rua, bandido de cadeia ; (b) conceptualização com base em modelos esquemático-imagéticos, em que observamos a atribuição de uma espécie de escala ao sentido atribuído à construção, culminando em diferentes status para a categoria BANDIDO DE X, subjacente a expressões bandido de primeira/segunda/quinta categoria/linha; (c) conceptualização de BANDIDO DE X com base em modelos metonímicos, em que x é uma peça do vestuário/calçado/acessório, de modo a interpretar o BANDIDO como pertencendo a uma categoria que costuma utilizar determinada peça de roupa, acessório ou calçado bandido de colarinho branco, bandidos de farda, bandido de chinelo ; (d) conceptualização de BANDIDO DE X com base em modelos metafóricos, em que x é um conceito abstrato que pode ser entendido como um objeto possuído pelo bandido, de forma a caracterizá-lo pela maneira de agir ou expertise bandido de conceito, bandido de atitude, bandido de fé. Acreditamos, assim, na possibilidade de descrição de padrões que regem a conceptualização de BANDIDO DE X, cujos sentidos alcançados por meio de modificadores revelam a produtividade e complexidade do modelo cognitivo BANDIDO / The theme of this study is the concept of bandit. We intend to investigate, analyze and describe the idealized cognitive models of 32 expressions resulted from the construction bandit of x .We organized a corpus composed of 137 comments taken from the internet that contain bandit of x expressions. We provide a description of the Idealized Cognitive Models. It counts on the contributions of Conceptual Metaphor Theory (LAKOFF and JONHSON, 1980), Conceptual Blending Theory (FAUCONNIER and TURNER, 2002) and Idealized Cognitive Models Theory (LAKOFF, 1987). The another part of the corpus was made by analyzing responses of 15 elementary school students, 18 middle school students and 20 college students which students set 24 expressions bandit x. The research followed a qualitative procedure of data analysis which we see the different interpretations given to the terms on the basis of various cognitive processes Thus, based on analysis of comments from netizens and definitions of students, we propose four processes of conceptualization to outlaw expressions of x: (a) conceptualization based on propositional cognitive models, where x is interpreted as a rental place of origin or acting bandit - bandit hill, street thug, thug in jail - (b) conceptualization based on the schematic, pictorial models, we observe the allocation of a kind of scale to the meaning attributed to construction, culminating in different status for category villain of x, the underlying expressions bandit first / second / fifth category / line, (c) conceptualization of x-based metonymic models, where x is a piece of clothing / footwear / accessories, so to interpret the bandit as belonging to a category that tends to use certain piece of clothing, accessory or footwear - white collar crook, uniformed bandits, bandit slipper - (d) conceptualization of BANDIT of x, based on metaphorical models in x is an abstract concept that can be understood as an object owned by the BANDIT in order to characterize it by way of acting or expertise - bandit concept, attitude bandit, bandit of faith. We believe, therefore, the possibility of description of standards governing the conceptualization of BANDIT of x, whose senses achieved through modifiers reveal productivity and complexity of the cognitive model BANDIT
|
63 |
Sélection séquentielle en environnement aléatoire appliquée à l'apprentissage superviséCaelen, Olivier 25 September 2009 (has links)
Cette thèse se penche sur les problèmes de décisions devant être prises de manière séquentielle au sein d'un environnement aléatoire. Lors de chaque étape d'un tel problème décisionnel, une alternative doit être sélectionnée parmi un ensemble d'alternatives. Chaque alternative possède un gain moyen qui lui est propre et lorsque l'une d'elles est sélectionnée, celle-ci engendre un gain aléatoire. La sélection opérée peut suivre deux types d'objectifs.<p>Dans un premier cas, les tests viseront à maximiser la somme des gains collectés. Un juste compromis doit alors être trouvé entre l'exploitation et l'exploration. Ce problème est couramment dénommé dans la littérature scientifique "multi-armed bandit problem".<p>Dans un second cas, un nombre de sélections maximal est imposé et l'objectif consistera à répartir ces sélections de façon à augmenter les chances de trouver l'alternative présentant le gain moyen le plus élevé. Ce deuxième problème est couramment repris dans la littérature scientifique sous l'appellation "selecting the best".<p>La sélection de type gloutonne joue un rôle important dans la résolution de ces problèmes de décision et opère en choisissant l'alternative qui s'est jusqu'ici montrée optimale. Or, la nature généralement aléatoire de l'environnement rend incertains les résultats d'une telle sélection. <p>Dans cette thèse, nous introduisons une nouvelle quantité, appelée le "gain espéré d'une action gloutonne". Sur base de quelques propriétés de cette quantité, de nouveaux algorithmes permettant de résoudre les deux problèmes décisionnels précités seront proposés.<p>Une attention particulière sera ici prêtée à l'application des techniques présentées au domaine de la sélection de modèles en l'apprentissage artificiel supervisé. <p>La collaboration avec le service d'anesthésie de l'Hôpital Erasme nous a permis d'appliquer les algorithmes proposés à des données réelles, provenant du milieu médical. Nous avons également développé un système d'aide à la décision dont un prototype a déjà été testé en conditions réelles sur un échantillon restreint de patients. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
|
64 |
Essays on experimental group dynamics and competitionWilliam J Brown (10996413) 23 July 2021 (has links)
<div>This thesis consists of three chapters. In the first chapter, I investigate the effects of complexity in various voting systems on individual behavior in small group electoral competitions. Using a laboratory experiment, I observe individual behavior within one of three voting systems -- plurality, instant runoff voting (IRV), and score then automatic runoff (STAR). I then estimate subjects' behavior in three different models of bounded rationality. The estimated models are a model of Level-K thinking (Nagel, 1995), the Cognitive Hierarchy (CH) model (Camerer, et al. 2004), and a Quantal Response Equilibrium (QRE) (McKelvey and Palfrey 1995). I consistently find that more complex voting systems induce lower levels of strategic thinking. This implies that policy makers desiring more sincere voting behavior could potentially achieve this through voting systems with more complex strategy sets. Of the tested behavioral models, Level-K consistently fits observed data the best, implying subjects make decisions that combine of steps of thinking with random, utility maximizing, errors.</div><div><br></div><div>In the second chapter, I investigate the relationship between the mechanisms used to select leaders and both measures of group performance and leaders' ethical behavior. Using a laboratory experiment, we measure group performance in a group minimum effort task with a leader selected using one of three mechanisms: random, a competition task, and voting. After the group task, leaders must complete a task that asks them to behave honestly or dishonestly in questions related to the groups performance. We find that leaders have a large impact on group performance when compared to those groups without leaders. Evidence for which selection mechanism performs best in terms of group performance seems mixed. On measures of honesty, the strongest evidence seems to indicate that honesty is most positively impacted through a voting selection mechanism, which differences in ethical behavior between the random and competition selection treatments are negligible.</div><div><br></div><div><br></div><div>In the third chapter, I provide an investigation into the factors and conditions that drive "free riding" behavior in dynamic innovation contests. Starting from a dynamic innovation contest model from Halac, et al. (2017), I construct a two period dynamic innovation contest game. From there, I provide a theoretical background and derivation of mixed strategies that can be interpreted as an agent's degree to which they engage in free riding behavior, namely through allowing their opponent to exert effort in order to uncover information about an uncertain state of the world. I show certain conditions must be fulfilled in order to induce free riding in equilibrium, and also analytically show the impact of changing contest prize structures on the degree of free riding. I end this paper with an experimental design to test these various theoretical conclusions in a laboratory setting while also considering the behavioral observations recorded in studies investigating similar contest models and provide a plan to analyze the data collected by this laboratory experiment.</div><div><br></div><div>All data collected for this study consists of individual human subject data collected from laboratory experiments. Project procedures have been conducted in accordance with Purdue's internal review board approval and known consent from all participants was obtained.</div>
|
65 |
Visuell återkoppling i casinospel : Visuell återkoppling påverkar spelandet i casinospel / Visual feedback in casino games : Visual feedback affects gaming in casino gamesBjarre, Hanna, Richardsson, Matilda January 2019 (has links)
Nya casinon dyker upp allt oftare och spelberoende är ett återkommande ämne i nyheterna. Trots alla nya onlinecasinon finns det inga studier idag som undersöker hur den visuella aspekten av dessa onlinespel påverkar spelarna. Denna studie undersöker hur visuell återkoppling påverkar individernas spelande och risktagande när de spelar på onlinecasinon. För att studera dessa effekter genomfördes en studie på 45 studenter som delades in i två grupper. Båda grupperna blev ombedda att spela en prototyp av en enarmad bandit skapad för denna studie som heter "Casino Slotmachine". En grupp såg en vanlig grå pop-up ruta med resultat, medan den andra gruppen utsattes för animeringar i form av fallande konfetti när de vann och en mer estetiskt tilltalande popup-box i en ljusgul ton med resultat. Under spelet registrerades deltagarnas satsningar per runda, tid per runda samt antalet klick per runda. Varje deltagare spelade 20 rundor av spelet. Efter spelet svarade deltagarna på en enkät om deras spelupplevelse. Resultaten visar att deltagarna som fick visuell återkoppling tog större risker under sessionen än gruppen som inte fick visuell återkoppling. Visuell återkoppling kan därför vara en underliggande faktor till varför vissa spelare i onlinecasinon tar större risker idag. Undersökningen visar också på att de som fick visuell återkoppling blev mindre stressade av spelet än de som inte gjorde det. Både män och kvinnor påverkades på samma sätt av den visuella återkopplingen. / New casinos appear online constantly, and gambling addiction is a recurring subject in the news. Despite all the new online casinos, there are no studies today that investigate how the visual aspect of these online games affects the players. This study examines how visual feedback affects individuals' gambling and risk-taking when gambling on online casinos. To study these effects, a study was conducted on 45 students who were divided into two groups. Both groups were asked to play a prototype of a slot machine created for this study called "Casino Slotmachine". One group saw a regular gray pop-up box displaying net profit, while the other group was exposed to animations in the form of falling confetti when winning, and a more aesthetically pleasing pop-up box in a light yellow tone displaying net profit. During the game, the participants' bet per round, time per round and the amount of clicks per round was recorded. Each participant played 20 rounds of the game. After playing, the participants responded to a survey about their gaming experience. The results show that the participants who received visual feedback took more risks during the session than the group that did not receive visual feedback. Visual feedback can therefore be an underlying factor to why some players in online casinos are taking greater risks today. The survey also shows that those who received visual feedback became less stressed by the game than those who did not. Both men and women were similarly affected by the visual feedback.
|
66 |
“Yo Soy Joaquín Murrieta”: Los múltiples rostros de Joaquín a través del espacio y el tiempoMinonne, Francesca January 2008 (has links)
No description available.
|
67 |
Navigating Uncertainty: Distributed and Bandit Solutions for Equilibrium Learning in Multiplayer GamesYuanhanqing Huang (18361527) 15 April 2024 (has links)
<p dir="ltr">In multiplayer games, a collection of self-interested players aims to optimize their individual cost functions in a non-cooperative manner. The cost function of each player depends not only on its own actions but also on the actions of others. In addition, players' actions may also collectively satisfy some global constraints. The study of this problem has grown immensely in the past decades with applications arising in a wide range of societal systems, including strategic behaviors in power markets, traffic assignment of strategic risk-averse users, engagement of multiple humanitarian organizations in disaster relief, etc. Furthermore, with machine learning models playing an increasingly important role in practical applications, the robustness of these models becomes another prominent concern. Investigation into the solutions of multiplayer games and Nash equilibrium problems (NEPs) can advance the algorithm design for fitting these models in the presence of adversarial noises. </p><p dir="ltr">Most of the existing methods for solving multiplayer games assume the presence of a central coordinator, which, unfortunately, is not practical in many scenarios. Moreover, in addition to couplings in the objectives and the global constraints, all too often, the objective functions contain uncertainty in the form of stochastic noises and unknown model parameters. The problem is further complicated by the following considerations: the individual objectives of players may be unavailable or too complex to model; players may exhibit reluctance to disclose their actions; players may experience random delays when receiving feedback regarding their actions. To contend with these issues and uncertainties, in the first half of the thesis, we develop several algorithms based on the theory of operator splitting and stochastic approximation, where the game participants only share their local information and decisions with their trusted neighbors on the network. In the second half of the thesis, we explore the bandit online learning framework as a solution to the challenges, where decisions made by players are updated based solely on the realized objective function values. Our future work will delve into data-driven approaches for learning in multiplayer games and we will explore functional representations of players' decisions, in a departure from the vector form. </p>
|
68 |
Machine Learning and Statistical Decision Making for Green Radio / Apprentissage statistique et prise de décision pour la radio verteModi, Navikkumar 17 May 2017 (has links)
Cette thèse étudie les techniques de gestion intelligente du spectre et de topologie des réseaux via une approche radio intelligente dans le but d’améliorer leur capacité, leur qualité de service (QoS – Quality of Service) et leur consommation énergétique. Les techniques d’apprentissage par renforcement y sont utilisées dans le but d’améliorer les performances d’un système radio intelligent. Dans ce manuscrit, nous traitons du problème d’accès opportuniste au spectre dans le cas de réseaux intelligents sans infrastructure. Nous nous plaçons dans le cas où aucune information n’est échangée entre les utilisateurs secondaires (pour éviter les surcoûts en transmissions). Ce problème particulier est modélisé par une approche dite de bandits manchots « restless » markoviens multi-utilisateurs (multi-user restless Markov MAB -multi¬armed bandit). La contribution principale de cette thèse propose une stratégie d’apprentissage multi-joueurs qui prend en compte non seulement le critère de disponibilité des canaux (comme déjà étudié dans la littérature et une thèse précédente au laboratoire), mais aussi une métrique de qualité, comme par exemple le niveau d’interférence mesuré (sensing) dans un canal (perturbations issues des canaux adjacents ou de signaux distants). Nous prouvons que notre stratégie, RQoS-UCB distribuée (distributed restless QoS-UCB – Upper Confidence Bound), est quasi optimale car on obtient des performances au moins d’ordre logarithmique sur son regret. En outre, nous montrons par des simulations que les performances du système intelligent proposé sont améliorées significativement par l’utilisation de la solution d’apprentissage proposée permettant à l’utilisateur secondaire d’identifier plus efficacement les ressources fréquentielles les plus disponibles et de meilleure qualité. Cette thèse propose également un nouveau modèle d’apprentissage par renforcement combiné à un transfert de connaissance afin d’améliorer l’efficacité énergétique (EE) des réseaux cellulaires hétérogènes. Nous formulons et résolvons un problème de maximisation de l’EE pour le cas de stations de base (BS – Base Stations) dynamiquement éteintes et allumées (ON-OFF). Ce problème d’optimisation combinatoire peut aussi être modélisé par des bandits manchots « restless » markoviens. Par ailleurs, une gestion dynamique de la topologie des réseaux hétérogènes, utilisant l’algorithme RQoS-UCB, est proposée pour contrôler intelligemment le mode de fonctionnement ON-OFF des BS, dans un contexte de trafic et d’étude de capacité multi-cellulaires. Enfin une méthode incluant le transfert de connaissance « transfer RQoS-UCB » est proposée et validée par des simulations, pour pallier les pertes de récompense initiales et accélérer le processus d’apprentissage, grâce à la connaissance acquise à d’autres périodes temporelles correspondantes à la période courante (même heure de la journée la veille, ou même jour de la semaine par exemple). La solution proposée de gestion dynamique du mode ON-OFF des BS permet de diminuer le nombre de BS actives tout en garantissant une QoS adéquate en atténuant les fluctuations de la QoS lors des variations du trafic et en améliorant les conditions au démarrage de l’apprentissage. Ainsi, l’efficacité énergétique est grandement améliorée. Enfin des démonstrateurs en conditions radio réelles ont été développés pour valider les solutions d’apprentissage étudiées. Les algorithmes ont également été confrontés à des bases de données de mesures effectuées par un partenaire dans la gamme de fréquence HF, pour des liaisons transhorizon. Les résultats confirment la pertinence des solutions d’apprentissage proposées, aussi bien en termes d’optimisation de l’utilisation du spectre fréquentiel, qu’en termes d’efficacité énergétique. / Future cellular network technologies are targeted at delivering self-organizable and ultra-high capacity networks, while reducing their energy consumption. This thesis studies intelligent spectrum and topology management through cognitive radio techniques to improve the capacity density and Quality of Service (QoS) as well as to reduce the cooperation overhead and energy consumption. This thesis investigates how reinforcement learning can be used to improve the performance of a cognitive radio system. In this dissertation, we deal with the problem of opportunistic spectrum access in infrastructureless cognitive networks. We assume that there is no information exchange between users, and they have no knowledge of channel statistics and other user's actions. This particular problem is designed as multi-user restless Markov multi-armed bandit framework, in which multiple users collect a priori unknown reward by selecting a channel. The main contribution of the dissertation is to propose a learning policy for distributed users, that takes into account not only the availability criterion of a band but also a quality metric linked to the interference power from the neighboring cells experienced on the sensed band. We also prove that the policy, named distributed restless QoS-UCB (RQoS-UCB), achieves at most logarithmic order regret. Moreover, numerical studies show that the performance of the cognitive radio system can be significantly enhanced by utilizing proposed learning policies since the cognitive devices are able to identify the appropriate resources more efficiently. This dissertation also introduces a reinforcement learning and transfer learning frameworks to improve the energy efficiency (EE) of the heterogeneous cellular network. Specifically, we formulate and solve an energy efficiency maximization problem pertaining to dynamic base stations (BS) switching operation, which is identified as a combinatorial learning problem, with restless Markov multi-armed bandit framework. Furthermore, a dynamic topology management using the previously defined algorithm, RQoS-UCB, is introduced to intelligently control the working modes of BSs, based on traffic load and capacity in multiple cells. Moreover, to cope with initial reward loss and to speed up the learning process, a transfer RQoS-UCB policy, which benefits from the transferred knowledge observed in historical periods, is proposed and provably converges. Then, proposed dynamic BS switching operation is demonstrated to reduce the number of activated BSs while maintaining an adequate QoS. Extensive numerical simulations demonstrate that the transfer learning significantly reduces the QoS fluctuation during traffic variation, and it also contributes to a performance jump-start and presents significant EE improvement under various practical traffic load profiles. Finally, a proof-of-concept is developed to verify the performance of proposed learning policies on a real radio environment and real measurement database of HF band. Results show that proposed multi-armed bandit learning policies using dual criterion (e.g. availability and quality) optimization for opportunistic spectrum access is not only superior in terms of spectrum utilization but also energy efficient.
|
69 |
Machine Learning and Statistical Decision Making for Green Radio / Apprentissage statistique et prise de décision pour la radio verteModi, Navikkumar 17 May 2017 (has links)
Cette thèse étudie les techniques de gestion intelligente du spectre et de topologie des réseaux via une approche radio intelligente dans le but d’améliorer leur capacité, leur qualité de service (QoS – Quality of Service) et leur consommation énergétique. Les techniques d’apprentissage par renforcement y sont utilisées dans le but d’améliorer les performances d’un système radio intelligent. Dans ce manuscrit, nous traitons du problème d’accès opportuniste au spectre dans le cas de réseaux intelligents sans infrastructure. Nous nous plaçons dans le cas où aucune information n’est échangée entre les utilisateurs secondaires (pour éviter les surcoûts en transmissions). Ce problème particulier est modélisé par une approche dite de bandits manchots « restless » markoviens multi-utilisateurs (multi-user restless Markov MAB -multi¬armed bandit). La contribution principale de cette thèse propose une stratégie d’apprentissage multi-joueurs qui prend en compte non seulement le critère de disponibilité des canaux (comme déjà étudié dans la littérature et une thèse précédente au laboratoire), mais aussi une métrique de qualité, comme par exemple le niveau d’interférence mesuré (sensing) dans un canal (perturbations issues des canaux adjacents ou de signaux distants). Nous prouvons que notre stratégie, RQoS-UCB distribuée (distributed restless QoS-UCB – Upper Confidence Bound), est quasi optimale car on obtient des performances au moins d’ordre logarithmique sur son regret. En outre, nous montrons par des simulations que les performances du système intelligent proposé sont améliorées significativement par l’utilisation de la solution d’apprentissage proposée permettant à l’utilisateur secondaire d’identifier plus efficacement les ressources fréquentielles les plus disponibles et de meilleure qualité. Cette thèse propose également un nouveau modèle d’apprentissage par renforcement combiné à un transfert de connaissance afin d’améliorer l’efficacité énergétique (EE) des réseaux cellulaires hétérogènes. Nous formulons et résolvons un problème de maximisation de l’EE pour le cas de stations de base (BS – Base Stations) dynamiquement éteintes et allumées (ON-OFF). Ce problème d’optimisation combinatoire peut aussi être modélisé par des bandits manchots « restless » markoviens. Par ailleurs, une gestion dynamique de la topologie des réseaux hétérogènes, utilisant l’algorithme RQoS-UCB, est proposée pour contrôler intelligemment le mode de fonctionnement ON-OFF des BS, dans un contexte de trafic et d’étude de capacité multi-cellulaires. Enfin une méthode incluant le transfert de connaissance « transfer RQoS-UCB » est proposée et validée par des simulations, pour pallier les pertes de récompense initiales et accélérer le processus d’apprentissage, grâce à la connaissance acquise à d’autres périodes temporelles correspondantes à la période courante (même heure de la journée la veille, ou même jour de la semaine par exemple). La solution proposée de gestion dynamique du mode ON-OFF des BS permet de diminuer le nombre de BS actives tout en garantissant une QoS adéquate en atténuant les fluctuations de la QoS lors des variations du trafic et en améliorant les conditions au démarrage de l’apprentissage. Ainsi, l’efficacité énergétique est grandement améliorée. Enfin des démonstrateurs en conditions radio réelles ont été développés pour valider les solutions d’apprentissage étudiées. Les algorithmes ont également été confrontés à des bases de données de mesures effectuées par un partenaire dans la gamme de fréquence HF, pour des liaisons transhorizon. Les résultats confirment la pertinence des solutions d’apprentissage proposées, aussi bien en termes d’optimisation de l’utilisation du spectre fréquentiel, qu’en termes d’efficacité énergétique. / Future cellular network technologies are targeted at delivering self-organizable and ultra-high capacity networks, while reducing their energy consumption. This thesis studies intelligent spectrum and topology management through cognitive radio techniques to improve the capacity density and Quality of Service (QoS) as well as to reduce the cooperation overhead and energy consumption. This thesis investigates how reinforcement learning can be used to improve the performance of a cognitive radio system. In this dissertation, we deal with the problem of opportunistic spectrum access in infrastructureless cognitive networks. We assume that there is no information exchange between users, and they have no knowledge of channel statistics and other user's actions. This particular problem is designed as multi-user restless Markov multi-armed bandit framework, in which multiple users collect a priori unknown reward by selecting a channel. The main contribution of the dissertation is to propose a learning policy for distributed users, that takes into account not only the availability criterion of a band but also a quality metric linked to the interference power from the neighboring cells experienced on the sensed band. We also prove that the policy, named distributed restless QoS-UCB (RQoS-UCB), achieves at most logarithmic order regret. Moreover, numerical studies show that the performance of the cognitive radio system can be significantly enhanced by utilizing proposed learning policies since the cognitive devices are able to identify the appropriate resources more efficiently. This dissertation also introduces a reinforcement learning and transfer learning frameworks to improve the energy efficiency (EE) of the heterogeneous cellular network. Specifically, we formulate and solve an energy efficiency maximization problem pertaining to dynamic base stations (BS) switching operation, which is identified as a combinatorial learning problem, with restless Markov multi-armed bandit framework. Furthermore, a dynamic topology management using the previously defined algorithm, RQoS-UCB, is introduced to intelligently control the working modes of BSs, based on traffic load and capacity in multiple cells. Moreover, to cope with initial reward loss and to speed up the learning process, a transfer RQoS-UCB policy, which benefits from the transferred knowledge observed in historical periods, is proposed and provably converges. Then, proposed dynamic BS switching operation is demonstrated to reduce the number of activated BSs while maintaining an adequate QoS. Extensive numerical simulations demonstrate that the transfer learning significantly reduces the QoS fluctuation during traffic variation, and it also contributes to a performance jump-start and presents significant EE improvement under various practical traffic load profiles. Finally, a proof-of-concept is developed to verify the performance of proposed learning policies on a real radio environment and real measurement database of HF band. Results show that proposed multi-armed bandit learning policies using dual criterion (e.g. availability and quality) optimization for opportunistic spectrum access is not only superior in terms of spectrum utilization but also energy efficient.
|
70 |
Statistical Design of Sequential Decision Making AlgorithmsChi-hua Wang (12469251) 27 April 2022 (has links)
<p>Sequential decision-making is a fundamental class of problem that motivates algorithm designs of online machine learning and reinforcement learning. Arguably, the resulting online algorithms have supported modern online service industries for their data-driven real-time automated decision making. The applications span across different industries, including dynamic pricing (Marketing), recommendation (Advertising), and dosage finding (Clinical Trial). In this dissertation, we contribute fundamental statistical design advances for sequential decision-making algorithms, leaping progress in theory and application of online learning and sequential decision making under uncertainty including online sparse learning, finite-armed bandits, and high-dimensional online decision making. Our work locates at the intersection of decision-making algorithm designs, online statistical machine learning, and operations research, contributing new algorithms, theory, and insights to diverse fields including optimization, statistics, and machine learning.</p>
<p><br></p>
<p>In part I, we contribute a theoretical framework of continuous risk monitoring for regularized online statistical learning. Such theoretical framework is desirable for modern online service industries on monitoring deployed model's performance of online machine learning task. In the first project (Chapter 1), we develop continuous risk monitoring for the online Lasso procedure and provide an always-valid algorithm for high-dimensional dynamic pricing problems. In the second project (Chapter 2), we develop continuous risk monitoring for online matrix regression and provide new algorithms for rank-constrained online matrix completion problems. Such theoretical advances are due to our elegant interplay between non-asymptotic martingale concentration theory and regularized online statistical machine learning.</p>
<p><br></p>
<p>In part II, we contribute a bootstrap-based methodology for finite-armed bandit problems, termed Residual Bootstrap exploration. Such a method opens a possibility to design model-agnostic bandit algorithms without problem-adaptive optimism-engineering and instance-specific prior-tuning. In the first project (Chapter 3), we develop residual bootstrap exploration for multi-armed bandit algorithms and shows its easy generalizability to bandit problems with complex or ambiguous reward structure. In the second project (Chapter 4), we develop a theoretical framework for residual bootstrap exploration in linear bandit with fixed action set. Such methodology advances are due to our development of non-asymptotic theory for the bootstrap procedure.</p>
<p><br></p>
<p>In part III, we contribute application-driven insights on the exploration-exploitation dilemma for high-dimensional online decision-making problems. Such insights help practitioners to implement effective high-dimensional statistics methods to solve online decisionmaking problems. In the first project (Chapter 5), we develop a bandit sampling scheme for online batch high-dimensional decision making, a practical scenario in interactive marketing, and sequential clinical trials. In the second project (Chapter 6), we develop a bandit sampling scheme for federated online high-dimensional decision-making to maintain data decentralization and perform collaborated decisions. These new insights are due to our new bandit sampling design to address application-driven exploration-exploitation trade-offs effectively. </p>
|
Page generated in 0.0482 seconds