Spelling suggestions: "subject:"bandits"" "subject:"andits""
51 |
Adaptive Measurement Strategies for Network Optimization and Control / Adaptiva Mätstrategier för Optimering och Reglering av NätverkLindståhl, Simon January 2023 (has links)
The fifth generation networks is rapidly becoming the new network standardand its new technological capabilities are expected to enable a far widervariety of services compared to the fourth generation networks. To ensurethat these services can co-exist and meet their standardized requirements,the network’s resources must be provisioned, managed and reconfigured ina far more complex manner than before. As such, it is no longer sufficientto select a simple, static scheme for gathering the necessary information totake decisions. Instead, it is necessary to adaptively, with regards to networksystem dynamics, trade-off the cost in terms of power, CPU and bandwidthconsumption of the taken measurements to the value their information brings.Orchestration is a wide field, and the way to quantify the value of a givenmeasurement heavily depends on the problem studied. As such, this thesisaddresses adaptive measurement schemes for a number of well-defined networkoptimization problems. The thesis is presented as a compilation, whereafter an introduction detailing the background, purpose, problem formulation,methodology and contributions of our work, we present each problemseparately through the papers submitted to several conferences. First, we study the problem of optimal spectrum access for low priorityservices. We assume that the network manager has limited opportunitiesto measure the spectrum before assigning one (if any) resource block to thesecondary service for transmission, and this measurement has a known costattached to it. We study this framework through the lens of multi-armedbandits with multiple arm pulls per decision, a framework we call predictivebandits. We analyze such bandits and show a problem specific lower bound ontheir regret, as well as design an algorithm which meets this regret asymptotically,studying both the case where measurements are perfect and the casewhere the measurement has noise of known quantity. Studying a syntheticsimulated problem, we find that it performs considerably better compared toa simple benchmark strategy. Secondly, we study a variation of admission control where the controllermust select one of multiple slices to enter a new service into. The agentdoes not know the resources available in the slices initially, and must insteadmeasure these, subject to noise. Mimicking three commonly used admissioncontrol strategies, we study this as a best arm identification problem, whereone or multiple arms is ”correct” (the arm chose by the strategy if it had fullinformation). Through this framework, we analyze each strategy and devisesample complexity lower bounds, as well as algorithms that meet these lowerbounds. In simulations with synthetic data, we show that our measurementalgorithm can vastly reduce the number of required measurements comparedto uniform sampling strategies. Finally, we study a network monitoring system where the controller mustdetect sudden changes in system behavior such as batch traffic arrivals orhandovers, in order to take future action. We study this through the lensof change point detection but argue that the classical framework is insufficientfor capturing both physical time aspects such as delay as well as measurementcosts independently, and present an alternative framework whichiidecouples these, requiring more sophisticated monitoring agents. We show,both through theory and through simulation with both synthetic data anddata from a 5G testbed, that such adaptive schedules qualitatively and quantitativelyimprove upon classical change point detection schemes in terms ofmeasurment frequency, without losing classical optimality guarantees such asthe one on required measurements post change. / Femte generationens nätverk håller snabbt på att bli den nya standarden och dess teknologiska förmågor förväntas bereda väg för en avsevärt större variation av tjänster jämfört med fjärde generationens nätverk. För att se till att dessa tjänster kan samexistera och möta sina standardiserade krav måste nätverkens resurser provisioneras, hanteras och omkonfigureras på ett mycket mer komplext vis än tidigare. Det är därmed inte längre tillräckligt att välja en simpel, statisk plan för att samla den nödvändiga information som krävs för att ta beslut. Istället behöver man adaptivt, med hänsyn till nätversystemens dynamik, avväga mätningarnas kostnad i termer av effekt-, CPU- och bandbreddskonsumtion mot det värde som de medför. Den här sortens nätverksorkestrering är ett brett fält, och hur mätningarnas värde ska kvantifieras beror i hög grad på vilket optimeringsproblem som studeras. Således bemöter den här avhandlningen adaptiva mätplaner för ett antal väldefinerade optimeringsproblem. Avhandlingen tar formen av en sammanlänkning, där följandes en introduktion som beskriver bakgrund, syfte, problemformulering, metodologi och forskningsbidrag så presenterar vi varje problem separat genom de artiklar vi inlämnat till olika konferenser. Först studerar vi optimal spektrumaccess för lågprioritetstjänster. Vi antar att nätverksregulatorn har begränsat med möjligheter att mäta spektrumanvändning innan den tillger som mest ett resursblock till tjänsten med lägre prioritet att skicka data på, och de här mätningarna har en känd kostnad. Vi studerar det här ramverket från perspektivet av flerarmade banditer med flera armdragningar per beslut, ett ramverk vi benämner förutsägande banditer (predictive bandits). Vi analyserar sådana banditer och visar en problemspecifik undre gräns på dess inlärningsförlust, samt designar en algorithm som presterar lika bra som denna gräns i den asymptotiska regimen. Vi studerar fallet där mätningarna är perfekta såväl som fallet där mätningarna har brus med känd storlek. Genom att studera ett syntetiskt simulerat problem av detta slag finner vi att vår algoritm presterar avsevärt bättre jämfört med en simplare riktmärkesstrategi. Därefter studerar vi en variation av tillträdeskontroll, där en regulator måste välja en av ett antal betjänter att släppa in en ny tjänst till (om någon alls). Agenten vet ursprungligen inte vilka resurser som finns betjänterna tillgängliga, utan måste mäta detta med brusiga mätningar. Vi härmar tre vanligt använda tillträdesstrategier och studerar detta som ett bästa-arms identifieringsproblem, där en eller flera armar är "korrekta" (det vill säga, de armar som hade valts av tillträdesstrategin om den hade haft perfekt kännedom). Med det här ramverket analyserar vi varje strategi och visar undre gränser på antalet mätningar som krävs, och skapar algoritmer som möter dessa gränser. I simuleringar med syntetisk data visar vi att våra mätalgoritmer kan drastiskt reducera antalet mätningar som krävs jämfört med jämlika mätstrategier. Slutligen studerar vi ett övervakningssystem där agenten måste upptäcka plötsliga förändringar i systemets beteende såsom förändringar i trafiken eller överräckningar mellan master, för att kunna agera därefter. Vi studerar detta med ramverket förändringsdetektion, men argumenterar att det klassiska ramverket är otillräckligt för att bemöta aspekter berörande fysisk tid (som fördröjning) samtidigt som den bemöter mätningarnas kostnad. Vi presenterar därmed ett alternativt ramverk som frikopplar de två, vilket i sin tur kräver mer sostifikerade övervakningssystem. Vi visar, genom både teori och simulering med både syntetisk och experimentell data, att sådana adaptiva mätscheman kan förbättra mätfrekvensen jämfört med klassiska periodiska mätscheman, både kvalitativt och kvantitativt, utan att förlora klassiska optimalitetsgarantier såsom det på antalet mätningar som behövs när förändringen har skett. / <p>QC 20230915</p>
|
52 |
Algorithmic and Ethical Aspects of Recommender Systems in e-CommerceParaschakis, Dimitris January 2018 (has links)
Recommender systems have become an integral part of virtually every e-commerce application on the web. These systems enable users to quickly discover relevant products, at the same time increasing business value. Over the past decades, recommender systems have been modeled using numerous machine learning techniques. However, the adoptability of these models by commercial applications remains unclear. We assess the receptiveness of the industrial sector to algorithmic contributions of the research community by surveying more than 30 e-commerce platforms, and experimenting with various recommenders on proprietary e-commerce datasets. Another overlooked but important factor that complicates the design and use of recommender systems is their ethical implications. We identify and summarize these issues in our ethical recommendation framework, which also enables users to control sensitive moral aspects of recommendations via the proposed “ethical toolbox”. The feasibility of this tool is supported by the results of our user study. Because of moral implications associated with user profiling, we investigate algorithms capable of generating user-agnostic recommendations. We propose an ensemble learning scheme based on Thompson Sampling bandit policy, which models arms as base recommendation functions. We show how to adapt this algorithm to realistic situations when neither arm availability nor reward stationarity is guaranteed.
|
53 |
The pirates of Somalia : maritime bandits or warlords of the high seasCronjé, Dian 03 1900 (has links)
Thesis presented in partial fulfilment of the requirements for the degree of Master of
Philosophy (Political Management) at Stellenbosch University / Thesis (MPhil (Political Science))--University of Stellenbosch, 2010. / ENGLISH ABSTRACT: Inflicting a financial loss of over $US16 billion to international shipping, the occurrence of
maritime piracy in areas such as the Strait of Malacca and the west coast of Africa, has
significantly affected the long-term stability of global maritime trade. Since the collapse of
the Somali state in the early 1990’s, international watch groups have expressed their concern
as to the rise of piracy off the Somali coast and the waterways of the Gulf of Aden. However,
2008 marked an unprecedented increase in pirate attacks in Somali waters. These attacks did
not only increase in number but also became more sophisticated. As more than 85% of world
trade relies on maritime transport, the world was forced to take notice of the magnitude of
Somali piracy. Considering the relative novel nature of Somali piracy, this field presents a
vast potential for further and in-depth academic inquiry.
This descriptive and explanatory study set out to explore the evasive nature of the what and
why (and who) of Somali piracy and relied on inductive reasoning in order (a) to explore and
define the contributing causes to the Somali conflict; (b) to indicate how the conflict and the
resulting consequences in particularly the Puntland region contributed to the rise of maritime
piracy; (c) to determine whether the pirate groups are fishermen protecting their resources by
acting like vigilantes and self-defence units, or if they were bandits, warlords, Islamists or a
combination of aforementioned; and to (d) establish the role which resource scarcity and state
collapse played in rendering Somalia vulnerable to maritime piracy. In pursuing the above
mentioned goals, this study relied on an analysis of authoritative and contemporary sources.
Media reporting was used for updating the fast moving information.
This study attributed the Somali conflict to historic and ethnic clan rivalries and the legacy of
colonial rule that led to the arbitrary partitioning of Somalia by colonial superpowers. Military
rule, oppression, wars with neighbours (Ethiopia), superpower intervention, famine and the
rise of warlords made for state failure in Somalia. In Puntland, such factors were further
aggravated by severe environmental hardship and natural disasters. Food became one of the
scarcest resources in Somalia. People migrated to cities and to the coast where foreign fishing
vessels also exploited the absence of coast guards in plundering fish. Some Somali fishermen reacted and in retrieving fish, apprehended ships, resulting in armed robbery at sea. But many
went further, hijacking merchant vessels, and demanding huge ransoms.
Initially prompted by grievance towards the exploitation of the Somali coastal resources, the
vast financial rewards of piracy rapidly transformed this impetus to personal gain and greed.
In doing so, these groups assumed characteristic similar to criminal bandits and warlords. Or
were they Islamists fundraising for al-Qaeda? But unlike warlords, pirates normally never kill.
The links with either Islamists or terrorism have also not been established either. The alleged link with criminal networks is much more plausible. / AFRIKAANSE OPSOMMING: Maritieme seerowery in areas soos die Straat van Malacca en aan die weskus van Afrika, het
tot op datum, na raming, finansiële verliese van meer as $US16 biljoen aan internasionale
skeepshandel berokken en het ‘n beduidende negatiewe effek op die langtermyn stabiliteit van
globale maritieme handel. Sedert die verval van die Somaliese staat in 1991, het
internasionale waarnemingsgroepe hul besorgdheid uitgespreek oor die toename van
seerowery aan die Somaliese kus en die aangrensende Golf van Aden. Vanaf 2008 was daar
egter ‘n ongekende toename in seerower aanvalle in Somaliese kuswaters. Nie alleen was daar
‘n toename in die aantal insidente nie, maar die aanvalle is gekenmerk deur meer
gesofistikeerde metodes. Aangesien meer as 85% van wêreldhandel afhanklik is van
seevervoer, was die wêreld genoodsaak om kennis te neem van die omvang van die
verskynsel. Gegewe die feit dat Somaliese seerowery ‘n relatiewe onlangse verwikkeling is,
bied hierdie veld groot potensiaal vir verdere en diepgaande studie.
Die beskrywende en verduidelikende studie het ten doel om die ontwykende vraagstuk oor
die wat, hoekom en wie van Somaliese seerowery te verken en by wyse van induktiewe
beredenering die volgende vas te stel: (a) om die bydraende oorsake tot die Somaliese konflik
te ondersoek en te definieer, (b) om aan te dui hoe die konflik en die gevolge daarvan,
spesifiek in die Puntland streek, bygedra het tot die ontstaan van plaaslike seerowery (c) om
vas te stel of die seerower-groepe vissers is wat hul bronne beskerm deur vigilante of
selfverdedigings-eenhede te stig en of hulle oorlogsbaronne, radikale Islamiste of ‘n
kombinasie van voorafgenoemde is, en (d) om die rol te beskryf wat hulpbron-skaarste en
staatkundige verval gespeel het om die risiko van seerowery in Somalie te verhoog.
In navolging van voorafgenoemde doelwitte het die ondersoek staatgemaak op ‘n deeglike
ontleding van gesaghebbende en kontemporêre bronne. Hierdie teoretiese grondslag is verder
aangevul deur media-verslaggewing oor die onderwerp.
Die studie het bevind dat die Somaliese konflik toegeskryf kan word aan historiese en
klanverskille en die nalatenskap van koloniale heerskappy wat mettertyd gelei het tot die
arbitrere verdeling van Somalië deur koloniale moondhede, militêre onderdrukking, geskille
met buurstate (Ethiopië), inmenging van supermoonthede, hongersnood en die opkoms van oorlogsbaronne. Hierdie faktore het bygedra tot die staatkundige verval van Somalië. In
Puntland in besonder, is hierdie bydraende faktore vererger deur omgewingsontbering en
natuurlike rampe. Gevolglik het voedsel een van die skaarste hulpbronne geword in Somalië.
Hierdie omstandighede het die bevolking na die kus gedryf, waar buitelandse visserbote
onwettig die mariene-bronne geplunder het. In reaksie hierop het die bevolking self die wapen
opgeneem om sulke skepe te konfronteer wat gelei het tot gewapende roof ter see. Sekere
vissermanne het egter verder oortree en bote gekaap en aangehou in ruil vir omkoopgeld. Dit
was egter lank nie meer gekaapte vissersbote nie, maar handelsskepe met ander duursame
vragte.
Terwyl hul optrede aanvanklik gemotiveer is deur ontevredenheid met die onwettige
ontginning van mariene bronne, het die aansienlike finansiele voordele van seerowery hierdie
dryfveer mettertyd gewysig tot een van persoonlike gewin en hebsug. In hierdie proses het die
groeperinge eienskappe ontwikkel soortgelyk aan kriminele rowers en oorlogsbaronne van die
oopsee en radikale Islamiste. Anders as oorlogsbaronne het hierdie groepe egter nie die lewe
van hul slagoffers geneem nie. Die verband tussen hierdie seerowergroepe en radikale
Islamiste of terroriste groepe kan ook nie verseker vasgestel word nie. Daar is dus ‘n meer geloofwaardige verband tussen sulke groepe en georganiseerde kriminele netwerke.
|
54 |
Creating Systems and Applying Large-Scale Methods to Improve Student Remediation in Online Tutoring Systems in Real-time and at ScaleSelent, Douglas A 08 June 2017 (has links)
"A common problem shared amongst online tutoring systems is the time-consuming nature of content creation. It has been estimated that an hour of online instruction can take up to 100-300 hours to create. Several systems have created tools to expedite content creation, such as the Cognitive Tutors Authoring Tool (CTAT) and the ASSISTments builder. Although these tools make content creation more efficient, they all still depend on the efforts of a content creator and/or past historical. These tools do not take full advantage of the power of the crowd. These issues and challenges faced by online tutoring systems provide an ideal environment to implement a solution using crowdsourcing. I created the PeerASSIST system to provide a solution to the challenges faced with tutoring content creation. PeerASSIST crowdsources the work students have done on problems inside the ASSISTments online tutoring system and redistributes that work as a form of tutoring to their peers, who are in need of assistance. Multi-objective multi-armed bandit algorithms are used to distribute student work, which balance exploring which work is good and exploiting the best currently known work. These policies are customized to run in a real-world environment with multiple asynchronous reward functions and an infinite number of actions. Inspired by major companies such as Google, Facebook, and Bing, PeerASSIST is also designed as a platform for simultaneous online experimentation in real-time and at scale. Currently over 600 teachers (grades K-12) are requiring students to show their work. Over 300,000 instances of student work have been collected from over 18,000 students across 28,000 problems. From the student work collected, 2,000 instances have been redistributed to over 550 students who needed help over the past few months. I conducted a randomized controlled experiment to evaluate the effectiveness of PeerASSIST on student performance. Other contributions include representing learning maps as Bayesian networks to model student performance, creating a machine-learning algorithm to derive student incorrect processes from their incorrect answer and the inputs of the problem, and applying Bayesian hypothesis testing to A/B experiments. We showed that learning maps can be simplified without practical loss of accuracy and that time series data is necessary to simplify learning maps if the static data is highly correlated. I also created several interventions to evaluate the effectiveness of the buggy messages generated from the machine-learned incorrect processes. The null results of these experiments demonstrate the difficulty of creating a successful tutoring and suggest that other methods of tutoring content creation (i.e. PeerASSIST) should be explored."
|
55 |
Essays on indexability of stochastic sheduling and dynamic allocation problemsRuíz Hernández, Diego 13 April 2007 (has links)
In this Thesis, we first deploy Gittins index theory to establish the indexability of inter-alia general families of restless bandits that arise in problems of stochastic scheduling with switching penalties and machine maintenance. We also give formulae for the resulting indices. Numerical investigations testify the strong performance of the index heuristics.The second class of problems concerns two families of Markov decision problems. The spinning plates problem concerns the optimal management of a portfolio of assets whose yields grow with investment but otherwise decline. In the model of asset exploitation called the squad system, the yield from an asset declines when it is utilised but will recover when the asset is at rest. Simply stated conditions are given which guarantee general indexability of the problem together with necessary and sufficient conditions for strict indexability. The index heuristics, which emerge from the analysis, are assessed numerically and found to perform strongly.
|
56 |
Online Learning for Optimal Control of Communication and Computing SystemsCayci, Semih January 2020 (has links)
No description available.
|
57 |
New Spatio-temporal Hawkes Process Models For Social GoodWen-Hao Chiang (12476658) 28 April 2022 (has links)
<p>As more and more datasets with self-exciting properties become available, the demand for robust models that capture contagion across events is also getting stronger. Hawkes processes stand out given their ability to capture a wide range of contagion and self-excitation patterns, including the transmission of infectious disease, earthquake aftershock distributions, near-repeat crime patterns, and overdose clusters. The Hawkes process is flexible in modeling these various applications through parametric and non-parametric kernels that model event dependencies in space, time and on networks.</p>
<p>In this thesis, we develop new frameworks that integrate Hawkes Process models with multi-armed bandit algorithms, high dimensional marks, and high-dimensional auxiliary data to solve problems in search and rescue, forecasting infectious disease, and early detection of overdose spikes.</p>
<p>In Chapter 3, we develop a method applications to the crisis of increasing overdose mortality over the last decade. We first encode the molecular substructures found in a drug overdose toxicology report. We then cluster these overdose encodings into different overdose categories and model these categories with spatio-temporal multivariate Hawkes processes. Our results demonstrate that the proposed methodology can improve estimation of the magnitude of an overdose spike based on the substances found in an initial overdose. </p>
<p>In Chapter 4, we build a framework for multi-armed bandit problems arising in event detection where the underlying process is self-exciting. We derive the expected number of events for Hawkes processes given a parametric model for the intensity and then analyze the regret bound of a Hawkes process UCB-normal algorithm. By introducing the Hawkes Processes modeling into the upper confidence bound construction, our models can detect more events of interest under the multi-armed bandit problem setting. We apply the Hawkes bandit model to spatio-temporal data on crime events and earthquake aftershocks. We show that the model can quickly learn to detect hotspot regions, when events are unobserved, while striking a balance between exploitation and exploration. </p>
<p>In Chapter 5, we present a new spatio-temporal framework for integrating Hawkes processes with multi-armed bandit algorithms. Compared to the methods proposed in Chapter 4, the upper confidence bound is constructed through Bayesian estimation of a spatial Hawkes process to balance the trade-off between exploiting and exploring geographic regions. The model is validated through simulated datasets and real-world datasets such as flooding events and improvised explosive devices (IEDs) attack records. The experimental results show that our model outperforms baseline spatial MAB algorithms through rewards and ranking metrics.</p>
<p>In Chapter 6, we demonstrate that the Hawkes process is a powerful tool to model the infectious disease transmission. We develop models using Hawkes processes with spatial-temporal covariates to forecast COVID-19 transmission at the county level. In the proposed framework, we show how to estimate the dynamic reproduction number of the virus within an EM algorithm through a regression on Google mobility indices. We also include demographic covariates as spatial information to enhance the accuracy. Such an approach is tested on both short-term and long-term forecasting tasks. The results show that the Hawkes process outperforms several benchmark models published in a public forecast repository. The model also provides insights on important covariates and mobility that impact COVID-19 transmission in the U.S.</p>
<p>Finally, in chapter 7, we discuss implications of the research and future research directions.</p>
|
58 |
Beyond Disagreement-based Learning for Contextual BanditsPinaki Ranjan Mohanty (16522407) 26 July 2023 (has links)
<p>While instance-dependent contextual bandits have been previously studied, their analysis<br>
has been exclusively limited to pure disagreement-based learning. This approach lacks a<br>
nuanced understanding of disagreement and treats it in a binary and absolute manner.<br>
In our work, we aim to broaden the analysis of instance-dependent contextual bandits by<br>
studying them under the framework of disagreement-based learning in sub-regions. This<br>
framework allows for a more comprehensive examination of disagreement by considering its<br>
varying degrees across different sub-regions.<br>
To lay the foundation for our analysis, we introduce key ideas and measures widely<br>
studied in the contextual bandit and disagreement-based active learning literature. We<br>
then propose a novel, instance-dependent contextual bandit algorithm for the realizable<br>
case in a transductive setting. Leveraging the ability to observe contexts in advance, our<br>
algorithm employs a sophisticated Linear Programming subroutine to identify and exploit<br>
sub-regions effectively. Next, we provide a series of results tying previously introduced<br>
complexity measures and offer some insightful discussion on them. Finally, we enhance the<br>
existing regret bounds for contextual bandits by integrating the sub-region disagreement<br>
coefficient, thereby showcasing significant improvement in performance against the pure<br>
disagreement-based approach.<br>
In the concluding section of this thesis, we do a brief recap of the work done and suggest<br>
potential future directions for further improving contextual bandit algorithms within the<br>
framework of disagreement-based learning in sub-regions. These directions offer opportuni-<br>
ties for further research and development, aiming to refine and enhance the effectiveness of<br>
contextual bandit algorithms in practical applications.<br>
<br>
</p>
|
59 |
Contribution to learning and decision making under uncertainty for Cognitive Radio. / Contribution à l’apprentissage et à la prise de décision, dans des contextes d’incertitude, pour la radio intelligenteJouini, Wassim 15 June 2012 (has links)
L’allocation des ressources spectrales à des services de communications sans fil, sans cesse plus nombreux et plus gourmands, a récemment mené la communauté radio à vouloir remettre en question la stratégie de répartition des bandes de fréquences imposée depuis plus d’un siècle. En effet une étude rendue publique en 2002 par la commission fédérale des communications aux Etats-Unis (Federal Communications Commission - FCC) mit en évidence une pénurie des ressources spectrales dans une large bande de fréquences comprise entre quelques mégahertz à plusieurs gigahertz. Cependant, cette même étude expliqua cette pénurie par une allocation statique des ressources aux différents services demandeurs plutôt que par une saturation des bandes de fréquences. Cette explication fut par la suite corroborée par de nombreuses mesures d’occupation spectrale, réalisées dans plusieurs pays, qui montrèrent une forte sous-utilisation des bandes de fréquences en fonction du temps et de l’espace, représentant par conséquent autant d’opportunité spectrale inexploitée. Ces constations donnèrent naissance à un domaine en plein effervescence connu sous le nom d’Accès Opportuniste au Spectre (Opportunistic Spectrum Access). Nos travaux suggèrent l’étude de mécanismes d’apprentissage pour la radio intelligente (Cognitive Radio) dans le cadre de l’Accès Opportuniste au Spectre (AOS) afin de permettre à des équipements radio d’exploiter ces opportunités de manière autonome. Pour cela, nous montrons que les problématiques d’AOS peuvent être fidèlement représentées par des modèles d’apprentissage par renforcement. Ainsi, l’équipement radio est modélisé par un agent intelligent capable d’interagir avec son environnement afin d’en collecter des informations. Ces dernières servent à reconnaître, au fur et à mesure des expériences, les meilleurs choix (bandes de fréquences, configurations, etc.) qui s’offrent au système de communication. Nous nous intéressons au modèle particulier des bandits manchots (Multi-Armed Bandit appliqué à l’AOS). Nous discutons, lors d’une phase préliminaire, différentes solutions empruntées au domaine de l’apprentissage machine (Machine Learning). Ensuite, nous élargissons ces résultats à des cadres adaptés à la radio intelligente. Notamment, nous évaluons les performances de ces algorithmes dans le cas de réseaux d’équipements qui collaborent en prenant en compte, dans le modèle suggéré, les erreurs d’observations. On montre de plus que ces algorithmes n’ont pas besoin de connaître la fréquence des erreurs d’observation afin de converger. La vitesse de convergence dépend néanmoins de ces fréquences. Dans un second temps nous concevons un nouvel algorithme d’apprentissage destiné à répondre à des problèmes d’exploitation des ressources spectrales dans des conditions dites de fading. Tous ces travaux présupposent néanmoins la capacité de l’équipement intelligent à détecter efficacement l’activité d’autres utilisateurs sur la bande (utilisateurs prioritaires dits utilisateurs primaires). La principale difficulté réside dans le fait que l’équipement intelligent ne suppose aucune connaissance a priori sur son environnement (niveau du bruit notamment) ou sur les utilisateurs primaires. Afin de lever le doute sur l’efficacité de l’approche suggérée, nous analysons l’impact de ces incertitudes sur le détecteur d’énergie. Ce dernier prend donc le rôle d’observateur et envoie ses observations aux algorithmes d’apprentissage. Nous montrons ainsi qu’il est possible de quantifier les performances de ce détecteur dans des conditions d’incertitude sur le niveau du bruit ce qui le rend utilisable dans le contexte de la radio intelligente. Par conséquent, les algorithmes d’apprentissage utilisés pourront exploiter les résultats du détecteur malgré l’incertitude inhérente liée à l’environnement considéré et aux hypothèses (sévères) d’incertitude liées au problème analysé. / During the last century, most of the meaningful frequency bands were licensed to emerging wireless applications. Because of the static model of frequency allocation, the growing number of spectrum demanding services led to a spectrum scarcity. However, recently, series of measurements on the spectrum utilization showed that the different frequency bands were underutilized (sometimes even unoccupied) and thus that the scarcity of the spectrum resource is virtual and only due to the static allocation of the different bands to specific wireless services. Moreover, the underutilization of the spectrum resource varies on different scales in time and space offering many opportunities to an unlicensed user or network to access the spectrum. Cognitive Radio (CR) and Opportunistic Spectrum Access (OSA) were introduced as possible solutions to alleviate the spectrum scarcity issue.In this dissertation, we aim at enabling CR equipments to exploit autonomously communication opportunities found in their vicinity. For that purpose, we suggest decision making mechanisms designed and/or adapted to answer CR related problems in general, and more specifically, OSA related scenarios. Thus, we argue that OSA scenarios can be modeled as Multi-Armed Bandit (MAB) problems. As a matter of fact, within OSA contexts, CR equipments are assumed to have no prior knowledge on their environment. Acquiring the necessary information relies on a sequential interaction between the CR equipment and its environment. Finally, the CR equipment is modeled as a cognitive agent whose purpose is to learn while providing an improving service to its user. Thus, firstly we analyze the performance of UCB1 algorithm when dealing with OSA problems with imperfect sensing. More specifically, we show that UCB1 can efficiently cope with sensing errors. We prove its convergence to the optimal channel and quantify its loss of performance compared to the case with perfect sensing. Secondly, we combine UCB1 algorithm with collaborative and coordination mechanism to model a secondary network (i.e. several SUs). We show that within this complex scenario, a coordinated learning mechanism can lead to efficient secondary networks. These scenarios assume that a SU can efficiently detect incumbent users’ activity while having no prior knowledge on their characteristics. Usually, energy detection is suggested as a possible approach to handle such task. Unfortunately, energy detection in known to perform poorly when dealing with uncertainty. Consequently, we ventured in this Ph.D. to revisit the problem of energy detection limits under uncertainty. We present new results on its performances as well as its limits when the noise level is uncertain and the uncertainty is modeled by a log-normal distribution (as suggested by Alexander Sonnenschein and Philip M. Fishman in 1992). Within OSA contexts, we address a final problem where a sensor aims at quantifying the quality of a channel in fading environments. In such contexts, UCB1 algorithms seem to fail. Consequently, we designed a new algorithm called Multiplicative UCB (UCB) and prove its convergence. Moreover, we prove that MUCB algorithms are order optimal (i.e., the order of their learning rate is optimal). This last work provides a contribution that goes beyond CR and OSA. As a matter of fact, MUCB algorithms are introduced and solved within a general MAB framework.
|
60 |
On recommendation systems in a sequential context / Des Systèmes de Recommandation dans un Contexte SéquentielGuillou, Frédéric 02 December 2016 (has links)
Cette thèse porte sur l'étude des Systèmes de Recommandation dans un cadre séquentiel, où les retours des utilisateurs sur des articles arrivent dans le système l'un après l'autre. Après chaque retour utilisateur, le système doit le prendre en compte afin d'améliorer les recommandations futures. De nombreuses techniques de recommandation ou méthodologies d'évaluation ont été proposées par le passé pour les problèmes de recommandation. Malgré cela, l'évaluation séquentielle, qui est pourtant plus réaliste et se rapproche davantage du cadre d'évaluation d'un vrai système de recommandation, a été laissée de côté. Le contexte séquentiel nécessite de prendre en considération différents aspects non visibles dans un contexte fixe. Le premier de ces aspects est le dilemme dit d'exploration vs. exploitation: le modèle effectuant les recommandations doit trouver le bon compromis entre recueillir de l'information sur les goûts des utilisateurs à travers des étapes d'exploration, et exploiter la connaissance qu'il a à l'heure actuelle pour maximiser le feedback reçu. L'importance de ce premier point est mise en avant à travers une première évaluation, et nous proposons une approche à la fois simple et efficace, basée sur la Factorisation de Matrice et un algorithme de Bandit Manchot, pour produire des recommandations appropriées. Le second aspect pouvant apparaître dans le cadre séquentiel surgit dans le cas où une liste ordonnée d'articles est recommandée au lieu d'un seul article. Dans cette situation, le feedback donné par l'utilisateur est multiple: la partie explicite concerne la note donnée par l'utilisateur concernant l'article choisi, tandis que la partie implicite concerne les articles cliqués (ou non cliqués) parmi les articles de la liste. En intégrant les deux parties du feedback dans un modèle d'apprentissage, nous proposons une approche basée sur la Factorisation de Matrice, qui peut recommander de meilleures listes ordonnées d'articles, et nous évaluons cette approche dans un contexte séquentiel particulier pour montrer son efficacité. / This thesis is dedicated to the study of Recommendation Systems under a sequential setting, where the feedback given by users on items arrive one after another in the system. After each feedback, the system has to integrate it and try to improve future recommendations. Many techniques or evaluation methods have already been proposed to study the recommendation problem. Despite that, such sequential setting, which is more realistic and represent a closer framework to a real Recommendation System evaluation, has surprisingly been left aside. Under a sequential context, recommendation techniques need to take into consideration several aspects which are not visible for a fixed setting. The first one is the exploration-exploitation dilemma: the model making recommendations needs to find a good balance between gathering information about users' tastes or items through exploratory recommendation steps, and exploiting its current knowledge of the users and items to try to maximize the feedback received. We highlight the importance of this point through the first evaluation study and propose a simple yet efficient approach to make effective recommendation, based on Matrix Factorization and Multi-Armed Bandit algorithms. The second aspect emphasized by the sequential context appears when a list of items is recommended to the user instead of a single item. In such a case, the feedback given by the user includes two parts: the explicit feedback as the rating, but also the implicit feedback given by clicking (or not clicking) on other items of the list. By integrating both feedback into a Matrix Factorization model, we propose an approach which can suggest better ranked list of items, and we evaluate it in a particular setting.
|
Page generated in 0.0511 seconds