Spelling suggestions: "subject:"multiarmed"" "subject:"multiformed""
21 |
Méthodes optimistes d’apprentissage actif pour la classification / Optimistic Methods in Active Learning for ClassificationCollet, Timothé 11 July 2016 (has links)
La classification se base sur un jeu de données étiquetées par un expert. Plus le jeu de données est grand, meilleure est la performance de classification. Pourtant, la requête à un expert peut parfois être coûteuse. Le but de l'apprentissage actif est alors de minimiser le nombre de requêtes à l'expert. La collection des données non-étiquetées reste aisée cependant et illimitée, il est donc nécessaire de faire un choix sur les données à annoter, l'idée est alors de profiter de ce choix pour maximiser les performances en ne lui fournissant que les données les plus informatives à étiqueter. Pourtant, le niveau d'informativité de chaque donnée ne peut pas être calculé exactement et ne peut être estimé qu'à une incertitude près. Améliorer la précision de l'estimation nécessite d'annoter de nouvelles données. Il y a donc un dilemme entre utiliser le budget d'annotations disponible pour améliorer la performance du classifieur selon l'estimation actuelle du critère ou pour améliorer la précision sur le critère. Ce dilemme est bien connu dans le cadre de l'optimisation en budget fini sous le nom de dilemme entre exploration et exploitation. Les solutions usuelles pour résoudre ce dilemme dans ce contexte font usage du principe d'Optimisme Face à l'Incertitude. Dans cette thèse, nous montrons donc qu'il est possible d'adapter ce principe au problème d'apprentissage actif pour la classification. Pour cela, plusieurs algorithmes ont été être développés pour des classifieurs de complexité croissante, chacun utilisant le principe de l'Optimisme Face à l'Incertitude, et leurs résultats ont été évalués empiriquement / A Classification problem makes use of a training set consisting of data labeled by an oracle. The larger the training set, the best the performance. However, requesting the oracle may be costly. The goal of Active Learning is thus to minimize the number of requests to the oracle while achieving the best performance. To do so, the data that are presented to the oracle must be carefully selected among a large number of unlabeled instances acquired at no cost. However, the true profitability of labeling a particular instance may not be known perfectly. It can therefore be estimated along with a measure of uncertainty. To Increase the precision on the estimate, we need to label more data. Thus, there is a dilemma between labeling data in order to increase the performance of the classifier or to better know how to select data. This dilemma is well studied in the context of finite budget optimization under the name of exploration versus exploitation dilemma. The most famous solutions make use of the principle of Optimism in the Face of Uncertainty. In this thesis, we show that it is possible to adapt this principle to the active learning problem for classification. Several algorithms have been developed for classifiers of increasing complexity, each one of them using the principle of Optimism in the Face of Uncertainty, and their performances have been empirically evaluated
|
22 |
Multi-channel opportunistic access : a restless multi-armed bandit perspectiveWang, Kehao 22 June 2012 (has links) (PDF)
In the thesis, we address the fundamental problem of opportunistic spectrum access in a multi-channel communication system. Specifically, we consider a communication system in which a user has access to multiple channels, but is limited to sensing and transmitting only on one at a given time. We explore how the smart user should exploit past observations and the knowledge of the stochastic properties of these channels to maximize its transmission rate by switching channels opportunistically. Formally, we provide a generic analysis on the opportunistic spectrum access problem by casting the problem into the restless multi-armed bandit (RMAB) problem, one of the most well-known generalizations of the classic multi-armed bandit (MAB) problem, which is of fundamental importance in stochastic decision theory. Despite the significant research efforts in the field, the RMAB problem in its generic form still remains open. Until today, very little result is reported on the structure of the optimal policy. Obtaining the optimal policy for a general RMAB problem is often intractable due to the exponential computation complexity. Hence, a natural alternative is to seek a simple myopic policy maximizing the short-term reward. Therefore, we develop three axioms characterizing a family of functions which we refer to as regular functions, which are generic and practically important. We then establish the optimality of the myopic policy when the reward function can be expressed as a regular function and the discount factor is bounded by a closed-form threshold determined by the reward function. We also illustrate how the derived results, generic in nature, are applied to analyze a class of RMAB problems arising from multi-channel opportunistic access. Next, we further investigate the more challenging problem where the user has to decide the number of channels to sense in each slot in order to maximize its utility (e.g., throughput). After showing the exponential complexity of the problem, we develop a heuristic v-step look-ahead strategy. In the developed strategy, the parameter v allows to achieve a desired tradeoff between social efficiency and computation complexity. We demonstrate the benefits of the proposed strategy via numerical experiments on several typical settings.
|
23 |
Design of Quality Assuring Mechanisms with Learning for Strategic CrowdsSatyanath Bhat, K January 2017 (has links) (PDF)
In this thesis, we address several generic problems concerned with procurement of tasks from a crowd that consists of strategic workers with uncertainty in their qualities. These problems assume importance as the quality of services in a service marketplace is known to degrade when there is (unchecked) information asymmetry pertaining to quality. Moreover, crowdsourcing is increasingly being used for a wide variety of tasks these days since it offers high levels of flexibility to workers as well as employers. We seek to address the issue of quality uncertainty in crowdsourcing through mechanism design and machine learning. As the interactions in web-based crowdsourcing platform are logged, the data captured could be used to learn unknown parameters such as qualities of individual crowd workers. Further, many of these platforms invite bids by crowd workers for available tasks but the strategic workers may not bid truthfully. This warrants the use of mechanism design to induce truthful bidding. There ensues a complex interplay between machine learning and mechanism design, leading to interesting technical challenges. We resolve some generic challenges in the context of the following problems.
Design of a quality eliciting mechanism with interdependent values
We consider an expert sourcing problem, where a planner seeks opinions from a pool of experts. Execution of the task at an assured quality level in a cost effective manner turns out to be a mechanism design problem when the individual qualities are private information of the experts. Also, the task execution problem involves interdependent values, where truthfulness and efficiency cannot be achieved in an unrestricted setting due to an impossibility result. We propose a novel mechanism that exploits the special structure of the problem and guarantees allocative efficiency, ex-post incentive compatibility and strict budget balance for the mechanism, and ex-post individual rationality for the experts.
Design of an optimal dimensional crowdsourcing auction
We study the problem faced by an auctioneer who gains stochastic rewards by procuring multiple units of a service from a pool of heterogeneous strategic workers. The reward obtained depends on the inherent quality of the worker; the worker’s quality is fixed but unknown. The costs and capacities are private information of the workers. The auctioneer is required to elicit costs and capacities (making the mechanism design dimensional) and further, has to learn the qualities of the workers as well, to enable utility maximization. To solve this problem, we design a dimensional multi-armed bandit auction that maximizes the expected utility of the auctioneer subject to incentive compatibility and individual rationality while simultaneously learning the unknown qualities of the agents.
Design of a multi-parameter learning mechanism for crowdsourcing
We investigate the problem of allocating divisible jobs, arriving online, to workers in a crowd-sourcing platform. Each job is split into a certain number of tasks that are then allocated to workers. These tasks have to meet several constraints that depend on the worker performance. The performance of each worker in turn is characterized by several intrinsic stochastic parameters. In particular, we study a problem where each arriving job has to be completed within a deadline and each task has to be completed, honouring a lower bound on quality. The job completion time and quality of each worker are stochastic with fixed but unknown means. We propose a learning mechanism to elicit the costs truthfully while simultaneously learning the stochastic parameters. Our proposed mechanism is dominant strategy incentive compatible and ex-post individually rational with asymptotically optimal regret performance.
|
24 |
Learning-based Attack and Defense on Recommender SystemsPalanisamy Sundar, Agnideven 08 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The internet is the home for massive volumes of valuable data constantly being created, making it difficult for users to find information relevant to them. In recent times, online users have been relying on the recommendations made by websites to narrow down the options. Online reviews have also become an increasingly important factor in the final choice of a customer. Unfortunately, attackers have found ways to manipulate both reviews and recommendations to mislead users. A Recommendation System is a special type of information filtering system adapted by online vendors to provide suggestions to their customers based on their requirements. Collaborative filtering is one of the most widely used recommendation systems; unfortunately, it is prone to shilling/profile injection attacks. Such attacks alter the recommendation process to promote or demote a particular product. On the other hand, many spammers write deceptive reviews to change the credibility of a product/service. This work aims to address these issues by treating the review manipulation and shilling attack scenarios independently. For the shilling attacks, we build an efficient Reinforcement Learning-based shilling attack method. This method reduces the uncertainty associated with the item selection process and finds the most optimal items to enhance attack reach while treating the recommender system as a black box. Such practical online attacks open new avenues for research in building more robust recommender systems. When it comes to review manipulations, we introduce a method to use a deep structure embedding approach that preserves highly nonlinear structural information and the dynamic aspects of user reviews to identify and cluster the spam users. It is worth mentioning that, in the experiment with real datasets, our method captures about 92\% of all spam reviewers using an unsupervised learning approach.
|
25 |
Active Learning for Ranking from Noisy ObservationsRen, Wenbo January 2021 (has links)
No description available.
|
26 |
Structured Stochastic BanditsMagureanu, Stefan January 2016 (has links)
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlated arms. Particularly, we investigate the case when the expected rewards are a Lipschitz function of the arm, and the learning to rank problem, as viewed from a MAB perspective. For the former, we derive a problem specific lower bound and propose both an asymptotically optimal algorithm (OSLB) and a (pareto)optimal, algorithm (POSLB). For the latter, we construct the regret lower bound and determine its closed form for some particular settings, as well as propose two asymptotically optimal algorithms PIE and PIE-C. For all algorithms mentioned above, we present performance analysis in the form of theoretical regret guarantees as well as numerical evaluation on artificial datasets as well as real-world datasets, in the case of PIE and PIE-C. / <p>QC 20160223</p>
|
27 |
Regret Minimization in the Gain Estimation ProblemTourkaman, Mahan January 2019 (has links)
A novel approach to the gain estimation problem,using a multi-armed bandit formulation, is studied. The gain estimation problem deals with the problem of estimating the largest L2-gain that signal of bounded norm experiences when passing through a linear and time-invariant system. Under certain conditions, this new approach is guaranteed to surpass traditional System Identification methods in terms of accuracy.The bandit algorithms Upper Confidence Bound, Thompson Sampling and Weighted Thompson Sampling are implemented with the aim of designing the optimal input for maximizing the gain of an unknown system. The regret performance of each algorithm is studied using simulations on a test system. Upper Confidence Bound, with exploration parameter set to zero, performed the best among all tested values for this parameter. Weighted Thompson Sampling performed better than Thompson Sampling.
|
28 |
Graph Bandits : Multi-Armed Bandits with Locality Constraints / Grafbanditer : Flerarmade banditer med lokala restriktionerJohansson, Kasper January 2022 (has links)
Multi-armed bandits (MABs) have been studied extensively in the literature and have applications in a wealth of domains, including recommendation systems, dynamic pricing, and investment management. On the one hand, the current MAB literature largely seems to focus on the setting where each arm is available to play at each time step, and ignores how agents move between the arms. On the other hand, there is work that takes the movement between arms into account, but this work models the problem as a Markov decision process and applies generic reinforcement learning (RL) algorithms, like Q-learning. This thesis examines an extension of the MAB problem to a setting where the set of available arms at each round depends on which arm was played in the previous round. In this formulation the arms are nodes in a graph, and arms that can be played successively are connected via edges. We denote this the Graph Bandit (GB) problem. We show that under certain conditions the optimal action is governed by a stationary policy. Furthermore, we develop an algorithm that leverages the graphical structure of the problem to find this policy when the reward distributions are perfectly known, and denote this algorithm the Q-graph. When the reward distributions are unknown, we show how to leverage the Qgraph algorithm together with standard sampling algorithms like Thompson sampling and upper confidence bound to create an online learning algorithm that provably achieves logarithmic regret. Finally, this regret-bound is supported in numerical simulations, and it is illustrated how the proposed Q-graph algorithm outperforms generic algorithms from the MAB and RL communities. / Flerarmade banditer (FAB) har studerats omfattande i litteraturen och har applikationer inom en mängd domäner, såsom rekommendationssystem, dynamisk prissättning och finans. Å ena sidan verkar det som at en stor del av litteraturen fokuserar på situationen där alla armar är tillgängliga att spela vid varje tidssteg och ignorerar hur agenten rör sig mellan armarna. Å andra sidan finns det arbete som tar till hänsyn hur agenten rör sig mellan armarna men det arbetet modellerar systemet som en Markovprocess och använder sig av generiska inlärningsmetoder, såsom Q-learning. Den här uppsatsen undersöker en utvidgning av FAB-problemet till en situation där mängden tillgänliga armar vid varje runda beror på vilken arm som spelades i den föregående rundan. I denna formulering är armarna noder i en graf och armar som kan spelas i på varandra följande rundor är anslutna via kanter. Vi kallar det här problemt Grafbanditen. Vi visar att under vissa förutsättningar bestäms det optimala aggerandet av en stationär policy. Vi utvecklar också en algoritm som utnyttjar den grafiska strukturen i problemet för att beräkna denna policy när distributionerna hos alla armar är kända. Denna algoritm får namnet Q-grafen. När distributionerna är okända visar vi hur Q-grafen kan användas tillsammans med Thompson sampling eller upper confidence bound-metoder för att skapa en online inlärningsalgoritm som bevisligen uppnår logaritmisk regret. Slutligen stöds de teoretiska resultaten via numeriska simuleringar som illustrerar att Q-grafen är överlägsen många generiska inlärningsalgoritmer.
|
29 |
Adaptive Measurement Strategies for Network Optimization and Control / Adaptiva Mätstrategier för Optimering och Reglering av NätverkLindståhl, Simon January 2023 (has links)
The fifth generation networks is rapidly becoming the new network standardand its new technological capabilities are expected to enable a far widervariety of services compared to the fourth generation networks. To ensurethat these services can co-exist and meet their standardized requirements,the network’s resources must be provisioned, managed and reconfigured ina far more complex manner than before. As such, it is no longer sufficientto select a simple, static scheme for gathering the necessary information totake decisions. Instead, it is necessary to adaptively, with regards to networksystem dynamics, trade-off the cost in terms of power, CPU and bandwidthconsumption of the taken measurements to the value their information brings.Orchestration is a wide field, and the way to quantify the value of a givenmeasurement heavily depends on the problem studied. As such, this thesisaddresses adaptive measurement schemes for a number of well-defined networkoptimization problems. The thesis is presented as a compilation, whereafter an introduction detailing the background, purpose, problem formulation,methodology and contributions of our work, we present each problemseparately through the papers submitted to several conferences. First, we study the problem of optimal spectrum access for low priorityservices. We assume that the network manager has limited opportunitiesto measure the spectrum before assigning one (if any) resource block to thesecondary service for transmission, and this measurement has a known costattached to it. We study this framework through the lens of multi-armedbandits with multiple arm pulls per decision, a framework we call predictivebandits. We analyze such bandits and show a problem specific lower bound ontheir regret, as well as design an algorithm which meets this regret asymptotically,studying both the case where measurements are perfect and the casewhere the measurement has noise of known quantity. Studying a syntheticsimulated problem, we find that it performs considerably better compared toa simple benchmark strategy. Secondly, we study a variation of admission control where the controllermust select one of multiple slices to enter a new service into. The agentdoes not know the resources available in the slices initially, and must insteadmeasure these, subject to noise. Mimicking three commonly used admissioncontrol strategies, we study this as a best arm identification problem, whereone or multiple arms is ”correct” (the arm chose by the strategy if it had fullinformation). Through this framework, we analyze each strategy and devisesample complexity lower bounds, as well as algorithms that meet these lowerbounds. In simulations with synthetic data, we show that our measurementalgorithm can vastly reduce the number of required measurements comparedto uniform sampling strategies. Finally, we study a network monitoring system where the controller mustdetect sudden changes in system behavior such as batch traffic arrivals orhandovers, in order to take future action. We study this through the lensof change point detection but argue that the classical framework is insufficientfor capturing both physical time aspects such as delay as well as measurementcosts independently, and present an alternative framework whichiidecouples these, requiring more sophisticated monitoring agents. We show,both through theory and through simulation with both synthetic data anddata from a 5G testbed, that such adaptive schedules qualitatively and quantitativelyimprove upon classical change point detection schemes in terms ofmeasurment frequency, without losing classical optimality guarantees such asthe one on required measurements post change. / Femte generationens nätverk håller snabbt på att bli den nya standarden och dess teknologiska förmågor förväntas bereda väg för en avsevärt större variation av tjänster jämfört med fjärde generationens nätverk. För att se till att dessa tjänster kan samexistera och möta sina standardiserade krav måste nätverkens resurser provisioneras, hanteras och omkonfigureras på ett mycket mer komplext vis än tidigare. Det är därmed inte längre tillräckligt att välja en simpel, statisk plan för att samla den nödvändiga information som krävs för att ta beslut. Istället behöver man adaptivt, med hänsyn till nätversystemens dynamik, avväga mätningarnas kostnad i termer av effekt-, CPU- och bandbreddskonsumtion mot det värde som de medför. Den här sortens nätverksorkestrering är ett brett fält, och hur mätningarnas värde ska kvantifieras beror i hög grad på vilket optimeringsproblem som studeras. Således bemöter den här avhandlningen adaptiva mätplaner för ett antal väldefinerade optimeringsproblem. Avhandlingen tar formen av en sammanlänkning, där följandes en introduktion som beskriver bakgrund, syfte, problemformulering, metodologi och forskningsbidrag så presenterar vi varje problem separat genom de artiklar vi inlämnat till olika konferenser. Först studerar vi optimal spektrumaccess för lågprioritetstjänster. Vi antar att nätverksregulatorn har begränsat med möjligheter att mäta spektrumanvändning innan den tillger som mest ett resursblock till tjänsten med lägre prioritet att skicka data på, och de här mätningarna har en känd kostnad. Vi studerar det här ramverket från perspektivet av flerarmade banditer med flera armdragningar per beslut, ett ramverk vi benämner förutsägande banditer (predictive bandits). Vi analyserar sådana banditer och visar en problemspecifik undre gräns på dess inlärningsförlust, samt designar en algorithm som presterar lika bra som denna gräns i den asymptotiska regimen. Vi studerar fallet där mätningarna är perfekta såväl som fallet där mätningarna har brus med känd storlek. Genom att studera ett syntetiskt simulerat problem av detta slag finner vi att vår algoritm presterar avsevärt bättre jämfört med en simplare riktmärkesstrategi. Därefter studerar vi en variation av tillträdeskontroll, där en regulator måste välja en av ett antal betjänter att släppa in en ny tjänst till (om någon alls). Agenten vet ursprungligen inte vilka resurser som finns betjänterna tillgängliga, utan måste mäta detta med brusiga mätningar. Vi härmar tre vanligt använda tillträdesstrategier och studerar detta som ett bästa-arms identifieringsproblem, där en eller flera armar är "korrekta" (det vill säga, de armar som hade valts av tillträdesstrategin om den hade haft perfekt kännedom). Med det här ramverket analyserar vi varje strategi och visar undre gränser på antalet mätningar som krävs, och skapar algoritmer som möter dessa gränser. I simuleringar med syntetisk data visar vi att våra mätalgoritmer kan drastiskt reducera antalet mätningar som krävs jämfört med jämlika mätstrategier. Slutligen studerar vi ett övervakningssystem där agenten måste upptäcka plötsliga förändringar i systemets beteende såsom förändringar i trafiken eller överräckningar mellan master, för att kunna agera därefter. Vi studerar detta med ramverket förändringsdetektion, men argumenterar att det klassiska ramverket är otillräckligt för att bemöta aspekter berörande fysisk tid (som fördröjning) samtidigt som den bemöter mätningarnas kostnad. Vi presenterar därmed ett alternativt ramverk som frikopplar de två, vilket i sin tur kräver mer sostifikerade övervakningssystem. Vi visar, genom både teori och simulering med både syntetisk och experimentell data, att sådana adaptiva mätscheman kan förbättra mätfrekvensen jämfört med klassiska periodiska mätscheman, både kvalitativt och kvantitativt, utan att förlora klassiska optimalitetsgarantier såsom det på antalet mätningar som behövs när förändringen har skett. / <p>QC 20230915</p>
|
30 |
Edge Compute Offloading Strategies using Heuristic and Reinforcement Learning Techniques.Dikonimaki, Chrysoula January 2023 (has links)
The emergence of 5G alongside the distributed computing paradigm called Edge computing has prompted a tremendous change in the industry through the opportunity for reducing network latency and energy consumption and providing scalability. Edge computing extends the capabilities of users’ resource-constrained devices by placing data centers at the edge of the network. Computation offloading enables edge computing by allowing the migration of users’ tasks to edge servers. Deciding whether it is beneficial for a mobile device to offload a task and on which server to offload, while environmental variables, such as availability, load, network quality, etc., are changing dynamically, is a challenging problem that requires careful consideration to achieve better performance. This project focuses on proposing lightweight and efficient algorithms to take offloading decisions from the mobile device perspective to benefit the user. Subsequently, heuristic techniques have been examined as a way to find quick but sub-optimal solutions. These techniques have been combined with a Multi-Armed Bandit algorithm, called Discounted Upper Confidence Bound (DUCB) to take optimal decisions quickly. The findings indicate that these heuristic approaches cannot handle the dynamicity of the problem and the DUCB provides the ability to adapt to changing circumstances without having to keep adding extra parameters. Overall, the DUCB algorithm performs better in terms of local energy consumption and can improve service time most of the times. / Utvecklingen av 5G har skett parallellt med det distribuerade beräkningsparadigm som går under namnet Edge Computing. Lokala datacenter placerade på kanten av nätverket kan reducera nätverkslatensen och energiförbrukningen för applikationer. Exempelvis kan användarenheter med begränsade resurser ges utökande möjligheter genom avlastning av beräkningsintensiva uppgifter. Avlastningen sker genom att migrera de beräkningsintensiva uppgifterna till en dator i datacentret på kanten. Det är dock inte säkert att det alltid lönar sig att avlasta en beräkningsintensiv uppgift från en enhet till kanten. Detta måste avgöras från fall till fall. Att avgöra om och när det lönar sig är ett svårt problem då förutsättningar som tillgänglighet, last, nätverkskvalitét, etcetera hela tiden varierar. Fokus i detta projekt är att identifiera enkla och effektiva algoritmer som kan avgöra om det lönar sig för en användare att avlasta en beräkningsintensiv uppgift från en mobil enhet till kanten. Heuristiska tekniker har utvärderats som en möjlig väg att snabbt hitta lösningar även om de råkar vara suboptimala. Dessa tekniker har kombinerats med en flerarmad banditalgoritm (Multi-Armed Bandit), kallad Discounted Upper Confidence Bound (DUCB), för att ta optimala beslut snabbt. Resultaten indikerar att dessa heuristiska tekniker inte kan hantera de dynamiska förändringar som hela tiden sker samtidigt som DUCB kan anpassa sig till dessa förändrade omständigheter utan att man måste addera extra parametrar. Sammantaget, ger DUCM-algoritmen bättre resultat när det gäller lokal energikonsumtion och kan i de flesta fallen förbättra tiden för tjänsten.
|
Page generated in 0.0593 seconds