• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 36
  • 14
  • 6
  • 5
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 85
  • 34
  • 33
  • 31
  • 16
  • 14
  • 12
  • 12
  • 9
  • 9
  • 9
  • 9
  • 9
  • 9
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

[en] ESSAYS IN ECONOMETRICS: ONLINE LEARNING IN HIGH-DIMENSIONAL CONTEXTS AND TREATMENT EFFECTS WITH COMPLEX AND UNKNOWN ASSIGNMENT RULES / [pt] ESTUDOS EM ECONOMETRIA: APRENDIZADO ONLINE EM AMBIENTES DE ALTA DIMENSÃO E EFEITOS DE TRATAMENTO COM REGRAS DE ALOCAÇÃO COMPLEXAS E DESCONHECIDAS

CLAUDIO CARDOSO FLORES 04 October 2021 (has links)
[pt] Essa tese é composta por dois capítulos. O primeiro deles refere-se ao problema de aprendizado sequencial, útil em diversos campos de pesquisa e aplicações práticas. Exemplos incluem problemas de apreçamento dinâmico, desenhos de leilões e de incentivos, além de programas e tratamentos sequenciais. Neste capítulo, propomos a extensão de uma das mais populares regras de aprendizado, epsilon-greedy, para contextos de alta-dimensão, levando em consideração uma diretriz conservadora. Em particular, nossa proposta consiste em alocar parte do tempo que a regra original utiliza na adoção de ações completamente novas em uma busca focada em um conjunto restrito de ações promissoras. A regra resultante pode ser útil para aplicações práticas nas quais existem restrições suaves à adoção de ações não-usuais, mas que eventualmente, valorize surpresas positivas, ainda que a uma taxa decrescente. Como parte dos resultados, encontramos limites plausíveis, com alta probabilidade, para o remorso cumulativo para a regra epsilon-greedy conservadora em alta-dimensão. Também, mostramos a existência de um limite inferior para a cardinalidade do conjunto de ações viáveis que implica em um limite superior menor para o remorso da regra conservadora, comparativamente a sua versão não-conservadora. Adicionalmente, usuários finais possuem suficiente flexibilidade em estabelecer o nível de segurança que desejam, uma vez que tal nível não impacta as propriedades teóricas da regra de aprendizado proposta. Ilustramos nossa proposta tanto por meio de simulação, quanto por meio de um exercício utilizando base de dados de um problema real de sistemas de classificação. Por sua vez, no segundo capítulo, investigamos efeitos de tratamento determinísticos quando a regra de aloção é complexa e desconhecida, talvez por razões éticas, ou para evitar manipulação ou competição desnecessária. Mais especificamente, com foco na metodologia de regressão discontínua sharp, superamos a falta de conhecimento de pontos de corte na alocação de unidades, pela implementação de uma floresta de árvores de classificação, que também utiliza aprendizado sequencial na sua construção, para garantir que, assintoticamente, as regras de alocação desconhecidas sejam identificadas corretamente. A estrutura de árvore também é útil nos casos em que a regra de alocação desconhecida é mais complexa que as tradicionais univariadas. Motivado por exemplos da vida prática, nós mostramos nesse capítulo que, com alta probabilidade e baseado em premissas razoáveis, é possível estimar consistentemente os efeitos de tratamento sob esse cenário. Propomos ainda um algoritmo útil para usuários finais que se mostrou robusto para diferentes especificações e que revela com relativa confiança a regra de alocação anteriormente desconhecida. Ainda, exemplificamos os benefícios da metodologia proposta pela sua aplicação em parte do P900, um programa governamental Chileno de suporte para escolas, que se mostrou adequado ao cenário aqui estudado. / [en] Sequential learning problems are common in several fields of research and practical applications. Examples include dynamic pricing and assortment, design of auctions and incentives and permeate a large number of sequential treatment experiments. In this essay, we extend one of the most popular learning solutions, the epsilon-greedy heuristics, to high-dimensional contexts considering a conservative directive. We do this by allocating part of the time the original rule uses to adopt completely new actions to a more focused search in a restrictive set of promising actions. The resulting rule might be useful for practical applications that still values surprises, although at a decreasing rate, while also has restrictions on the adoption of unusual actions. With high probability, we find reasonable bounds for the cumulative regret of a conservative high-dimensional decaying epsilon-greedy rule. Also, we provide a lower bound for the cardinality of the set of viable actions that implies in an improved regret bound for the conservative version when compared to its non-conservative counterpart. Additionally, we show that end-users have sufficient flexibility when establishing how much safety they want, since it can be tuned without impacting theoretical properties. We illustrate our proposal both in a simulation exercise and using a real dataset. The second essay studies deterministic treatment effects when the assignment rule is both more complex than traditional ones and unknown to the public perhaps, among many possible causes, due to ethical reasons, to avoid data manipulation or unnecessary competition. More specifically, sticking to the well-known sharp RDD methodology, we circumvent the lack of knowledge of true cutoffs by employing a forest of classification trees which also uses sequential learning, as in the last essay, to guarantee that, asymptotically, the true unknown assignment rule is correctly identified. The tree structure also turns out to be suitable if the program s rule is more sophisticated than traditional univariate ones. Motivated by real world examples, we show in this essay that, with high probability and based on reasonable assumptions, it is possible to consistently estimate treatment effects under this setup. For practical implementation we propose an algorithm that not only sheds light on the previously unknown assignment rule but also is capable to robustly estimate treatment effects regarding different specifications imputed by end-users. Moreover, we exemplify the benefits of our methodology by employing it on part of the Chilean P900 school assistance program, which proves to be suitable for our framework.
42

Algorithmic Study on Prediction with Expert Advice : Study of 3 novel paradigms with Grouped Experts

Cayuela Rafols, Marc January 2018 (has links)
The main work for this thesis has been a thorough study of the novel Prediction with Partially Monitored Grouped Expert Advice and Side Information paradigm. This is newly proposed in this thesis, and it extends the widely studied Prediction with Expert Advice paradigm. The extension is based on two assumptions and one restriction that modify the original problem. The first assumption, Grouped, presumes that the experts are structured into groups. The second assumption, Side Information, introduces additional information that can be used to timely relate predictions with groups. Finally, the restriction, Partially Monitored, imposes that the groups’ predictions are only known for one group at a time. The study of this paradigm includes the design of a complete prediction algorithm, the proof of a theoretical bound of the worse-case cumulative regret for such algorithm, and an experimental evaluation of the algorithm (proving the existence of cases where this paradigm outperforms Prediction with Expert Advice). Furthermore, since the development of the algorithm is constructive, it allows to easily build two additional prediction algorithms for the Prediction with Grouped Expert Advice and Prediction with Grouped Expert Advice and Side Information paradigms. Therefore, this thesis presents three novel prediction algorithms, with corresponding regret bounds, and a comparative experimental evaluation including the original Prediction with Expert Advice paradigm. / Huvudarbetet för den här avhandlingen har varit en grundlig studie av den nya Prediction with Partially Monitored Grouped Expert Advice and Side Information paradigmet. Detta är nyligen föreslagit i denna avhandling, och det utökar det brett studerade Prediction with Expert Advice paradigmet. Förlängningen baseras på två antaganden och en begränsning som ändrar det ursprungliga problemet. Det första antagandet, Grouped, förutsätter att experterna är inbyggda i grupper. Det andra antagandet, Side Information, introducerar ytterligare information som kan användas för att i tid relatera förutsägelser med grupper. Slutligen innebär begränsningen, Partially Monitored, att gruppens förutsägelser endast är kända för en grupp i taget. Studien av detta paradigm innefattar utformningen av en komplett förutsägelsesalgoritm, beviset på en teoretisk bindning till det sämre fallet kumulativa ånger för en sådan algoritm och en experimentell utvärdering av algoritmen (bevisar förekomsten av fall där detta paradigm överträffar Prediction with Expert Advice). Eftersom algoritmens utveckling är konstruktiv tillåter den dessutom att enkelt bygga två ytterligare prediksionsalgoritmer för Prediction with Grouped Expert Advice och Prediction with Grouped Expert Advice and Side Information paradigmer. Därför presenterar denna avhandling tre nya prediktionsalgoritmer med motsvarande ångergränser och en jämförande experimentell utvärdering inklusive det ursprungliga Prediction with Expert Advice paradigmet.
43

Dramatizace Olbrachtova Nikoly Šuhaje loupežníka / Dramatization of Ivan Olbracht's novel the Bandit Nikolai Shuhai

Mládková, Nela January 2011 (has links)
ENGLISH SUMMARY In my diploma thesis called The Dramatization of Ivan Olbracht's novel The Bandit Nikolai Shuhai I have concentrated primarily on the adaptation most appreciated by readers as well as, in my opinion, the adaptation of the best quality. In chapter one I focused on Olbracht's novel, precisely on the theme genesis in The Bandit Nikolai Shuhai, which was based on a real-life event. The public responses on the book, which was awarded the state prize for a literary work of art in 1933, are reflected as well. We also deal with blending of the myth with reality, the language and stylistic processing of the novel, and with the main protagonists. Personally, I consider Olbracht's The Bandit Nikolai Shuhai one of the greatest pieces within the Czech literature of 20th century. There are two reasons for which the theme of the novel became a good-quality and favourite basis for further processing: it tells an engaging story of a real-life figure and it is situated in a place which impresses us, Europeans, with enchanting exoticism of its landscape and culture. Chapter two deals with the work of Petr Ulrych which was inspired by Olbracht's novel. Ulrych's first product based on this literary work was a monothematic LP The Bandit Nikolai Shuhai (released in 1974). A year later those thirteen songs were...
44

Méthodes optimistes d’apprentissage actif pour la classification / Optimistic Methods in Active Learning for Classification

Collet, Timothé 11 July 2016 (has links)
La classification se base sur un jeu de données étiquetées par un expert. Plus le jeu de données est grand, meilleure est la performance de classification. Pourtant, la requête à un expert peut parfois être coûteuse. Le but de l'apprentissage actif est alors de minimiser le nombre de requêtes à l'expert. La collection des données non-étiquetées reste aisée cependant et illimitée, il est donc nécessaire de faire un choix sur les données à annoter, l'idée est alors de profiter de ce choix pour maximiser les performances en ne lui fournissant que les données les plus informatives à étiqueter. Pourtant, le niveau d'informativité de chaque donnée ne peut pas être calculé exactement et ne peut être estimé qu'à une incertitude près. Améliorer la précision de l'estimation nécessite d'annoter de nouvelles données. Il y a donc un dilemme entre utiliser le budget d'annotations disponible pour améliorer la performance du classifieur selon l'estimation actuelle du critère ou pour améliorer la précision sur le critère. Ce dilemme est bien connu dans le cadre de l'optimisation en budget fini sous le nom de dilemme entre exploration et exploitation. Les solutions usuelles pour résoudre ce dilemme dans ce contexte font usage du principe d'Optimisme Face à l'Incertitude. Dans cette thèse, nous montrons donc qu'il est possible d'adapter ce principe au problème d'apprentissage actif pour la classification. Pour cela, plusieurs algorithmes ont été être développés pour des classifieurs de complexité croissante, chacun utilisant le principe de l'Optimisme Face à l'Incertitude, et leurs résultats ont été évalués empiriquement / A Classification problem makes use of a training set consisting of data labeled by an oracle. The larger the training set, the best the performance. However, requesting the oracle may be costly. The goal of Active Learning is thus to minimize the number of requests to the oracle while achieving the best performance. To do so, the data that are presented to the oracle must be carefully selected among a large number of unlabeled instances acquired at no cost. However, the true profitability of labeling a particular instance may not be known perfectly. It can therefore be estimated along with a measure of uncertainty. To Increase the precision on the estimate, we need to label more data. Thus, there is a dilemma between labeling data in order to increase the performance of the classifier or to better know how to select data. This dilemma is well studied in the context of finite budget optimization under the name of exploration versus exploitation dilemma. The most famous solutions make use of the principle of Optimism in the Face of Uncertainty. In this thesis, we show that it is possible to adapt this principle to the active learning problem for classification. Several algorithms have been developed for classifiers of increasing complexity, each one of them using the principle of Optimism in the Face of Uncertainty, and their performances have been empirically evaluated
45

Essays on contracting for experimentation

Tang, Aodi January 2018 (has links)
This thesis is composed of four chapters and addresses the contracting issue under strategic experimentation. The first chapter presents an overview of the thesis and introduces the strategic bandit model, which is commonly adopted in the other three chapters. The chapter also previews the main results and implications of the thesis. The second chapter discusses the contracting issue between a principal and a team of agents where the actions of agents are unobservable to the principal. The main contribution of this chapter is to fill the gap of strategic experimentation literature by introducing the free-rider problem in teamwork. The chapter first deals with the optimal hiring choice of the principal under perfect information. Since the belief of the state being good decreases if no one succeeds over time, the paper shows that the principal tends to hire fewer agents in response to the downward-adjusted posterior belief. When the principal can neither monitor the agents' actions nor distinguish the agents who succeed, this chapter shows the optimal incentivising contract consists of an upfront payment from the agents to the principal, a bonus to every agent conditioning on success and a stopping time. Under this contract, the principal can implement first-best experimentation and incentivise all agents to work until the optimal stopping time. The third and fourth chapters discuss the financial contracting issue in innovation where an innovator requires external funding from an investor. The third chapter adopts a \bad news" exponential bandit to study the financial contracting under adverse selection between the innovator and the investor. The innovator, owns the innovation project, is privately informed of either a high or low prior belief of the good state but seeks a large amount of external investment from the less-informed investor. Experimentation is conducted by the innovator using internal funding before the external investment. The posterior belief about the good state increases in the amount of internal funding if no bad news arrives during experimentation, but the project will be abandoned as long as bad news arrives. The chapter shows that the amount of internal funding can be used by the investor to separate the agents with different priors. Under the unique least-costly separating equilibrium, the high-prior innovator spends even more than the low-prior first-best internal funding in order to deter the low-prior one from mimicking, and the low-prior one remains at his first-best. This chapter enriches the financial experimentation literature by proposing internal funding as a novel signalling tool and establishing a Pareto dominating separating equilibrium. The fourth chapter studies a multi-stage innovation financing problem between an agent and an investor with asymmetric information on the progress of the project. The innovation is comprised of two stages where the agent needs to complete the first development stage in order to proceed to the second experiment stage. The model assumes that the completion of the first stage can be early or late following a binary distribution, and the arrival of success in the experimentation stage follows a "good news" exponential bandit. Each period, a fixed amount of investment is needed from the investor. However, the investor can not observe nor verify the project progress. The chapter shows that the optimal incentive-compatible contract consists of differential maximum funding periods in the event of early and late completion of the first stage respectively and subsequent bonuses to the investor conditioning on a success in the second stage. We prove that the first-best experimentation time is attainable as long as the bonus of the late completion exceeds that of the early completion, and the difference between the two bonuses should be confined within a certain range. In the extension, we consider the case when the first stage completion time is informative such that an early completion indicates a higher prior in the good state than the late completion. Under imperfect information, the agent has a stronger incentive to mimic the early completion if the first stage is completed late as a longer experimentation time will be granted according the first-best contract. The chapter proves that the first-best is still achievable under a similar bonus contract but the difference between the two bonuses becomes smaller. This chapter contributes to the experimentation financing literature including the information imperfectness on project progress and multi-stage spillover effects.
46

The Influence of Variance in Two-Armed Bandit Problems

黃秋霖, Huang, Qiu-Lin Unknown Date (has links)
本論文主要是發掘變異數在Two-armed Bandit問題中的影響。在文中我們假設兩種治療法的成功率分別是θ1和θ2,且以π1~Beta(cα,cβ)和π2~Beta(α,β)為其驗前機率分配。此外,我們假設所有病人數(N)已知。 我們證明了當N=2、3,變異因子(c)>1時,最佳的策略是k1*=0,也就是說,我們不應在成功率的變異數較小的治療法上做試驗。這個結果和One-armed Bandit問題(c=∞)的結論是一樣的。但是,當N=10、12的例子中,我們發現k1*=0就並非是最佳的策略。 當α=β時,我們證明了效用函數是c的遞減函數。也就是說,其中一個治療法的變異越小,效用亦越小。當α=β=c=1時,最佳的策略是k^*=k_2^*≈√(1+N)-1。此外,我們也證明了效用函數是c的連續函數。 / The focus of the report is to find the influence of variance in Two-armed Bandit problems. In this report, we consider the case when the success probabilities of the two treatmentsθ1,θ2 haveπ1~Beta(cα,cβ) andπ2~Beta(α,β) as their priors, and the total number of patients, N is known. We showed that for N=2 and 3 the optimal strategy is k1*=0 if variance factor, c>1. That is, we should not make trials on the treatment which variance is smaller. But when N=10 and 12, we showed that k1*=0 is not optimal. When α=β we showed that the utility function is a decreasing function of the c. That is, the smaller variance of a treatment is the smaller utility will be. We have found that k^*=k_2^*≈√(1+N)-1 whenα=β=c=1. Besides, we also have the continuity of utility function in c.
47

Scaling Up Reinforcement Learning without Sacrificing Optimality by Constraining Exploration

Mann, Timothy 1984- 14 March 2013 (has links)
The purpose of this dissertation is to understand how algorithms can efficiently learn to solve new tasks based on previous experience, instead of being explicitly programmed with a solution for each task that we want it to solve. Here a task is a series of decisions, such as a robot vacuum deciding which room to clean next or an intelligent car deciding to stop at a traffic light. In such a case, state-of-the-art learning algorithms are difficult to employ in practice because they often make thou- sands of mistakes before reliably solving a task. However, humans learn solutions to novel tasks, often making fewer mistakes, which suggests that efficient learning algorithms may exist. One advantage that humans have over state- of-the-art learning algorithms is that, while learning a new task, humans can apply knowledge gained from previously solved tasks. The central hypothesis investigated by this dissertation is that learning algorithms can solve new tasks more efficiently when they take into consideration knowledge learned from solving previous tasks. Al- though this hypothesis may appear to be obviously true, what knowledge to use and how to apply that knowledge to new tasks is a challenging, open research problem. I investigate this hypothesis in three ways. First, I developed a new learning algorithm that is able to use prior knowledge to constrain the exploration space. Second, I extended a powerful theoretical framework in machine learning, called Probably Approximately Correct, so that I can formally compare the efficiency of algorithms that solve only a single task to algorithms that consider knowledge from previously solved tasks. With this framework, I found sufficient conditions for using knowledge from previous tasks to improve efficiency of learning to solve new tasks and also identified conditions where transferring knowledge may impede learning. I present situations where transfer learning can be used to intelligently constrain the exploration space so that optimality loss can be minimized. Finally, I tested the efficiency of my algorithms in various experimental domains. These theoretical and empirical results provide support for my central hypothesis. The theory and experiments of this dissertation provide a deeper understanding of what makes a learning algorithm efficient so that it can be widely used in practice. Finally, these results also contribute the general goal of creating autonomous machines that can be reliably employed to solve complex tasks.
48

Introduction of statistics in optimization

Teytaud, Fabien 08 December 2011 (has links) (PDF)
In this thesis we study two optimization fields. In a first part, we study the use of evolutionary algorithms for solving derivative-free optimization problems in continuous space. In a second part we are interested in multistage optimization. In that case, we have to make decisions in a discrete environment with finite horizon and a large number of states. In this part we use in particular Monte-Carlo Tree Search algorithms. In the first part, we work on evolutionary algorithms in a parallel context, when a large number of processors are available. We start by presenting some state of the art evolutionary algorithms, and then, show that these algorithms are not well designed for parallel optimization. Because these algorithms are population based, they should be we well suitable for parallelization, but the experiments show that the results are far from the theoretical bounds. In order to solve this discrepancy, we propose some rules (such as a new selection ratio or a faster decrease of the step-size) to improve the evolutionary algorithms. Experiments are done on some evolutionary algorithms and show that these algorithms reach the theoretical speedup with the help of these new rules.Concerning the work on multistage optimization, we start by presenting some of the state of the art algorithms (Min-Max, Alpha-Beta, Monte-Carlo Tree Search, Nested Monte-Carlo). After that, we show the generality of the Monte-Carlo Tree Search algorithm by successfully applying it to the game of Havannah. The application has been a real success, because today, every Havannah program uses Monte-Carlo Tree Search algorithms instead of the classical Alpha-Beta. Next, we study more precisely the Monte-Carlo part of the Monte-Carlo Tree Search algorithm. 3 generic rules are proposed in order to improve this Monte-Carlo policy. Experiments are done in order to show the efficiency of these rules.
49

Multi-channel opportunistic access : a restless multi-armed bandit perspective

Wang, Kehao 22 June 2012 (has links) (PDF)
In the thesis, we address the fundamental problem of opportunistic spectrum access in a multi-channel communication system. Specifically, we consider a communication system in which a user has access to multiple channels, but is limited to sensing and transmitting only on one at a given time. We explore how the smart user should exploit past observations and the knowledge of the stochastic properties of these channels to maximize its transmission rate by switching channels opportunistically. Formally, we provide a generic analysis on the opportunistic spectrum access problem by casting the problem into the restless multi-armed bandit (RMAB) problem, one of the most well-known generalizations of the classic multi-armed bandit (MAB) problem, which is of fundamental importance in stochastic decision theory. Despite the significant research efforts in the field, the RMAB problem in its generic form still remains open. Until today, very little result is reported on the structure of the optimal policy. Obtaining the optimal policy for a general RMAB problem is often intractable due to the exponential computation complexity. Hence, a natural alternative is to seek a simple myopic policy maximizing the short-term reward. Therefore, we develop three axioms characterizing a family of functions which we refer to as regular functions, which are generic and practically important. We then establish the optimality of the myopic policy when the reward function can be expressed as a regular function and the discount factor is bounded by a closed-form threshold determined by the reward function. We also illustrate how the derived results, generic in nature, are applied to analyze a class of RMAB problems arising from multi-channel opportunistic access. Next, we further investigate the more challenging problem where the user has to decide the number of channels to sense in each slot in order to maximize its utility (e.g., throughput). After showing the exponential complexity of the problem, we develop a heuristic v-step look-ahead strategy. In the developed strategy, the parameter v allows to achieve a desired tradeoff between social efficiency and computation complexity. We demonstrate the benefits of the proposed strategy via numerical experiments on several typical settings.
50

Design of Quality Assuring Mechanisms with Learning for Strategic Crowds

Satyanath Bhat, K January 2017 (has links) (PDF)
In this thesis, we address several generic problems concerned with procurement of tasks from a crowd that consists of strategic workers with uncertainty in their qualities. These problems assume importance as the quality of services in a service marketplace is known to degrade when there is (unchecked) information asymmetry pertaining to quality. Moreover, crowdsourcing is increasingly being used for a wide variety of tasks these days since it offers high levels of flexibility to workers as well as employers. We seek to address the issue of quality uncertainty in crowdsourcing through mechanism design and machine learning. As the interactions in web-based crowdsourcing platform are logged, the data captured could be used to learn unknown parameters such as qualities of individual crowd workers. Further, many of these platforms invite bids by crowd workers for available tasks but the strategic workers may not bid truthfully. This warrants the use of mechanism design to induce truthful bidding. There ensues a complex interplay between machine learning and mechanism design, leading to interesting technical challenges. We resolve some generic challenges in the context of the following problems. Design of a quality eliciting mechanism with interdependent values We consider an expert sourcing problem, where a planner seeks opinions from a pool of experts. Execution of the task at an assured quality level in a cost effective manner turns out to be a mechanism design problem when the individual qualities are private information of the experts. Also, the task execution problem involves interdependent values, where truthfulness and efficiency cannot be achieved in an unrestricted setting due to an impossibility result. We propose a novel mechanism that exploits the special structure of the problem and guarantees allocative efficiency, ex-post incentive compatibility and strict budget balance for the mechanism, and ex-post individual rationality for the experts. Design of an optimal dimensional crowdsourcing auction We study the problem faced by an auctioneer who gains stochastic rewards by procuring multiple units of a service from a pool of heterogeneous strategic workers. The reward obtained depends on the inherent quality of the worker; the worker’s quality is fixed but unknown. The costs and capacities are private information of the workers. The auctioneer is required to elicit costs and capacities (making the mechanism design dimensional) and further, has to learn the qualities of the workers as well, to enable utility maximization. To solve this problem, we design a dimensional multi-armed bandit auction that maximizes the expected utility of the auctioneer subject to incentive compatibility and individual rationality while simultaneously learning the unknown qualities of the agents. Design of a multi-parameter learning mechanism for crowdsourcing We investigate the problem of allocating divisible jobs, arriving online, to workers in a crowd-sourcing platform. Each job is split into a certain number of tasks that are then allocated to workers. These tasks have to meet several constraints that depend on the worker performance. The performance of each worker in turn is characterized by several intrinsic stochastic parameters. In particular, we study a problem where each arriving job has to be completed within a deadline and each task has to be completed, honouring a lower bound on quality. The job completion time and quality of each worker are stochastic with fixed but unknown means. We propose a learning mechanism to elicit the costs truthfully while simultaneously learning the stochastic parameters. Our proposed mechanism is dominant strategy incentive compatible and ex-post individually rational with asymptotically optimal regret performance.

Page generated in 0.0462 seconds