• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 104
  • 14
  • 12
  • 5
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 204
  • 204
  • 109
  • 60
  • 41
  • 40
  • 40
  • 35
  • 34
  • 33
  • 33
  • 21
  • 21
  • 21
  • 19
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
201

Approximate Dynamic Programming and Reinforcement Learning - Algorithms, Analysis and an Application

Lakshminarayanan, Chandrashekar January 2015 (has links) (PDF)
Problems involving optimal sequential making in uncertain dynamic systems arise in domains such as engineering, science and economics. Such problems can often be cast in the framework of Markov Decision Process (MDP). Solving an MDP requires computing the optimal value function and the optimal policy. The idea of dynamic programming (DP) and the Bellman equation (BE) are at the heart of solution methods. The three important exact DP methods are value iteration, policy iteration and linear programming. The exact DP methods compute the optimal value function and the optimal policy. However, the exact DP methods are inadequate in practice because the state space is often large and in practice, one might have to resort to approximate methods that compute sub-optimal policies. Further, in certain cases, the system observations are known only in the form of noisy samples and we need to design algorithms that learn from these samples. In this thesis we study interesting theoretical questions pertaining to approximate and learning algorithms, and also present an interesting application of MDPs in the domain of crowd sourcing. Approximate Dynamic Programming (ADP) methods handle the issue of large state space by computing an approximate value function and/or a sub-optimal policy. In this thesis, we are concerned with conditions that result in provably good policies. Motivated by the limitations of the PBE in the conventional linear algebra, we study the PBE in the (min, +) linear algebra. It is a well known fact that deterministic optimal control problems with cost/reward criterion are (min, +)/(max, +) linear and ADP methods have been developed for such systems in literature. However, it is straightforward to show that infinite horizon discounted reward/cost MDPs are neither (min, +) nor (max, +) linear. We develop novel ADP schemes namely the Approximate Q Iteration (AQI) and Variational Approximate Q Iteration (VAQI), where the approximate solution is a (min, +) linear combination of a set of basis functions whose span constitutes a subsemimodule. We show that the new ADP methods are convergent and we present a bound on the performance of the sub-optimal policy. The Approximate Linear Program (ALP) makes use of linear function approximation (LFA) and offers theoretical performance guarantees. Nevertheless, the ALP is difficult to solve due to the presence of a large number of constraints and in practice, a reduced linear program (RLP) is solved instead. The RLP has a tractable number of constraints sampled from the original constraints of the ALP. Though the RLP is known to perform well in experiments, theoretical guarantees are available only for a specific RLP obtained under idealized assumptions. In this thesis, we generalize the RLP to define a generalized reduced linear program (GRLP) which has a tractable number of constraints that are obtained as positive linear combinations of the original constraints of the ALP. The main contribution here is the novel theoretical framework developed to obtain error bounds for any given GRLP. Reinforcement Learning (RL) algorithms can be viewed as sample trajectory based solution methods for solving MDPs. Typically, RL algorithms that make use of stochastic approximation (SA) are iterative schemes taking small steps towards the desired value at each iteration. Actor-Critic algorithms form an important sub-class of RL algorithms, wherein, the critic is responsible for policy evaluation and the actor is responsible for policy improvement. The actor and critic iterations have deferent step-size schedules, in particular, the step-sizes used by the actor updates have to be generally much smaller than those used by the critic updates. Such SA schemes that use deferent step-size schedules for deferent sets of iterates are known as multitimescale stochastic approximation schemes. One of the most important conditions required to ensure the convergence of the iterates of a multi-timescale SA scheme is that the iterates need to be stable, i.e., they should be uniformly bounded almost surely. However, the conditions that imply the stability of the iterates in a multi-timescale SA scheme have not been well established. In this thesis, we provide veritable conditions that imply stability of two timescale stochastic approximation schemes. As an example, we also demonstrate that the stability of a widely used actor-critic RL algorithm follows from our analysis. Crowd sourcing (crowd) is a new mode of organizing work in multiple groups of smaller chunks of tasks and outsourcing them to a distributed and large group of people in the form of an open call. Recently, crowd sourcing has become a major pool for human intelligence tasks (HITs) such as image labeling, form digitization, natural language processing, machine translation evaluation and user surveys. Large organizations/requesters are increasingly interested in crowd sourcing the HITs generated out of their internal requirements. Task starvation leads to huge variation in the completion times of the tasks posted on to the crowd. This is an issue for frequent requesters desiring predictability in the completion times of tasks specified in terms of percentage of tasks completed within a stipulated amount of time. An important task attribute that affects the completion time of a task is its price. However, a pricing policy that does not take the dynamics of the crowd into account might fail to achieve the desired predictability in completion times. Here, we make use of the MDP framework to compute a pricing policy that achieves predictable completion times in simulations as well as real world experiments.
202

Learning Continuous Human-Robot Interactions from Human-Human Demonstrations

Vogt, David 02 March 2018 (has links)
In der vorliegenden Dissertation wurde ein datengetriebenes Verfahren zum maschinellen Lernen von Mensch-Roboter Interaktionen auf Basis von Mensch-Mensch Demonstrationen entwickelt. Während einer Trainingsphase werden Bewegungen zweier Interakteure mittels Motion Capture erfasst und in einem Zwei-Personen Interaktionsmodell gelernt. Zur Laufzeit wird das Modell sowohl zur Erkennung von Bewegungen des menschlichen Interaktionspartners als auch zur Generierung angepasster Roboterbewegungen eingesetzt. Die Leistungsfähigkeit des Ansatzes wird in drei komplexen Anwendungen evaluiert, die jeweils kontinuierliche Bewegungskoordination zwischen Mensch und Roboter erfordern. Das Ergebnis der Dissertation ist ein Lernverfahren, das intuitive, zielgerichtete und sichere Kollaboration mit Robotern ermöglicht.
203

A nonparametric Bayesian perspective for machine learning in partially-observed settings

Akova, Ferit 31 July 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Robustness and generalizability of supervised learning algorithms depend on the quality of the labeled data set in representing the real-life problem. In many real-world domains, however, we may not have full knowledge of the underlying data-generating mechanism, which may even have an evolving nature introducing new classes continually. This constitutes a partially-observed setting, where it would be impractical to obtain a labeled data set exhaustively defined by a fixed set of classes. Traditional supervised learning algorithms, assuming an exhaustive training library, would misclassify a future sample of an unobserved class with probability one, leading to an ill-defined classification problem. Our goal is to address situations where such assumption is violated by a non-exhaustive training library, which is a very realistic yet an overlooked issue in supervised learning. In this dissertation we pursue a new direction for supervised learning by defining self-adjusting models to relax the fixed model assumption imposed on classes and their distributions. We let the model adapt itself to the prospective data by dynamically adding new classes/components as data demand, which in turn gradually make the model more representative of the entire population. In this framework, we first employ suitably chosen nonparametric priors to model class distributions for observed as well as unobserved classes and then, utilize new inference methods to classify samples from observed classes and discover/model novel classes for those from unobserved classes. This thesis presents the initiating steps of an ongoing effort to address one of the most overlooked bottlenecks in supervised learning and indicates the potential for taking new perspectives in some of the most heavily studied areas of machine learning: novelty detection, online class discovery and semi-supervised learning.
204

Renforcement de la sécurité à travers les réseaux programmables

Abou El Houda, Zakaria 09 1900 (has links)
La conception originale d’Internet n’a pas pris en compte les aspects de sécurité du réseau; l’objectif prioritaire était de faciliter le processus de communication. Par conséquent, de nombreux protocoles de l’infrastructure Internet exposent un ensemble de vulnérabilités. Ces dernières peuvent être exploitées par les attaquants afin de mener un ensemble d’attaques. Les attaques par déni de service distribué (Distributed Denial of Service ou DDoS) représentent une grande menace et l’une des attaques les plus dévastatrices causant des dommages collatéraux aux opérateurs de réseau ainsi qu’aux fournisseurs de services Internet. Les réseaux programmables, dits Software-Defined Networking (SDN), ont émergé comme un nouveau paradigme promettant de résoudre les limitations de l’architecture réseau actuelle en découplant le plan de contrôle du plan de données. D’une part, cette séparation permet un meilleur contrôle du réseau et apporte de nouvelles capacités pour mitiger les attaques par déni de service distribué. D’autre part, cette séparation introduit de nouveaux défis en matière de sécurité du plan de contrôle. L’enjeu de cette thèse est double. D’une part, étudier et explorer l’apport de SDN à la sécurité afin de concevoir des solutions efficaces qui vont mitiger plusieurs vecteurs d’attaques. D’autre part, protéger SDN contre ces attaques. À travers ce travail de recherche, nous contribuons à la mitigation des attaques par déni de service distribué sur deux niveaux (intra-domaine et inter-domaine), et nous contribuons au renforcement de l’aspect sécurité dans les réseaux programmables. / The original design of Internet did not take into consideration security aspects of the network; the priority was to facilitate the process of communication. Therefore, many of the protocols that are part of the Internet infrastructure expose a set of vulnerabilities that can be exploited by attackers to carry out a set of attacks. Distributed Denial-of-Service (DDoS) represents a big threat and one of the most devastating and destructive attacks plaguing network operators and Internet service providers (ISPs) in a stealthy way. Software defined networks (SDN), an emerging technology, promise to solve the limitations of the conventional network architecture by decoupling the control plane from the data plane. On one hand, the separation of the control plane from the data plane allows for more control over the network and brings new capabilities to deal with DDoS attacks. On the other hand, this separation introduces new challenges regarding the security of the control plane. This thesis aims to deal with various types of attacks including DDoS attacks while protecting the resources of the control plane. In this thesis, we contribute to the mitigation of both intra-domain and inter-domain DDoS attacks, and to the reinforcement of security aspects in SDN.

Page generated in 0.0649 seconds