• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 695
  • 81
  • 68
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1118
  • 1118
  • 277
  • 234
  • 216
  • 189
  • 168
  • 167
  • 160
  • 157
  • 152
  • 135
  • 129
  • 128
  • 119
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
421

Použití zpětnovazebního učení pro hraní textových her / Using reinforcement learning to learn how to play text-based games

Zelinka, Mikuláš January 2017 (has links)
The ability to learn optimal control policies in systems where action space is defined by sentences in natural language would allow many interesting real-world applications such as automatic optimisation of dialogue systems. Text-based games with multiple endings and rewards are a promising platform for this task, since their feedback allows us to employ reinforcement learning techniques to jointly learn text representations and control policies. We present a general text game playing agent, testing its generalisation and transfer learning performance and showing its ability to play multiple games at once. We also present pyfiction, an open-source library for universal access to different text games that could, together with our agent that implements its interface, serve as a baseline for future research.
422

Minimizing Regret in Combinatorial Bandits and Reinforcement Learning

Talebi Mazraeh Shahi, Mohammad Sadegh January 2017 (has links)
This thesis investigates sequential decision making tasks that fall in the framework of reinforcement learning (RL). These tasks involve a decision maker repeatedly interacting with an environment modeled by an unknown finite Markov decision process (MDP), who wishes to maximize a notion of reward accumulated during her experience. Her performance can be measured through the notion of regret, which compares her accumulated expected reward against that achieved by an oracle algorithm always following an optimal behavior. In order to maximize her accumulated reward, or equivalently to minimize the regret, she needs to face a trade-off between exploration and exploitation. The first part of this thesis investigates combinatorial multi-armed bandit (MAB) problems, which are RL problems whose state-space is a singleton. It also addresses some applications that can be cast as combinatorial MAB problems. The number of arms in such problems generically grows exponentially with the number of basic actions, but the rewards of various arms are correlated. Hence, the challenge in such problems is to exploit the underlying combinatorial structure.For these problems, we derive asymptotic (i.e., when the time horizon grows large) lower bounds on the regret of any admissible algorithm and investigate how these bounds scale with the dimension of the underlying combinatorial structure. We then propose several algorithms and provide finite-time analyses of their regret. The proposed algorithms efficiently exploit the structure of the problem, provide better performance guarantees than existing algorithms, and significantly outperform these algorithms in practice. The second part of the thesis concerns RL in an unknown and discrete MDP under the average-reward criterion. We develop some variations of the transportation lemma that could serve as novel tools for the regret analysis of RL algorithms. Revisiting existing regret lower bounds allows us to derive alternative bounds, which motivate that the local variance of the bias function of the MDP, i.e., the variance with respect to next-state transition laws, could serve as a notion of problem complexity for regret minimization in RL. Leveraging these tools also allows us to report a novel regret analysis of the KL-UCRL algorithm for ergodic MDPs. The leading term in our regret bound depends on the local variance of the bias function, thus coinciding with observations obtained from our presented lower bounds. Numerical evaluations in some benchmark MDPs indicate that the leading term of the derived bound can provide an order of magnitude improvement over previously known results for this algorithm. / <p>QC 20171215</p>
423

Tuning evolutionary search for closed-loop optimization

Allmendinger, Richard January 2012 (has links)
Closed-loop optimization deals with problems in which candidate solutions are evaluated by conducting experiments, e.g. physical or biochemical experiments. Although this form of optimization is becoming more popular across the sciences, it may be subject to rather unexplored resourcing issues, as any experiment may require resources in order to be conducted. In this thesis we are concerned with understanding how evolutionary search is affected by three particular resourcing issues -- ephemeral resource constraints (ERCs), changes of variables, and lethal environments -- and the development of search strategies to combat these issues. The thesis makes three broad contributions. First, we motivate and formally define the resourcing issues considered. Here, concrete examples in a range of applications are given. Secondly, we theoretically and empirically investigate the effect of the resourcing issues considered on evolutionary search. This investigation reveals that resourcing issues affect optimization in general, and that clear patterns emerge relating specific properties of the different resourcing issues to performance effects. Thirdly, we develop and analyze various search strategies augmented on an evolutionary algorithm (EA) for coping with resourcing issues. To cope specifically with ERCs, we develop several static constraint-handling strategies, and investigate the application of reinforcement learning techniques to learn when to switch between these static strategies during an optimization process. We also develop several online resource-purchasing strategies to cope with ERCs that leave the arrangement of resources to the hands of the optimizer. For problems subject to changes of variables relating to the resources, we find that knowing which variables are changed provides an optimizer with valuable information, which we exploit using a novel dynamic strategy. Finally, for lethal environments, where visiting parts of the search space can cause the permanent loss of resources, we observe that a standard EA's population may be reduced in size rapidly, complicating the search for innovative solutions. To cope with such scenarios, we consider some non-standard EA setups that are able to innovate genetically whilst simultaneously mitigating risks to the evolving population.
424

On exploiting location flexibility in data-intensive distributed systems

Yu, Boyang 12 October 2016 (has links)
With the fast growth of data-intensive distributed systems today, more novel and principled approaches are needed to improve the system efficiency, ensure the service quality to satisfy the user requirements, and lower the system running cost. This dissertation studies the design issues in the data-intensive distributed systems, which are differentiated from other systems by the heavy workload of data movement and are characterized by the fact that the destination of each data flow is limited to a subset of available locations, such as those servers holding the requested data. Besides, even among the feasible subset, different locations may result in different performance. The studies in this dissertation improve the data-intensive systems by exploiting the data storage location flexibility. It addresses how to reasonably determine the data placement based on the measured request patterns, to improve a series of performance metrics, such as the data access latency, system throughput and various costs, by the proposed hypergraph models for data placement. To implement the proposal with a lower overhead, a sketch-based data placement scheme is presented, which constructs the sparsified hypergraph under a distributed and streaming-based system model, achieving a good approximation on the performance improvement. As the network can potentially become the bottleneck of distributed data-intensive systems due to the frequent data movement among storage nodes, the online data placement by reinforcement learning is proposed which intelligently determines the storage locations of each data item at the moment that the item is going to be written or updated, with the joint-awareness of network conditions and request patterns. Meanwhile, noticing that distributed memory caches are effective measures in lowering the workload to the backend storage systems, the auto-scaling of memory cache clusters is studied, which tries to balance the energy cost of the service and the performance ensured. As the outcome of this dissertation, the designed schemes and methods essentially help to improve the running efficiency of data-intensive distributed systems. Therefore, they can either help to improve the user-perceived service quality under the same level of system resource investment, or help to lower the monetary expense and energy consumption in maintaining the system under the same performance standard. From the two perspectives, both the end users and the system providers could obtain benefits from the results of the studies. / Graduate
425

Aprendizado por reforço relacional para o controle de robôs sociáveis / Relational reinforcement learning to control sociable robots

Renato Ramos da Silva 10 March 2009 (has links)
A inteligência artificial não busca somente entender mas construir entidades inteligentes. A inteligência pode ser dividida em vários fatores e um deles é conhecido como aprendizado. A área de aprendizado de máquina visa o desenvolvimento de técnicas para aprendizado automático de máquinas, que incluem computadores, robôs ou qualquer outro dispositivo. Entre essas técnicas encontra-se o Aprendizado por Reforço, foco principal deste trabalho. Mais especificamente, o aprendizado por reforço relacional (ARR) foi investigado, que representa na forma relacional o aprendizado obtido através da interação direta com o ambiente. O ARR é bem interessante no campo de robótica, pois, em geral, não se dispôe do modelo do ambiente e se requer econômia de recursos utilizados. A técnica ARR foi investigada dentro do contexto de aprendizado de uma cabeça robótica. Uma modificação no algoritmo ARR foi proposta, denominada por ETG, e incorporada em uma arquitetura de controle de uma cabeça robótica. A arquitetura foi avaliada no contexto de um problema real não trivial: o aprendizado da atenção compartilhada. Os resultados obtidos mostram que a arquitetura é capaz de exibir comportamentos apropriados durante uma interação social controlada, através da utilização do ETG. Uma análise comparativa com outros métodos foi realizada que mostram que o algoritmo proposto conseguiu obter um desempenho superior na maioria dos experimentos realizados / The artificial Intelligence search not only understand but to build intelligent entities. The intelligence can be divided into several factors and one of them is known as learning. The area of machine learning aimed at the development techniques for automatic learning of machinery, including computers, robots or any other device. Reinforcement Learning is one of those techniques, main focus of this work. Specifically, the relational reinforcement learning was investigated, which is use relational representation for learning obtained through direct interaction with the environment. The relational reinforcement learning is quite interesting in the field of robotics, because, in general, it does not have the model of environment and economy of resources used are required. The relational reinforcement learning technique was investigated within the context of learning a robotic head. A change in the relational reinforcement learning algorithm was proposed, called TGE, and incorporated into an architecture of control of a robotic head. The architecture was evaluated in the context of a real problem not trivial: the learning of shared attention. The results show that the architecture is capable of displaying appropriate behavior during a social interaction controlled through the use of TGE. A comparative analysis was performed with other methods show that the proposed algorithm has achieved a superior performance in most experiments
426

Zero-Knowledge Agent Trained for the Game of Risk

Bethdavid, Simon January 2020 (has links)
Recent developments in deep reinforcement learning applied to abstract strategy games such as Go, chess and Hex have sparked an interest within military planning. This Master thesis explores if it is possible to implement an algorithm similar to Expert Iteration and AlphaZero to wargames. The studied wargame is Risk, which is a turn-based multiplayer game played on a simplified political map of the world. The algorithms consist of an expert, in the form of a Monte Carlo tree search algorithm, and an apprentice, implemented through a neural network. The neural network is trained by imitation learning, trained to mimic expert decisions generated from self-play reinforcement learning. The apprentice is then used as heuristics in forthcoming tree searches. The results demonstrated that a Monte Carlo tree search algorithm could, to some degree, be employed on a strategy game as Risk, dominating a random playing agent. The neural network, fed with a state representation in the form of a vector, had difficulty in learning expert decisions and could not beat a random playing agent. This led to a halt in the expert/apprentice learning process. However, possible solutions are provided as future work.
427

Device to Device Communications for Smart Grid

Shimotakahara, Kevin 17 June 2020 (has links)
This thesis identifies and addresses two barriers to the adoption of Long Term Evolution (LTE) Device-to-Device (D2D) communication enabled smart grid applications in out of core network coverage regions. The first barrier is the lack of accessible simulation software for engineers to develop and test the feasibility of their D2D LTE enabled smart grid application designs. The second barrier is the lack of a distributed resource allocation algorithm for LTE D2D communications that has been tailored to the needs of smart grid applications. A solution was proposed to the first barrier in the form of a simulator constructed in Matlab/Simulink used to simulate power systems and the underlying communication system, i.e., D2D communication protocol stack of Long Term Evolution (LTE). The simulator is built using Matlab's LTE System Toolbox, SimEvents, and Simscape Power Systems in addition to an in-house developed interface software to facilitate D2D communications in smart grid applications. To test the simulator, a simple fault location, isolation, and restoration (FLISR) application was implemented using the simulator to show that the LTE message timing is consistent with the relay signaling in the power system. A solution was proposed to the second barrier in the form of a multi-agent Q-learning based resource allocation algorithm that allows Long Term Evolution (LTE) enabled device-to-device (D2D) communication agents to generate orthogonal transmission schedules outside of network coverage. This algorithm reduces packet drop rates (PDR) in distributed D2D communication networks to meet the quality of service requirements of microgrid communications. The PDR and latency performance of the proposed algorithm was compared to the existing random self-allocation mechanism introduced under the Third Generation Partnership Project's LTE Release 12. The proposed algorithm outperformed the LTE algorithm for all tested scenarios, demonstrating 20-40% absolute reductions in PDR and 10-20 ms reductions in latency for all microgrid applications.
428

Is the Click the Trick? The Efficacy of Clickers and Other Reinforcement Methods in Training Naïve Dogs to Perform New Tasks

January 2020 (has links)
abstract: A handheld metal noisemaker known as a “clicker” is widely used to train new behaviors in dogs; however, evidence for the superior efficacy of clickers as opposed to providing solely primary reinforcement or other secondary reinforcers in the acquisition of novel behavior in dogs is almost entirely anecdotal. Three experiments were conducted to determine under what circumstances a clicker may result in acquisition of a novel behavior more rapidly or to a higher level compared to other readily available reinforcement methods. In Experiment 1, three groups of 30 dogs each were trained to emit a novel sit and stay behavior of increasing duration with either the delivery of food alone, a verbal stimulus paired with food, or a clicker with food. The group that received only a primary reinforcer reached a significantly higher criterion of training success than the group trained with a verbal secondary reinforcer. Performance of the group experiencing a clicker secondary reinforcer was intermediate between the other two groups, but not significantly different from either. In Experiment 2, three different groups of 25 dogs each were shaped to emit a nose targeting behavior and then perform that behavior at increasing distances from the experimenter using the same three methods of positive reinforcement as in Experiment 1. No statistically significant differences between the groups were found. In Experiment 3, three groups of 30 dogs each were shaped to emit a nose-targeting behavior upon an array of wooden blocks with task difficulty increasing throughout testing using the same three methods of positive reinforcement as previously. No statistically significant differences between the groups were found. Overall, the findings suggest that both clickers and other forms of positive reinforcement can be used successfully in training a dog to perform a novel behavior, but that no positive reinforcement method has significantly greater efficacy than any other. / Dissertation/Thesis / Masters Thesis Psychology 2020
429

Better cooperation through communication in multi-agent reinforcement learning

Kiseliou, Ivan January 2020 (has links)
Cooperative needs play a critical role in the organisation of natural systems of communications. A number of recent studies in multi-agent reinforcement learning have established that artificial intelligence agents are similarly able to develop functional communication when required to complete a cooperative task. This thesis studies the emergence of communication in reinforcement learning agents, using a custom card game environment as a test-bed. Two contrasting approaches encompassing continuous and discrete modes of communication were appraised experimentally. Based on the average game completion rate, the agents provisioned with a continuous communication channel consistently exceed the no-communication baseline. A qualitative analysis of the agents’ behavioural strategies reveals a clearly defined communication protocol as well as the deployment of playing tactics unseen in the baseline agents. On the other hand, the agents equipped with the discrete channel fail to learn to utilise it effectively, ultimately showing no improvement from the baseline.
430

On Hierarchical Goal Based Reinforcement Learning

Denis, Nicholas 27 August 2019 (has links)
Discrete time sequential decision processes require that an agent select an action at each time step. As humans, we plan over long time horizons and use temporal abstraction by selecting temporally extended actions such as “make lunch” or “get a masters degree”, whereby each is comprised of more granular actions. This thesis concerns itself with such hierarchical temporal abstractions in the form of macro actions and options, as they apply to goal-based Markov Decision Processes. A novel algorithm for discovering hierarchical macro actions in goal-based MDPs, as well as a novel algorithm utilizing landmark options for transfer learning in multi-task goal- based reinforcement learning settings are introduced. Theoretical properties regarding the life-long regret of an agent executing the latter algorithm are also discussed.

Page generated in 0.1056 seconds