• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 634
  • 81
  • 66
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1036
  • 1036
  • 251
  • 215
  • 185
  • 175
  • 156
  • 153
  • 152
  • 148
  • 139
  • 123
  • 119
  • 115
  • 109
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Condensing observation of locale and agents : a state representation /

Valenti, Brett A. January 1900 (has links)
Thesis (M.S.)--Oregon State University, 2010. / Printout. Includes bibliographical references (leaves 67-72). Also available on the World Wide Web.
82

The assessment of preference for qualitatively different reinforcers in persons with developmental and learning disabilities a comparison of value using behavioral economic and standard preference assessment procedures /

Bredthauer, Jennifer Lyn. Johnston, James M., January 2009 (has links)
Thesis (Ph. D.)--Auburn University. / Abstract. Includes bibliographical references (p. 148-163).
83

Limitations and extensions of the WoLF-PHC algorithm /

Cook, Philip R., January 2007 (has links) (PDF)
Thesis (M.S.)--Brigham Young University. Dept. of Computer Science, 2007. / Includes bibliographical references (p. 93-101).
84

Task localization, similarity, and transfer : towards a reinforcement learning task library system /

Carroll, James Lamond, January 2005 (has links) (PDF)
Thesis (M.S.)--Brigham Young University. Dept. of Computer Science, 2005. / Includes bibliographical references (p. 113-117).
85

Zpětnovazební učení v multiagentním makroekonomickém modelu / Reinforcement learning in Agent-based macroeconomic model

Vlk, Bořivoj January 2018 (has links)
Utilizing game theory, learning automata and reinforcement learning concepts, thesis presents a computational model (simulation) based on general equilibrium theory and classical monetary model. Model is based on interacting Constructively Rational agents. Constructive Ratio- nality has been introduced in current literature as machine learning based concept that allows relaxing assumptions on modeled economic agents information and ex- pectations. Model experiences periodical endogenous crises (Fall in both production and con- sumption accompanied with rise in unemployment rate). Crises are caused by firms and households adopting to a change in price and wage levels. Price and wage level adjustments are necessary for the goods and labor market to clear in the presence of technological growth. Finally, model has good theoretical background and large potential for further de- velopment. Also, general properties of games of learning entities are examined, with special focus on sudden changes (shocks) in the game and behavior of game's play- ers, during recovery from which rigidities can emerge. JEL Classification D80, D83, C63, E32, C73, Keywords Learning, Information and Knowledge, Agent-based, Reinforcement learning, Business cycle, Stochastic and Dynamic Games, Simulation, Modeling Author's e-mail...
86

Motor expectancy: the modulation of the reward positivity in a reinforcement learning motor task

Trska, Robert 30 August 2018 (has links)
An adage posits that we learn from our mistakes; however, this is not entirely true. According to reinforcement learning theory, we learn when the expectation of our actions differs from outcomes. Here, we examined whether expectancy driven learning lends a role in motor learning. Given the vast amount of overlapping anatomy and circuitry within the brain with respect to reward and motor processes, it is appropriate to examine both motor control and expectancy processes within a singular task. In the current study, participants performed a line drawing task via tablet under conditions of changing expectancies. Participants were provided feedback in a reinforcement-learning manner, as positive (✓) or negative (x) based off their performance. Modulation of expected outcomes were reflected by changes in amplitude of the human event-related potential (ERP), the reward positivity. The reward positivity is thought to reflect phasic dopamine release from the mesolimbic dopaminergic system to the basal ganglia and cingulate cortex. Due to the overlapping circuitry of reward and motor pathways, another human ERP, the bereitschatftspotential (BP), was examined. The BP is implicated in motor planning and execution; however, the late aspect of the BP shares similarity with the contingent negative variability (CNV). Current evidence demonstrates a relationship between expectancy and reward positivity amplitude in a motor learning context, as well as modulation of the BP under difficult task conditions. Behavioural data supports prior literature and may suggest a connection between sensory motor prediction errors working in concert with reward prediction errors. Further evidence supports a frontal-medial evaluation system for motor errors. Additionally, results support prior evidence of motor plans being formed upon target observation and held in memory until motor execution, rather than their formation before movement onset. / Graduate
87

Extending dynamic scripting

Ludwig, Jeremy R. 12 1900 (has links)
xvi, 167 p. A print copy of this thesis is available through the UO Libraries. Search the library catalog for the location and call number. / The dynamic scripting reinforcement learning algorithm can be extended to improve the speed, effectiveness, and accessibility of learning in modern computer games without sacrificing computational efficiency. This dissertation describes three specific enhancements to the dynamic scripting algorithm that improve learning behavior and flexibility while imposing a minimal computational cost: (1) a flexible, stand alone version of dynamic scripting that allows for hierarchical dynamic scripting, (2) a method of using automatic state abstraction to increase the context sensitivity of the algorithm, and (3) an integration of this algorithm with an existing hierarchical behavior modeling architecture. The extended dynamic scripting algorithm is then examined in the three different contexts. The first results reflect a preliminary investigation based on two abstract real-time strategy games. The second set of results comes from a number of abstract tactical decision games, designed to demonstrate the strengths and weaknesses of extended dynamic scripting. The third set of results is generated by a series of experiments in the context of the commercial computer role-playing game Neverwinter Nights demonstrating the capabilities of the algorithm in an actual game. To conclude, a number of future research directions for investigating the effectiveness of extended dynamic scripting are described. / Adviser: Arthur Farley
88

A reinforcement-learning approach to understanding loss-chasing behavior in rats

Marshall, Andrew Thomas January 1900 (has links)
Doctor of Philosophy / Psychological Sciences / Kimberly Kirkpatrick / Risky decisions are inherently characterized by the potential to receive gains and losses from these choices, and gains and losses have distinct effects on global risky choice behavior and the likelihoods of making risky choices depending on the outcome of the previous choice. One translationally-relevant phenomenon of risky choice is loss-chasing, in which individuals make risky choices following losses. However, the mechanisms of loss-chasing are poorly understood. The goal of two experiments was to illuminate the mechanisms governing individual differences in loss-chasing and risky choice behaviors. In two experiments, rats chose between a certain outcome that always delivered reward and a risky outcome that probabilistically delivered reward. In Experiment 1, loss processing and loss-chasing behavior were assessed in the context of losses-disguised-as-wins (LDWs), or loss outcomes presented along with gain-related stimuli. The rats presented with LDWs were riskier and less sensitive to differential losses. In Experiment 2, these behaviors were assessed relative to the number of risky losses that could be experienced. Here, the addition of reward omission or a small non-zero loss to the possible risky outcomes elicited substantial individual differences in risky choice, with some rats increasing, decreasing, or maintaining their previous risky choice preferences. Several reinforcement learning (RL) models were fit to individual rats’ data to elucidate the possible psychological mechanisms that best accounted for individual differences in risky choice and loss-chasing behaviors. The RL analyses indicated that the critical predictors of risky choice and loss-chasing behavior were the different rates that individuals updated value estimates with newly experienced gains and losses. Thus, learning deficits may predict individual differences in maladaptive risky decision making. Accordingly, targeted interventions to alleviate learning deficits may ultimately increase the likelihood of making more optimal and informed choices.
89

Multi-Agent Area Coverage Control Using Reinforcement Learning Techniques

Adepegba, Adekunle Akinpelu January 2016 (has links)
An area coverage control law in cooperation with reinforcement learning techniques is proposed for deploying multiple autonomous agents in a two-dimensional planar area. A scalar field characterizes the risk density in the area to be covered yielding nonuniform distribution of agents while providing optimal coverage. This problem has traditionally been addressed in the literature to date using locational optimization and gradient descent techniques, as well as proportional and proportional-derivative controllers. In most cases, agents' actuator energy required to drive them in optimal configurations in the workspace is not considered. Here the maximum coverage is achieved with minimum actuator energy required by each agent. Similar to existing coverage control techniques, the proposed algorithm takes into consideration time-varying risk density. These density functions represent the probability of an event occurring (e.g., the presence of an intruding target) at a certain location or point in the workspace indicating where the agents should be located. To this end, a coverage control algorithm using reinforcement learning that moves the team of mobile agents so as to provide optimal coverage given the density functions as they evolve over time is being proposed. Area coverage is modeled using Centroidal Voronoi Tessellation (CVT) governed by agents. Based on [1,2] and [3], the application of Centroidal Voronoi tessellation is extended to a dynamic changing harbour-like environment. The proposed multi-agent area coverage control law in conjunction with reinforcement learning techniques is implemented in a distributed manner whereby the multi-agent team only need to access information from adjacent agents while simultaneously providing dynamic target surveillance for single and multiple targets and feedback control of the environment. This distributed approach describes how automatic flocking behaviour of a team of mobile agents can be achieved by leveraging the geometrical properties of centroidal Voronoi tessellation in area coverage control while enabling multiple targets tracking without the need of consensus between individual agents. Agent deployment using a time-varying density model is being introduced which is a function of the position of some unknown targets in the environment. A nonlinear derivative of the error coverage function is formulated based on the single-integrator agent dynamics. The agent, aware of its local coverage control condition, learns a value function online while leveraging the same from its neighbours. Moreover, a novel computational adaptive optimal control methodology based on work by [4] is proposed that employs the approximate dynamic programming technique online to iteratively solve the algebraic Riccati equation with completely unknown system dynamics as a solution to linear quadratic regulator problem. Furthermore, an online tuning adaptive optimal control algorithm is implemented using an actor-critic neural network recursive least-squares solution framework. The work in this thesis illustrates that reinforcement learning-based techniques can be successfully applied to non-uniform coverage control. Research combining non-uniform coverage control with reinforcement learning techniques is still at an embryonic stage and several limitations exist. Theoretical results are benchmarked and validated with related works in area coverage control through a set of computer simulations where multiple agents are able to deploy themselves, thus paving the way for efficient distributed Voronoi coverage control problems.
90

An Automated VNF Manager based on Parameterized-Action MDP and Reinforcement Learning

Li, Xinrui 15 April 2021 (has links)
Managing and orchestrating the behaviour of virtualized Network Functions (VNFs) remains a major challenge due to their heterogeneity and the ever increasing resource demands of the served flows. In this thesis, we propose a novel VNF manager (VNFM) that employs a parameterized actions-based reinforcement learning mechanism to simultaneously decide on the optimal VNF management action (e.g., migration, scaling, termination or rebooting) and the action's corresponding configuration parameters (e.g., migration location or amount of resources needed for scaling ). More precisely, we first propose a novel parameterized-action Markov decision process (PAMDP) model to accurately describe each VNF, instances of its components and their communication as well as the set of permissible management actions by the VNFM and the rewards of realizing these actions. The use of parameterized actions allows us to rigorously represent the functionalities of the VNFM in order perform various Lifecycle management (LCM) operations on the VNFs. Next, we propose a two-stage reinforcement learning (RL) scheme that alternates between learning an action-value function for the discrete LCM actions and updating the actions parameters selection policy. In contrast to existing machine learning schemes, the proposed work uniquely provides a holistic management platform the unifies individual efforts targeting individual LCM functions such as VNF placement and scaling. Performance evaluation results demonstrate the efficiency of the proposed VNFM in maintaining the required performance level of the VNF while optimizing its resource configurations.

Page generated in 0.1432 seconds