• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1405
  • 372
  • 195
  • 159
  • 74
  • 59
  • 45
  • 24
  • 23
  • 21
  • 17
  • 17
  • 17
  • 17
  • 17
  • Tagged with
  • 2974
  • 1247
  • 565
  • 391
  • 346
  • 295
  • 256
  • 251
  • 243
  • 242
  • 240
  • 226
  • 203
  • 197
  • 173
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
331

A Reinforcement-Learning Approach to Power Management

Steinbach, Carl 01 May 2002 (has links)
We describe an adaptive, mid-level approach to the wireless device power management problem. Our approach is based on reinforcement learning, a machine learning framework for autonomous agents. We describe how our framework can be applied to the power management problem in both infrastructure and ad~hoc wireless networks. From this thesis we conclude that mid-level power management policies can outperform low-level policies and are more convenient to implement than high-level policies. We also conclude that power management policies need to adapt to the user and network, and that a mid-level power management framework based on reinforcement learning fulfills these requirements.
332

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P. 01 August 1993 (has links)
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong.
333

Subsitution of stimulus functions as a means to distinguish among different types of functional classes /

Delgado, Diana January 2005 (has links)
Thesis (M.A.)--University of Nevada, Reno, 2005. / "May, 2005." Includes bibliographical references (leaves 47-49). Online version available on the World Wide Web. Library also has microfilm. Ann Arbor, Mich. : ProQuest Information and Learning Company, [2005]. 1 microfilm reel ; 35 mm.
334

Human responding on a fixed interval schedule : effects of manipulations on the response, the consequence, and the establishing operation /

Johnston, Michael R. January 2005 (has links)
Thesis (Ph. D.)--University of Nevada, Reno, 2005. / "May, 2005." Includes bibliographical references (leaves 111-117). Online version available on the World Wide Web. Library also has microfilm. Ann Arbor, Mich. : ProQuest Information and Learning Company, [2005]. 1 microfilm reel ; 35 mm.
335

Closed-Loop Learning of Visual Control Policies

Jodogne, Sébastien 05 December 2006 (has links)
In this dissertation, I introduce a general, flexible framework for learning direct mappings from images to actions in an agent that interacts with its surrounding environment. This work is motivated by the paradigm of purposive vision. The original contributions consist in the design of reinforcement learning algorithms that are applicable to visual spaces. Inspired by the paradigm of local-appearance vision, these algorithms exploit specialized visual features that can be detected in the visual signal. Two different ways to use the visual features are described. Firstly, I introduce adaptive-resolution methods for discretizing the visual space into a manageable number of perceptual classes. To this end, a percept classifier that tests the presence or absence of few highly informative visual features is incrementally refined. New discriminant visual features are selected in a sequence of attempts to remove perceptual aliasing. Any standard reinforcement learning algorithm can then be used to extract an optimal visual control policy. The resulting algorithm is called "Reinforcement Learning of Visual Classes." Secondly, I propose to exploit the raw content of the visual features, without ever considering an equivalence relation on the visual feature space. Technically, feature regression models that associate visual features with a real-valued utility are introduced within the Approximate Policy Iteration architecture. This is done by means of a general, abstract version of Approximate Policy Iteration. This results in the "Visual Approximate Policy Iteration" algorithm. Another major contribution of this dissertation is the design of adaptive-resolution techniques that can be applied to complex, high-dimensional and/or continuous action spaces, simultaneously to visual spaces. The "Reinforcement Learning of Joint Classes" algorithm produces a non-uniform discretization of the joint space of percepts and actions. This is a brand new, general approach to adaptive-resolution methods in reinforcement learning that can deal with arbitrary, hybrid state-action spaces. Throughout this dissertation, emphasis is also put on the design of general algorithms that can be used in non-visual (e.g. continuous) perceptual spaces. The applicability of the proposed algorithms is demonstrated by solving several visual navigation tasks.
336

Constructing Neuro-Fuzzy Control Systems Based on Reinforcement Learning Scheme

Pei, Shan-cheng 10 September 2007 (has links)
Traditionally, the fuzzy rules for a fuzzy controller are provided by experts. They cannot be trained from a set of input-output training examples because the correct response of the plant being controlled is delayed and cannot be obtained immediately. In this paper, we propose a novel approach to construct fuzzy rules for a fuzzy controller based on reinforcement learning. Our task is to learn from the delayed reward to choose sequences of actions that result in the best control. A neural network with delays is used to model the evaluation function Q. Fuzzy rules are constructed and added as the learning proceeds. Both the weights of the Q-learning network and the parameters of the fuzzy rules are tuned by gradient descent. Experimental results have shown that the fuzzy rules obtained perform effectively for control.
337

Sparse Value Function Approximation for Reinforcement Learning

Painter-Wakefield, Christopher Robert January 2013 (has links)
<p>A key component of many reinforcement learning (RL) algorithms is the approximation of the value function. The design and selection of features for approximation in RL is crucial, and an ongoing area of research. One approach to the problem of feature selection is to apply sparsity-inducing techniques in learning the value function approximation; such sparse methods tend to select relevant features and ignore irrelevant features, thus automating the feature selection process. This dissertation describes three contributions in the area of sparse value function approximation for reinforcement learning.</p><p>One method for obtaining sparse linear approximations is the inclusion in the objective function of a penalty on the sum of the absolute values of the approximation weights. This <italic>L<sub>1</sub></italic> regularization approach was first applied to temporal difference learning in the LARS-inspired, batch learning algorithm LARS-TD. In our first contribution, we define an iterative update equation which has as its fixed point the <italic>L<sub>1</sub></italic> regularized linear fixed point of LARS-TD. The iterative update gives rise naturally to an online stochastic approximation algorithm. We prove convergence of the online algorithm and show that the <italic>L<sub>1</sub></italic> regularized linear fixed point is an equilibrium fixed point of the algorithm. We demonstrate the ability of the algorithm to converge to the fixed point, yielding a sparse solution with modestly better performance than unregularized linear temporal difference learning.</p><p>Our second contribution extends LARS-TD to integrate policy optimization with sparse value learning. We extend the <italic>L<sub>1</sub></italic> regularized linear fixed point to include a maximum over policies, defining a new, "greedy" fixed point. The greedy fixed point adds a new invariant to the set which LARS-TD maintains as it traverses its homotopy path, giving rise to a new algorithm integrating sparse value learning and optimization. The new algorithm is demonstrated to be similar in performance with policy iteration using LARS-TD.</p><p>Finally, we consider another approach to sparse learning, that of using a simple algorithm that greedily adds new features. Such algorithms have many of the good properties of the <italic>L<sub>1</sub></italic> regularization methods, while also being extremely efficient and, in some cases, allowing theoretical guarantees on recovery of the true form of a sparse target function from sampled data. We consider variants of orthogonal matching pursuit (OMP) applied to RL. The resulting algorithms are analyzed and compared experimentally with existing <italic>L<sub>1</sub></italic> regularized approaches. We demonstrate that perhaps the most natural scenario in which one might hope to achieve sparse recovery fails; however, one variant provides promising theoretical guarantees under certain assumptions on the feature dictionary while another variant empirically outperforms prior methods both in approximation accuracy and efficiency on several benchmark problems.</p> / Dissertation
338

Parallels between Gambling and Amphetamine Reinforcement in Pathological Gamblers and Healthy Controls and the Role of Sensitization

Chugani, Bindiya 21 March 2012 (has links)
Pathological gambling is a serious disorder with lifetime prevalence between 1.1-3.5%. Evidence suggests commonalities in the neurochemical basis of pathological gambling and psychostimulant addiction. However, parallel effects of gambling and a stimulant drug have not been assessed in the same subjects. This study employed a cross-priming strategy in which 12 male pathological gamblers and 11 male controls were exposed to a 15-minute slot machine game and d-amphetamine (0.4 mg/kg). Subjective, cognitive, electrophysiological, and physiological responses were assessed. Gamblers reported greater desire to gamble after both reinforcers, when baseline motivation was controlled. Conversely, gamblers exhibited diminished cardiovascular response to amphetamine. Gamblers also exhibited decreased pre-pulse inhibition (impaired sensorimotor gating), and deficits on this index predicted greater post-amphetamine desire to gamble and decreased heart rate response to the dose. Results are consistent with possible dopaminergic sensitization in pathological gamblers, but also suggest that central noradrenergic receptor deficits contribute importantly to these effects.
339

Parallels between Gambling and Amphetamine Reinforcement in Pathological Gamblers and Healthy Controls and the Role of Sensitization

Chugani, Bindiya 21 March 2012 (has links)
Pathological gambling is a serious disorder with lifetime prevalence between 1.1-3.5%. Evidence suggests commonalities in the neurochemical basis of pathological gambling and psychostimulant addiction. However, parallel effects of gambling and a stimulant drug have not been assessed in the same subjects. This study employed a cross-priming strategy in which 12 male pathological gamblers and 11 male controls were exposed to a 15-minute slot machine game and d-amphetamine (0.4 mg/kg). Subjective, cognitive, electrophysiological, and physiological responses were assessed. Gamblers reported greater desire to gamble after both reinforcers, when baseline motivation was controlled. Conversely, gamblers exhibited diminished cardiovascular response to amphetamine. Gamblers also exhibited decreased pre-pulse inhibition (impaired sensorimotor gating), and deficits on this index predicted greater post-amphetamine desire to gamble and decreased heart rate response to the dose. Results are consistent with possible dopaminergic sensitization in pathological gamblers, but also suggest that central noradrenergic receptor deficits contribute importantly to these effects.
340

The effects of sensory awareness training on self-actualization in a personal growth group

Barrick, Glen Anthony 03 June 2011 (has links)
The purpose of this study was to investigate the effects of sensory awareness training on self-actualization in a personal growth group. The null hypothesis pertained to the differences in self-actualization between treatment and control groups as measured by the Inner Directedness Scale of the Personal Orientation Inventory.The subjects were undergraduate students from a Midwest university who volunteered to participate in a Personal Growth Group. Based on their time availability, a.m. or p.m., the sample of 116 subjects was randomly assigned to four treatment, four control, or two reserve groups, so as to maintain proportional samples of females and males. Because of attrition prior to the group experience, some reserve subjects were randomly assigned to some treatment and control groups so that the final sample was composed of 88 subjects (57 females and 31 males). Forty-four of these subjects experienced one of four treatment groups (10 or 12 subjects per group) and the other 44 subjects experienced one of four control groups (10 or 12 subjects per group).Both experimental and control groups were one and one-half hour long personal growth groups designed to develop human potential, increase awareness of self and others, and to increase skills in interpersonal relationships. The difference between the groups was that the treatment groups received instructions which stressed, emphasized, and sought to stimulate aspects of sensory awareness, while the control group instructions minimized sensory awareness experiences.Immediately following the group session, all subjects were administered the Personal Orientation Inventory. The instruments were scored and the differences between the average raw scores of the Inner Directedness Scale of the POI were subjected to a univariate analysis of variance, with the differences considered significant at the .05 level. Preliminary to testing the null hypothesis, all other main effects had been controlled and computed F value for effect due to interaction between groups and sex (F = .329, p< .568) was not significant.control groups as measured by the I Scale of the POI. Therefore, it is concluded that, using this one and one-half hour scripted personal growth group approach, the sensory awareness. training did riot produce a significant positive change in self-actualization as measured by the aforementionedThe computed F value for the group effect (F = 1.273, p < .263) was not significant. Therefore, the null hypothesis—There will be no significant difference between the experimental and control group subjects’ average raw scores on the Inner Directedness Scale of the Personal Orientation Inventory, controlling for any effects due to fascilitator, time, and sex—was not rejected.Analysis of the data indicated that there was no significant difference in self-actualization between the experimental and control groups as measured by the I Scale of the POI. Therefore, it is concluded that, using this one and one-half hour scripted personal growth group approach, the sensory awareness training did not produce a significant positive change in self-actualization as measured by the aforementioned instrument scale. Use of the pre-structured script disallowed flexibility of sensory awareness training activities. The group members had to "flow" with the script, rather than the script "flow" to meet the needs of the group.Data was also collected through subjects completing a questionnaire concerning their reactions to the group experience. These secondary data were descriptive in nature and were not treated statistically. An analysis of these data indicated that both experimental and control group subjects valued the growth group experience and expanded their human potentials, especially in the areas of self and other awareness. Finally, these data indicated there may have been a lack of process difference between the experimental and control groups. Specifically, some aspects of sensory awareness training might have been reduced further in the control groups.

Page generated in 0.063 seconds