Global ETD Search

331	Subsitution of stimulus functions as a means to distinguish among different types of functional classes / Delgado, Diana January 2005 (has links) Thesis (M.A.)--University of Nevada, Reno, 2005. / "May, 2005." Includes bibliographical references (leaves 47-49). Online version available on the World Wide Web. Library also has microfilm. Ann Arbor, Mich. : ProQuest Information and Learning Company, [2005]. 1 microfilm reel ; 35 mm.
332	Human responding on a fixed interval schedule : effects of manipulations on the response, the consequence, and the establishing operation / Johnston, Michael R. January 2005 (has links) Thesis (Ph. D.)--University of Nevada, Reno, 2005. / "May, 2005." Includes bibliographical references (leaves 111-117). Online version available on the World Wide Web. Library also has microfilm. Ann Arbor, Mich. : ProQuest Information and Learning Company, [2005]. 1 microfilm reel ; 35 mm.
333	Closed-Loop Learning of Visual Control Policies Jodogne, Sébastien 05 December 2006 (has links) In this dissertation, I introduce a general, flexible framework for learning direct mappings from images to actions in an agent that interacts with its surrounding environment. This work is motivated by the paradigm of purposive vision. The original contributions consist in the design of reinforcement learning algorithms that are applicable to visual spaces. Inspired by the paradigm of local-appearance vision, these algorithms exploit specialized visual features that can be detected in the visual signal. Two different ways to use the visual features are described. Firstly, I introduce adaptive-resolution methods for discretizing the visual space into a manageable number of perceptual classes. To this end, a percept classifier that tests the presence or absence of few highly informative visual features is incrementally refined. New discriminant visual features are selected in a sequence of attempts to remove perceptual aliasing. Any standard reinforcement learning algorithm can then be used to extract an optimal visual control policy. The resulting algorithm is called "Reinforcement Learning of Visual Classes." Secondly, I propose to exploit the raw content of the visual features, without ever considering an equivalence relation on the visual feature space. Technically, feature regression models that associate visual features with a real-valued utility are introduced within the Approximate Policy Iteration architecture. This is done by means of a general, abstract version of Approximate Policy Iteration. This results in the "Visual Approximate Policy Iteration" algorithm. Another major contribution of this dissertation is the design of adaptive-resolution techniques that can be applied to complex, high-dimensional and/or continuous action spaces, simultaneously to visual spaces. The "Reinforcement Learning of Joint Classes" algorithm produces a non-uniform discretization of the joint space of percepts and actions. This is a brand new, general approach to adaptive-resolution methods in reinforcement learning that can deal with arbitrary, hybrid state-action spaces. Throughout this dissertation, emphasis is also put on the design of general algorithms that can be used in non-visual (e.g. continuous) perceptual spaces. The applicability of the proposed algorithms is demonstrated by solving several visual navigation tasks. Visual features Reinforcement learning Purposive vision
334	Constructing Neuro-Fuzzy Control Systems Based on Reinforcement Learning Scheme Pei, Shan-cheng 10 September 2007 (has links) Traditionally, the fuzzy rules for a fuzzy controller are provided by experts. They cannot be trained from a set of input-output training examples because the correct response of the plant being controlled is delayed and cannot be obtained immediately. In this paper, we propose a novel approach to construct fuzzy rules for a fuzzy controller based on reinforcement learning. Our task is to learn from the delayed reward to choose sequences of actions that result in the best control. A neural network with delays is used to model the evaluation function Q. Fuzzy rules are constructed and added as the learning proceeds. Both the weights of the Q-learning network and the parameters of the fuzzy rules are tuned by gradient descent. Experimental results have shown that the fuzzy rules obtained perform effectively for control. neuro-fuzzy reinforcement learning control system
335	Sparse Value Function Approximation for Reinforcement Learning Painter-Wakefield, Christopher Robert January 2013 (has links) <p>A key component of many reinforcement learning (RL) algorithms is the approximation of the value function. The design and selection of features for approximation in RL is crucial, and an ongoing area of research. One approach to the problem of feature selection is to apply sparsity-inducing techniques in learning the value function approximation; such sparse methods tend to select relevant features and ignore irrelevant features, thus automating the feature selection process. This dissertation describes three contributions in the area of sparse value function approximation for reinforcement learning.</p><p>One method for obtaining sparse linear approximations is the inclusion in the objective function of a penalty on the sum of the absolute values of the approximation weights. This <italic>L<sub>1</sub></italic> regularization approach was first applied to temporal difference learning in the LARS-inspired, batch learning algorithm LARS-TD. In our first contribution, we define an iterative update equation which has as its fixed point the <italic>L<sub>1</sub></italic> regularized linear fixed point of LARS-TD. The iterative update gives rise naturally to an online stochastic approximation algorithm. We prove convergence of the online algorithm and show that the <italic>L<sub>1</sub></italic> regularized linear fixed point is an equilibrium fixed point of the algorithm. We demonstrate the ability of the algorithm to converge to the fixed point, yielding a sparse solution with modestly better performance than unregularized linear temporal difference learning.</p><p>Our second contribution extends LARS-TD to integrate policy optimization with sparse value learning. We extend the <italic>L<sub>1</sub></italic> regularized linear fixed point to include a maximum over policies, defining a new, "greedy" fixed point. The greedy fixed point adds a new invariant to the set which LARS-TD maintains as it traverses its homotopy path, giving rise to a new algorithm integrating sparse value learning and optimization. The new algorithm is demonstrated to be similar in performance with policy iteration using LARS-TD.</p><p>Finally, we consider another approach to sparse learning, that of using a simple algorithm that greedily adds new features. Such algorithms have many of the good properties of the <italic>L<sub>1</sub></italic> regularization methods, while also being extremely efficient and, in some cases, allowing theoretical guarantees on recovery of the true form of a sparse target function from sampled data. We consider variants of orthogonal matching pursuit (OMP) applied to RL. The resulting algorithms are analyzed and compared experimentally with existing <italic>L<sub>1</sub></italic> regularized approaches. We demonstrate that perhaps the most natural scenario in which one might hope to achieve sparse recovery fails; however, one variant provides promising theoretical guarantees under certain assumptions on the feature dictionary while another variant empirically outperforms prior methods both in approximation accuracy and efficiency on several benchmark problems.</p> / Dissertation Computer science Feature selection Regularization Reinforcement learning
336	Parallels between Gambling and Amphetamine Reinforcement in Pathological Gamblers and Healthy Controls and the Role of Sensitization Chugani, Bindiya 21 March 2012 (has links) Pathological gambling is a serious disorder with lifetime prevalence between 1.1-3.5%. Evidence suggests commonalities in the neurochemical basis of pathological gambling and psychostimulant addiction. However, parallel effects of gambling and a stimulant drug have not been assessed in the same subjects. This study employed a cross-priming strategy in which 12 male pathological gamblers and 11 male controls were exposed to a 15-minute slot machine game and d-amphetamine (0.4 mg/kg). Subjective, cognitive, electrophysiological, and physiological responses were assessed. Gamblers reported greater desire to gamble after both reinforcers, when baseline motivation was controlled. Conversely, gamblers exhibited diminished cardiovascular response to amphetamine. Gamblers also exhibited decreased pre-pulse inhibition (impaired sensorimotor gating), and deficits on this index predicted greater post-amphetamine desire to gamble and decreased heart rate response to the dose. Results are consistent with possible dopaminergic sensitization in pathological gamblers, but also suggest that central noradrenergic receptor deficits contribute importantly to these effects. Gambling Sensitization Amphetamine Reinforcement Prepulse Inhibition 0419
337	Parallels between Gambling and Amphetamine Reinforcement in Pathological Gamblers and Healthy Controls and the Role of Sensitization Chugani, Bindiya 21 March 2012 (has links) Pathological gambling is a serious disorder with lifetime prevalence between 1.1-3.5%. Evidence suggests commonalities in the neurochemical basis of pathological gambling and psychostimulant addiction. However, parallel effects of gambling and a stimulant drug have not been assessed in the same subjects. This study employed a cross-priming strategy in which 12 male pathological gamblers and 11 male controls were exposed to a 15-minute slot machine game and d-amphetamine (0.4 mg/kg). Subjective, cognitive, electrophysiological, and physiological responses were assessed. Gamblers reported greater desire to gamble after both reinforcers, when baseline motivation was controlled. Conversely, gamblers exhibited diminished cardiovascular response to amphetamine. Gamblers also exhibited decreased pre-pulse inhibition (impaired sensorimotor gating), and deficits on this index predicted greater post-amphetamine desire to gamble and decreased heart rate response to the dose. Results are consistent with possible dopaminergic sensitization in pathological gamblers, but also suggest that central noradrenergic receptor deficits contribute importantly to these effects. Gambling Sensitization Amphetamine Reinforcement Prepulse Inhibition 0419
338	The effects of sensory awareness training on self-actualization in a personal growth group Barrick, Glen Anthony 03 June 2011 (has links) The purpose of this study was to investigate the effects of sensory awareness training on self-actualization in a personal growth group. The null hypothesis pertained to the differences in self-actualization between treatment and control groups as measured by the Inner Directedness Scale of the Personal Orientation Inventory.The subjects were undergraduate students from a Midwest university who volunteered to participate in a Personal Growth Group. Based on their time availability, a.m. or p.m., the sample of 116 subjects was randomly assigned to four treatment, four control, or two reserve groups, so as to maintain proportional samples of females and males. Because of attrition prior to the group experience, some reserve subjects were randomly assigned to some treatment and control groups so that the final sample was composed of 88 subjects (57 females and 31 males). Forty-four of these subjects experienced one of four treatment groups (10 or 12 subjects per group) and the other 44 subjects experienced one of four control groups (10 or 12 subjects per group).Both experimental and control groups were one and one-half hour long personal growth groups designed to develop human potential, increase awareness of self and others, and to increase skills in interpersonal relationships. The difference between the groups was that the treatment groups received instructions which stressed, emphasized, and sought to stimulate aspects of sensory awareness, while the control group instructions minimized sensory awareness experiences.Immediately following the group session, all subjects were administered the Personal Orientation Inventory. The instruments were scored and the differences between the average raw scores of the Inner Directedness Scale of the POI were subjected to a univariate analysis of variance, with the differences considered significant at the .05 level. Preliminary to testing the null hypothesis, all other main effects had been controlled and computed F value for effect due to interaction between groups and sex (F = .329, p< .568) was not significant.control groups as measured by the I Scale of the POI. Therefore, it is concluded that, using this one and one-half hour scripted personal growth group approach, the sensory awareness. training did riot produce a significant positive change in self-actualization as measured by the aforementionedThe computed F value for the group effect (F = 1.273, p < .263) was not significant. Therefore, the null hypothesis—There will be no significant difference between the experimental and control group subjects’ average raw scores on the Inner Directedness Scale of the Personal Orientation Inventory, controlling for any effects due to fascilitator, time, and sex—was not rejected.Analysis of the data indicated that there was no significant difference in self-actualization between the experimental and control groups as measured by the I Scale of the POI. Therefore, it is concluded that, using this one and one-half hour scripted personal growth group approach, the sensory awareness training did not produce a significant positive change in self-actualization as measured by the aforementioned instrument scale. Use of the pre-structured script disallowed flexibility of sensory awareness training activities. The group members had to "flow" with the script, rather than the script "flow" to meet the needs of the group.Data was also collected through subjects completing a questionnaire concerning their reactions to the group experience. These secondary data were descriptive in nature and were not treated statistically. An analysis of these data indicated that both experimental and control group subjects valued the growth group experience and expanded their human potentials, especially in the areas of self and other awareness. Finally, these data indicated there may have been a lack of process difference between the experimental and control groups. Specifically, some aspects of sensory awareness training might have been reduced further in the control groups. Self-actualization (Psychology) Sensory reinforcement. Group psychotherapy.
339	Contingency contracting as an adjunct to group counseling with substance abusers in the natural setting Mahan, Dorothea B. 03 June 2011 (has links) The main purpose of this study was to examine, in the natural environment, the relative effects of positive reinforcement and response cost as an adjunct to traditional group counseling in the treatment of substance abusers. While these procedures have been repeatedly reported to be effective in controlled settings, little evidence exists that the results generalize to the natural setting. Further, there is a dearth of research which compares contingency contracting to other modalities in the natural setting. Therefore, a second purpose of this research was to compare the effects of contingency contracting as an adjunct to traditional group counseling versus traditional group counseling alone.Subjects for this study were 45 male enlisted soldiers who were diagnosed as alcohol or drug abusers and were enrolled in an Army Community Drug and Alcohol Assistance Center (CDAAC). Of the subjects, 25 were alcohol abusers and 20 were drug abusers. The mean age was 23 years and the median rank was E4. They were randomly assigned to one of the three treatment conditions.The counselors were six paraprofessional military members of the CDAAC staff. They were given five one-hour training sessions by the experimenter on the use of contingency contracting and reinforcement procedures. They were then randomly assigned to the treatment conditions. All subjects received traditional group counseling. Additionally, subjects in Treatment Condition1 received tokens for, carrying out the contingencies of a two-part weekly contract. Subjects in Treatment Condition 2 received the total possible number of tokens at the onset of treatment and forfeited tokens each week if the contingencies of the contract were not met. Tokens were exchanged at the end of treatment for rewards previously negotiated with each subject. Subjectsin Treatment Condition 3 did no contracting and received no tokens.The dependent variables in this study were the subject's level of depression and hostility. These were measured by the Self-Rating Depression Scale and the Buss-Durkee Inventory, respectively. A counterbalanced pretest-posttest design was used. The instruments were administered in a classroom in the CDAAC to all subjects prior to the first group session and again after the sixth session. The posttest instruments were administered in the reverse order from the pretest.The statistical analyses were accomplished using a one-way multivariate analysis of variance (MANOVA). The analysis of the data revealed no statistical differences between contingency contracting with positive reinforcement or contingency contracting with response cost. Further, there were no differences between contingency contracting as an adjunct to traditional group counseling and group counseling alone.The failure to find significant differences between the groups suggests that contingency contracting may not be a viable therapeutic tool in out-patient settings where the counselor does not have control over all potential reinforcers or where the clients may not have a substantial investment in the reinforcement. If the technique is only successful with highly motivated, voluntary clients, it may be no more effective than the contingencies implicit in other counseling relationships. If the effects of in-patient token economies do not generalize to the natural setting and if these procedures require unrealisitic controls when administered in out-patient settings, the previously reported positive results may have little practical value. Further research should be conducted which compares the effects of contingency contracting to other treatment modalities. Drug addicts -- Rehabilitation. Alcoholism counseling. Reinforcement (Psychology)
340	A Framework for Aggregation of Multiple Reinforcement Learning Algorithms Jiang, Ju January 2007 (has links) Aggregation of multiple Reinforcement Learning (RL) algorithms is a new and effective technique to improve the quality of Sequential Decision Making (SDM). The quality of a SDM depends on long-term rewards rather than the instant rewards. RL methods are often adopted to deal with SDM problems. Although many RL algorithms have been developed, none is consistently better than the others. In addition, the parameters of RL algorithms significantly influence learning performances. There is no universal rule to guide the choice of algorithms and the setting of parameters. To handle this difficulty, a new multiple RL system - Aggregated Multiple Reinforcement Learning System (AMRLS) is developed. In AMRLS, each RL algorithm (learner) learns individually in a learning module and provides its output to an intelligent aggregation module. The aggregation module dynamically aggregates these outputs and provides a final decision. Then, all learners take the action and update their policies individually. The two processes are performed alternatively. AMRLS can deal with dynamic learning problems without the need to search for the optimal learning algorithm or the optimal values of learning parameters. It is claimed that several complementary learning algorithms can be integrated in AMRLS to improve the learning performance in terms of success rate, robustness, confidence, redundance, and complementariness. There are two strategies for learning an optimal policy with RL methods. One is based on Value Function Learning (VFL), which learns an optimal policy expressed as a value function. The Temporal Difference RL (TDRL) methods are examples of this strategy. The other is based on Direct Policy Search (DPS), which directly searches for the optimal policy in the potential policy space. The Genetic Algorithms (GAs)-based RL (GARL) are instances of this strategy. A hybrid learning architecture of GARL and TDRL, HGATDRL, is proposed to combine them together to improve the learning ability. AMRLS and HGATDRL are tested on several SDM problems, including the maze world problem, pursuit domain problem, cart-pole balancing system, mountain car problem, and flight control system. Experimental results show that the proposed framework and method can enhance the learning ability and improve learning performance of a multiple RL system. reinforcement learning aggregation Electrical and Computer Engineering

Search results