Spelling suggestions: "subject:"reinforcement."" "subject:"einforcement.""
341 |
Contingency contracting as an adjunct to group counseling with substance abusers in the natural settingMahan, Dorothea B. 03 June 2011 (has links)
The main purpose of this study was to examine, in the natural environment, the relative effects of positive reinforcement and response cost as an adjunct to traditional group counseling in the treatment of substance abusers. While these procedures have been repeatedly reported to be effective in controlled settings, little evidence exists that the results generalize to the natural setting. Further, there is a dearth of research which compares contingency contracting to other modalities in the natural setting. Therefore, a second purpose of this research was to compare the effects of contingency contracting as an adjunct to traditional group counseling versus traditional group counseling alone.Subjects for this study were 45 male enlisted soldiers who were diagnosed as alcohol or drug abusers and were enrolled in an Army Community Drug and Alcohol Assistance Center (CDAAC). Of the subjects, 25 were alcohol abusers and 20 were drug abusers. The mean age was 23 years and the median rank was E4. They were randomly assigned to one of the three treatment conditions.The counselors were six paraprofessional military members of the CDAAC staff. They were given five one-hour training sessions by the experimenter on the use of contingency contracting and reinforcement procedures. They were then randomly assigned to the treatment conditions. All subjects received traditional group counseling. Additionally, subjects in Treatment Condition1 received tokens for, carrying out the contingencies of a two-part weekly contract. Subjects in Treatment Condition 2 received the total possible number of tokens at the onset of treatment and forfeited tokens each week if the contingencies of the contract were not met. Tokens were exchanged at the end of treatment for rewards previously negotiated with each subject. Subjectsin Treatment Condition 3 did no contracting and received no tokens.The dependent variables in this study were the subject's level of depression and hostility. These were measured by the Self-Rating Depression Scale and the Buss-Durkee Inventory, respectively. A counterbalanced pretest-posttest design was used. The instruments were administered in a classroom in the CDAAC to all subjects prior to the first group session and again after the sixth session. The posttest instruments were administered in the reverse order from the pretest.The statistical analyses were accomplished using a one-way multivariate analysis of variance (MANOVA). The analysis of the data revealed no statistical differences between contingency contracting with positive reinforcement or contingency contracting with response cost. Further, there were no differences between contingency contracting as an adjunct to traditional group counseling and group counseling alone.The failure to find significant differences between the groups suggests that contingency contracting may not be a viable therapeutic tool in out-patient settings where the counselor does not have control over all potential reinforcers or where the clients may not have a substantial investment in the reinforcement. If the technique is only successful with highly motivated, voluntary clients, it may be no more effective than the contingencies implicit in other counseling relationships. If the effects of in-patient token economies do not generalize to the natural setting and if these procedures require unrealisitic controls when administered in out-patient settings, the previously reported positive results may have little practical value. Further research should be conducted which compares the effects of contingency contracting to other treatment modalities.
|
342 |
A Framework for Aggregation of Multiple Reinforcement Learning AlgorithmsJiang, Ju January 2007 (has links)
Aggregation of multiple Reinforcement Learning (RL) algorithms is a new and effective technique to improve the quality of Sequential Decision Making (SDM). The quality of a SDM depends on long-term rewards rather than the instant rewards. RL methods are often adopted to deal with SDM problems.
Although many RL algorithms have been developed, none is consistently better than the others. In addition, the parameters of RL algorithms significantly influence learning performances. There is no universal rule to guide the choice of algorithms and the setting of parameters. To handle this difficulty, a new multiple RL system - Aggregated Multiple Reinforcement Learning System (AMRLS) is developed. In AMRLS, each RL algorithm (learner) learns individually in a learning module and provides its output to an intelligent aggregation module. The aggregation module dynamically aggregates these outputs and provides a final decision. Then, all learners take the action and update their policies individually. The two processes are performed alternatively. AMRLS can deal with dynamic learning problems without the need to search for the optimal learning algorithm or the optimal values of learning parameters. It is claimed that several complementary learning algorithms can be integrated in AMRLS to improve the learning performance in terms of success rate, robustness, confidence, redundance, and complementariness.
There are two strategies for learning an optimal policy with RL methods. One is based on Value Function Learning (VFL), which learns an optimal policy expressed as a value function. The Temporal Difference RL (TDRL) methods are examples of this strategy. The other is based on Direct Policy Search (DPS), which directly searches for the optimal policy in the potential policy space. The Genetic Algorithms (GAs)-based RL (GARL) are instances of this strategy. A hybrid learning architecture of GARL and TDRL, HGATDRL, is proposed to combine them together to improve the learning ability.
AMRLS and HGATDRL are tested on several SDM problems, including the maze world problem, pursuit domain problem, cart-pole balancing system, mountain car problem, and flight control system. Experimental results show that the proposed framework and method can enhance the learning ability and improve learning performance of a multiple RL system.
|
343 |
Reinforcement Learning in Keepaway Framework for RoboCup Simulation LeagueLi, Wei January 2011 (has links)
This thesis aims to apply the reinforcement learning into soccer robot and show the great power of reinforcement learning for the RoboCup. In the first part, the background of reinforcement learning is briefly introduced before showing the previous work on it. Therefore the difficulty in implementing reinforcement learning is proposed. The second section demonstrates basic concepts in reinforcement learning, including three fundamental elements, state, action and reward respectively, and three classical approaches, dynamic programming, monte carlo methods and temporal-difference learning respectively. When it comes to keepaway framework, more explanations are given to further combine keepaway with reinforcement learning. After the suggestion about sarsa algorithm with two function approximation, artificial neural network and tile coding, it is implemented successfully during the simulations. The results show it significantly improves the performance of soccer robot.
|
344 |
A Framework for Aggregation of Multiple Reinforcement Learning AlgorithmsJiang, Ju January 2007 (has links)
Aggregation of multiple Reinforcement Learning (RL) algorithms is a new and effective technique to improve the quality of Sequential Decision Making (SDM). The quality of a SDM depends on long-term rewards rather than the instant rewards. RL methods are often adopted to deal with SDM problems.
Although many RL algorithms have been developed, none is consistently better than the others. In addition, the parameters of RL algorithms significantly influence learning performances. There is no universal rule to guide the choice of algorithms and the setting of parameters. To handle this difficulty, a new multiple RL system - Aggregated Multiple Reinforcement Learning System (AMRLS) is developed. In AMRLS, each RL algorithm (learner) learns individually in a learning module and provides its output to an intelligent aggregation module. The aggregation module dynamically aggregates these outputs and provides a final decision. Then, all learners take the action and update their policies individually. The two processes are performed alternatively. AMRLS can deal with dynamic learning problems without the need to search for the optimal learning algorithm or the optimal values of learning parameters. It is claimed that several complementary learning algorithms can be integrated in AMRLS to improve the learning performance in terms of success rate, robustness, confidence, redundance, and complementariness.
There are two strategies for learning an optimal policy with RL methods. One is based on Value Function Learning (VFL), which learns an optimal policy expressed as a value function. The Temporal Difference RL (TDRL) methods are examples of this strategy. The other is based on Direct Policy Search (DPS), which directly searches for the optimal policy in the potential policy space. The Genetic Algorithms (GAs)-based RL (GARL) are instances of this strategy. A hybrid learning architecture of GARL and TDRL, HGATDRL, is proposed to combine them together to improve the learning ability.
AMRLS and HGATDRL are tested on several SDM problems, including the maze world problem, pursuit domain problem, cart-pole balancing system, mountain car problem, and flight control system. Experimental results show that the proposed framework and method can enhance the learning ability and improve learning performance of a multiple RL system.
|
345 |
The Characteristics and Neural Substrates of Feedback-based Decision Process in Recognition MemoryHan, Sanghoon 10 April 2008 (has links)
The judgment of prior stimulus occurrence, generally referred to as item recognition, is perhaps the most heavily studied of all memory skills. A skilled recognition observer not only recovers high fidelity memory evidence, he or she is also able to flexibly modify how much evidence is required for affirmative responding (the decision criterion) depending upon whether the context calls for a cautious or liberal task approach. The ability to adaptively adjust the decision criterion is a relatively understudied recognition skill, and the goal of this thesis is to examine reinforcement learning mechanisms contributing to recognition criterion adaptability. In Chapter 1, I review a measurement model whose theoretical framework has been successfully applied to recognition memory research (i.e., Signal Detection Theory). I also review major findings in the recognition literature examining the adaptive flexibility of criteria. Chapter 2 reports behavioral experiments that examine the sensitivity of decision criteria to trial-by-trial feedback by manipulating feedback validity in a potentially covert manner. Chapter 3 presents another series of behavioral experiments that used even subtler feedback manipulations based on predictions from reinforcement learning and category learning literatures. The findings suggested that feedback induced criterion shifts may rely upon procedural learning mechanisms that are largely implicit. The data also revealed that the magnitudes of induced criterion shifts were significantly correlated with personality measures linked to reward seeking outside the laboratory. In Chapter 4 functional magnetic resonance imaging (fMRI) was used to explore possible neurobiological links between brain regions traditionally linked to reinforcement processing, and recognition decisions. Prominent activations in striatum tracked the intrinsic goals of the subjects with greater activation for correct responding to old items compared to correct responding to new items during standard recognition testing. Furthermore, the pattern was amplified and reversed by the addition of extrinsic rewards. Finally, activation in ventral striatum tracked individual differences in personality reward seeking measures. Together, the findings further support the idea that a reinforcement learning system contributes to recognition decision-making. In the final chapter, I review the main implications arising from the research and suggest future research that could bolster the current results and implications. / Dissertation
|
346 |
Learning from Observation Using PrimitivesBentivegna, Darrin Charles 13 July 2004 (has links)
Learning without any prior knowledge in environments that contain large or continuous state spaces is a daunting task. For robots that operate in the real world, learning must occur in a reasonable amount of time. Providing a robot with domain knowledge and also with the ability to learn from watching others can greatly increase its learning rate. This research explores learning algorithms that can learn quickly and make the most use of information obtained from observing others. Domain knowledge is encoded in the form of primitives, small parts of a task that are executed many times while a task is being performed. This thesis explores and presents many challenges involved in programming robots to learn and adapt to environments that humans operate in.
A "Learning from Observation Using Primitives" framework has been created that provides the means to observe primitives as they are performed by others. This information is used by the robot in a three level process as it performs in the environment. In the first level the robot chooses a primitive to use for the observed state. The second level decides on the manner in which the chosen primitive will be performed. This information is then used in the third level to control the robot as necessary to perform the desired action. The framework also provides a means for the robot to observe and evaluate its own actions as it performs in the environment which allows the robot to increase its performance of selecting and performing the primitives.
The framework and algorithms have been evaluated on two testbeds: Air Hockey and Marble Maze. The tasks are done both by actual robots and in simulation. Our robots have the ability to observe humans as they operate in these environments. The software version of Air Hockey allows a human to play against a cyber player and the hardware version allows the human to play against a 30 degree-of-freedom humanoid robot. The implementation of our learning system in these tasks helps to clearly present many issues involved in having robots learn and perform in dynamic environments.
|
347 |
Scaling reinforcement learning to the unconstrained multi-agent domainPalmer, Victor 02 June 2009 (has links)
Reinforcement learning is a machine learning technique designed to mimic the
way animals learn by receiving rewards and punishment. It is designed to train
intelligent agents when very little is known about the agent’s environment, and consequently
the agent’s designer is unable to hand-craft an appropriate policy. Using
reinforcement learning, the agent’s designer can merely give reward to the agent when
it does something right, and the algorithm will craft an appropriate policy automatically.
In many situations it is desirable to use this technique to train systems of agents
(for example, to train robots to play RoboCup soccer in a coordinated fashion). Unfortunately,
several significant computational issues occur when using this technique
to train systems of agents. This dissertation introduces a suite of techniques that
overcome many of these difficulties in various common situations.
First, we show how multi-agent reinforcement learning can be made more tractable
by forming coalitions out of the agents, and training each coalition separately. Coalitions
are formed by using information-theoretic techniques, and we find that by using
a coalition-based approach, the computational complexity of reinforcement-learning
can be made linear in the total system agent count. Next we look at ways to integrate
domain knowledge into the reinforcement learning process, and how this can signifi-cantly improve the policy quality in multi-agent situations. Specifically, we find that
integrating domain knowledge into a reinforcement learning process can overcome training data deficiencies and allow the learner to converge to acceptable solutions
when lack of training data would have prevented such convergence without domain
knowledge. We then show how to train policies over continuous action spaces, which
can reduce problem complexity for domains that require continuous action spaces
(analog controllers) by eliminating the need to finely discretize the action space. Finally,
we look at ways to perform reinforcement learning on modern GPUs and show
how by doing this we can tackle significantly larger problems. We find that by offloading
some of the RL computation to the GPU, we can achieve almost a 4.5 speedup
factor in the total training process.
|
348 |
Optimal Control of Perimeter Patrol Using Reinforcement LearningWalton, Zachary 2011 May 1900 (has links)
Unmanned Aerial Vehicles (UAVs) are being used more frequently in surveillance scenarios for both civilian and military applications. One such application addresses
a UAV patrolling a perimeter, where certain stations can receive alerts at random intervals. Once the UAV arrives at an alert site it can take two actions:
1. Loiter and gain information about the site.
2. Move on around the perimeter.
The information that is gained is transmitted to an operator to allow him to classify the alert. The information is a function of the amount of time the UAV is at the alert site, also called the dwell time, and the maximum delay. The goal of the optimization is to classify the alert so as to maximize the expected discounted information gained by the UAV's actions at a station about an alert. This optimization problem can be readily solved using Dynamic Programming. Even though this approach generates feasible solutions, there are reasons to experiment with different approaches. A
complication for Dynamic Programming arises when the perimeter patrol problem is expanded. This is that the number of states increases rapidly when one adds additional stations, nodes, or UAVs to the perimeter. This in effect greatly increases the computation time making the determination of the solution intractable. The following attempts to alleviate this problem by implementing a Reinforcement Learning technique to obtain the optimal solution, more specifically Q-Learning. Reinforcement Learning is a simulation-based version of Dynamic Programming and requires lesser information to compute sub-optimal solutions. The effectiveness of the policies generated using Reinforcement Learning for the perimeter patrol problem have been corroborated numerically in this thesis.
|
349 |
Immediate and subsequent effects of fixed-time delivery of therapist attention on problem behavior maintained by attentionWalker, Stephen Frank. Smith, Richard G., January 2009 (has links)
Thesis (M.S.)--University of North Texas, Aug., 2009. / Title from title page display. Includes bibliographical references.
|
350 |
Examining conjugate reinforcement /MacAleese, Kenneth R. January 2008 (has links)
Thesis (Ph. D.)--University of Nevada, Reno, 2008. / "December, 2008." Includes bibliographical references (leaves 55-64). Library also has microfilm. Ann Arbor, Mich. : ProQuest Information and Learning Company, [2009]. 1 microfilm reel ; 35 mm. Online version available on the World Wide Web.
|
Page generated in 0.0929 seconds