191 |
Nonparametric Inverse Reinforcement Learning and Approximate Optimal Control with Temporal Logic TasksPerundurai Rajasekaran, Siddharthan 30 August 2017 (has links)
"This thesis focuses on two key problems in reinforcement learning: How to design reward functions to obtain intended behaviors in autonomous systems using the learning-based control? Given complex mission specification, how to shape the reward function to achieve fast convergence and reduce sample complexity while learning the optimal policy? To answer these questions, the first part of this thesis investigates inverse reinforcement learning (IRL) method with a purpose of learning a reward function from expert demonstrations. However, existing algorithms often assume that the expert demonstrations are generated by the same reward function. Such an assumption may be invalid as one may need to aggregate data from multiple experts to obtain a sufficient set of demonstrations. In the first and the major part of the thesis, we develop a novel method, called Non-parametric Behavior Clustering IRL. This algorithm allows one to simultaneously cluster behaviors while learning their reward functions from demonstrations that are generated from more than one expert/behavior. Our approach is built upon the expectation-maximization formulation and non-parametric clustering in the IRL setting. We apply the algorithm to learn, from driving demonstrations, multiple driver behaviors (e.g., aggressive vs. evasive driving behaviors). In the second task, we study whether reinforcement learning can be used to generate complex behaviors specified in formal logic — Linear Temporal Logic (LTL). Such LTL tasks may specify temporally extended goals, safety, surveillance, and reactive behaviors in a dynamic environment. We introduce reward shaping under LTL constraints to improve the rate of convergence in learning the optimal and probably correct policies. Our approach exploits the relation between reward shaping and actor-critic methods for speeding up the convergence and, as a consequence, reducing training samples. We integrate compositional reasoning in formal methods with actor-critic reinforcement learning algorithms to initialize a heuristic value function for reward shaping. This initialization can direct the agent towards efficient planning subject to more complex behavior specifications in LTL. The investigation takes the initial step to integrate machine learning with formal methods and contributes to building highly autonomous and self-adaptive robots under complex missions."
|
192 |
Partial Reinforcement in Frontalis Electromyographic TrainingCapriotti, Richard 12 1900 (has links)
This study investigated the role of reinforcement schedule and instructional set in frontalis EMG training. The experiment consisted of four groups participating in 30 minute sessions on three consecutive days. Group conditions were intermittent feedback (alternating 100 second trials), continuous feedback, motivated control and no-treatment control. Excepting the no-treatment controls, each subject was instructed that extra credit points were available contingent on the number of seconds in criterion. An individual criterion based on each subject's initial baseline microvolt level was utilized.
|
193 |
Using the Class Pass Intervention (CPI) for Children with Disruptive BehaviorAndreu, Madison 29 June 2016 (has links)
The Class Pass Intervention (CPI) is designed for students who engage in escape-motivated problem behavior to avoid or escape difficult or aversive academic work and who are not responsive to the system-wide universal supports provided to all students. Research on the CPI is in its initial stages and requires replications to be proven effective in multiple settings and become evidenced-based. Therefore, the purpose of the study was to expand the literature on CPI by targeting elementary school students and assess its impact on decreasing disruptive behavior maintained by attention and on increasing academic engagement. The study involved 4 students with disruptive classroom behavior and low academic engagement and their 2 classroom teachers. A multiple-baseline design across participants was used to demonstrate the intervention outcomes. The intervention was implemented during a targeted routine or academic time period when behavior was most likely to occur. Results indicated that teachers implemented the CPI with high levels of fidelity, and their implementation was effective in increasing academic engagement and decreasing disruptive behavior with all participants. The intervention effects were maintained after undergoing fading for all 4 students and during 2-week follow-up for 2 students. The results of social validity assessments indicated students and teachers found the intervention to be acceptable and effective. Limitations and implications for future research are discussed.
|
194 |
Choices in Reinforcer DeliveryLaw, Sarah Ann 08 1900 (has links)
The current study consisted of two experiments, both of which were comparisons of choice conditions replicated across four participants. Four typically-developing pre-school children participated in this study. Experiment 1 evaluated participants' preference for choosing consequent stimuli prior to engaging in academic tasks (pre-session choice) versus choosing consequent stimuli each time criterion for reinforcement had been met within the session (within-session choice). In Experiment 2, preference for choice-making was evaluated when outcomes for both choice and no-choice conditions were identical. For two participants, results indicated strong preference for choice-making.
|
195 |
The Effects of Jackpots on Responding and Choice in Two Domestic DogsMuir, Kristy Lynn 05 1900 (has links)
The current study investigated the impact of delivering a jackpot on response rate and response allocation in two domestic dogs. For the purpose of this research, a jackpot was defined as a one-time, within-session increase in the magnitude of reinforcement. Two experiments were conducted to investigate the effects of delivering a jackpot in both single-operant and concurrent schedule procedures. Experiment 1 investigated the impact of a one-time, within-session increase in the magnitude of reinforcement on response rate in a single-operant procedure. Results of Experiment 1 showed no clear change in response rate after the delivery of the jackpot. Experiment 2 investigated the impact of a one-time, within-session increase in the magnitude of reinforcement on response allocation in a concurrent schedule procedure. Results of Experiment 2 showed an increase in response allocation to the jackpotted contingency in both subjects. These results suggest that a jackpot, as defined here, has no effect in single-operant procedures while having an effect in concurrent schedule procedures. These effects are similar to those reported in the magnitude of reinforcement literature.
|
196 |
Using Concurrent Schedules of Reinforcement to Decrease BehaviorPalmer, Ashlyn 12 1900 (has links)
We manipulated delay and magnitude of reinforcers in two concurrent schedules of reinforcement to decrease a prevalent behavior while increasing another behavior already in the participant's repertoire. The first experiment manipulated delay, implementing a five second delay between the behavior and delivery of reinforcement for a behavior targeted for decrease while no delay was implemented after the behavior targeted for increase. The second experiment manipulated magnitude, providing one piece of food for the behavior targeted for decrease while two pieces of food were provided for the behavior targeted for increase. The experiments used an ABAB reversal design. Results suggest that behavior can be decreased without the use of extinction when contingencies favor the desirable behavior.
|
197 |
Cost and Power Loss Aware Coalitions under Uncertainty in Transactive Energy SystemsSadeghi, Mohammad 02 June 2022 (has links)
The need to cope with the rapid transformation of the conventional electrical grid into the future smart grid, with multiple connected microgrids, has led to the investigation of optimal smart grid architectures. The main components of the future smart grids such as generators, substations, controllers, smart meters and collector nodes are evolving; however, truly effective integration of these elements into the microgrid context to guarantee intelligent and dynamic functionality across the whole smart grid remains an open issue. Energy trading is a significant part of this integration.
In microgrids, energy trading refers to the use of surplus energy in one microgrid to satisfy the demand of another microgrid or a group of microgrids that form a microgrid community. Different techniques are employed to manage the energy trading process such as optimization-based and conventional game-theoretical methods, which bring about several challenges including complexity, scalability and ability to learn dynamic environments. A common challenge among all of these methods is adapting to changing circumstances. Optimization methods, for example, show promising performance in static scenarios where the optimal solution is achieved for a specific snapshot of the system. However, to use such a technique in a dynamic environment, finding the optimal solutions for all the time slots is needed, which imposes a significant complexity. Challenges such as this can be best addressed using game theory techniques empowered with machine learning methods across grid infrastructure and microgrid communities.
In this thesis, novel Bayesian coalitional game theory-based and Bayesian reinforcement learning-based coalition formation algorithms are proposed, which allow the microgrids to exchange energy with their coalition members while minimizing the associated cost and power loss. In addition, a deep reinforcement learning scheme is developed to address the problem of large convergence time resulting from the sizeable state-action space of the methods mentioned above. The proposed algorithms can ideally overcome the uncertainty in the system. The advantages of the proposed methods are highlighted by comparing them with the conventional coalitional game theory-based techniques, Q-learning-based technique, random coalition formation, as well as with the case with no coalitions. The results show the superiority of the proposed methods in terms of power loss and cost minimization in dynamic environments.
|
198 |
The Relative Susceptibilities of Interresponse Times and Post-Reinforcement Pauses to Differential ReinforcementTrapp, Nancy L. 01 May 1987 (has links)
Post-reinforcement pauses (PRP) and interresponse times (IRTs) were examined to determine if these two temporal units changed in a similar fashion as a function of the delivery of differential reinforcement. Two experiments were conducted. In Experiment 1, four pigeons were exposed to a series of procedures in which PRP and IRT durations were gradually increased and then decreased. A fixed-ratio two (FR 2) differentiation schedule was used. Reinforcement was delivered if the PRP or IRT durations were greater than (PRP > and IRT > procedures) or less than (PRP < and IRT < procedures) specified temporal criteria. Criteria were gradually changed across procedures. Results showed that PRPs and IRTs changed in accordance with the differential reinforcement as specified by the various contingencies. When PRPs and IRTs were free to vary, the PRPs tended to change in a direction consistent with the IRT shaping contingencys whereas, the IRTs tended to shorten regardless of the PRP shaping contingency. In Experiment 2, two subjects were exposed to both an FR 2 and FR 1 schedule to determine if schedule size influenced the effects obtained on the differentiation procedures. PRPs were systematically changed using a differentiation procedure with a response requirement of either FR 1 or FR 2. Results showed similar changes in PRP durations between FR 1 and FR 2 differentiation procedures. An analysis of errors made on each shaping condition in both experiments was conducted to determine whether PRPs or IRTs were more susceptible to the differential reinforcement contingencies. Fewer errors were made on the PRP shaping conditions, indicating that PRPs were more easily changed. Implications for a comprehensive theory of reinforcement were discussed.
|
199 |
Exploration of the Primary Reinforcers and Behaviors that are Enhanced by Delta-9-tetrahydrocannabinol (THC) in Male and Female RatsWalston, Kynah B, Ahmed, Cristal, Palmatier, Matthew 25 April 2023 (has links)
Humans consume cannabis for the pharmacological effects mediated by the primary psychoactive cannabinoid, delta-9-tetrahydrocannabinol (THC). However, there is little evidence to suggest that THC acts as a primary reinforcer in non-human models because the drug alone does not support robust self-administration. We hypothesized that THC may have more potent reinforcement enhancing effects – meaning that THC may enhance the reinforcing effects of other non-drug rewards in a user’s environment. In the present experiments, we explore the effects of THC on operant responding for saccharin (SACC) or a visual stimulus (VS). In all experiments rats were shaped to respond for their assigned reinforcer. Drug challenge tests were conducted every 72 hours, rats were injected with the assigned dose of THC and responding for each reinforcer was measured. Our initial findings indicated possible sex differences between male and female rats – THC injections increased lever-pressing for SACC in male rats but not female rats. However, in follow-up experiments we used a different response (nose-key press instead of lever press) that facilitated operant responding in rats that were different sizes – adult males are significantly more massive than adult females. In that experiment THC enhanced nose-key presses for SACC in both male and female rats across a range of doses. Moreover, this latter experiment confirmed that the effect of THC was motivational in nature, THC injections increased effort to obtain SACC under a progressively increasing schedule of reinforcement (progressive ratio). Finally, using a third operant response (head entry into a receptacle) we demonstrated that THC increased reinforcement by the VS across a range of doses. The present studies indicate that THC acts as a reinforcement enhancer, increasing motivation in male and female rats to obtain both SACC and VS throughout a range of doses. By demonstrating that THC enhances the reinforcing effects of both gustatory and non-gustatory reinforcers, our evidence supports the hypothesis that THC’s effect on the brain facilitates incentive motivation regardless of sensory modality of the reinforcer.
|
200 |
Dynamic mechanical properties of cementitious composites with carbon nanotubesWang, J., Dong, S., Ashour, Ashraf, Wang, X., Han, B. 29 October 2019 (has links)
Yes / This paper studied the effect of different types of multi-walled carbon nanotubes (MWCNTs) on the dynamic mechanical properties of cementitious composites. Impact compression test was conducted on various specimens to obtain the dynamic stress-strain curves and dynamic compressive strength as well as deformation of cementitious composites. The dynamic impact toughness and impact dissipation energy were, then, estimated. Furthermore, the microscopic morphology of cementitious composites was identified by using the scanning electron microscope to show the reinforcing mechanisms of MWCNTs on cementitious composites. Experimental results show that all types of MWCNTs can increase the dynamic compressive strength and ultimate strain of the composite, but the dynamic peak strain of the composite presents deviations with the MWCNT incorporation. The composite with thick-short MWCNTs has a 100.8% increase in the impact toughness, and the composite with thin-long MWCNTs presents an increased dissipation energy up to 93.8%. MWCNTs with special structure or coating treatment have higher reinforcing effect to strength of the composite against untreated MWCNTs. The modifying mechanisms of MWCNTs on cementitious composite are mainly attributed to their nucleation and bridging effects, which prevent the micro-crack generation and delay the macro-crack propagation through increasing the energy consumption.
|
Page generated in 0.12 seconds