Spelling suggestions: "subject:"reinforcement."" "subject:"einforcement.""
561 |
Comparison of deep reinforcement learning algorithms in a self-play settingKumar, Sunil 30 August 2021 (has links)
In this exciting era of artificial intelligence and machine learning, the success of AlphaGo,
AlphaZero, and MuZero has generated a great interest in deep reinforcement
learning, especially under self-play settings. The methods used by AlphaZero are
finding their ways to be more useful than before in many different application areas,
such as clinical medicine, intelligent military command decision support systems, and
recommendation systems. While specific methods of reinforcement learning with selfplay
have found their place in application domains, there is much to be explored from
existing reinforcement learning methods not originally intended for self-play settings.
This thesis focuses on evaluating performance of existing reinforcement learning
techniques in self-play settings. In this research, we trained and evaluated the performance
of two deep reinforcement learning algorithms with self-play settings on game
environments, such as the games Connect Four and Chess.
We demonstrate how a simple on-policy, policy-based method, such as REINFORCE,
shows signs of learning, whereas an off-policy value-based method such as
Deep Q-Networks does not perform well with self-play settings in the selected environments.
The results show that REINFORCE agent wins 85% of the games after
training against a random baseline agent and 60% games against the greedy baseline
agent in the game Connect Four. The agent’s strength from both techniques was measured
and plotted against different baseline agents. We also investigate the impact
of selected significant hyper-parameters in the performance of the agents. Finally,
we provide our recommendation for these hyper-parameters’ values for training deep
reinforcement learning agents in similar environments. / Graduate
|
562 |
Single asset trading: a recurrent reinforcement learning approachNikolic, Marko January 2020 (has links)
Asset trading using machine learning has become popular within the financial industry in the recent years. This can for instance be seen in the large number of daily trading volume which are defined by an automatic algorithm. This thesis presents a recurrent reinforcement learning model to trade an asset. The benefits, drawdowns and the derivations of the model are presented. Different parameters of the model are calibrated and tuned considering a traditional division between training and testing data set and also with the help of nested cross validation. The results of the single asset trading model are compared to the benchmark strategy, which consists of buying the underlying asset and hold it for a long period of time regardless of the asset volatility. The proposed model outperforms the buy and hold strategy on three out of four stocks selected for the experiment. Additionally, returns of the model are sensitive to changes in epoch, m, learning rate and training/test ratio.
|
563 |
Comparison of Token Reinforcement and Monetary Reinforcement to Increase Steps in Adults with Intellectual Disabilities in a Group Home SettingHanashiro-Parson, Hana 22 March 2019 (has links)
As the obesity rate in America continues to rise, the levels of physical activity have persistently declined at a rapid pace across all age groups. This trend is demonstrated most significantly in individuals diagnosed with intellectual disabilities (ID). Due to the high obesity rate in individuals with ID, it is crucial to find an effective intervention to increase physical activity. The purpose of this study is to compare the effectiveness of token reinforcement and monetary reinforcement for increasing physical activity among adults with ID, to assess preference for token or monetary reinforcement, and to evaluate the effects of choice of reinforcement procedure on physical activity. An ABAB design with an alternating treatments design was used to compare the two conditions (token reinforcement and monetary reinforcement). In the second intervention phase, the participants chose between the two reinforcement conditions. Results showed that both reinforcement conditions increased physical activity.
|
564 |
Efficiency Comparison Between Curriculum Reinforcement Learning & Reinforcement Learning Using ML-Agents.Tabell Johnsson, Marco, Jafar, Ala January 2020 (has links)
No description available.
|
565 |
Training reinforcement learning model with custom OpenAI gym for IIoT scenarioNorman, Pontus January 2022 (has links)
Denna studie består av ett experiment för att se, som ett test, hur bra det skulle fungera att implementera en industriell gymmiljö för att träna en reinforcement learning modell. För att fastställa det här tränas modellen upprepade gånger och modellen testas. Om modellen lyckas lösa scenariot, som är en representation av miljön, räknas den träningsiterationen som en framgång. Tiden det tar att träna för ett visst antal spelavsnitt mäts. Antalet avsnitt det tar för reinforcement learning modellen att uppnå ett acceptabelt resultat på 80 % av maximal poäng mäts och tiden det tar att träna dessa avsnitt mäts. Dessa mätningar utvärderas och slutsatser dras om hur väl reinforcement learning modellerna fungerade. Verktygen som används är Q-learning algoritmen implementerad på egen hand och djup Q-learning med TensorFlow. Slutsatsen visade att den manuellt implementerade Q-learning algoritmen visade varierande resultat beroende på miljödesign och hur länge modellen tränades. Det gav både hög och låg framgångsfrekvens varierande från 100 % till 0 %. Och tiderna det tog att träna agenten till en acceptabel nivå var 0,116, 0,571 och 3,502 sekunder beroende på vilken miljö som testades (se resultatkapitlet för mer information om hur modellerna ser ut). TensorFlow-implementeringen gav antingen 100 % eller 0 % framgång och eftersom jag tror att de polariserande resultaten berodde på något problem med implementeringen så valde jag att inte göra fler mätningar än för en miljö. Och eftersom modellen aldrig nådde ett stabilt utfall på mer än 80 % mättes ingen tid på länge den behöver tränas för denna implementering. / This study consists of an experiment to see, as a proof of concept, how well it would work to implement an industrial gym environment to train a reinforcement learning model. To determine this, the reinforcement learning model is trained repeatedly and tested. If the model completes the training scenario, then that training iteration counts as a success. The time it takes to train for certain amount of game episodes is measured. The number of episodes it takes for the reinforcement learning model to achieve an acceptable outcome of 80% of maximum score is measured and the time it takes to train those episodes are measured. These measurements are evaluated, and conclusions are drawn on how well the reinforcement learning models worked. The tools used is the Q-learning algorithm implemented on its own and deep Q-learning with TensorFlow. The conclusion showed that the manually implemented Q-learning algorithm showed varying results depending on environment design and how long the agent is trained. It gave both high and low success rate varying from 100% to 0%. And the times it took to train the agent to an acceptable level was 0.116, 0.571 and 3.502 seconds depending on what environment was tested (see the result chapter for more information on the environments). The TensorFlow implementation gave either 100% or 0% success rate and since I believe the polarizing results was because of some issue with the implementation I chose to not do more measurements than for one environment. And since the model never reached a stable outcome of more than 80% no time for long it needs to train was measured for this implementation.
|
566 |
Assessment and Treatment of Multiple Topographies of Self-injury Maintained by Separate Reinforcement ContingenciesPace, Amy 08 1900 (has links)
Functional analysis procedures were used to assess and treat multiple topographies of self-injurious behavior exhibited by an individual. An experimental functional analysis indicated that one topography, hand biting, appeared to be maintained by social positive reinforcement in the form of delivery of tangible items. The analysis also provided evidence that a second form of self-injury, skin picking, was automatically reinforced. To treat positively reinforced hand biting, access to a preferred tangible was arranged contingent on the omission of biting for a prespecified time interval. Hand biting was nearly eliminated, and low rates were maintained as the schedule of reinforcement was thinned to 10 min. Competing stimulus assessments identified that magazines effectively suppressed all occurrences of skin picking; therefore, noncontingent access to magazines was implemented. Using a combination of multielement and multiple baseline designs, we were able to demonstrate that the two topographies of self-injury were maintained by independent reinforcement contingencies and that interventions corresponding to each topography and function effectively treated both behaviors.
|
567 |
Behavioral Induction in Guinea Pigs as a Function of Reinforcement Magnitude in Multiple Schedules of Negative ReinforcementBurns, Dennis L. 01 May 1975 (has links)
The purpose of this study was to examine the effects of changes in magnitude of negative reinforcement on multiple schedules with the guinea pigs. In both schedule components, the first response (lever press) after an average of 10 seconds was reinforced. In the constant component of this schedule the reinforcement magnitude (time-off from electric foot shock) was always 15 seconds; whereas, in the manipulated component the magnitude changed in the following sequence: 15, 7.5, 15, 30, and 15 seconds. All subjects showed a gradual decrease in response rate across baseline conditions. When behavioral effects were evaluated relative to this changing baseline, five of six subjects demonstrated that as the reinforcement magnitude decreased in one component, the response rates in both components decreased (negative induction). Likewise, when reinforcement magnitude increased in one component, all subjects showed behavioral induction. Specifically, three subjects showed increases in response rate in both components (positive induction), while two subjects showed decreases in response rate in v both components (negative induction). This research extends the generality of the behavioral induction phenomena on multiple schedules to in elude negative reinforcement with the guinea pig as a function of changes in reinforcement magnitude
|
568 |
The Use of Symbolic Modeling On Generalized Imitation In ChildrenAnderson, Emmett G. 01 May 1979 (has links)
Ten experimentally naive children between the ages of six and eight served in three generalized imitation experiments using symbolic models. Subjects were presented videotaped behaviors to imitate via closed circuit television, and their responses were mechanically defined, recorded, and reinforced in an effort to control social influences from the presence of the experimenter. In Experiment 1, imitation of three behaviors was reinforced and imitation of a fourth behavior was never reinforced for four subjects. Two other subjects received noncontingent reinforcement. The following independent variables were tested: (1) the presence and absence of an experimenter, (2) instructions to "Do that," and (3) contingent and noncontingent reinforcement.
Results of Experiment 1 demonstrated the apparatus could be used to produce and maintain generalized imitation, even in the absence of the experimenter, so long as differential reinforcement was available. ''Do that'' instructions were not necessary, and the presence of the experimenter served to maintain imitation when contingent reinforcement was not available.
In Experiment 2, four subjects produced generalized imitation in the absence of both a n experimenter and any instructions with two reinforced and two nonreinforced imitations.
Using the same four subjects in Experiment 3, congruent, incongruent, and "Do what you want" instructions given before sessions demonstrated that instructions could override the effect of reinforcers or produce differential responding in most subjects. When given a choice to imitate or not imitate, subjects continued generalized imitation.
The data tend to support the theory that imitation is itself a response class, and the effect of instructions is to divide that response class into a class of imitated responses and a class of instruction-following responses. The influence of instructions, even in the absence of an adult experimenter, was obvious.
|
569 |
The Effect of Token Reinforcement on Moderate-to-Vigorous Physical Activity Exhibited by Young ChildrenPatel, Rutvi R. 01 January 2017 (has links)
We used a multiple-baseline across participants and combined reversal and multielement design to assess the effects of contingent-token-reinforcement, compared to noncontingent-token-reinforcement, on moderate-to-vigorous physical activity (MVPA) exhibited by four preschool-aged children. Three children engaged in higher levels of MVPA when tokens were delivered contingent on MVPA compared to baseline (no token) and noncontingent-token conditions. Although MVPA was differentiated across contingent-token sessions and corresponding baseline (no token) control probes for three of the four participants, some variability was apparent. The present study demonstrated that the delivery of tokens contingent on MVPA can increase and maintain MVPA exhibited by preschool-aged children, resulting in more MVPA than in baseline conditions and conditions in which tokens are awarded without respect to MVPA.
|
570 |
Does the Establishment of Conditioned Reinforcement for Narrative Reading Affect STEM Reading or Vice Versa?White, Mary-Genevieve January 2023 (has links)
Research has demonstrated the positive effects on reading achievement measures when content is conditioned as a reinforcer for prolonged reading. While previous research has focused on conditioning narrative texts on the relation to increased comprehension, there is no current research on the effects of conditioning informational texts.
Experiment 1 examined whether the effects of conditioning narrative texts as a reinforcer extends to technical writing for science, technology, engineering, and math (STEM) content for third graders with and without Individualized Education Plans. We replicated the conditioning procedures used with elementary-aged participants in previous studies for narratives texts. Using a four-step, peer-collaborative procedure, peer interactions were paired with reading activities to condition narrative texts as reinforcers for prolonged reading. Results indicated that reinforcement value of conditioned narrative texts did not transfer to STEM texts.
Experiment 2 examined whether the effects of conditioning STEM texts as reinforcer extends to narrative texts. Academic achievement was also measured after conditioned reinforcement for STEM texts was established using the four-step peer collaborative procedure. Results indicated that the reinforcement value for STEM texts did not transfer to narrative texts. Keywords: conditioned reinforcement, narrative, pairing, peers
|
Page generated in 0.0879 seconds