Spelling suggestions: "subject:"reinforcement."" "subject:"einforcement.""
461 |
An Automated VNF Manager based on Parameterized-Action MDP and Reinforcement LearningLi, Xinrui 15 April 2021 (has links)
Managing and orchestrating the behaviour of virtualized Network Functions (VNFs) remains a major challenge due to their heterogeneity and the ever increasing resource demands of the served flows. In this thesis, we propose a novel VNF manager (VNFM) that employs a parameterized actions-based reinforcement learning mechanism to simultaneously decide on the optimal VNF management action (e.g., migration, scaling, termination or rebooting) and the action's corresponding configuration parameters (e.g., migration location or amount of resources needed for scaling ). More precisely, we first propose a novel parameterized-action Markov decision process (PAMDP) model to accurately describe each VNF, instances of its components and their communication as well as the set of permissible management actions by the VNFM and the rewards of realizing these actions. The use of parameterized actions allows us to rigorously represent the functionalities of the VNFM in order perform various Lifecycle management (LCM) operations on the VNFs. Next, we propose a two-stage reinforcement learning (RL) scheme that alternates between learning an action-value function for the discrete LCM actions and updating the actions parameters selection policy. In contrast to existing machine learning schemes, the proposed work uniquely provides a holistic management platform the unifies individual efforts targeting individual LCM functions such as VNF placement and scaling. Performance evaluation results demonstrate the efficiency of the proposed VNFM in maintaining the required performance level of the VNF while optimizing its resource configurations.
|
462 |
The Effects of Concrete, Symbolic, and Verbal Reinforcement of the Discrimination Learning of Moderately and Severely Retarded BoysJohnson, James E. 01 1900 (has links)
The present study is an attempt to determine which of several different types of reinforcers is most effective in discrimination learning using institutionalized mentally retarded boys of different intellectual levels as subjects. If one type of reinforcement works more effectively in conditioning one level of institutionalized mentally retarded subject, then that type of reinforcement could be used to greater advantage in controlling behavior than some other, less effective kind of reinforcer.
|
463 |
Learning from Immediate and Delayed RewardsCotet, Miruna Gabriela January 2021 (has links)
No description available.
|
464 |
Mutual Reinforcement LearningReid, Cameron 05 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Mutual learning is an emerging field in intelligent systems which takes inspiration from
naturally intelligent agents and attempts to explore how agents can communicate and coop-
erate to share information and learn more quickly. While agents in many biological systems
have little trouble learning from one another, it is not immediately obvious how artificial
agents would achieve similar learning. In this thesis, I explore how agents learn to interact
with complex systems. I further explore how these complex learning agents may be able
to transfer knowledge to one another to improve their learning performance when they are
learning together and have the power of communication. While significant research has been
done to explore the problem of knowledge transfer, the existing literature is concerned ei-
ther with supervised learning tasks or relatively simple discrete reinforcement learning. The
work presented here is, to my knowledge, the first which admits continuous state spaces and
deep reinforcement learning techniques. The first contribution of this thesis, presented in
Chapter 2, is a modified version of deep Q-learning which demonstrates improved learning
performance due to the addition of a mutual learning term which penalizes disagreement
between mutually learning agents. The second contribution, in Chapter 3, is a presentation
work which describes effective communication of agents which use fundamentally different
knowledge representations and systems of learning (model-free deep Q learning and model-
based adaptive dynamic programming), and I discuss how the agents can mathematically
negotiate their trust in one another to achieve superior learning performance. I conclude
with a discussion of the promise shown by this area of research and a discussion of problems
which I believe are exciting directions for future research.
|
465 |
Differentiating the Primary Reinforcing and Reinforcement-Enhancing Effects of VareniclineSchassburger, Rachel L., Levin, Melissa E., Weaver, Matthew T., Palmatier, Matthew I., Caggiula, Anthony R., Donny, Eric C., Sved, Alan F. 01 January 2015 (has links)
Rationale: Varenicline (VAR), a smoking cessation aid that is a partial agonist at nicotinic receptors, mimics the reinforcement-enhancing effects of nicotine. Varenicline, when accompanied by non-drug cues, is self-administered by rats, though it is unclear whether this results from varenicline acting as a primary reinforcer or a reinforcement enhancer of the cues. Objectives: This study sought to disentangle these two potential actions. Methods: Rats were allowed to self-administer intravenous nicotine, saline, or varenicline during 1-h sessions in operant chambers equipped with two levers. Five groups had concurrent access to drug infusions and a moderately reinforcing visual stimulus (VS) for responding on separate levers. Meeting the reinforcement schedule on one lever was reinforced with VAR (0.01, 0.06, 0.1 mg/kg/infusion), nicotine (0.06 mg/kg/infusion), or saline, while meeting the same schedule on the other lever delivered the VS. Additional groups were reinforced for pressing a single "active" lever and received VAR paired with the VS, the VS with response-independent infusions of VAR, or VAR alone (0.1 mg/kg/infusion). Results: Rats readily responded for VAR paired with VS on a single lever. However, when VAR was the only reinforcer contingent on a response, rats did not respond more than for saline. Conclusions: These findings show that VAR does not serve as a primary reinforcer in rats at doses that increase responding for non-drug reinforcers. These data are consistent with research showing that the primary reinforcing effects of VAR are weak, at best, and that the primary reinforcing and reinforcement-enhancing actions of nicotinic drugs are pharmacologically distinct.
|
466 |
Percolation and reinforcement on complex networksYuan, Xin 27 January 2018 (has links)
Complex networks appear in almost every aspect of our daily life and are widely studied in
the fields of physics, mathematics, finance, biology and computer science. This work utilizes
percolation theory in statistical physics to explore the percolation properties of
complex networks and develops a reinforcement scheme on improving network resilience.
This dissertation covers two major parts of my Ph.D. research on complex networks:
i) probe—in the context of both traditional percolation and k-core percolation—the resilience
of complex networks with tunable degree distributions or directed dependency links under
random, localized or targeted attacks; ii) develop and propose a
reinforcement scheme to eradicate catastrophic collapses that occur very often in interdependent networks.
We first use generating function and probabilistic methods to obtain analytical solutions to
percolation properties of interest, such as the giant component size and the critical occupation probability.
We study uncorrelated random networks with Poisson, bi-Poisson, power-law, and Kronecker-delta degree
distributions and construct those networks which are based on the configuration model.
The computer simulation results show remarkable agreement
with theoretical predictions.
We discover an increase of network robustness as the degree distribution
broadens and a decrease of network robustness as directed dependency links come into play
under random attacks. We also find that targeted attacks exert the biggest damage to
the structure of both single and interdependent networks in k-core percolation.
To strengthen the resilience of interdependent networks, we develop and propose a reinforcement
strategy and obtain the critical amount of reinforced nodes analytically for interdependent
Erdős-Rényi networks and numerically for scale-free and for random regular networks.
Our mechanism leads to improvement of network stability of the West U.S. power grid.
This dissertation provides us with a deeper understanding of the effects of structural features on network
stability and fresher insights into designing resilient interdependent infrastructure networks.
|
467 |
An Investigation of the Effects of Differential Reinforcement of Alternative Behavior on Students with Mild/Moderate Disabilities in a School ClassroomSpangenberg, Katrina 01 December 2008 (has links)
This study investigated the effects of differential reinforcement of alternative behavior (DRA), a behavior reduction procedure, on problem behavior exhibited by three elementary school students in a general education classroom. DRA involves reinforcement of an alternative behavior while withholding reinforcement for the inappropriate behavior. The three participants were classified as experiencing mild/moderate disabilities but received most services (and participated in this research) in a general education classroom. Problem behaviors included off-task, talk-outs, and inappropriate touching. Alternative behaviors included on-task and hand-raising to get teacher attention. Results indicated that DRA decreased off-task and talk-out behavior for two participants, although effects were variable. Results for a third participant indicated minimal effects on reduction of both off-task and inappropriate touching behaviors. For two participants, differential reinforcement of lower rates of behavior (DRL) was implemented following DRA in attempt to establish stimulus control over problem behavior. However, results of the DRL intervention were mixed. Results are discussed in terms of differences between investigating the effects DRA in classroom versus clinic settings and establishing and maintaining contingencies for reinforcement.
|
468 |
Non-Contingent Reinforcement in a Counseling Like SituationShelton, Robert B., Jr. 01 May 1973 (has links)
The purpose of this study was to determine if a variable, non-contingent reinforcement, could account for a significant amount of the effect of psychotherapy. A sample of ninety subjects was drawn from basic psychology classes and randomly assigned to six groups in a variation of the Soloman 4-group design. The treatment groups were connected to sham GSR equipment and told that when a light flashed the y had made an anxiety reducing statement and were becoming more mentally healthy. The subjects were given three by five cards upon which were typed positive-negative adjective pairs and told to use the cards as cues to talk about themselves. The subjects were placed on a variable interval schedule with a mean of 10 seconds. No significant difference was found for the treatment.
|
469 |
The Role of Vicarious Reinforcement for Modeled or Alternate BehaviorLech, Brian C. 01 May 1986 (has links)
Research on vicarious reinforcement has answered many questions but whether vicarious reinforcement increases the likelihood that an observer will imitate a model, a social learning theory would predict, or sets the occasion for the observer to perform an alternate response, as a discriminative stimulus interpretation of vicarious reinforcement suggests, seems to depend on (1) the setting, (2) procedure, and (3) reinforcers used. In an effort to better understand the function of vicarious reinforcement, while controlling for subjects; histories and using tangible reinforcers, 47 preschool children participated in two experiments that (1) provided an experimental history of responding on several levers, (2) provided differential reinforcement on the levers during training, and (3) assessed the effects of observing a model respond on a lever and receive tokens.
In Experiment I, 18 subjects who were trained to respond on three levers responded during an extinction period and then observed either an adult model respond on a fourth, novel lever or observed a control procedure. Only subjects who observed the model receive tokens responded on the same lever as the model during an additional extinction period. The extinction period was procedurally defined and relatively short in duration. The results of Experiment I supported social learning theory; however, imitation effects were short lived. Another experiment was conducted to evaluate more fully the extinction of the modeled behavior found in Experiment I.
In Experiment II, 29 subjects who were trained to respond on three levers responded during an extinction period and then observed an adult model in one of four modeling conditions. The subjects in this experiment were exposed to the model lever during training and had an extensive history of never being reinforced on the modeled lever. Only some of the subjects who observed the model receive tokens responded on the model lever and only for a short period of time. The results of this experiment illustrated the importance of the reinforcement history of the observer and supported previous studied which found an extinction effect for vicarious reinforcement.
Taken together, these experiments illustrate the limits of social learning theory because imitation effects were short lived and suggest certain procedures that will enhance the use of vicarious reinforcement in applied setting.
|
470 |
Accounting for Behavioral Contrast: Recent InterpretationsSnyder, Ronald L. 01 May 1983 (has links)
Behavioral contrast has been interpreted as a function of either (1) the reduction of frequency of reinforcement in one component of a multiple schedule or (2) the suppression of responses in one component regardless of reinforcement frequency.
These explanations are discussed in terms of their adequacy in accounting for several recent experimental results. Two alternative explanations are considered.
First, contrast is interpreted as a function of the relative summation of excitatory and inhibitory effects of stimuli.
Second, contrast is discussed as a possible function of a switch from a response-reinforcer contingency to a stimulus-reinforcer contingency as seen in auto-pecking. Both avenues are considered promising in terms of accounting for behavioral contrast.
|
Page generated in 0.0775 seconds