Spelling suggestions: "subject:"reinforced 1earning"" "subject:"reinforced c1earning""
1 |
Burst timing-dependent plasticity of NMDA receptor-mediated transmission in midbrain dopamine neurons : a putative cellular substrate for reward learningHarnett, Mark Thomas 04 February 2010 (has links)
The neurotransmitter dopamine (DA) represents a neural substrate for positive
motivation as its spatiotemporal distribution across the brain is responsible for goaldirected
behavior and learning reward associations. The critical determinant of DA
release throughout the brain is the firing pattern of DA-producing neurons. Synchronized
bursts of spikes can be triggered by sensory stimuli in these neurons, evoking phasic
release of DA in target brain areas to drive reward-based reinforcement learning and
behavior. These bursts are generated by NMDA-type glutamate receptors (NMDARs).
This dissertation reports a novel form of long-term potentiation (LTP) of NMDARmediated
excitatory transmission at DA neurons as a putative cellular substrate for
changes in DA neuron firing during reward learning.
Patch-clamp electrophysiological recording from DA neurons in acute brain slices
from young adult rats demonstrated that synaptic NMDARs exhibit LTP in an associative manner, requiring coordinated pre- and postsynaptic burst firing. Ca2+ signals produced
by postsynaptic burst firing needed to be amplified by preceding metabotropic
neurotransmitter inputs to effectively drive plasticity. Activation of NMDARs
themselves was also necessary. These two coincidence detectors governed the timingdependence
of NMDAR plasticity in a manner analogous to the timing rule for cuereward
learning paradigms in behaving animals. Further mechanistic study revealed that
PKA, but not PKC, activity gated LTP induction by regulating the magnitude of Ca2+
signal amplification via the inositol 1,4,5-triphospate (IP3) receptor and release of Ca2+
from intracellular stores. Plasticity of NMDARs was input specific and appeared to be
expressed postsynaptically, but was not associated with a change in NMDAR subunit
stoichiometry. LTP of NDMARs was DA-independent, and was specific for NMDARs:
the same induction protocol produced long-term depression of AMPA receptors.
NMDARs that had undergone LTP could be depotentiated in a spike-conditional manner,
consistent with active unlearning. Finally, repeated, in vivo amphetamine experience
dramatically increased facilitation of spike-evoked Ca2+ signals, which in turn drove
enhanced plasticity.
NMDAR plasticity thus represents a potential neural substrate for conditioned DA
neuron burst responses to environmental stimuli acquired during reward-based learning
as well a novel therapeutic target for intervention-based therapy of addictive disorders. / text
|
2 |
Adaptive Semi-structured Information ExtractionArpteg, Anders January 2003 (has links)
<p>The number of domains and tasks where information extraction tools can be used needs to be increased. One way to reach this goal is to construct user-driven information extraction systems where novice users are able to adapt them to new domains and tasks. To accomplish this goal, the systems need to become more intelligent and able to learn to extract information without need of expert skills or time-consuming work from the user.</p><p>The type of information extraction system that is in focus for this thesis is semistructural information extraction. The term semi-structural refers to documents that not only contain natural language text but also additional structural information. The typical application is information extraction from World Wide Web hypertext documents. By making effective use of not only the link structure but also the structural information within each such document, user-driven extraction systems with high performance can be built.</p><p>The extraction process contains several steps where different types of techniques are used. Examples of such types of techniques are those that take advantage of structural, pure syntactic, linguistic, and semantic information. The first step that is in focus for this thesis is the navigation step that takes advantage of the structural information. It is only one part of a complete extraction system, but it is an important part. The use of reinforcement learning algorithms for the navigation step can make the adaptation of the system to new tasks and domains more user-driven. The advantage of using reinforcement learning techniques is that the extraction agent can efficiently learn from its own experience without need for intensive user interactions.</p><p>An agent-oriented system was designed to evaluate the approach suggested in this thesis. Initial experiments showed that the training of the navigation step and the approach of the system was promising. However, additional components need to be included in the system before it becomes a fully-fledged user-driven system.</p> / Report code: LiU-Tek-Lic-2002:73.
|
3 |
Adaptive Semi-structured Information ExtractionArpteg, Anders January 2003 (has links)
The number of domains and tasks where information extraction tools can be used needs to be increased. One way to reach this goal is to construct user-driven information extraction systems where novice users are able to adapt them to new domains and tasks. To accomplish this goal, the systems need to become more intelligent and able to learn to extract information without need of expert skills or time-consuming work from the user. The type of information extraction system that is in focus for this thesis is semistructural information extraction. The term semi-structural refers to documents that not only contain natural language text but also additional structural information. The typical application is information extraction from World Wide Web hypertext documents. By making effective use of not only the link structure but also the structural information within each such document, user-driven extraction systems with high performance can be built. The extraction process contains several steps where different types of techniques are used. Examples of such types of techniques are those that take advantage of structural, pure syntactic, linguistic, and semantic information. The first step that is in focus for this thesis is the navigation step that takes advantage of the structural information. It is only one part of a complete extraction system, but it is an important part. The use of reinforcement learning algorithms for the navigation step can make the adaptation of the system to new tasks and domains more user-driven. The advantage of using reinforcement learning techniques is that the extraction agent can efficiently learn from its own experience without need for intensive user interactions. An agent-oriented system was designed to evaluate the approach suggested in this thesis. Initial experiments showed that the training of the navigation step and the approach of the system was promising. However, additional components need to be included in the system before it becomes a fully-fledged user-driven system. / <p>Report code: LiU-Tek-Lic-2002:73.</p>
|
4 |
Future-proofing Video Game Agents with Reinforced Learning and Unity ML-Agents / Framtidssäkring av datorspelsagenter med förstärkningsinlärning och Unity ML-AgentsAndersson, Pontus January 2021 (has links)
In later years, a number of simulation platforms has utilized video games as training grounds for designing and experimenting with different Machine Learning algorithms. One issue for many is that video games usually do not provide any source code. The Unity ML-Agents toolkit provides both example environments and state-of-the-art Machine Learning algorithms in an attempt solve this. This has sparked curiosity in a local game company which wished to investigate the incorporation of machine-learned agents into their game using the toolkit. As such, the goal was to produce high performing, integrable agents capable of completing locomotive tasks. A pilot study was conducted which contributed with insight in training functionality and aspect which were important to producing a robust behavior model. With the use of Proximal Policy Optimization and different training configurations several neural network models were produced and evaluated on existing and new data. Several of the produced models displayed promising results but did not achieve the defined success rate of 80%. With some additional testing it is believed that the desired result could be reached. Alternatively, different aspect of the toolkit like Soft Actor Critic and Curriculum Learning could be investigated. / På senare tid har ett handfull simulationsplattformar använt datorspel som en träningsmiljö för att designa och experimentera med olika maskininlärningsalgoritmer. Ett problem för många är att dessa spel vanligtvis inte tillhandahåller någon källkod. Unity ML-Agents toolkit ämnar lösa behovet genom att erbjuda befintliga träningsmiljöer tillsammans med de senaste maskininlärningsalgoritmerna. Detta har väckt intresset hos ett lokalt spelföretag som vill undersöka möjligheten att integrera maskininlärda agenter i ett av deras spel. Som följd formulerades målet att skapa högpresterande och integrerbara agenter kapabla att utföra lokomotoriska uppgifter. En förstudie genomfördes och tillhandagav nyttig information om träningsfunktionalitet och kringliggande aspekter om att producera robusta beteendemodeller. Med hjälp av proximal policyoptimering och olika träningskonfigurationer skapades modeller av neurala nätverk som utvärderades på befintlig respektive ny data. Flertalet modeller visade lovande resultat men ingendera nådde det specificerade prestandamålet på 80%. Tron är att med ytterligare tester hade ett önskat resultat kunnat bli nått. Fortsättningsvis är det även möjligt att undersöka andra lärotekniker inkluderade i ML-Agent verktyget.
|
Page generated in 0.0566 seconds