Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
301 |
Deep reinforcement learning approach to portfolio management / Deep reinforcement learning metod för portföljförvaltningJama, Fuaad January 2023 (has links)
This thesis evaluates the use of a Deep Reinforcement Learning (DRL) approach to portfolio management on the Swedish stock market. The idea is to construct a portfolio that is adjusted daily using the DRL algorithm Proximal policy optimization (PPO) with a multi perceptron neural network. The input to the neural network was historical data in the form of open, high, and low price data. The portfolio is evaluated by its performance against the OMX Stockholm 30 index (OMXS30). Furthermore, three different approaches for optimization are going to be studied, in that three different reward functions are going to be used. These functions are Sharp ratio, cumulative reward (Daily return) and Value at risk reward (which is a daily return with a value at risk penalty). The historival data that is going to be used is from the period 2010-01-01 to 2015-12-31 and the DRL approach is then tested on two different time periods which represents different marked conditions, 2016-01-01 to 2018-12-31 and 2019-01-01 to 2021-12-31. The results show that in the first test period all three methods (corresponding to the three different reward functions) outperform the OMXS30 benchmark in returns and sharp ratio, while in the second test period none of the methods outperform the OMXS30 index. / Målet med det här arbetet var att utvärdera användningen av "Deep reinforcement learning" (DRL) metod för portföljförvaltning på den svenska aktiemarknaden. Idén är att konstruera en portfölj som justeras dagligen med hjälp av DRL algoritmen "Proximal policy optimization" (PPO) med ett neuralt nätverk med flera perceptroner. Inmatningen till det neurala nätverket var historiska data i form av öppnings, lägsta och högsta priser. Portföljen utvärderades utifrån dess prestation mot OMX Stockholm 30 index (OMXS30). Dessutom studerades tre olika tillvägagångssätt för optimering, genom att använda tre olika belöningsfunktioner. Dessa funktioner var Sharp ratio, kumulativ belöning (Daglig avkastning) och Value at risk-belöning (som är en daglig avkastning minus Value at risk-belöning). Den historiska data som användes var från perioden 2010-01-01 till 2015-12-31 och DRL-metoden testades sedan på två olika tidsperioder som representerar olika marknadsförhållanden, 2016-01-01 till 2018-12-31 och 2019-01-01 till 2021-12-31. Resultatet visar att i den första testperioden så överträffade alla tre metoder (vilket motsvarar de tre olika belöningsfunktionerna) OMXS30 indexet i avkastning och sharp ratio, medan i den andra testperioden så överträffade ingen av metoderna OMXS30 indexet.
|
302 |
Temporal Abstractions in Multi-agent LearningJiayu Chen (18396687) 13 June 2024 (has links)
<p dir="ltr">Learning, planning, and representing knowledge at multiple levels of temporal abstractions provide an agent with the ability to predict consequences of different courses of actions, which is essential for improving the performance of sequential decision making. However, discovering effective temporal abstractions, which the agent can use as skills, and adopting the constructed temporal abstractions for efficient policy learning can be challenging. Despite significant advancements in single-agent settings, temporal abstractions in multi-agent systems remains underexplored. This thesis addresses this research gap by introducing novel algorithms for discovering and employing temporal abstractions in both cooperative and competitive multi-agent environments. We first develop an unsupervised spectral-analysis-based discovery algorithm, aiming at finding temporal abstractions that can enhance the joint exploration of agents in complex, unknown environments for goal-achieving tasks. Subsequently, we propose a variational method that is applicable for a broader range of collaborative multi-agent tasks. This method unifies dynamic grouping and automatic multi-agent temporal abstraction discovery, and can be seamlessly integrated into the commonly-used multi-agent reinforcement learning algorithms. Further, for competitive multi-agent zero-sum games, we develop an algorithm based on Counterfactual Regret Minimization, which enables agents to form and utilize strategic abstractions akin to routine moves in chess during strategy learning, supported by solid theoretical and empirical analyses. Collectively, these contributions not only advance the understanding of multi-agent temporal abstractions but also present practical algorithms for intricate multi-agent challenges, including control, planning, and decision-making in complex scenarios.</p>
|
303 |
Individual differences in personality associated with anterior cingulate cortex function: implication for understanding depressionUmemoto, Akina 18 March 2016 (has links)
We humans depend heavily on cognitive control to make decision and execute goal-directed behaviors, without which our behavior would be overpowered by automatic, stimulus-driven responses. In my dissertation, I focus on a brain region most implicated in this crucial process: the anterior cingulate cortex (ACC). The importance of this region is highlighted by lesion studies demonstrating diminished self-initiated behavior, or apathy, following ACC damage, the most severe form of which results in the near complete absence of speech production and willed actions in the presence of intact motor ability. Despite decades of research, however, its precise function is still highly debated, due particularly to ACC’s observed involvement in multiple aspects of cognition. In my dissertation I examine ACC function according to recent developments in reinforcement learning theory that posit a key role for ACC in motivating extended behavior. According to this theory, ACC is responsible for learning task values and motivating effortful control over extended behaviors based on those learned task values. The aim of my dissertation is two-fold: 1) to improve understanding of ACC function, and 2) to elucidate the contribution of ACC to depression, as revealed by individual differences in several personality traits related to motivation and reward sensitivity in a population of healthy college students. It was hypothesized that these different personality traits express, to greater or lesser degrees across individuals, ACC function, and that their abnormal expression (in particular, atypically low motivation and reward sensitivity) constitute hallmark characteristics of depression.
First, this dissertation reveals that reward positivity (RewP), a key electrophysiological signature of reward processing that is believed to index the impact of reinforcement learning signals carried by the midbrain dopamine system on to ACC, is sensitive to individual differences in reward valuation, being larger for those high in reward sensitivity and smaller for those high in depression scores. Second, consistent with a previous suggestion that people high in depression or depression scores have difficulty using reward information to motivate behavior, I find these individuals to exhibit relatively poor prolonged task performance despite an apparently greater investment of cognitive control, and a reduced willingness to expend effort to obtain probable rewards, a behavior that was stable with time on task. In contrast, individuals characterized by high persistence, which is indicative of good ACC function, exhibited high self-reported task engagement and increasing effortful behaviors with time on task, particularly for trials in which reward receipt was unlikely, suggesting increased motivational control. In sum, this dissertation emphasizes the importance of understanding the basic function of ACC as assessed by individual differences in personality, which is then used to understand the impact of its dysfunction in relation to mental illnesses. / Graduate
|
304 |
Information driven self-organization of agents and agent collectivesHarder, Malte January 2014 (has links)
From a visual standpoint it is often easy to point out whether a system is considered to be self-organizing or not, though a quantitative approach would be more helpful. Information theory, as introduced by Shannon, provides the right tools not only quantify self-organization, but also to investigate it in relation to the information processing performed by individual agents within a collective. This thesis sets out to introduce methods to quantify spatial self-organization in collective systems in the continuous domain as a means to investigate morphogenetic processes. In biology, morphogenesis denotes the development of shapes and form, for example embryos, organs or limbs. Here, I will introduce methods to quantitatively investigate shape formation in stochastic particle systems. In living organisms, self-organization, like the development of an embryo, is a guided process, predetermined by the genetic code, but executed in an autonomous decentralized fashion. Information is processed by the individual agents (e.g. cells) engaged in this process. Hence, information theory can be deployed to study such processes and connect self-organization and information processing. The existing concepts of observer based self-organization and relevant information will be used to devise a framework for the investigation of guided spatial self-organization. Furthermore, local information transfer plays an important role for processes of self-organization. In this context, the concept of synergy has been getting a lot attention lately. Synergy is a formalization of the idea that for some systems the whole is more than the sum of its parts and it is assumed that it plays an important role in self-organization, learning and decision making processes. In this thesis, a novel measure of synergy will be introduced, that addresses some of the theoretical problems that earlier approaches posed.
|
305 |
Feedback Related Negativity: Reward Prediction Error or Salience Prediction Error?Heydari, Sepideh 07 April 2015 (has links)
The reward positivity is a component of the human event-related brain potential (ERP) elicited by feedback stimuli in trial-and-error learning and guessing tasks. A prominent theory holds that the reward positivity reflects a reward prediction error that is differentially sensitive to the valence of the outcomes, namely, larger for unexpected positive events relative to unexpected negative events (Holroyd & Coles, 2002). Although the theory has found substantial empirical support, most of these studies have utilized either monetary or performance feedback to test the hypothesis. However, in apparent contradiction to the theory, a recent study found that unexpected physical punishments (a shock to the finger) also elicit the reward positivity (Talmi, Atkinson, & El-Deredy, 2013). Accordingly, these investigators argued that this ERP component reflects a salience prediction error rather than a reward prediction error. To investigate this finding further, I adapted the task paradigm by Talmi and colleagues to a more standard guessing task often used to investigate the reward positivity. Participants navigated a virtual T-maze and received feedback on each trial under two conditions. In a reward condition the feedback indicated that they would either receive a monetary reward or not for their performance on that trial. In a punishment condition the feedback indicated that they would receive a small shock or not at the end of the trial. I found that the feedback stimuli elicited a typical reward positivity in the reward condition and an apparently delayed reward positivity in the punishment condition. Importantly, this signal was more positive to the stimuli that predicted the omission of a possible punishment relative to stimuli that predicted a forthcoming punishment, which is inconsistent with the salience hypothesis. / Graduate / 0633 / 0317 / heydari@uvic.ca
|
306 |
Reinforcement learning for intelligent assembly automationLee, Siu-keung., 李少強. January 2002 (has links)
published_or_final_version / Industrial and Manufacturing Systems Engineering / Doctoral / Doctor of Philosophy
|
307 |
Induction of environment and goal models by an adaptive agent in deterministic environment / Adaptyvaus agento aplinkos ir tikslo modelių indukcija deterministinėje aplinkojeKapočiūtė-Dzikienė, Jurgita 01 February 2011 (has links)
If the laws of percepts change, which are described by deterministic Markov decision process, exist in the observable or partially observable environment, then the agent by interacting with the environment and having no initial knowledge is capable to discover those laws using the methods of logical and constructive induction (i.e. is capable to learn environment and goal models); it is capable to learn predicting precisely its own consequences of actions and apply this learned knowledge in order to achieve its own goals in the new unseen situations.
Adaptive agent proposed in this dissertation differs from other similar works presented in the literature in three novel potentials that enables this agent to solve the problem of knowledge transferability from one environment into another, when the same laws are valid for the environments; to solve the problem of goal percepts generalization; and to solve the problem of perceptual aliasing in the partially observable environment.
During the investigations it was discovered that adaptive agent using created environment model solves the tasks of knowledge transferability into new environments better than other alternative agents (based on Q-learning and ADP methods); using created goal model solves the goal percepts generalization tasks by correctly approximating the reward function and predicting its values in the new environments; solves the problem of perceptual aliasing by transforming the deterministic nth order Markov... [to full text] / Jei stebimoje ar iš dalies stebimoje aplinkoje galioja būsenų kaitos dėsniai, nusakomi deterministiniu Markovo sprendimo procesu, tai agentas, sąveikaudamas su aplinka ir neturėdamas jokių pradinių žinių, gali šiuos dėsnius atrasti loginės ir konstrukcinės indukcijos metodais (išmokti aplinkos ir tikslo modelius), gali išmokti tiksliai prognozuoti savo veiksmų pasekmes ir taikyti šias žinias, kad greičiau pasiektų savo tikslus naujose nematytose situacijose. Disertacijoje siūlomas adaptyvus agentas nuo literatūroje pristatomų panašių darbų skiriasi trimis naujomis galimybėmis, nes: geba spręsti vienoje aplinkoje išmoktų žinių perkeliamumo į naujas aplinkas problemą, kai aplinkoms galioja tie patys dėsniai; tikslo stebėjimų apibendrinimo problemą; stebėjimų daugiareikšmiškumo problemą dalinai stebimoje aplinkoje. Tyrimų metu nustatyta, kad adaptyvus agentas, naudodamas sukurtą aplinkos modelį, žinių perkeliamumo į naujas aplinkas uždavinius sprendžia geriau nei kiti alternatyvūs agentai (grindžiami Q-mokymu ir ADP metodais); tikslo stebėjimų apibendrinimo uždavinius, naudodamas sukurtą tikslo modelį, sprendžia teisingai aproksimuodamas atlygio funkciją ir prognozuodamas pastiprinimo reikšmes naujose aplinkose; stebėjimų daugiareikšmiškumo problemą sprendžia pertvarkydamas deterministinį n-tos eilės Markovo sprendimo procesą į 1-os eilės ir jam sukurdamas aplinkos modelį, atitinkantį baigtinį Muro automatą.
|
308 |
Adaptyvaus agento aplinkos ir tikslo modelių indukcija deterministinėje aplinkoje / Induction of environment and goal models by an adaptive agent in deterministic environmentKapočiūtė-Dzikienė, Jurgita 01 February 2011 (has links)
Jei stebimoje ar iš dalies stebimoje aplinkoje galioja būsenų kaitos dėsniai, nusakomi deterministiniu Markovo sprendimo procesu, tai agentas, sąveikaudamas su aplinka ir neturėdamas jokių pradinių žinių, gali šiuos dėsnius atrasti loginės ir konstrukcinės indukcijos metodais (išmokti aplinkos ir tikslo modelius), gali išmokti tiksliai prognozuoti savo veiksmų pasekmes ir taikyti šias žinias, kad greičiau pasiektų savo tikslus naujose nematytose situacijose. Disertacijoje siūlomas adaptyvus agentas nuo literatūroje pristatomų panašių darbų skiriasi trimis naujomis galimybėmis, nes: geba spręsti vienoje aplinkoje išmoktų žinių perkeliamumo į naujas aplinkas problemą, kai aplinkoms galioja tie patys dėsniai; tikslo stebėjimų apibendrinimo problemą; stebėjimų daugiareikšmiškumo problemą dalinai stebimoje aplinkoje. Tyrimų metu nustatyta, kad adaptyvus agentas, naudodamas sukurtą aplinkos modelį, žinių perkeliamumo į naujas aplinkas uždavinius sprendžia geriau nei kiti alternatyvūs agentai (grindžiami Q-mokymu ir ADP metodais); tikslo stebėjimų apibendrinimo uždavinius, naudodamas sukurtą tikslo modelį, sprendžia teisingai aproksimuodamas atlygio funkciją ir prognozuodamas pastiprinimo reikšmes naujose aplinkose; stebėjimų daugiareikšmiškumo problemą sprendžia pertvarkydamas deterministinį n-tos eilės Markovo sprendimo procesą į 1-os eilės ir jam sukurdamas aplinkos modelį, atitinkantį baigtinį Muro automatą. / If the laws of percepts change, which are described by deterministic Markov decision process, exist in the observable or partially observable environment, then the agent by interacting with the environment and having no initial knowledge is capable to discover those laws using the methods of logical and constructive induction (i.e. is capable to learn environment and goal models); it is capable to learn predicting precisely its own consequences of actions and apply this learned knowledge in order to achieve its own goals in the new unseen situations.
Adaptive agent proposed in this dissertation differs from other similar works presented in the literature in three novel potentials that enables this agent to solve the problem of knowledge transferability from one environment into another, when the same laws are valid for the environments; to solve the problem of goal percepts generalization; and to solve the problem of perceptual aliasing in the partially observable environment.
During the investigations it was discovered that adaptive agent using created environment model solves the tasks of knowledge transferability into new environments better than other alternative agents (based on Q-learning and ADP methods); using created goal model solves the goal percepts generalization tasks by correctly approximating the reward function and predicting its values in the new environments; solves the problem of perceptual aliasing by transforming the deterministic nth order Markov... [to full text]
|
309 |
Complex question answering : minimizing the gaps and beyondHasan, Sheikh Sadid Al January 2013 (has links)
Current Question Answering (QA) systems have been significantly advanced in demonstrating
finer abilities to answer simple factoid and list questions. Such questions are easier
to process as they require small snippets of texts as the answers. However, there is
a category of questions that represents a more complex information need, which cannot
be satisfied easily by simply extracting a single entity or a single sentence. For example,
the question: “How was Japan affected by the earthquake?” suggests that the inquirer is
looking for information in the context of a wider perspective. We call these “complex questions”
and focus on the task of answering them with the intention to minimize the existing
gaps in the literature.
The major limitation of the available search and QA systems is that they lack a way of
measuring whether a user is satisfied with the information provided. This was our motivation
to propose a reinforcement learning formulation to the complex question answering
problem. Next, we presented an integer linear programming formulation where sentence
compression models were applied for the query-focused multi-document summarization
task in order to investigate if sentence compression improves the overall performance.
Both compression and summarization were considered as global optimization problems.
We also investigated the impact of syntactic and semantic information in a graph-based
random walk method for answering complex questions. Decomposing a complex question
into a series of simple questions and then reusing the techniques developed for answering
simple questions is an effective means of answering complex questions. We proposed a
supervised approach for automatically learning good decompositions of complex questions
in this work. A complex question often asks about a topic of user’s interest. Therefore, the
problem of complex question decomposition closely relates to the problem of topic to question
generation. We addressed this challenge and proposed a topic to question generation
approach to enhance the scope of our problem domain. / xi, 192 leaves : ill. ; 29 cm
|
310 |
Computational model-based functional magnetic resonance imaging of reinforcement learning in humansErdeniz, Burak January 2013 (has links)
The aim of this thesis is to determine the changes in BOLD signal of the human brain during various stages of reinforcement learning. In order to accomplish that goal two probabilistic reinforcement-learning tasks were developed and assessed with healthy participants by using functional magnetic resonance imaging (fMRI). For both experiments the brain imaging data of the participants were analysed by using a combination of univariate and model–based techniques. In Experiment 1 there were three types of stimulus-response pairs where they predict either a reward, a neutral or a monetary loss outcome with a certain probability. The Experiment 1 tested the following research questions: Where does the activity occur in the brain for expecting and receiving a monetary reward and a punishment ? Does avoiding a loss outcome activate similar brain regions as gain outcomes and vice a verse does avoiding a reward outcome activate similar brain regions as loss outcomes? Where in the brain prediction errors, and predictions for rewards and losses are calculated? What are the neural correlates of reward and loss predictions for reward and loss during early and late phases in learning? The results of the Experiment 1 have shown that expectation for reward and losses activate overlapping brain areas mainly in the anterior cingulate cortex and basal ganglia but outcomes of rewards and losses activate separate brain regions, outcomes of losses mainly activate insula and amygdala whereas reward activate bilateral medial frontal gyrus. The model-based analysis also revealed early versus late learning related changes. It was found that predicted-value in early trials is coded in the ventro-medial orbito frontal cortex but later in learning the activation for the predicted value was found in the putamen. The second experiment was designed to find out the differences in processing novel versus familiar reward-predictive stimuli. The results revealed that dorso-lateral prefrontal cortex and several regions in the parietal cortex showed greater activation for novel stimuli than for familiar stimuli. As an extension to the fourth research question of Experiment 1, reward predictedvalues of the conditional stimuli and prediction errors of unconditional stimuli were also assessed in Experiment 2. The results revealed that during learning there is a significant activation of the prediction error mainly in the ventral striatum with extension to various cortical regions but for familiar stimuli no prediction error activity was observed. Moreover, predicted values for novel stimuli activate mainly ventro-medial orbito frontal cortex and precuneus whereas the predicted value of familiar stimuli activates putamen. The results of Experiment 2 for the predictedvalues reviewed together with the early versus later predicted values in Experiment 1 suggest that during learning of CS-US pairs activation in the brain shifts from ventro-medial orbito frontal structures to sensori-motor parts of the striatum.
|
Page generated in 0.1191 seconds