Global ETD Search

1	Spatially Similar Practice Immediately Following Motor Sequence Learning Eliminates Offline Gains Handa, Atul 14 March 2013 (has links) Robust offline performance gains, beyond those that would be anticipated by being exposed to additional physical practice, have been reported during procedural learning. However, practice of unrelated procedural task performance within 4-6 hour after initial practice has been revealed to eliminate offline improvement. The present experiment assessed the relative impact of experiencing supplemental practice of a spatially or a motorically-similar procedural task immediately following practice of a target motor sequence task. Based on a contemporary model of procedural skill acquisition forwarded by Hikosaka and colleagues, we assumed exposure to a spatial compatible motor sequence rather than interfering would support rapid improvement in the production of the spatial variant of the target task without compromising important memory processes, which are conducted offline to improve delayed performance of the target task. Findings revealed the often demonstrated offline gain when the target task was performed in the absence of interfering task practice as well as the elimination of such gains when target task practice was followed with additional practice of either a novel or motorically-similar motor sequence task. While immediate performance of the spatially-similar task was facilitated by preceding target task training, offline gains for the target task no longer emerged. These data are consistent with a central premise of Hikosaka et al.’s model that a spatial reference system plays an important role early during motor sequence learning but highlight the sensitivity of offline gains to task practice order. proactive facilitation interference motor sequence task offline learning
2	Offline Reinforcement Learning for Scheduling Live Video Events in Large Enterprises Franzén, Jonathan January 2022 (has links) In modern times, live video streaming events in companies has become an increasingly relevantmethod for communications. As a platform provider for these events, being able to deliverrelevant recommendations for event scheduling times to users is an important feature. A systemproviding relevant recommendations to users can be described as a recommender system.Recommender systems usually face issues such as having to be trained purely offline, astraining the system online can be costly or time-consuming, requiring manual user feedback.While many solutions and advancements have been made in recommender systems over theyears, such as contributions in the Netflix Prize, it still continues to be an active research topic.This work aims at designing a recommender system which observes users' past sequentialscheduling behavior to provide relevant recommendations for scheduling upcoming live videoevents. The developed recommender system uses reinforcement learning as a model, withcomponents such as a generative model to help it learn from offline data. Reinforcement Learning Offline Learning Recommender Systems Computer and Information Sciences Data- och informationsvetenskap
3	Statistical Methods for Offline Deep Reinforcement Learning Danyang Wang (18414336) 20 April 2024 (has links) <p dir="ltr">Reinforcement learning (RL) has been a rapidly evolving field of research over the past years, enhancing developments in areas such as artificial intelligence, healthcare, and education, to name a few. Regardless of the success of RL, its inherent online learning nature presents obstacles for its real-world applications, since in many settings, online data collection with the latest learned policy can be expensive and/or dangerous (such as robotics, healthcare, and autonomous driving). This challenge has catalyzed research into offline RL, which involves reinforcement learning from previously collected static datasets, without the need for further online data collection. However, most existing offline RL methods depend on two key assumptions: unconfoundedness and positivity (also known as the full-coverage assumption), which frequently do not hold in the context of static datasets. </p><p dir="ltr">In the first part of this dissertation, we simultaneously address these two challenges by proposing a novel policy learning algorithm: PESsimistic CAusal Learning (PESCAL). We utilize the mediator variable based on Front-Door Criterion, to remove the confounding bias. Additionally, we adopt the pessimistic principle to tackle the distributional shift problem induced by the under-coverage issue. This issue refers to the mismatch of distributions between the action distributions induced by candidate policies, and the policy that generates the observational data (known as the behavior policy). Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function, to partially mitigate the issue of distributional shift. This insight significantly simplifies our algorithm, by circumventing the challenging task of sequential uncertainty quantification for the estimated Q-function. Moreover, we provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.</p><p dir="ltr">In the second part of this dissertation, in contrast to the first part, which approaches the distributional shift issue implicitly by penalizing the value function as a whole, we explicitly constrain the learned policy to not deviate significantly from the behavior policy, while still enabling flexible adjustment of the degree of constraints. Building upon the offline reinforcement learning algorithm, TD3+BC \cite{fujimoto2021minimalist}, we propose a model-free actor-critic algorithm with an adjustable behavior cloning (BC) term. We employ an ensemble of networks to quantify the uncertainty of the estimated value function, thus addressing the issue of overestimation. Moreover, we introduce a method that is both convenient and intuitively simple for controlling the degree of BC, through a Bernoulli random variable based on the user-specified confidence level for different offline datasets. Our proposed algorithm, named Ensemble-based Actor Critic with Adaptive Behavior Cloning (EABC), is straightforward to implement, exhibits low variance, and achieves strong performance across all D4RL benchmarks.</p> Reinforcement learning Computational statistics Statistical data science reinforcement learning Offline Learning Sequential Decision-Making artificial intellgience Algorithm Design
4	Biased Exploration in Offline Hierarchical Reinforcement Learning Miller, Eric D. 26 January 2021 (has links) No description available. Computer Science Artificial Intelligence machine learning reinforcement learning offline learning biased sampling sampling hierarchy task hierarchy hierarchical reinforcement learning rl hrl exploration optimism offline reinforcement learning
5	Methodology to estimate building energy consumption using artificial intelligence / Méthodologie pour estimer la consommation d’énergie dans les bâtiments en utilisant des techniques d’intelligence artificielle Paudel, Subodh 22 September 2016 (has links) Les normes de construction pour des bâtiments de plus en plus économes en énergie (BBC) nécessitent une attention particulière. Ces normes reposent sur l’amélioration des performances thermiques de l’enveloppe du bâtiment associé à un effet capacitif des murs augmentant la constante de temps du bâtiment. La prévision de la demande en énergie de bâtiments BBC est plutôt complexe. Ce travail aborde cette question par la mise en œuvre d’intelligence artificielle(IA). Deux approches de mise en œuvre ont été proposées : « all data » et « relevant data ». L’approche « all data » utilise la totalité de la base de données. L’approche « relevant data » consiste à extraire de la base de données un jeu de données représentant le mieux possible les prévisions météorologiques en incluant les phénomènes inertiels. Pour cette extraction, quatre modes de sélection ont été étudiés : le degré jour (HDD), une modification du degré jour (mHDD) et des techniques de reconnaissance de chemin : distance de Fréchet (FD) et déformation temporelle dynamique (DTW). Quatre techniques IA sont mises en œuvre : réseau de neurones (ANN), machine à support de vecteurs (SVM), arbre de décision (DT) et technique de forêt aléatoire (RF). Dans un premier temps, six bâtiments ont été numériquement simulés (de consommation entre 86 kWh/m².an à 25 kWh/m².an) : l’approche « relevant data » reposant sur le couple (DTW, SVM) donne les prévisions avec le moins d’erreur. L’approche « relevant data » (DTW, SVM) sur les mesures du bâtiment de l’Ecole des Mines de Nantes reste performante. / High-energy efficiency building standards (as Low energy building LEB) to improve building consumption have drawn significant attention. Building standards is basically focused on improving thermal performance of envelope and high heat capacity thus creating a higher thermal inertia. However, LEB concept introduces alarge time constant as well as large heat capacity resulting in a slower rate of heat transfer between interior of building and outdoor environment. Therefore, it is challenging to estimate and predict thermal energy demand for such LEBs. This work focuses on artificial intelligence (AI) models to predict energy consumptionof LEBs. We consider two kinds of AI modeling approaches: “all data” and “relevant data”. The “all data” uses all available data and “relevant data” uses a small representative day dataset and addresses the complexity of building non-linear dynamics by introducing past day climatic impacts behavior. This extraction is based on either simple physical understanding: Heating Degree Day (HDD), modified HDD or pattern recognition methods: Frechet Distance and Dynamic Time Warping (DTW). Four AI techniques have been considered: Artificial Neural Network (ANN), Support Vector Machine (SVM), Boosted Ensemble Decision Tree (BEDT) and Random forest (RF). In a first part, numerical simulations for six buildings (heat demand in the range [25 – 85 kWh/m².yr]) have been performed. The approach “relevant data” with (DTW, SVM) shows the best results. Real data of the building “Ecole des Mines de Nantes” proves the approach is still relevant. Prévision Bâtiment basse consommation Intelligence artificielle Jeu de données représentatives Apprentissage en ligne et hors ligne Building Energy Consumption Prediction Low Energy Building Machine Learning Small representative data Online and Offline Learning
6	An empirical study of stability and variance reduction in DeepReinforcement Learning Lindström, Alexander January 2024 (has links) Reinforcement Learning (RL) is a branch of AI that deals with solving complex sequential decision making problems such as training robots, trading while following patterns and trends, optimal control of industrial processes, and more. These applications span various fields, including data science, factories, finance, and others[1]. The most popular RL algorithm today is Deep Q Learning (DQL), developed by a team at DeepMind, which successfully combines RL with Neural Network (NN). However, combining RL and NN introduces challenges such as numerical instability and unstable learning due to high variance. Among others, these issues are due to the“moving target problem”. To mitigate this problem, the target network was introduced as a solution. However, using a target network slows down learning, vastly increases memory requirements, and adds overheads in running the code. In this thesis, we conduct an empirical study to investigate the importance of target networks. We conduct this empirical study for three scenarios. In the first scenario, we train agents in online learning. The aim here is to demonstrate that the target network can be removed after some point in time without negatively affecting performance. To evaluate this scenario, we introduce the concept of the stabilization point. In thesecond scenario, we pre-train agents before continuing to train them in online learning. For this scenario, we demonstrate the redundancy of the target network by showing that it can be completely omitted. In the third scenario, we evaluate a newly developed activation function called Truncated Gaussian Error Linear Unit (TGeLU). For thisscenario, we train an agent in online learning and show that by using TGeLU as anactivation function, we can completely remove the target network. Through the empirical study of these scenarios, we conjecture and verify that a target network has only transient benefits concerning stability. We show that it has no influence on the quality of the policy found. We also observed that variance was generally higher when using a target network in the later stages of training compared to cases where the target network had been removed. Additionally, during the investigation of the second scenario, we observed that the magnitude of training iterations during pre-training affected the agent’s performance in the online learning phase. This thesis provides a deeper understanding of how the target networkaffects the training process of DQL, some of them - surrounding variance reduction- are contrary to popular belief. Additionally, the results have provided insights into potential future work. These include further explore the benefits of lower variance observed when removing the target network and conducting more efficient convergence analyses for the pre-training part in the second scenario. Reinforcement Learning Markov Decision Processes Neural Network Deep Q Learning Deep Q Network Sigmoid Truncated Gaussian Error Linear Unit Target network Stable learning Online learning Offline learning Computer Engineering Datorteknik

1

Page generated in 0.0561 seconds