Global ETD Search

301	Machine Learning-Based Instruction Scheduling for a DSP Architecture Compiler : Instruction Scheduling using Deep Reinforcement Learning and Graph Convolutional Networks / Maskininlärningsbaserad schemaläggning av instruktioner för en DSP-arkitekturkompilator : Schemaläggning av instruktioner med Deep Reinforcement Learning och grafkonvolutionella nätverk Alava Peña, Lucas January 2023 (has links) Instruction Scheduling is a back-end compiler optimisation technique that can provide significant performance gains. It refers to ordering instructions in a particular order to reduce latency for processors with instruction-level parallelism. At the present typical compilers use heuristics to perform instruction scheduling and solve other related non-polynomial complete problems. This thesis aims to present a machine learning-based approach to challenge heuristic methods concerning performance. In this thesis, a novel reinforcement learning (RL) based model for the instruction scheduling problem is developed including modelling features of processors such as forwarding, resource utilisation and treatment of the action space. An efficient optimal scheduler is presented to be used for an optimal schedule length based reward function, however, this is not used in the final results as a heuristic based reward function was deemed to be sufficient and faster to compute. Furthermore, an RL agent that interacts with the model of the problem is presented using three different types of graph neural networks for the state processing: graph conventional networks, graph attention networks, and graph attention based on the work of Lee et al. A simple two-layer neural network is also used for generating embeddings for the resource utilisation stages. The proposed solution is validated against the modelled environment and favourable but not significant improvements were found compared to the most common heuristic method. Furthermore, it was found that having embeddings relating to resource utilisation was very important for the explained variance of the RL models. Additionally, a trained model was tested in an actual compiler, however, no informative results were found likely due to register allocation or other compiler stages that occur after instruction scheduling. Future work should include improving the scalability of the proposed solution. / Instruktionsschemaläggning är en optimeringsteknik för kompilatorer som kan ge betydande prestandavinster. Det handlar om att ordna instruktioner i en viss ordning för att minska latenstiden för processorer med parallellitet på instruktionsnivå. För närvarande använder vanliga kompilatorer heuristiker för att utföra schemaläggning av instruktioner och lösa andra relaterade ickepolynomiala kompletta problem. Denna avhandling syftar till att presentera en maskininlärningsbaserad metod för att utmana heuristiska metoder när det gäller prestanda. I denna avhandling utvecklas en ny förstärkningsinlärningsbaserad (RL) modell för schemaläggning av instruktioner, inklusive modellering av processorns egenskaper såsom vidarebefordran, resursutnyttjande och behandling av handlingsutrymmet. En effektiv optimal schemaläggare presenteras för att eventuellt användas för belöningsfunktionen, men denna används inte i de slutliga resultaten. Dessutom presenteras en RL-agent som interagerar med problemmodellen och använder tre olika typer av grafneurala nätverk för tillståndsprocessering: grafkonventionella nätverk, grafuppmärksamhetsnätverk och grafuppmärksamhet baserat på arbetet av Lee et al. Ett enkelt neuralt nätverk med två lager används också för att generera inbäddningar för resursanvändningsstegen. Den föreslagna lösningen valideras mot den modellerade miljön och gynnsamma men inte signifikanta förbättringar hittades jämfört med den vanligaste heuristiska metoden. Dessutom visade det sig att det var mycket viktigt för den förklarade variansen i RL-modellerna att ha inbäddningar relaterade till resursutnyttjande. Dessutom testades en tränad modell i en verklig kompilator, men inga informativa resultat hittades, sannolikt på grund av registerallokering eller andra kompilatorsteg som inträffar efter schemaläggning av instruktioner. Framtida arbete bör inkludera att förbättra skalbarheten hos den föreslagna lösningen. Instruction Scheduling Deep reinforcement Learning Compilers Graph Convolutional Networks Schemaläggning av instruktioner Deep Reinforcement Learning kompilatorer grafkonvolutionella nätverk Computer and Information Sciences Data- och informationsvetenskap
302	Integrating Data-driven Control Methods with Motion Planning: A Deep Reinforcement Learning-based Approach Avinash Prabu (6920399) 08 January 2024 (has links) <p dir="ltr">Path-tracking control is an integral part of motion planning in autonomous vehicles, in which the vehicle's lateral and longitudinal positions are controlled by a control system that will provide acceleration and steering angle commands to ensure accurate tracking of longitudinal and lateral movements in reference to a pre-defined trajectory. Extensive research has been conducted to address the growing need for efficient algorithms in this area. In this dissertation, a scenario and machine learning-based data-driven control approach is proposed for a path-tracking controller. Firstly, a Deep Reinforcement Learning model is developed to facilitate the control of longitudinal speed. A Deep Deterministic Policy Gradient algorithm is employed as the primary algorithm in training the reinforcement learning model. The main objective of this model is to maintain a safe distance from a lead vehicle (if present) or track a velocity set by the driver. Secondly, a lateral steering controller is developed using Neural Networks to control the steering angle of the vehicle with the main goal of following a reference trajectory. Then, a path-planning algorithm is developed using a hybrid A* planner. Finally, the longitudinal and lateral control models are coupled together to obtain a complete path-tracking controller that follows a path generated by the hybrid A* algorithm at a wide range of vehicle speeds. The state-of-the-art path-tracking controller is also built using Model Predictive Control and Stanley control to evaluate the performance of the proposed model. The results showed the effectiveness of both proposed models in the same scenario, in terms of velocity error, lateral yaw angle error, and lateral distance error. The results from the simulation show that the developed hybrid A* algorithm has good performance in comparison to the state-of-the-art path planning algorithms.</p> Intelligent mobility Autonomous vehicle systems Control engineering Intelligent robotics Modelling and simulation Reinforcement learning Motion Planning Motion Planning and Control Steering Control Path Planning Reinforcement learning
303	Reinforcement Learning-based Handover in Millimeter-wave Networks Yang, Jiarui January 2021 (has links) Millimeter Wave (mmWave) is a key technology to meet the challenge of data rates and the lack of bandwidth in sub-6GHz networks. Due to a high operation frequency, the mmWave network has unique channel characteristics and a relatively high pathloss. Therefore, a dense deployment of Base Station (BS) is necessary, leading to a more frequent handover, which may cause a degradation of User Equipment (UE) experience. Furthermore, a massive number of devices cause an interference issue and a high dropping probability. In this project, we propose a handover method based on Reinforcement Learning (RL). This handover method provides a seamless connection and considers the load balancing. To verify the proposed method, Q-learning is selected to solve this RL problem and a simulation environment of mmWave is set up, including the pathloss model, system model, and beamforming. The average data rate, number of handovers, and number of available resources are evaluated during the movement of UEs. The results are compared with rate-max method and random backup method in different interference scenarios. Our proposed method shows a notable performance in terms of data rate, for example, while doubling the interference, the data rate decreases 8.6% with our method while it decreases 20% with the random-backup method. Moreover, our method has the minimum number of handovers in the trajectory. The performance in multiple trajectories is also illustrated and it performs as expected. / Millimeter Wave (mmWave) är en nyckelteknologi för att möta utmaningen med datahastigheter och bristen på bandbredd i sub-6GHz-nätverk. På grund av den höga driftsfrekvensen har mmWave-nätverket unika kanalegenskaper och en relativt hög banförlust. Därför är en tät användning av basstationen (BS) nödvändig vilket leder till en mer frekvent överlämning, vilket kan orsaka en försämring av User Equipment (UE) upplevelse. Dessutom orsakar ett stort antal enheter störningsproblem och en hög dropping probability. I det här projektet föreslår vi en överlämningsmetod baserad på Reinforcement Learning (RL). Denna överlämningsmetod ger en sömlös anslutning och tar hänsyn till lastbalanseringen. För att verifiera den föreslagna metoden har en simuleringsmiljö på mmWave ställts in, inklusive banförlust-modellen, systemmodellen och strålformning. Genomsnitt datahastighet, antal överlämningar och antal tillgängliga resurser utvärderas under förflyttning av UE: er. Resultaten jämförs med rate-max metod och slumpmässig säkerhetskopieringsmetod i olika störningsscenarier. Vår föreslagna metod visar en anmärkningsvärd prestanda när det gäller datahastighet, till exempel, när interferensen fördubblas minskar datahastigheten 8,6% med vår metod medan den minskar 20% med slumpmässig säkerhetskopieringsmetod. Dessutom har vår metod det minsta antalet överlämningar i banan. Prestandan i flera banor illustreras också och den fungerar som förväntat. Wireless communications mmWave networks reinforcement learning handover Trådlös kommunikation mmWave-nätverk Reinforcement Learning överlämning Elektroteknik och elektronik
304	Deep reinforcement learning approach to portfolio management / Deep reinforcement learning metod för portföljförvaltning Jama, Fuaad January 2023 (has links) This thesis evaluates the use of a Deep Reinforcement Learning (DRL) approach to portfolio management on the Swedish stock market. The idea is to construct a portfolio that is adjusted daily using the DRL algorithm Proximal policy optimization (PPO) with a multi perceptron neural network. The input to the neural network was historical data in the form of open, high, and low price data. The portfolio is evaluated by its performance against the OMX Stockholm 30 index (OMXS30). Furthermore, three different approaches for optimization are going to be studied, in that three different reward functions are going to be used. These functions are Sharp ratio, cumulative reward (Daily return) and Value at risk reward (which is a daily return with a value at risk penalty). The historival data that is going to be used is from the period 2010-01-01 to 2015-12-31 and the DRL approach is then tested on two different time periods which represents different marked conditions, 2016-01-01 to 2018-12-31 and 2019-01-01 to 2021-12-31. The results show that in the first test period all three methods (corresponding to the three different reward functions) outperform the OMXS30 benchmark in returns and sharp ratio, while in the second test period none of the methods outperform the OMXS30 index. / Målet med det här arbetet var att utvärdera användningen av "Deep reinforcement learning" (DRL) metod för portföljförvaltning på den svenska aktiemarknaden. Idén är att konstruera en portfölj som justeras dagligen med hjälp av DRL algoritmen "Proximal policy optimization" (PPO) med ett neuralt nätverk med flera perceptroner. Inmatningen till det neurala nätverket var historiska data i form av öppnings, lägsta och högsta priser. Portföljen utvärderades utifrån dess prestation mot OMX Stockholm 30 index (OMXS30). Dessutom studerades tre olika tillvägagångssätt för optimering, genom att använda tre olika belöningsfunktioner. Dessa funktioner var Sharp ratio, kumulativ belöning (Daglig avkastning) och Value at risk-belöning (som är en daglig avkastning minus Value at risk-belöning). Den historiska data som användes var från perioden 2010-01-01 till 2015-12-31 och DRL-metoden testades sedan på två olika tidsperioder som representerar olika marknadsförhållanden, 2016-01-01 till 2018-12-31 och 2019-01-01 till 2021-12-31. Resultatet visar att i den första testperioden så överträffade alla tre metoder (vilket motsvarar de tre olika belöningsfunktionerna) OMXS30 indexet i avkastning och sharp ratio, medan i den andra testperioden så överträffade ingen av metoderna OMXS30 indexet. Deep reinforcement learning Proximal policy optimization reward function portfolio management Deep reinforcement learning Proximal policy optimization belöningsfunktioner portföljförvaltning Other Mathematics Annan matematik
305	Temporal Abstractions in Multi-agent Learning Jiayu Chen (18396687) 13 June 2024 (has links) <p dir="ltr">Learning, planning, and representing knowledge at multiple levels of temporal abstractions provide an agent with the ability to predict consequences of different courses of actions, which is essential for improving the performance of sequential decision making. However, discovering effective temporal abstractions, which the agent can use as skills, and adopting the constructed temporal abstractions for efficient policy learning can be challenging. Despite significant advancements in single-agent settings, temporal abstractions in multi-agent systems remains underexplored. This thesis addresses this research gap by introducing novel algorithms for discovering and employing temporal abstractions in both cooperative and competitive multi-agent environments. We first develop an unsupervised spectral-analysis-based discovery algorithm, aiming at finding temporal abstractions that can enhance the joint exploration of agents in complex, unknown environments for goal-achieving tasks. Subsequently, we propose a variational method that is applicable for a broader range of collaborative multi-agent tasks. This method unifies dynamic grouping and automatic multi-agent temporal abstraction discovery, and can be seamlessly integrated into the commonly-used multi-agent reinforcement learning algorithms. Further, for competitive multi-agent zero-sum games, we develop an algorithm based on Counterfactual Regret Minimization, which enables agents to form and utilize strategic abstractions akin to routine moves in chess during strategy learning, supported by solid theoretical and empirical analyses. Collectively, these contributions not only advance the understanding of multi-agent temporal abstractions but also present practical algorithms for intricate multi-agent challenges, including control, planning, and decision-making in complex scenarios.</p> Autonomous agents and multiagent systems Intelligent robotics Planning and decision making Reinforcement Learning Counterfactual Regret Minimization Multi-agent Reinforcement Learning Hierarchical Learning Option Discovery Skill Discovery
306	Reinforcement Learning From Human Feedback For Ethically Robust Ai Decision-Making Plasencia, Marco M 01 January 2024 (has links) (PDF) The emergence of reinforcement learning from human feedback (RLHF) has made great strides toward giving AI decision-making the ability to learn from external human advice. In general, this machine learning technique is concerned with producing agents that learn to work toward optimizing and achieving some goal, advanced by interactions with the environment and feedback given in terms of a quantifiable reward. In the scope of this project, we seek to merge the intricate realms of AI robustness, ethical decision-making, and RLHF. With no way to truly quantify human values, human feedback is an essential bridge in the learning process, allowing AI models to reflect better ethical principles rather than just replicating human behavior. By exploring the transformative potential of RLHF in AI-human interactions, acknowledging the dynamic nature of human behavior beyond simplistic models, and emphasizing the necessity for ethically framed AI systems, this thesis constructs a deep reinforcement learning framework that is not only robust but also well aligned with human ethical standards. Through a methodology that incorporates simulated ethical dilemmas and evaluates AI decisions against established ethical frameworks, the focus is to contribute significantly to the understanding and application of RLHF in creating AI systems that embody robustness and ethical integrity. Ethical AI Decision-Making AI Ethics and Human Values Robust Deep Reinforcement Learning Artificial Intelligence Computational Engineering Computer and Systems Architecture
307	Разработка интеллектуального цифрового двойника для видеоигр на примере использования Unity ML-Agent : магистерская диссертация / Development of an Intelligent Digital Twin for Video Games using Unity ML-Agent Смирнов, А. В., Smirnov, A. V. January 2024 (has links) Выпускная квалификационная работа на тему «Разработка интеллектуального цифрового двойника для видеоигр на примере использования Unity ML-Agent» посвящена созданию цифрового двойника для видеоигр с использованием технологии Unity ML-Agents. Основная цель работы — сократить время и повысить качество процесса тестирования видеоигр путем использования моделей машинного обучения для создания интеллектуальных двойников.Работа состоит из трех основных разделов: 1. Анализ применения машинного обучения и искусственного интеллекта в видеоиграх В этом разделе рассматриваются основные методы и инструменты, используемые для создания ИИ в видеоиграх, а также анализируются существующие решения и их практическое применение. 2. Практическая реализация создания цифрового двойника для игрового симулятора. Проводится анализ инструмента Unity ML-Agents, описываются его ключевые компоненты и методы обучения агентов. Также описывается процесс разработки прототипа цифрового двойника, и предлагается методология оценки его эффективности. 3. Изучение особенностей внедрения разработанного цифрового двойника и оценка его эффективности для автоматизации процесса тестирования. В этом разделе описывается эксперимент по использованию цифрового двойника для автоматизации тестирования, оценивается его эффективность и рассматриваются условия и ограничения применения данного инструмента. Научная новизна работы заключается в предложении применения методов машинного обучения для автоматизации ручного труда в процессе создания и тестирования видеоигр. В результате исследования было разработано и протестировано решение, позволяющее значительно сократить трудозатраты на тестирование и повысить его качество. / The graduation qualification work on the topic "Development of an Intelligent Digital Twin for Video Games using Unity ML-Agent" is dedicated to creating a digital twin for video games using Unity ML-Agents technology. The main goal of the work is to reduce the time and improve the quality of the video game testing process by using machine learning models to create intelligent twins. The work consists of three main sections: 1. Analysis of the Application of Machine Learning and Artificial Intelligence in Video Games. This section examines the main methods and tools used to create AI in video games, as well as analyzes existing solutions and their practical applications. 2. Practical Implementation of Creating a Digital Twin for a Game Simulator. This section analyzes the Unity ML-Agents tool, describes its key components and methods for training agents. It also outlines the process of developing a digital twin prototype and proposes a methodology for evaluating its effectiveness. 3. Study of the Features of Implementing the Developed Digital Twin and Assessing its Effectiveness for Automating the Testing Process. This section describes an experiment using the digital twin for automated testing, evaluates its effectiveness, and considers the conditions and limitations of applying this tool. The scientific novelty of the work lies in proposing the use of machine learning methods to automate manual labor in the process of creating and testing video games. As a result of the research, a solution was developed and tested, which significantly reduces labor costs for testing and improves its quality. MASTER'S THESIS REINFORCEMENT LEARNING UNITY UNITY ML-AGENTS AUTOMATIZATION AGENT VIDEOGAME TESTING DIGITAL TWIN REINFORCEMENT LEARNING UNITY UNITY ML-AGENTS АВТОМАТИЗАЦИЯ АГЕНТ ВИДЕОИГРА ТЕСТИРОВАНИЕ ЦИФРОВОЙ ДВОЙНИК
308	Individual differences in personality associated with anterior cingulate cortex function: implication for understanding depression Umemoto, Akina 18 March 2016 (has links) We humans depend heavily on cognitive control to make decision and execute goal-directed behaviors, without which our behavior would be overpowered by automatic, stimulus-driven responses. In my dissertation, I focus on a brain region most implicated in this crucial process: the anterior cingulate cortex (ACC). The importance of this region is highlighted by lesion studies demonstrating diminished self-initiated behavior, or apathy, following ACC damage, the most severe form of which results in the near complete absence of speech production and willed actions in the presence of intact motor ability. Despite decades of research, however, its precise function is still highly debated, due particularly to ACC’s observed involvement in multiple aspects of cognition. In my dissertation I examine ACC function according to recent developments in reinforcement learning theory that posit a key role for ACC in motivating extended behavior. According to this theory, ACC is responsible for learning task values and motivating effortful control over extended behaviors based on those learned task values. The aim of my dissertation is two-fold: 1) to improve understanding of ACC function, and 2) to elucidate the contribution of ACC to depression, as revealed by individual differences in several personality traits related to motivation and reward sensitivity in a population of healthy college students. It was hypothesized that these different personality traits express, to greater or lesser degrees across individuals, ACC function, and that their abnormal expression (in particular, atypically low motivation and reward sensitivity) constitute hallmark characteristics of depression. First, this dissertation reveals that reward positivity (RewP), a key electrophysiological signature of reward processing that is believed to index the impact of reinforcement learning signals carried by the midbrain dopamine system on to ACC, is sensitive to individual differences in reward valuation, being larger for those high in reward sensitivity and smaller for those high in depression scores. Second, consistent with a previous suggestion that people high in depression or depression scores have difficulty using reward information to motivate behavior, I find these individuals to exhibit relatively poor prolonged task performance despite an apparently greater investment of cognitive control, and a reduced willingness to expend effort to obtain probable rewards, a behavior that was stable with time on task. In contrast, individuals characterized by high persistence, which is indicative of good ACC function, exhibited high self-reported task engagement and increasing effortful behaviors with time on task, particularly for trials in which reward receipt was unlikely, suggesting increased motivational control. In sum, this dissertation emphasizes the importance of understanding the basic function of ACC as assessed by individual differences in personality, which is then used to understand the impact of its dysfunction in relation to mental illnesses. / Graduate Anterior Cingulate Cortex function Individual differences Personality Motivation Reinforcement learning Depression
309	Information driven self-organization of agents and agent collectives Harder, Malte January 2014 (has links) From a visual standpoint it is often easy to point out whether a system is considered to be self-organizing or not, though a quantitative approach would be more helpful. Information theory, as introduced by Shannon, provides the right tools not only quantify self-organization, but also to investigate it in relation to the information processing performed by individual agents within a collective. This thesis sets out to introduce methods to quantify spatial self-organization in collective systems in the continuous domain as a means to investigate morphogenetic processes. In biology, morphogenesis denotes the development of shapes and form, for example embryos, organs or limbs. Here, I will introduce methods to quantitatively investigate shape formation in stochastic particle systems. In living organisms, self-organization, like the development of an embryo, is a guided process, predetermined by the genetic code, but executed in an autonomous decentralized fashion. Information is processed by the individual agents (e.g. cells) engaged in this process. Hence, information theory can be deployed to study such processes and connect self-organization and information processing. The existing concepts of observer based self-organization and relevant information will be used to devise a framework for the investigation of guided spatial self-organization. Furthermore, local information transfer plays an important role for processes of self-organization. In this context, the concept of synergy has been getting a lot attention lately. Synergy is a formalization of the idea that for some systems the whole is more than the sum of its parts and it is assumed that it plays an important role in self-organization, learning and decision making processes. In this thesis, a novel measure of synergy will be introduced, that addresses some of the theoretical problems that earlier approaches posed. 006.3
310	Feedback Related Negativity: Reward Prediction Error or Salience Prediction Error? Heydari, Sepideh 07 April 2015 (has links) The reward positivity is a component of the human event-related brain potential (ERP) elicited by feedback stimuli in trial-and-error learning and guessing tasks. A prominent theory holds that the reward positivity reflects a reward prediction error that is differentially sensitive to the valence of the outcomes, namely, larger for unexpected positive events relative to unexpected negative events (Holroyd & Coles, 2002). Although the theory has found substantial empirical support, most of these studies have utilized either monetary or performance feedback to test the hypothesis. However, in apparent contradiction to the theory, a recent study found that unexpected physical punishments (a shock to the finger) also elicit the reward positivity (Talmi, Atkinson, & El-Deredy, 2013). Accordingly, these investigators argued that this ERP component reflects a salience prediction error rather than a reward prediction error. To investigate this finding further, I adapted the task paradigm by Talmi and colleagues to a more standard guessing task often used to investigate the reward positivity. Participants navigated a virtual T-maze and received feedback on each trial under two conditions. In a reward condition the feedback indicated that they would either receive a monetary reward or not for their performance on that trial. In a punishment condition the feedback indicated that they would receive a small shock or not at the end of the trial. I found that the feedback stimuli elicited a typical reward positivity in the reward condition and an apparently delayed reward positivity in the punishment condition. Importantly, this signal was more positive to the stimuli that predicted the omission of a possible punishment relative to stimuli that predicted a forthcoming punishment, which is inconsistent with the salience hypothesis. / Graduate / 0633 / 0317 / heydari@uvic.ca Reinforcement Learning Reward Positivity Elecetroencephalography EEG Reward Prediction Error Salience Prediction Error Punishment Pain

Search results