Global ETD Search

11	Machine Learning-Based Instruction Scheduling for a DSP Architecture Compiler : Instruction Scheduling using Deep Reinforcement Learning and Graph Convolutional Networks / Maskininlärningsbaserad schemaläggning av instruktioner för en DSP-arkitekturkompilator : Schemaläggning av instruktioner med Deep Reinforcement Learning och grafkonvolutionella nätverk Alava Peña, Lucas January 2023 (has links) Instruction Scheduling is a back-end compiler optimisation technique that can provide significant performance gains. It refers to ordering instructions in a particular order to reduce latency for processors with instruction-level parallelism. At the present typical compilers use heuristics to perform instruction scheduling and solve other related non-polynomial complete problems. This thesis aims to present a machine learning-based approach to challenge heuristic methods concerning performance. In this thesis, a novel reinforcement learning (RL) based model for the instruction scheduling problem is developed including modelling features of processors such as forwarding, resource utilisation and treatment of the action space. An efficient optimal scheduler is presented to be used for an optimal schedule length based reward function, however, this is not used in the final results as a heuristic based reward function was deemed to be sufficient and faster to compute. Furthermore, an RL agent that interacts with the model of the problem is presented using three different types of graph neural networks for the state processing: graph conventional networks, graph attention networks, and graph attention based on the work of Lee et al. A simple two-layer neural network is also used for generating embeddings for the resource utilisation stages. The proposed solution is validated against the modelled environment and favourable but not significant improvements were found compared to the most common heuristic method. Furthermore, it was found that having embeddings relating to resource utilisation was very important for the explained variance of the RL models. Additionally, a trained model was tested in an actual compiler, however, no informative results were found likely due to register allocation or other compiler stages that occur after instruction scheduling. Future work should include improving the scalability of the proposed solution. / Instruktionsschemaläggning är en optimeringsteknik för kompilatorer som kan ge betydande prestandavinster. Det handlar om att ordna instruktioner i en viss ordning för att minska latenstiden för processorer med parallellitet på instruktionsnivå. För närvarande använder vanliga kompilatorer heuristiker för att utföra schemaläggning av instruktioner och lösa andra relaterade ickepolynomiala kompletta problem. Denna avhandling syftar till att presentera en maskininlärningsbaserad metod för att utmana heuristiska metoder när det gäller prestanda. I denna avhandling utvecklas en ny förstärkningsinlärningsbaserad (RL) modell för schemaläggning av instruktioner, inklusive modellering av processorns egenskaper såsom vidarebefordran, resursutnyttjande och behandling av handlingsutrymmet. En effektiv optimal schemaläggare presenteras för att eventuellt användas för belöningsfunktionen, men denna används inte i de slutliga resultaten. Dessutom presenteras en RL-agent som interagerar med problemmodellen och använder tre olika typer av grafneurala nätverk för tillståndsprocessering: grafkonventionella nätverk, grafuppmärksamhetsnätverk och grafuppmärksamhet baserat på arbetet av Lee et al. Ett enkelt neuralt nätverk med två lager används också för att generera inbäddningar för resursanvändningsstegen. Den föreslagna lösningen valideras mot den modellerade miljön och gynnsamma men inte signifikanta förbättringar hittades jämfört med den vanligaste heuristiska metoden. Dessutom visade det sig att det var mycket viktigt för den förklarade variansen i RL-modellerna att ha inbäddningar relaterade till resursutnyttjande. Dessutom testades en tränad modell i en verklig kompilator, men inga informativa resultat hittades, sannolikt på grund av registerallokering eller andra kompilatorsteg som inträffar efter schemaläggning av instruktioner. Framtida arbete bör inkludera att förbättra skalbarheten hos den föreslagna lösningen. Instruction Scheduling Deep reinforcement Learning Compilers Graph Convolutional Networks Schemaläggning av instruktioner Deep Reinforcement Learning kompilatorer grafkonvolutionella nätverk Computer and Information Sciences Data- och informationsvetenskap
12	Deep reinforcement learning approach to portfolio management / Deep reinforcement learning metod för portföljförvaltning Jama, Fuaad January 2023 (has links) This thesis evaluates the use of a Deep Reinforcement Learning (DRL) approach to portfolio management on the Swedish stock market. The idea is to construct a portfolio that is adjusted daily using the DRL algorithm Proximal policy optimization (PPO) with a multi perceptron neural network. The input to the neural network was historical data in the form of open, high, and low price data. The portfolio is evaluated by its performance against the OMX Stockholm 30 index (OMXS30). Furthermore, three different approaches for optimization are going to be studied, in that three different reward functions are going to be used. These functions are Sharp ratio, cumulative reward (Daily return) and Value at risk reward (which is a daily return with a value at risk penalty). The historival data that is going to be used is from the period 2010-01-01 to 2015-12-31 and the DRL approach is then tested on two different time periods which represents different marked conditions, 2016-01-01 to 2018-12-31 and 2019-01-01 to 2021-12-31. The results show that in the first test period all three methods (corresponding to the three different reward functions) outperform the OMXS30 benchmark in returns and sharp ratio, while in the second test period none of the methods outperform the OMXS30 index. / Målet med det här arbetet var att utvärdera användningen av "Deep reinforcement learning" (DRL) metod för portföljförvaltning på den svenska aktiemarknaden. Idén är att konstruera en portfölj som justeras dagligen med hjälp av DRL algoritmen "Proximal policy optimization" (PPO) med ett neuralt nätverk med flera perceptroner. Inmatningen till det neurala nätverket var historiska data i form av öppnings, lägsta och högsta priser. Portföljen utvärderades utifrån dess prestation mot OMX Stockholm 30 index (OMXS30). Dessutom studerades tre olika tillvägagångssätt för optimering, genom att använda tre olika belöningsfunktioner. Dessa funktioner var Sharp ratio, kumulativ belöning (Daglig avkastning) och Value at risk-belöning (som är en daglig avkastning minus Value at risk-belöning). Den historiska data som användes var från perioden 2010-01-01 till 2015-12-31 och DRL-metoden testades sedan på två olika tidsperioder som representerar olika marknadsförhållanden, 2016-01-01 till 2018-12-31 och 2019-01-01 till 2021-12-31. Resultatet visar att i den första testperioden så överträffade alla tre metoder (vilket motsvarar de tre olika belöningsfunktionerna) OMXS30 indexet i avkastning och sharp ratio, medan i den andra testperioden så överträffade ingen av metoderna OMXS30 indexet. Deep reinforcement learning Proximal policy optimization reward function portfolio management Deep reinforcement learning Proximal policy optimization belöningsfunktioner portföljförvaltning Other Mathematics Annan matematik
13	Using Deep Reinforcement Learning For Adaptive Traffic Control in Four-Way Intersections Jörneskog, Gustav, Kandelan, Josef January 2019 (has links) The consequences of traffic congestion include increased travel time, fuel consumption, and the number of crashes. Studies suggest that most traffic delays are due to nonrecurring traffic congestion. Adaptive traffic control using real-time data is effective in dealing with nonrecurring traffic congestion. Many adaptive traffic control algorithms used today are deterministic and prone to human error and limitation. Reinforcement learning allows the development of an optimal traffic control policy in an unsupervised manner. We have implemented a reinforcement learning algorithm that only requires information about the number of vehicles and the mean speed of each incoming road to streamline traffic in a four-way intersection. The reinforcement learning algorithm is evaluated against a deterministic algorithm and a fixed-time control schedule. Furthermore, it was tested whether reinforcement learning can be trained to prioritize emergency vehicles while maintaining good traffic flow. The reinforcement learning algorithm obtains a lower average time in the system than the deterministic algorithm in eight out of nine experiments. Moreover, the reinforcement learning algorithm achieves a lower average time in the system than the fixed-time schedule in all experiments. At best, the reinforcement learning algorithm performs 13% better than the deterministic algorithm and 39% better than the fixed-time schedule. Moreover, the reinforcement learning algorithm could prioritize emergency vehicles while maintaining good traffic flow. Deep Reinforcement Learning Traffic Control System Green Wave SUMO Transport Systems and Logistics Transportteknik och logistik
14	Deep Reinforcement Learning for Intelligent Road Maintenance in Small Island Developing States Vulnerable to Climate Change : Using Artificial Intelligence to Adapt Communities to Climate Change Elvira, Boman January 2018 (has links) The consequences of climate change are already noticeable in small island developing states. Road networks are crucial for a functioning society, and are particularly vulnerable to extreme weather, floods, landslides and other effects of climate change. Road systems in small island developing states are therefore in special need of climate adaptation efforts. Climate adaptation of road systems also has to be cost-efficient since these small island states have limited economical resources. Recent advances in deep reinforcement learning, a subfield of artificial intelligence, has proven that intelligent agents can achieve superhuman level at a number of tasks, setting hopes high for possible future applications of the algorithms. To investigate wether deep reinforcement learning is suitable for climate adaptation of road maintenance systems a simulator has been set up, together with three deep reinforcement learning agents, and two non-intelligent agents for performance comparisons. The results of the project indicate that deep reinforcement learning is suitable for use in intelligent road maintenance systems for climate adaptation in small island developing states. Deep reinforcement learning Climate change adaptation Machine learning Computer Engineering Datorteknik
15	Extension on Adaptive MAC Protocol for Space Communications Li, Max Hongming 06 December 2018 (has links) This work devises a novel approach for mitigating the effects of Catastrophic Forgetting in Deep Reinforcement Learning-based cognitive radio engine implementations employed in space communication applications. Previous implementations of cognitive radio space communication systems utilized a moving window- based online learning method, which discards part of its understanding of the environment each time the window is moved. This act of discarding is called Catastrophic Forgetting. This work investigated ways to control the forgetting process in a more systematic manner, both through a recursive training technique that implements forgetting in a more controlled manner and an ensemble learning technique where each member of the ensemble represents the engine's understanding over a certain period of time. Both of these techniques were integrated into a cognitive radio engine proof-of-concept, and were delivered to the SDR platform on the International Space Station. The results were then compared to the results from the original proof-of-concept. Through comparison, the ensemble learning technique showed promise when comparing performance between training techniques during different communication channel contexts. Catastrophic Forgetting Deep Reinforcement Learning DVB-S2 Machine Learning NASA Space Communications
16	Zero-Knowledge Agent Trained for the Game of Risk Bethdavid, Simon January 2020 (has links) Recent developments in deep reinforcement learning applied to abstract strategy games such as Go, chess and Hex have sparked an interest within military planning. This Master thesis explores if it is possible to implement an algorithm similar to Expert Iteration and AlphaZero to wargames. The studied wargame is Risk, which is a turn-based multiplayer game played on a simplified political map of the world. The algorithms consist of an expert, in the form of a Monte Carlo tree search algorithm, and an apprentice, implemented through a neural network. The neural network is trained by imitation learning, trained to mimic expert decisions generated from self-play reinforcement learning. The apprentice is then used as heuristics in forthcoming tree searches. The results demonstrated that a Monte Carlo tree search algorithm could, to some degree, be employed on a strategy game as Risk, dominating a random playing agent. The neural network, fed with a state representation in the form of a vector, had difficulty in learning expert decisions and could not beat a random playing agent. This led to a halt in the expert/apprentice learning process. However, possible solutions are provided as future work. Deep Reinforcement Learning Zero-Knowledge Agent AlphaZero Expert Iteration Risk Engineering and Technology Teknik och teknologier
17	A Study on Resolution and Retrieval of Implicit Entity References in Microblogs / マイクロブログにおける暗黙的な実体参照の解決および検索に関する研究 Lu, Jun-Li 23 March 2020 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22580号 / 情博第717号 / 新制\|\|情\|\|123(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授吉川正俊, 教授黒橋禎夫, 教授田島敬史, 教授田中克己(京都大学名誉教授) / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Implicit Entity References Entity Identification Microblog Search Deep Transfer Learning Deep Reinforcement Learning 007
18	Deep Reinforcement Learning For Distributed Fog Network Probing Guan, Xiaoding 01 September 2020 (has links) The sixth-generation (6G) of wireless communication systems will significantly rely on fog/edge network architectures for service provisioning. To satisfy stringent quality of service requirements using dynamically available resources at the edge, new network access schemes are needed. In this paper, we consider a cognitive dynamic edge/fog network where primary users (PUs) may temporarily share their resources and act as fog nodes for secondary users (SUs). We develop strategies for distributed dynamic fog probing so SUs can find out available connections to access the fog nodes. To handle the large-state space of the connectivity availability that includes availability of channels, computing resources, and fog nodes, and the partial observability of the states, we design a novel distributed Deep Q-learning Fog Probing (DQFP) algorithm. Our goal is to develop multi-user strategies for accessing fog nodes in a distributed manner without any centralized scheduling or message passing. By using cooperative and competitive utility functions, we analyze the impact of the multi-user dynamics on the connectivity availability and establish design principles for our DQFP algorithm. cognitive wireless network fog probing dynamic network access deep reinforcement learning multi-agent learning Systems and Communications
19	Towards a Deep Reinforcement Learning based approach for real-time decision making and resource allocation for Prognostics and Health Management applications Ludeke, Ricardo Pedro João January 2020 (has links) Industrial operational environments are stochastic and can have complex system dynamics which introduce multiple levels of uncertainty. This uncertainty leads to sub-optimal decision making and resource allocation. Digitalisation and automation of production equipment and the maintenance environment enable predictive maintenance, meaning that equipment can be stopped for maintenance at the optimal time. Resource constraints in maintenance capacity could however result in further undesired downtime if maintenance cannot be performed when scheduled. In this dissertation the applicability of using a Multi-Agent Deep Reinforcement Learning based approach for decision making is investigated to determine the optimal maintenance scheduling policy in a fleet of assets where there are maintenance resource constraints. By considering the underlying system dynamics of maintenance capacity, as well as the health state of individual assets, a near-optimal decision making policy is found that increases equipment availability while also maximising maintenance capacity. The implemented solution is compared to a run-to-failure corrective maintenance strategy, a constant interval preventive maintenance strategy and a condition based predictive maintenance strategy. The proposed approach outperformed traditional maintenance strategies across several asset and operational maintenance performance metrics. It is concluded that Deep Reinforcement Learning based decision making for asset health management and resource allocation is more effective than human based decision making. / Dissertation (MEng (Mechanical Engineering))--University of Pretoria, 2020. / Mechanical and Aeronautical Engineering / MEng (Mechanical Engineering) / Unrestricted UCTD Maintenance Policy Optimisation Deep Reinforcement Learning Multi-agent Reinforcement Learning
20	Comparison of deep reinforcement learning algorithms in a self-play setting Kumar, Sunil 30 August 2021 (has links) In this exciting era of artificial intelligence and machine learning, the success of AlphaGo, AlphaZero, and MuZero has generated a great interest in deep reinforcement learning, especially under self-play settings. The methods used by AlphaZero are finding their ways to be more useful than before in many different application areas, such as clinical medicine, intelligent military command decision support systems, and recommendation systems. While specific methods of reinforcement learning with selfplay have found their place in application domains, there is much to be explored from existing reinforcement learning methods not originally intended for self-play settings. This thesis focuses on evaluating performance of existing reinforcement learning techniques in self-play settings. In this research, we trained and evaluated the performance of two deep reinforcement learning algorithms with self-play settings on game environments, such as the games Connect Four and Chess. We demonstrate how a simple on-policy, policy-based method, such as REINFORCE, shows signs of learning, whereas an off-policy value-based method such as Deep Q-Networks does not perform well with self-play settings in the selected environments. The results show that REINFORCE agent wins 85% of the games after training against a random baseline agent and 60% games against the greedy baseline agent in the game Connect Four. The agent’s strength from both techniques was measured and plotted against different baseline agents. We also investigate the impact of selected significant hyper-parameters in the performance of the agents. Finally, we provide our recommendation for these hyper-parameters’ values for training deep reinforcement learning agents in similar environments. / Graduate Deep Reinforcement Learning Self-play machine learning Deep learning reinforcement learning

Search results