Global ETD Search

21	Towards a Deep Reinforcement Learning based approach for real-time decision making and resource allocation for Prognostics and Health Management applications Ludeke, Ricardo Pedro João January 2020 (has links) Industrial operational environments are stochastic and can have complex system dynamics which introduce multiple levels of uncertainty. This uncertainty leads to sub-optimal decision making and resource allocation. Digitalisation and automation of production equipment and the maintenance environment enable predictive maintenance, meaning that equipment can be stopped for maintenance at the optimal time. Resource constraints in maintenance capacity could however result in further undesired downtime if maintenance cannot be performed when scheduled. In this dissertation the applicability of using a Multi-Agent Deep Reinforcement Learning based approach for decision making is investigated to determine the optimal maintenance scheduling policy in a fleet of assets where there are maintenance resource constraints. By considering the underlying system dynamics of maintenance capacity, as well as the health state of individual assets, a near-optimal decision making policy is found that increases equipment availability while also maximising maintenance capacity. The implemented solution is compared to a run-to-failure corrective maintenance strategy, a constant interval preventive maintenance strategy and a condition based predictive maintenance strategy. The proposed approach outperformed traditional maintenance strategies across several asset and operational maintenance performance metrics. It is concluded that Deep Reinforcement Learning based decision making for asset health management and resource allocation is more effective than human based decision making. / Dissertation (MEng (Mechanical Engineering))--University of Pretoria, 2020. / Mechanical and Aeronautical Engineering / MEng (Mechanical Engineering) / Unrestricted UCTD Maintenance Policy Optimisation Deep Reinforcement Learning Multi-agent Reinforcement Learning
22	Comparison of deep reinforcement learning algorithms in a self-play setting Kumar, Sunil 30 August 2021 (has links) In this exciting era of artificial intelligence and machine learning, the success of AlphaGo, AlphaZero, and MuZero has generated a great interest in deep reinforcement learning, especially under self-play settings. The methods used by AlphaZero are finding their ways to be more useful than before in many different application areas, such as clinical medicine, intelligent military command decision support systems, and recommendation systems. While specific methods of reinforcement learning with selfplay have found their place in application domains, there is much to be explored from existing reinforcement learning methods not originally intended for self-play settings. This thesis focuses on evaluating performance of existing reinforcement learning techniques in self-play settings. In this research, we trained and evaluated the performance of two deep reinforcement learning algorithms with self-play settings on game environments, such as the games Connect Four and Chess. We demonstrate how a simple on-policy, policy-based method, such as REINFORCE, shows signs of learning, whereas an off-policy value-based method such as Deep Q-Networks does not perform well with self-play settings in the selected environments. The results show that REINFORCE agent wins 85% of the games after training against a random baseline agent and 60% games against the greedy baseline agent in the game Connect Four. The agent’s strength from both techniques was measured and plotted against different baseline agents. We also investigate the impact of selected significant hyper-parameters in the performance of the agents. Finally, we provide our recommendation for these hyper-parameters’ values for training deep reinforcement learning agents in similar environments. / Graduate Deep Reinforcement Learning Self-play machine learning Deep learning reinforcement learning
23	Game AI of StarCraft II based on Deep Reinforcement Learning Junjie Luo (8786552) 30 April 2020 (has links) The research problem of this article is the Game AI agent of StarCraft II based on Deep Reinforcement Learning (DRL). StarCraft II is viewed as the most challenging Real-time Strategy (RTS) game for now, and it is also the most popular game where researchers are developing and improving AI agents. Building AI agents of StarCraft II can help researchers on machine learning figure out the weakness of DRL and improve this series of algorithms. In 2018, DeepMind and Blizzard developed the StarCraft II Learning Environment (PySC2) to enable researchers to promote the development of AI agents. DeepMind started to develop a new project called AlphaStar after AlphaGo based on DRL, while several laboratories also published articles about the AI agents of StarCraft II. Most of them are researching on the AI agents of Terran and Zerg, which are two of three races in StarCraft II. AI agents show high-level performance compared with most StarCraft II players. However, the performance is far from defeating E-sport players because Game AI for StarCraft II has large observation space and large action space. However, there is no publication on Protoss, which is the remaining and most complicated race to deal with (larger action space, larger observation space) for AI agents due to its characteristics. Thus, in this paper, the research question is whether the AI agent of Protoss, which is developed by the model based on DRL, for a full-length game on a particular map can defeat the high-level built-in cheating AI. The population of this research design is the StarCraft II AI agents that researchers built based on their DRL models, while the sample is the Protoss AI agent in this paper. The raw data is from the game matches between the Protoss AI agent and built-in AI agents. PySC2 can capture features and numerical variables in each match to obtain the training data. The expected outcome is the model based on DRL, which can train a Protoss AI agent to defeat high-level game AI agents with the win rate. The model includes the action space of Protoss, the observation space and the realization of DRL algorithms. Meanwhile, the model is built on PySC2 v2.0, which provides additional action functions. Due to the complexity and the unique characteristics of Protoss in StarCraft II, the model cannot be applied to other games or platforms. However, how the model trains a Protoss AI agent can show the limitation of DRL and push DRL algorithm a little forward. Game AI Deep Reinforcement Learning StarCraft II
24	Radio Resource Allocation and Beam Management under Location Uncertainty in 5G mmWave Networks Yao, Yujie 16 June 2022 (has links) Millimeter wave (mmWave) plays a critical role in the Fifth-generation (5G) new radio due to the rich bandwidth it provides. However, one shortcoming of mmWave is the substantial path loss caused by poor diffraction at high frequencies, and consequently highly directional beams are applied to mitigate this problem. A typical way of beam management is to cluster users based on their locations. However, localization uncertainty is unavoidable due to measurement accuracy, system performance fluctuation, and so on. Meanwhile, the traffic demand may change dynamically in wireless environments, which increases the complexity of network management. Therefore, a scheme that can handle both the uncertainty of localization and dynamic radio resource allocation is required. Moreover, since the localization uncertainty will influence the network performance, more state-of-the-art localization methods, such as vision-aided localization, are expected to reduce the localization error. In this thesis, we proposed two algorithms for joint radio resource allocation and beam management in 5G mmWave networks, namely UK-means-based Clustering and Deep Reinforcement Learning-based resource allocation (UK-DRL) and UK-medoids-based Clustering and Deep Reinforcement Learning-based resource allocation (UKM-DRL). Specifically, we deploy UK-means and UK-medoids clustering method in UK-DRL and UKM-DRL, respectively, which is designed to handle the clustering under location uncertainties. Meanwhile, we apply Deep Reinforcement Learning (DRL) for intra-beam radio resource allocations in UK-DRL and UKM-DRL. Moreover, to improve the localization accuracy, we develop a vision-aided localization scheme, where pixel characteristics-based features are extracted from satellite images as additional input features for location prediction. The simulations show that UK-DRL and UKM-DRL successfully improve the network performance in data rate and delay than baseline algorithms. When the traffic load is 4 Mbps, UK-DRL has a 172.4\% improvement in sum rate and 64.1\% improvement in latency than K-means-based Clustering and Deep Reinforcement Learning-based resource allocation (K-DRL). UKM-DRL has 17.2\% higher throughput and 7.7\% lower latency than UK-DRL, and 231\% higher throughput and 55.8\% lower latency than K-DRL. On the other hand, the vision-aided localization scheme can significantly reduce the localization error from 17.11 meters to 3.6 meters. Beam Management Location Uncertainty UK-means UK-medoids Vision-aided Deep Reinforcement Learning Radio Resource allocation
25	Navigation of Mobile Robots in Human Environments with Deep Reinforcement Learning / Navigering av mobila robotar i mänskliga miljöer med deep reinforcement learning Coors, Benjamin January 2016 (has links) For mobile robots which operate in human environments it is not sufficient to simply travel to their target destination as quickly as possible. Instead, mobile robots in human environments need to travel to their destination safely, keeping a comfortable distance to humans and not colliding with any obstacles along the way. As the number of possible human-robot interactions is very large, defining a rule-based navigation approach is difficult in such highly dynamic environments. Current approaches solve this task by predicting the trajectories of humans in the scene and then planning a collision-free path. However, this requires separate components for detecting and predicting human motion and does not scale well to densely populated environments. Therefore, this work investigates the use of deep reinforcement learning for the navigation of mobile robots in human environments. This approach is based on recent research on utilizing deep neural networks in reinforcement learning to successfully play Atari 2600 video games on human level. A deep convolutional neural network is trained end-to-end from one-dimensional laser scan data to command velocities. Discrete and continuous action space implementations are evaluated in a simulation and are shown to outperform a Social Force Model baseline approach on the navigation problem for mobile robots in human environments. Computer Sciences Datavetenskap (datalogi)
26	Exploiting Multi-Modal Fusion for Urban Autonomous Driving Using Latent Deep Reinforcement Learning Khalil, Yasser 29 April 2022 (has links) Human driving decisions are the leading cause of road fatalities. Autonomous driving naturally eliminates such incompetent decisions and thus can improve traffic safety and efficiency. Deep reinforcement learning (DRL) has shown great potential in learning complex tasks. Recently, researchers investigated various DRL-based approaches for autonomous driving. However, exploiting multi-modal fusion to generate pixel-wise perception and motion prediction and then leveraging these predictions to train a latent DRL has not been targeted yet. Unlike other DRL algorithms, the latent DRL algorithm distinguishes representation learning from task learning, enhancing sampling efficiency for reinforcement learning. In addition, supplying the latent DRL algorithm with accurate perception and motion prediction simplifies the surrounding urban scenes, improving training and thus learning a better driving policy. To that end, this Ph.D. research initially develops LiCaNext, a novel real-time multi-modal fusion network to produce accurate joint perception and motion prediction at a pixel level. Our proposed approach relies merely on a LIDAR sensor, where its multi-modal input is composed of bird's-eye view (BEV), range view (RV), and range residual images. Further, this Ph.D. thesis proposes leveraging these predictions with another simple BEV image to train a sequential latent maximum entropy reinforcement learning (MaxEnt RL) algorithm. A sequential latent model is deployed to learn a more compact latent representation from high-dimensional inputs. Subsequently, the MaxEnt RL model trains on this latent space to learn a driving policy. The proposed LiCaNext is trained on the public nuScenes dataset. Results demonstrated that LiCaNext operates in real-time and performs better than the state-of-the-art in perception and motion prediction, especially for small and distant objects. Furthermore, simulation experiments are conducted on CARLA to evaluate the performance of our proposed approach that exploits LiCaNext predictions to train sequential latent MaxEnt RL algorithm. The simulated experiments manifest that our proposed approach learns a better driving policy outperforming other prevalent DRL-based algorithms. The learned driving policy achieves the objectives of safety, efficiency, and comfort. Experiments also reveal that the learned policy maintains its effectiveness under different environments and varying weather conditions. Autonomous Driving Deep Reinforcement Learning Multi-Modal Fusion Perception and Motion Prediction
27	Neural Approaches for Syntactic and Semantic Analysis / 構文・意味解析に対するニューラルネットワークを利用した手法 Kurita, Shuhei 25 March 2019 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21911号 / 情博第694号 / 新制\|\|情\|\|119(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授黒橋禎夫, 教授鹿島久嗣, 准教授河原大輔 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Natural Language Processing Syntactic Analysis Semantic Analysis Dependency Parsing Neural Network Deep Reinforcement Learning 007
28	Training an Artificial Bat: Modeling Sonar-based Obstacle Avoidance using Deep-reinforcement Learning Mohan, Adithya Venkatesh January 2020 (has links) No description available. Artificial Intelligence Artificial-life Bat sonar-based behavior cloning Bio-inspired AI deep-reinforcement learning
29	Connected and Automated Traffic Control at Signalized Intersections under Mixed-autonomy Environments Guo, Yi January 2020 (has links) No description available. Civil Engineering Connected and Automated Vehicle Trajectory Control Traffic Signal Control Dynamic Programming Deep Reinforcement Learning
30	Performance Enhancement of Aerial Base Stations via Reinforcement Learning-Based 3D Placement Techniques Parvaresh, Nahid 21 December 2022 (has links) Deploying unmanned aerial vehicles (UAVs) as aerial base stations (BSs) in order to assist terrestrial connectivity, has drawn significant attention in recent years. UAV-BSs can take over quickly as service providers during natural disasters and many other emergency situations when ground BSs fail in an unanticipated manner. UAV-BSs can also provide cost-effective Internet connection to users who are out of infrastructure. UAV-BSs benefit from their mobility nature that enables them to change their 3D locations if the demand of ground users changes. In order to effectively make use of the mobility of UAV-BSs in a dynamic network and maximize the performance, the 3D location of UAV-BSs should be continuously optimized. However, solving the location optimization problem of UAV-BSs is NP-hard with no optimal solution in polynomial time for which near optimal solutions have to be exploited. Besides the conventional solutions, i.e. heuristic solutions, machine learning (ML), specifically reinforcement learning (RL), has emerged as a promising solution for tackling the positioning problem of UAV-BSs. The common practice for optimizing the 3D location of UAV-BSs using RL algorithms is assuming fixed location of ground users (i.e., UEs) in addition to defining discrete and limited action space for the agent of RL. In this thesis, we focus on improving the location optimization of UAV-BSs in two ways: 1-Taking into account the mobility of users in the design of RL algorithm, 2-Extending the action space of RL to a continuous action space so that the UAV-BS agent can flexibly change its location by any distance (limited by the maximum speed of UAV-BS). Three types of RL algorithms, i.e. Q-learning (QL), deep Q-learning (DQL) and actor-critic deep Q-learning (ACDQL) have been employed in this thesis to step-by-step improve the performance results of a UAV-assisted cellular network. QL is the first type of RL algorithm we use for the autonomous movement of the UAV-BS in the presence of mobile users. As a solution to the limitations of QL, we next propose a DQL-based strategy for the location optimization of the UAV-BS which largely improves the performance results of the network compared to the QL-based model. Third, we propose an ACDQL-based solution for autonomously moving the UAV-BS in a continuous action space wherein the performance results significantly outperforms both QL and DQL strategies. Aerial Base Stations Reinforcement Learning Deep Reinforcement Learning Wireless Networks Unmanned Aerial Vehicles

Search results