Global ETD Search

191	Model-based active learning in hierarchical policies Cora, Vlad M. 05 1900 (has links) Hierarchical task decompositions play an essential role in the design of complex simulation and decision systems, such as the ones that arise in video games. Game designers find it very natural to adopt a divide-and-conquer philosophy of specifying hierarchical policies, where decision modules can be constructed somewhat independently. The process of choosing the parameters of these modules manually is typically lengthy and tedious. The hierarchical reinforcement learning (HRL) field has produced elegant ways of decomposing policies and value functions using semi-Markov decision processes. However, there is still a lack of demonstrations in larger nonlinear systems with discrete and continuous variables. To narrow this gap between industrial practices and academic ideas, we address the problem of designing efficient algorithms to facilitate the deployment of HRL ideas in more realistic settings. In particular, we propose Bayesian active learning methods to learn the relevant aspects of either policies or value functions by focusing on the most relevant parts of the parameter and state spaces respectively. To demonstrate the scalability of our solution, we have applied it to The Open Racing Car Simulator (TORCS), a 3D game engine that implements complex vehicle dynamics. The environment is a large topological map roughly based on downtown Vancouver, British Columbia. Higher level abstract tasks are also learned in this process using a model-based extension of the MAXQ algorithm. Our solution demonstrates how HRL can be scaled to large applications with complex, discrete and continuous non-linear dynamics. / Science, Faculty of / Computer Science, Department of / Graduate Hierarchical Reinforcement Learning Decision Theory Bayesian Active Learning Robotics
192	A service-oriented approach to topology formation and resource discovery in wireless ad-hoc networks Gonzalez Valenzuela, Sergio 05 1900 (has links) The past few years have witnessed a significant evolution in mobile computing and communications, in which new trends and applications have the traditional role of computer networks into that of distributed service providers. In this thesis we explore an alternative way to form wireless ad-hoc networks whose topologies can be customized as required by the users’ software applications. In particular, we investigate the applicability of mobile codes to networks created by devices equipped with Bluetooth technology. Computer simulations results suggest that our proposed approach can achieve this task effectively, while matching the level of efficiency seen in other salient proposals in this area. This thesis also addresses the issue of service discovery in mobile ad-hoc networks. We propose the use of a directory whose network location varies in an attempt to reduce traffic overhead driven by users’ hosts looking for service information. We refer to this scheme as the Service Directory Placement Algorithm, or SDPA. We formulate the directory relocation problem as a Markov Decision Process that is solved by using Q-learning. Performance evaluations through computer simulations reveal bandwidth overhead reductions that range between 40% and 48% when compared with a basic broadcast flooding approach for networks comprising hosts moving at pedestrian speeds. We then extend our proposed approach and introduce a multi-directory service discovery system called the Service Directory Placement Protocol, or SDPP. Our findings reveal bandwidth overhead reductions typically ranging from 15% to 75% in networks comprising slow-moving hosts with restricted memory availability. In the fourth and final part of this work, we present the design foundations and architecture of a middleware system that called WISEMAN – WIreless Sensors Employing Mobile Agents. We employ WISEMAN for dispatching and processing mobile programs in Wireless Sensor Networks (WSNs). Our proposed system enables the dynamic creation of semantic relationships between network nodes that cooperate to provide an aggregate service. We present discussions on the advantages of our proposed approach, and in particular, how WISEMAN facilitates the realization of service-oriented tasks in WSNs. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate Service discovery Topology formation Reinforcement learning Mobile computing
193	Recommender System using Reinforcement Learning January 2020 (has links) abstract: Currently, recommender systems are used extensively to find the right audience with the "right" content over various platforms. Recommendations generated by these systems aim to offer relevant items to users. Different approaches have been suggested to solve this problem mainly by using the rating history of the user or by identifying the preferences of similar users. Most of the existing recommendation systems are formulated in an identical fashion, where a model is trained to capture the underlying preferences of users over different kinds of items. Once it is deployed, the model suggests personalized recommendations precisely, and it is assumed that the preferences of users are perfectly reflected by the historical data. However, such user data might be limited in practice, and the characteristics of users may constantly evolve during their intensive interaction between recommendation systems. Moreover, most of these recommender systems suffer from the cold-start problems where insufficient data for new users or products results in reduced overall recommendation output. In the current study, we have built a recommender system to recommend movies to users. Biclustering algorithm is used to cluster the users and movies simultaneously at the beginning to generate explainable recommendations, and these biclusters are used to form a gridworld where Q-Learning is used to learn the policy to traverse through the grid. The reward function uses the Jaccard Index, which is a measure of common users between two biclusters. Demographic details of new users are used to generate recommendations that solve the cold-start problem too. Lastly, the implemented algorithm is examined with a real-world dataset against the widely used recommendation algorithm and the performance for the cold-start cases. / Dissertation/Thesis / Masters Thesis Computer Science 2020 Artificial intelligence Computer science Biclustering Qlearning Recommender System Reinforcement Learning
194	Approaches for Efficient Autonomous Exploration using Deep Reinforcement Learning Thomas Molnar (8735079) 24 April 2020 (has links) <p>For autonomous exploration of complex and unknown environments, existing Deep Reinforcement Learning (Deep RL) approaches struggle to generalize from computer simulations to real world instances. Deep RL methods typically exhibit low sample efficiency, requiring a large amount of data to develop an optimal policy function for governing an agent's behavior. RL agents expect well-shaped and frequent rewards to receive feedback for updating policies. Yet in real world instances, rewards and feedback tend to be infrequent and sparse.</p><p> </p><p>For sparse reward environments, an intrinsic reward generator can be utilized to facilitate progression towards an optimal policy function. The proposed Augmented Curiosity Modules (ACMs) extend the Intrinsic Curiosity Module (ICM) by Pathak et al. These modules utilize depth image and optical flow predictions with intrinsic rewards to improve sample efficiency. Additionally, the proposed Capsules Exploration Module (Caps-EM) pairs a Capsule Network, rather than a Convolutional Neural Network, architecture with an A2C algorithm. This provides a more compact architecture without need for intrinsic rewards, which the ICM and ACMs require. Tested using ViZDoom for experimentation in visually rich and sparse feature scenarios, both the Depth-Augmented Curiosity Module (D-ACM) and Caps-EM improve autonomous exploration performance and sample efficiency over the ICM. The Caps-EM is superior, using 44% and 83% fewer trainable network parameters than the ICM and D-ACM, respectively. On average across all “My Way Home” scenarios, the Caps-EM converges to a policy function with 1141% and 437% time improvements over the ICM and D-ACM, respectively.</p> Computer Engineering deep reinforcement learning Capsule Network exploration performance Autonomous
195	Using Concurrent Schedules of Reinforcement to Decrease Behavior Palmer, Ashlyn 12 1900 (has links) We manipulated delay and magnitude of reinforcers in two concurrent schedules of reinforcement to decrease a prevalent behavior while increasing another behavior already in the participant's repertoire. The first experiment manipulated delay, implementing a five second delay between the behavior and delivery of reinforcement for a behavior targeted for decrease while no delay was implemented after the behavior targeted for increase. The second experiment manipulated magnitude, providing one piece of food for the behavior targeted for decrease while two pieces of food were provided for the behavior targeted for increase. The experiments used an ABAB reversal design. Results suggest that behavior can be decreased without the use of extinction when contingencies favor the desirable behavior. differential reinforcement concurrent schedules Psychology, Behavioral Reinforcement learning. Behavior modification.
196	Cooperative Vehicular Communications for High Throughput Applications / 大容量車載アプリケーションに向けた車車間協調通信 Taya, Akihiro 24 September 2019 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22099号 / 情博第709号 / 新制\|\|情\|\|122(附属図書館) / 京都大学大学院情報学研究科通信情報システム専攻 / (主査)教授守倉正博, 教授原田博司, 教授梅野健 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Vehicular communications Cooperative MIMO mmWave communications Reinforcement learning 007
197	Transfer of reinforcement learning for a robotic skill Gómez Rosal, Dulce Adriana January 2018 (has links) In this work, we develop the transfer learning (TL) of reinforcement learning (RL) for the robotic skill of throwing a ball into a basket, from a computer simulated environment to a real-world implementation. Whereas learning of the same skill has been previously explored by using a Programming by Demonstration approach directly on the real-world robot, for our work, the model-based RL algorithm PILCO was employed as an alternative as it provides the robot with no previous knowledge or hints, i.e. the robot begins learning from a tabula rasa state, PILCO learns directly on the simulated environment, and as part of its procedure, PILCO models the dynamics of the inflatable, plastic ball used to perform the task. The robotic skill is represented as a Markov Decision Process, the robotic arm is a Kuka LWR4+, RL is enabled by PILCO, and TL is achieved through policy adjustments. Two learned policies were transferred, and although the results show that no exhaustive policy adjustments are required, large gaps remain between the simulated and the real environment in terms of the ball and robot dynamics. The contributions of this thesis include: a novel TL of RL framework for teaching the basketball skill to the Kuka robotic arm; the development of a pythonised version of PILCO; robust and extendable ROS packages for policy learning and adjustment in a simulated or real robot; a tracking-vision package with a Kinect camera; and an Orocos package for a position controller in the robotic arm. Transfer learning Reinforcement learning Simulation Robotics Robotics Robotteknik och automation
198	Towards Superintelligence-Driven Autonomous Network Operation Centers Using Reinforcement Learning Altamimi, Basel 25 October 2021 (has links) Today's Network Operation Centers (NOC) consist of teams of network professionals responsible for monitoring and taking actions for their network's health. Most of these NOC actions are relatively complex and executed manually; only the simplest tasks can be automated with rules-based software. But today's networks are getting larger and more complex. Therefore, deciding what action to take in the face of non-trivial problems has essentially become an art that depends on collective human intelligence of NOC technicians, specialized support teams organized by technology domains, and vendors' technical support. This model is getting increasingly expensive and inefficient, and the automation of all or at least some NOC tasks is now considered a desirable step towards autonomous and self-healing networks. In this work, we investigate whether such decisions can be taken by Artificial Intelligence instead of collective human intelligence, specifically by Deep-Reinforcement Learning (DRL), which has been shown in computer games to outperform humans. We build an Action Recommendation Engine (ARE) based on RL, train it with expert rules or by letting it explore outcomes by itself, and show that it can learn new and more efficient strategies that outperform expert rules designed by humans by as much as 25%. Reinforcement Learning Autonomous networks Network operation center automation Computer networks
199	Mobile Robot Obstacle Avoidance based on Deep Reinforcement Learning Feng, Shumin January 2018 (has links) Obstacle avoidance is one of the core problems in the field of autonomous navigation. An obstacle avoidance approach is developed for the navigation task of a reconfigurable multi-robot system named STORM, which stands for Self-configurable and Transformable Omni-Directional Robotic Modules. Various mathematical models have been developed in previous work in this field to avoid collision for such robots. In this work, the proposed collision avoidance algorithm is trained via Deep Reinforcement Learning, which enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. The trained neural network architecture is capable of choosing an action directly based on the input sensor data using the trained neural network architecture. A virtual STORM locomotion module was trained to explore a Gazebo simulation environment without collision, using the proposed collision avoidance strategies based on DRL. The mathematical model of the avoidance algorithm was derived from the simulation and then applied to the prototype of the locomotion module and validated via experiments. Universal software architecture was also designed for the STORM modules. The software architecture has extensible and reusable features that improve the design efficiency and enable parallel development. / Master of Science / In this thesis, an obstacle avoidance approach is described to enable autonomous navigation of a reconfigurable multi-robot system, STORM. The Self-configurable and Transformable Omni-Directional Robotic Modules (STORM) is a novel approach towards heterogeneous swarm robotics. The system has two types of robotic modules, namely the locomotion module and the manipulation module. Each module is able to navigate and perform tasks independently. In addition, the systems are designed to autonomously dock together to perform tasks that the modules individually are unable to accomplish. The proposed obstacle avoidance approach is designed for the modules of STORM, but can be applied to mobile robots in general. In contrast to the existing collision avoidance approaches, the proposed algorithm was trained via deep reinforcement learning (DRL). This enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. In order to avoid damage to the real robot during the learning phase, a virtual robot was trained inside a Gazebo simulation environment with obstacles. The mathematical model for the collision avoidance strategy obtained through DRL was then validated on a locomotion module prototype of STORM. This thesis also introduces the overall STORM architecture and provides a brief overview of the generalized software architecture designed for the STORM modules. The software architecture has expandable and reusable features that apply well to the swarm architecture while allowing for design efficiency and parallel development. Robotic Systems Neural Networks Obstacle Avoidance Deep Reinforcement Learning
200	Leveraging machine learning for managing prefetchers and designing secure standard cells Eris, Furkan 23 May 2022 (has links) Machine Learning (ML) has gained prominence in recent years and is currently being used in a wide range of applications. Researchers have achieved impressive results at or beyond human levels in image processing, voice recognition, and natural language processing applications. Over the past several years, there has been a lot of work in the area of designing efficient hardware for ML applications. Realizing the power of ML over the years, lately, researchers are exploring the use of ML for designing computing systems. In this thesis, we propose two ML-based design and management approaches - in the first approach, we propose to use ML algorithms to improve hardware prefetching in processors. In the second approach, we leverage Reinforcement Learning (RL)-based algorithms to automatically insert nanoantennas into standard cell libraries to secure them against Hardware Trojans (HTs). In the first approach, we propose using ML to manage prefetchers and in turn improve processor performance. Classically, prefetcher improvements have been focused on either adding new prefetchers to the existing hybrid prefetching system (a system made out of one or more prefetchers) or increasing the complexity of the existing prefetchers. Both approaches increase the number of prefetcher system configurations (PSCs). Here, a PSC is a given setting for each prefetcher such as whether it is ON or OFF or in the case of more complex prefetchers settings such as the aggressiveness level of the prefetcher. While the choice of PSC of the hybrid prefetching system can be statically optimized for the average case, there are still opportunities to improve the performance at runtime. To this end, we propose a prefetcher manager called Puppeteer to enable dynamic configuration of existing prefetchers. Puppeteer uses a suite of decision trees to adapt PSCs at runtime. We extensively test Puppeteer using a cycle-accurate simulator across 232 traces. We show up to 46.0% instructions-per-cycle (IPC) improvement over no prefetching in 1C, 25.8% in 4C, and 11.9% over 8C. We design Puppeteer using pruning methods to reduce the hardware overhead and ensure feasibility by reducing the overhead to only a few KB for storage. In the second approach, we propose SecRLCAD, an RL-based Computer-Aided-Design (CAD) flow to secure standard cell libraries. The chip supply chain has become globalized. This globalization has raised security concerns since each step in the chip design, fabrication and testing is now prone to attacks. Prior work has shown that a HT in the form of a single capacitor with a couple of gates can be inserted during the fabrication step and then later be utilized to gain privileged access to a processor. To combat this inserted HT, nanoantennas can be inserted strategically in standard cells to create an optical signature of the chip. However, inserting these nanoantennas is difficult and time-consuming. To aid human designers in speeding up the design of secure standard cells, we design an RL-based flow to insert nanoantennas into each standard cell in a library. We evaluate our flow using Nangate FreePDK 45nm. We can secure and generate a clean library with an average area increase of 56%. / 2023-05-23T00:00:00Z Computer engineering Computer-aided design Reinforcement learning Runtime management

Search results