Global ETD Search

201	Recommender System using Reinforcement Learning January 2020 (has links) abstract: Currently, recommender systems are used extensively to find the right audience with the "right" content over various platforms. Recommendations generated by these systems aim to offer relevant items to users. Different approaches have been suggested to solve this problem mainly by using the rating history of the user or by identifying the preferences of similar users. Most of the existing recommendation systems are formulated in an identical fashion, where a model is trained to capture the underlying preferences of users over different kinds of items. Once it is deployed, the model suggests personalized recommendations precisely, and it is assumed that the preferences of users are perfectly reflected by the historical data. However, such user data might be limited in practice, and the characteristics of users may constantly evolve during their intensive interaction between recommendation systems. Moreover, most of these recommender systems suffer from the cold-start problems where insufficient data for new users or products results in reduced overall recommendation output. In the current study, we have built a recommender system to recommend movies to users. Biclustering algorithm is used to cluster the users and movies simultaneously at the beginning to generate explainable recommendations, and these biclusters are used to form a gridworld where Q-Learning is used to learn the policy to traverse through the grid. The reward function uses the Jaccard Index, which is a measure of common users between two biclusters. Demographic details of new users are used to generate recommendations that solve the cold-start problem too. Lastly, the implemented algorithm is examined with a real-world dataset against the widely used recommendation algorithm and the performance for the cold-start cases. / Dissertation/Thesis / Masters Thesis Computer Science 2020 Artificial intelligence Computer science Biclustering Qlearning Recommender System Reinforcement Learning
202	Approaches for Efficient Autonomous Exploration using Deep Reinforcement Learning Thomas Molnar (8735079) 24 April 2020 (has links) <p>For autonomous exploration of complex and unknown environments, existing Deep Reinforcement Learning (Deep RL) approaches struggle to generalize from computer simulations to real world instances. Deep RL methods typically exhibit low sample efficiency, requiring a large amount of data to develop an optimal policy function for governing an agent's behavior. RL agents expect well-shaped and frequent rewards to receive feedback for updating policies. Yet in real world instances, rewards and feedback tend to be infrequent and sparse.</p><p> </p><p>For sparse reward environments, an intrinsic reward generator can be utilized to facilitate progression towards an optimal policy function. The proposed Augmented Curiosity Modules (ACMs) extend the Intrinsic Curiosity Module (ICM) by Pathak et al. These modules utilize depth image and optical flow predictions with intrinsic rewards to improve sample efficiency. Additionally, the proposed Capsules Exploration Module (Caps-EM) pairs a Capsule Network, rather than a Convolutional Neural Network, architecture with an A2C algorithm. This provides a more compact architecture without need for intrinsic rewards, which the ICM and ACMs require. Tested using ViZDoom for experimentation in visually rich and sparse feature scenarios, both the Depth-Augmented Curiosity Module (D-ACM) and Caps-EM improve autonomous exploration performance and sample efficiency over the ICM. The Caps-EM is superior, using 44% and 83% fewer trainable network parameters than the ICM and D-ACM, respectively. On average across all “My Way Home” scenarios, the Caps-EM converges to a policy function with 1141% and 437% time improvements over the ICM and D-ACM, respectively.</p> Computer Engineering deep reinforcement learning Capsule Network exploration performance Autonomous
203	Cooperative Vehicular Communications for High Throughput Applications / 大容量車載アプリケーションに向けた車車間協調通信 Taya, Akihiro 24 September 2019 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第22099号 / 情博第709号 / 新制\|\|情\|\|122(附属図書館) / 京都大学大学院情報学研究科通信情報システム専攻 / (主査)教授守倉正博, 教授原田博司, 教授梅野健 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Vehicular communications Cooperative MIMO mmWave communications Reinforcement learning 007
204	Transfer of reinforcement learning for a robotic skill Gómez Rosal, Dulce Adriana January 2018 (has links) In this work, we develop the transfer learning (TL) of reinforcement learning (RL) for the robotic skill of throwing a ball into a basket, from a computer simulated environment to a real-world implementation. Whereas learning of the same skill has been previously explored by using a Programming by Demonstration approach directly on the real-world robot, for our work, the model-based RL algorithm PILCO was employed as an alternative as it provides the robot with no previous knowledge or hints, i.e. the robot begins learning from a tabula rasa state, PILCO learns directly on the simulated environment, and as part of its procedure, PILCO models the dynamics of the inflatable, plastic ball used to perform the task. The robotic skill is represented as a Markov Decision Process, the robotic arm is a Kuka LWR4+, RL is enabled by PILCO, and TL is achieved through policy adjustments. Two learned policies were transferred, and although the results show that no exhaustive policy adjustments are required, large gaps remain between the simulated and the real environment in terms of the ball and robot dynamics. The contributions of this thesis include: a novel TL of RL framework for teaching the basketball skill to the Kuka robotic arm; the development of a pythonised version of PILCO; robust and extendable ROS packages for policy learning and adjustment in a simulated or real robot; a tracking-vision package with a Kinect camera; and an Orocos package for a position controller in the robotic arm. Transfer learning Reinforcement learning Simulation Robotics Robotics Robotteknik och automation
205	Towards Superintelligence-Driven Autonomous Network Operation Centers Using Reinforcement Learning Altamimi, Basel 25 October 2021 (has links) Today's Network Operation Centers (NOC) consist of teams of network professionals responsible for monitoring and taking actions for their network's health. Most of these NOC actions are relatively complex and executed manually; only the simplest tasks can be automated with rules-based software. But today's networks are getting larger and more complex. Therefore, deciding what action to take in the face of non-trivial problems has essentially become an art that depends on collective human intelligence of NOC technicians, specialized support teams organized by technology domains, and vendors' technical support. This model is getting increasingly expensive and inefficient, and the automation of all or at least some NOC tasks is now considered a desirable step towards autonomous and self-healing networks. In this work, we investigate whether such decisions can be taken by Artificial Intelligence instead of collective human intelligence, specifically by Deep-Reinforcement Learning (DRL), which has been shown in computer games to outperform humans. We build an Action Recommendation Engine (ARE) based on RL, train it with expert rules or by letting it explore outcomes by itself, and show that it can learn new and more efficient strategies that outperform expert rules designed by humans by as much as 25%. Reinforcement Learning Autonomous networks Network operation center automation Computer networks
206	Mobile Robot Obstacle Avoidance based on Deep Reinforcement Learning Feng, Shumin January 2018 (has links) Obstacle avoidance is one of the core problems in the field of autonomous navigation. An obstacle avoidance approach is developed for the navigation task of a reconfigurable multi-robot system named STORM, which stands for Self-configurable and Transformable Omni-Directional Robotic Modules. Various mathematical models have been developed in previous work in this field to avoid collision for such robots. In this work, the proposed collision avoidance algorithm is trained via Deep Reinforcement Learning, which enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. The trained neural network architecture is capable of choosing an action directly based on the input sensor data using the trained neural network architecture. A virtual STORM locomotion module was trained to explore a Gazebo simulation environment without collision, using the proposed collision avoidance strategies based on DRL. The mathematical model of the avoidance algorithm was derived from the simulation and then applied to the prototype of the locomotion module and validated via experiments. Universal software architecture was also designed for the STORM modules. The software architecture has extensible and reusable features that improve the design efficiency and enable parallel development. / Master of Science / In this thesis, an obstacle avoidance approach is described to enable autonomous navigation of a reconfigurable multi-robot system, STORM. The Self-configurable and Transformable Omni-Directional Robotic Modules (STORM) is a novel approach towards heterogeneous swarm robotics. The system has two types of robotic modules, namely the locomotion module and the manipulation module. Each module is able to navigate and perform tasks independently. In addition, the systems are designed to autonomously dock together to perform tasks that the modules individually are unable to accomplish. The proposed obstacle avoidance approach is designed for the modules of STORM, but can be applied to mobile robots in general. In contrast to the existing collision avoidance approaches, the proposed algorithm was trained via deep reinforcement learning (DRL). This enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. In order to avoid damage to the real robot during the learning phase, a virtual robot was trained inside a Gazebo simulation environment with obstacles. The mathematical model for the collision avoidance strategy obtained through DRL was then validated on a locomotion module prototype of STORM. This thesis also introduces the overall STORM architecture and provides a brief overview of the generalized software architecture designed for the STORM modules. The software architecture has expandable and reusable features that apply well to the swarm architecture while allowing for design efficiency and parallel development. Robotic Systems Neural Networks Obstacle Avoidance Deep Reinforcement Learning
207	Leveraging machine learning for managing prefetchers and designing secure standard cells Eris, Furkan 23 May 2022 (has links) Machine Learning (ML) has gained prominence in recent years and is currently being used in a wide range of applications. Researchers have achieved impressive results at or beyond human levels in image processing, voice recognition, and natural language processing applications. Over the past several years, there has been a lot of work in the area of designing efficient hardware for ML applications. Realizing the power of ML over the years, lately, researchers are exploring the use of ML for designing computing systems. In this thesis, we propose two ML-based design and management approaches - in the first approach, we propose to use ML algorithms to improve hardware prefetching in processors. In the second approach, we leverage Reinforcement Learning (RL)-based algorithms to automatically insert nanoantennas into standard cell libraries to secure them against Hardware Trojans (HTs). In the first approach, we propose using ML to manage prefetchers and in turn improve processor performance. Classically, prefetcher improvements have been focused on either adding new prefetchers to the existing hybrid prefetching system (a system made out of one or more prefetchers) or increasing the complexity of the existing prefetchers. Both approaches increase the number of prefetcher system configurations (PSCs). Here, a PSC is a given setting for each prefetcher such as whether it is ON or OFF or in the case of more complex prefetchers settings such as the aggressiveness level of the prefetcher. While the choice of PSC of the hybrid prefetching system can be statically optimized for the average case, there are still opportunities to improve the performance at runtime. To this end, we propose a prefetcher manager called Puppeteer to enable dynamic configuration of existing prefetchers. Puppeteer uses a suite of decision trees to adapt PSCs at runtime. We extensively test Puppeteer using a cycle-accurate simulator across 232 traces. We show up to 46.0% instructions-per-cycle (IPC) improvement over no prefetching in 1C, 25.8% in 4C, and 11.9% over 8C. We design Puppeteer using pruning methods to reduce the hardware overhead and ensure feasibility by reducing the overhead to only a few KB for storage. In the second approach, we propose SecRLCAD, an RL-based Computer-Aided-Design (CAD) flow to secure standard cell libraries. The chip supply chain has become globalized. This globalization has raised security concerns since each step in the chip design, fabrication and testing is now prone to attacks. Prior work has shown that a HT in the form of a single capacitor with a couple of gates can be inserted during the fabrication step and then later be utilized to gain privileged access to a processor. To combat this inserted HT, nanoantennas can be inserted strategically in standard cells to create an optical signature of the chip. However, inserting these nanoantennas is difficult and time-consuming. To aid human designers in speeding up the design of secure standard cells, we design an RL-based flow to insert nanoantennas into each standard cell in a library. We evaluate our flow using Nangate FreePDK 45nm. We can secure and generate a clean library with an average area increase of 56%. / 2023-05-23T00:00:00Z Computer engineering Computer-aided design Reinforcement learning Runtime management
208	Kombinatorisk Optimering med Pointer Networks och Reinforcement Learning Holmberg, Axel, Hansson, Wilhelm January 2021 (has links) Given the complexity and range of combinatorial optimization problems, solving them can be computationally easy or hard. There are many ways to solve them, but all available methods share a problem: they take a long time to run and have to be rerun when new cases are introduced. Machine learning could prove a viable solution to solving combinatorial optimization problems due to the possibility for models to learn and generalize, eliminating the need to run a complex algorithm every time a new instance is presented. Uniter is a management consulting firm that provides services within product modularization. Product modularization results in the possibility for many different product variations to be created based on customer needs. Finding the best combination given a specific customer's need will require solving a combinatorial optimization problem. Based on Uniter's need, this thesis sought to develop and evaluate a machine learning model consisting of a Pointer Network architecture and trained using Reinforcement Learning. The task was to find the combination of parts yielding the lowest cost, given a use case. Each use case had different attributes that specified the need for the final product. For each use case, the model was tasked with selecting the most appropriate combination from a set of 4000 distinct combinations. Three experiments were conducted: examining if the model could suggest an optimal solution after being trained on one use case, if the model could suggest an optimal solution of a previously seen use case, and if the model could suggest an optimal solution of an unseen use case. For all experiments, a single data set was used. The suggested model was compared to three baselines: a weighted random selection, a naive model implementing a feed-forward network, and an exhaustive search. The results showed that the proposed model could not suggest an optimal solution in any of the experiments. In most tests conducted, the proposed model was significantly slower at suggesting a solution than any baseline. The proposed model had high accuracy in all experiments, meaning it suggested almost only feasible solutions in the allowed solution space. However, when the model converged, it suggested only one combination for every use case, with the feed-forward baseline showing the same behavior. This behavior could suggest that the model misinterpreted the task and identified a solution that would work in most cases instead of suggesting the optimal solution for each use case. The discussion concludes that an exhaustive search is preferable for the studied data set and that an alternate approach using supervised learning may be a better solution. Pointer Networks Reinforcement Learning Combinatorial Optimization Computer Sciences Datavetenskap (datalogi)
209	Remembering the past to predict the future: a scale-invariant timeline for memory and anticipation Goh, Wei Zhong 14 March 2022 (has links) To guide action, animals anticipate what events will occur, and when they will occur, based on experience. How animals anticipate future events is an unsettled question. Although reinforcement learning is often used to model anticipation, it is resource-intensive outside of the simplest scenarios. In this dissertation, I show evidence of memory that is persistent and carries timing information, and specify an algorithm for how animals might anticipate the identity and timing of future events. This dissertation consists of two studies. In the first study, I found that identity and timing of remembered odors are jointly represented in the same cells in the dentate gyrus and lateral entorhinal cortex. Further, odor memories persist well after new odors emerge. The study analyzed results from an experiment conducted by Woods et al. (2020) on mice passively exposed to separate odors for a period of 20 s per exposure. The results are consistent with a memory framework known as timing using inverse Laplace transform (TILT). In the second study, I constructed a computational algorithm based on the TILT memory framework to anticipate the identity and timing of future events. The algorithm generates predictions based on memories of past events, and stored associations between cues and outcomes. The algorithm is resource-efficient even when the future depends on the indefinite past. The algorithm is scale-invariant and works well with chains of events. Together, the studies support a novel computational mechanism which anticipates what events will occur, and when they will occur. The algorithm could be applied in machine learning in cases of long-range dependence on history. These studies predict that behavioral and neural responses of animals could depend on events well into the past. / 2024-03-13T00:00:00Z Quantitative psychology long memory prediction reinforcement learning scale invariance
210	Multi-Agent Reinforcement Learning: Analysis and Application Paulo Cesar Heredia (12428121) 20 April 2022 (has links) <p>With the increasing availability of data and the rise of networked systems such as autonomous vehicles, drones, and smart girds, the application of data-driven, machine learning methods with multi-agents systems have become an important topic. In particular, reinforcement learning has gained a lot of popularity due to its similarities with optimal control, with the potential of allowing us to develop optimal control systems using only observed data and without the need for a model of a system's state dynamics. In this thesis work, we explore the application of reinforcement learning with multi-agents systems, which is known as multi-agent reinforcement learning (MARL). We have developed algorithms that address some challenges in the cooperative setting of MARL. We have also done work on better understanding the convergence guarantees of some known multi-agent reinforcement learning algorithms, which combine reinforcement learning with distributed consensus methods. And, with the aim of making MARL better suited to real-world problems, we have also developed algorithms to address some practical challenges with MARL and we have applied MARL on a real-world problem.</p> <p>In the first part of this thesis, we focus on developing algorithms to address some open problems in MARL. One of these challenges is learning with output feedback, which is known as partial observability in the reinforcement learning literature. One of the main assumptions of reinforcement learning in the singles agent case is that the agent can fully observe the state of the plant it is controlling (we note the “plant" is often referred to as the “environment" in the reinforcement learning literature. We will use these terms interchangeably). In the single agent case this assumption can be reasonable since it only requires one agent to fully observe its environment. In the multi-agent setting, however, this assumption would require all agents to fully observe the state and furthermore since each agent could affect the plant (or environment) with its actions, the assumption would also require that agent's know the actions of other agents. We have also developed algorithms to address practical issues that may arise when applying reinforcement learning (RL) or MARL on large-scale real-world systems. One such algorithm is a distributed reinforcement learning algorithm that allows us to learn in cases where the states and actions are both continuous and of large dimensionality, which is the case for many real-world applications. Without the ability to handle continuous states and actions, many algorithms require discretization, which with high dimensional systems can become impractical. We have also developed a distributed reinforcement learning algorithm that addresses data scalability of RL. By data scalability we mean how to learn from a very large dataset that cannot be efficiently processed by a single agent with limited resources.</p> <p>In the second part of this thesis, we provide a finite-sample analysis of some distributed reinforcement learning algorithms. By finite-sample analysis, we mean we provide an upper bound on the squared error of the algorithm for a given iteration of the algorithm. Or equivalently, since each iteration uses one data sample, we provide an upper bound of the squared error for a given number of data samples used. This type of analysis had been missing in the MARL literature, where most works on MARL have only provided asymptotic results for their proposed algorithms, which only tells us how the algorithmic error behaves as the number of samples used goes to infinity. </p> <p>The third part of this thesis focuses on applications with real-world systems. We have explored a real-world problem, namely transactive energy systems (TES), which can be represented as a multi-agent system. We have applied various reinforcement learning algorithms with the aim of learning an optimal control policy for this system. Through simulations, we have compared the performance of these algorithms and have illustrated the effect of partial observability (output feedback) when compared to full state feedback.</p> <p>In the last part we present some other work, specifically we present a distributed observer that aims to address learning with output feedback by estimating the state. The proposed algorithm is designed so that we do not require a complete model of state dynamics, and instead we use a parameterized model where the parameters are estimated along with the state.</p> Aerospace Engineering multi-agent systems reinforcement learning (RL) Distributed Algorithms

Search results