• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 688
  • 81
  • 68
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1109
  • 1109
  • 277
  • 232
  • 212
  • 188
  • 168
  • 167
  • 159
  • 157
  • 152
  • 134
  • 128
  • 127
  • 118
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
201

Towards Superintelligence-Driven Autonomous Network Operation Centers Using Reinforcement Learning

Altamimi, Basel 25 October 2021 (has links)
Today's Network Operation Centers (NOC) consist of teams of network professionals responsible for monitoring and taking actions for their network's health. Most of these NOC actions are relatively complex and executed manually; only the simplest tasks can be automated with rules-based software. But today's networks are getting larger and more complex. Therefore, deciding what action to take in the face of non-trivial problems has essentially become an art that depends on collective human intelligence of NOC technicians, specialized support teams organized by technology domains, and vendors' technical support. This model is getting increasingly expensive and inefficient, and the automation of all or at least some NOC tasks is now considered a desirable step towards autonomous and self-healing networks. In this work, we investigate whether such decisions can be taken by Artificial Intelligence instead of collective human intelligence, specifically by Deep-Reinforcement Learning (DRL), which has been shown in computer games to outperform humans. We build an Action Recommendation Engine (ARE) based on RL, train it with expert rules or by letting it explore outcomes by itself, and show that it can learn new and more efficient strategies that outperform expert rules designed by humans by as much as 25%.
202

Mobile Robot Obstacle Avoidance based on Deep Reinforcement Learning

Feng, Shumin January 2018 (has links)
Obstacle avoidance is one of the core problems in the field of autonomous navigation. An obstacle avoidance approach is developed for the navigation task of a reconfigurable multi-robot system named STORM, which stands for Self-configurable and Transformable Omni-Directional Robotic Modules. Various mathematical models have been developed in previous work in this field to avoid collision for such robots. In this work, the proposed collision avoidance algorithm is trained via Deep Reinforcement Learning, which enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. The trained neural network architecture is capable of choosing an action directly based on the input sensor data using the trained neural network architecture. A virtual STORM locomotion module was trained to explore a Gazebo simulation environment without collision, using the proposed collision avoidance strategies based on DRL. The mathematical model of the avoidance algorithm was derived from the simulation and then applied to the prototype of the locomotion module and validated via experiments. Universal software architecture was also designed for the STORM modules. The software architecture has extensible and reusable features that improve the design efficiency and enable parallel development. / Master of Science / In this thesis, an obstacle avoidance approach is described to enable autonomous navigation of a reconfigurable multi-robot system, STORM. The Self-configurable and Transformable Omni-Directional Robotic Modules (STORM) is a novel approach towards heterogeneous swarm robotics. The system has two types of robotic modules, namely the locomotion module and the manipulation module. Each module is able to navigate and perform tasks independently. In addition, the systems are designed to autonomously dock together to perform tasks that the modules individually are unable to accomplish. The proposed obstacle avoidance approach is designed for the modules of STORM, but can be applied to mobile robots in general. In contrast to the existing collision avoidance approaches, the proposed algorithm was trained via deep reinforcement learning (DRL). This enables the robot to learn by itself from its experiences, and then fit a mathematical model by updating the parameters of a neural network. In order to avoid damage to the real robot during the learning phase, a virtual robot was trained inside a Gazebo simulation environment with obstacles. The mathematical model for the collision avoidance strategy obtained through DRL was then validated on a locomotion module prototype of STORM. This thesis also introduces the overall STORM architecture and provides a brief overview of the generalized software architecture designed for the STORM modules. The software architecture has expandable and reusable features that apply well to the swarm architecture while allowing for design efficiency and parallel development.
203

Leveraging machine learning for managing prefetchers and designing secure standard cells

Eris, Furkan 23 May 2022 (has links)
Machine Learning (ML) has gained prominence in recent years and is currently being used in a wide range of applications. Researchers have achieved impressive results at or beyond human levels in image processing, voice recognition, and natural language processing applications. Over the past several years, there has been a lot of work in the area of designing efficient hardware for ML applications. Realizing the power of ML over the years, lately, researchers are exploring the use of ML for designing computing systems. In this thesis, we propose two ML-based design and management approaches - in the first approach, we propose to use ML algorithms to improve hardware prefetching in processors. In the second approach, we leverage Reinforcement Learning (RL)-based algorithms to automatically insert nanoantennas into standard cell libraries to secure them against Hardware Trojans (HTs). In the first approach, we propose using ML to manage prefetchers and in turn improve processor performance. Classically, prefetcher improvements have been focused on either adding new prefetchers to the existing hybrid prefetching system (a system made out of one or more prefetchers) or increasing the complexity of the existing prefetchers. Both approaches increase the number of prefetcher system configurations (PSCs). Here, a PSC is a given setting for each prefetcher such as whether it is ON or OFF or in the case of more complex prefetchers settings such as the aggressiveness level of the prefetcher. While the choice of PSC of the hybrid prefetching system can be statically optimized for the average case, there are still opportunities to improve the performance at runtime. To this end, we propose a prefetcher manager called Puppeteer to enable dynamic configuration of existing prefetchers. Puppeteer uses a suite of decision trees to adapt PSCs at runtime. We extensively test Puppeteer using a cycle-accurate simulator across 232 traces. We show up to 46.0% instructions-per-cycle (IPC) improvement over no prefetching in 1C, 25.8% in 4C, and 11.9% over 8C. We design Puppeteer using pruning methods to reduce the hardware overhead and ensure feasibility by reducing the overhead to only a few KB for storage. In the second approach, we propose SecRLCAD, an RL-based Computer-Aided-Design (CAD) flow to secure standard cell libraries. The chip supply chain has become globalized. This globalization has raised security concerns since each step in the chip design, fabrication and testing is now prone to attacks. Prior work has shown that a HT in the form of a single capacitor with a couple of gates can be inserted during the fabrication step and then later be utilized to gain privileged access to a processor. To combat this inserted HT, nanoantennas can be inserted strategically in standard cells to create an optical signature of the chip. However, inserting these nanoantennas is difficult and time-consuming. To aid human designers in speeding up the design of secure standard cells, we design an RL-based flow to insert nanoantennas into each standard cell in a library. We evaluate our flow using Nangate FreePDK 45nm. We can secure and generate a clean library with an average area increase of 56%. / 2023-05-23T00:00:00Z
204

Kombinatorisk Optimering med Pointer Networks och Reinforcement Learning

Holmberg, Axel, Hansson, Wilhelm January 2021 (has links)
Given the complexity and range of combinatorial optimization problems, solving them can be computationally easy or hard. There are many ways to solve them, but all available methods share a problem: they take a long time to run and have to be rerun when new cases are introduced. Machine learning could prove a viable solution to solving combinatorial optimization problems due to the possibility for models to learn and generalize, eliminating the need to run a complex algorithm every time a new instance is presented. Uniter is a management consulting firm that provides services within product modularization. Product modularization results in the possibility for many different product variations to be created based on customer needs. Finding the best combination given a specific customer's need will require solving a combinatorial optimization problem. Based on Uniter's need, this thesis sought to develop and evaluate a machine learning model consisting of a Pointer Network architecture and trained using Reinforcement Learning.  The task was to find the combination of parts yielding the lowest cost, given a use case. Each use case had different attributes that specified the need for the final product. For each use case, the model was tasked with selecting the most appropriate combination from a set of 4000 distinct combinations. Three experiments were conducted: examining if the model could suggest an optimal solution after being trained on one use case, if the model could suggest an optimal solution of a previously seen use case, and if the model could suggest an optimal solution of an unseen use case. For all experiments, a single data set was used. The suggested model was compared to three baselines: a weighted random selection, a naive model implementing a feed-forward network, and an exhaustive search. The results showed that the proposed model could not suggest an optimal solution in any of the experiments. In most tests conducted, the proposed model was significantly slower at suggesting a solution than any baseline. The proposed model had high accuracy in all experiments, meaning it suggested almost only feasible solutions in the allowed solution space. However, when the model converged, it suggested only one combination for every use case, with the feed-forward baseline showing the same behavior. This behavior could suggest that the model misinterpreted the task and identified a solution that would work in most cases instead of suggesting the optimal solution for each use case. The discussion concludes that an exhaustive search is preferable for the studied data set and that an alternate approach using supervised learning may be a better solution.
205

Remembering the past to predict the future: a scale-invariant timeline for memory and anticipation

Goh, Wei Zhong 14 March 2022 (has links)
To guide action, animals anticipate what events will occur, and when they will occur, based on experience. How animals anticipate future events is an unsettled question. Although reinforcement learning is often used to model anticipation, it is resource-intensive outside of the simplest scenarios. In this dissertation, I show evidence of memory that is persistent and carries timing information, and specify an algorithm for how animals might anticipate the identity and timing of future events. This dissertation consists of two studies. In the first study, I found that identity and timing of remembered odors are jointly represented in the same cells in the dentate gyrus and lateral entorhinal cortex. Further, odor memories persist well after new odors emerge. The study analyzed results from an experiment conducted by Woods et al. (2020) on mice passively exposed to separate odors for a period of 20 s per exposure. The results are consistent with a memory framework known as timing using inverse Laplace transform (TILT). In the second study, I constructed a computational algorithm based on the TILT memory framework to anticipate the identity and timing of future events. The algorithm generates predictions based on memories of past events, and stored associations between cues and outcomes. The algorithm is resource-efficient even when the future depends on the indefinite past. The algorithm is scale-invariant and works well with chains of events. Together, the studies support a novel computational mechanism which anticipates what events will occur, and when they will occur. The algorithm could be applied in machine learning in cases of long-range dependence on history. These studies predict that behavioral and neural responses of animals could depend on events well into the past. / 2024-03-13T00:00:00Z
206

Multi-Agent Reinforcement Learning: Analysis and Application

Paulo Cesar Heredia (12428121) 20 April 2022 (has links)
<p>With the increasing availability of data and the rise of networked systems such as autonomous vehicles, drones, and smart girds, the application of data-driven, machine learning methods with multi-agents systems have become an important topic. In particular, reinforcement learning has gained a lot of popularity due to its similarities with optimal control, with the potential of allowing us to develop optimal control systems using only observed data and without the need for a model of a system's state dynamics. In this thesis work, we explore the application of reinforcement learning with multi-agents systems, which is known as multi-agent reinforcement learning (MARL). We have developed algorithms that address some challenges in the cooperative setting of MARL. We have also done work on better understanding the convergence guarantees of some known multi-agent reinforcement learning algorithms, which combine reinforcement learning with distributed consensus methods. And, with the aim of making MARL better suited to real-world problems, we have also developed algorithms to address some practical challenges with MARL and we have applied MARL on a real-world problem.</p> <p>In the first part of this thesis, we focus on developing algorithms to address some open problems in MARL. One of these challenges is learning with output feedback, which is known as partial observability in the reinforcement learning literature. One of the main assumptions of reinforcement learning in the singles agent case is that the agent can fully observe the state of the plant it is controlling (we note the “plant" is often referred to as the “environment" in the reinforcement learning literature. We will use these terms interchangeably). In the single agent case this assumption can be reasonable since it only requires one agent to fully observe its environment. In the multi-agent setting, however, this assumption would require all agents to fully observe the state and furthermore since each agent could affect the plant (or environment) with its actions, the assumption would also require that agent's know the actions of other agents. We have also developed algorithms to address practical issues that may arise when applying reinforcement learning (RL) or MARL on large-scale real-world systems. One such algorithm is a distributed reinforcement learning algorithm that allows us to learn in cases where the states and actions are both continuous and of large dimensionality, which is the case for many real-world applications. Without the ability to handle continuous states and actions, many algorithms require discretization, which with high dimensional systems can become impractical. We have also developed a distributed reinforcement learning algorithm that addresses data scalability of RL. By data scalability we mean how to learn from a very large dataset that cannot be efficiently processed by a single agent with limited resources.</p> <p>In the second part of this thesis, we provide a finite-sample analysis of some distributed reinforcement learning algorithms. By finite-sample analysis, we mean we provide an upper bound on the squared error of the algorithm for a given iteration of the algorithm. Or equivalently, since each iteration uses one data sample, we provide an upper bound of the squared error for a given number of data samples used. This type of analysis had been missing in the MARL literature, where most works on MARL have only provided asymptotic results for their proposed algorithms, which only tells us how the algorithmic error behaves as the number of samples used goes to infinity. </p> <p>The third part of this thesis focuses on applications with real-world systems. We have explored a real-world problem, namely transactive energy systems (TES), which can be represented as a multi-agent system. We have applied various reinforcement learning algorithms with the aim of learning an optimal control policy for this system. Through simulations, we have compared the performance of these algorithms and have illustrated the effect of partial observability (output feedback) when compared to full state feedback.</p> <p>In the last part we present some other work, specifically we present a distributed observer that aims to address learning with output feedback by estimating the state. The proposed algorithm is designed so that we do not require a complete model of state dynamics, and instead we use a parameterized model where the parameters are estimated along with the state.</p>
207

Learning enhances encoding of time and temporal surprise in primary sensory cortex

Rabinovich, Rebecca January 2022 (has links)
Primary sensory cortex has long been believed to play a straightforward role in the initial processing of sensory information. Yet, the superficial layers of cortex overall are sparsely active, even during strong sensory stimulation; moreover, cortical activity is influenced by other modalities, task context, reward, and behavioral state. The experiments described in this thesis demonstrate that reinforcement learning dramatically alters representations among longitudinally imaged neurons in superficial layers of mouse primary somatosensory cortex. Cells were confirmed to be sparsely active in naïve animals; however, learning an object detection task recruited previously unresponsive neurons, enlarging the neuronal population sensitive to tactile stimuli. In contrast, cortical responses habituated, decreasing upon repeated exposure to unrewarded stimuli. In addition, after conditioning, the cell population as well as individual neurons better encoded the rewarded stimuli, as well as behavioral choice. Furthermore, in well-trained mice, the neuronal population encoded of the passage of time. We further found evidence that the temporal information was contained in sequences of cell activity, meaning that different cells in the population activated at different moments within the trial. This kind of time-keeping was not observed in naïve animals, nor did it arise after repeated stimulus exposure. Finally, unexpected deviations in trial timing elicited even stronger responses than touch did. In conclusion, the superficial layers of sensory cortex exhibit a high degree of learning-dependent plasticity and are strongly modulated by non-sensory but behaviorally-relevant features, such as timing and surprise.
208

Reinforcement Learning of Repetitive Tasks for Autonomous Heavy-Duty Vehicles

Lindesvik Warma, Simon January 2020 (has links)
Many industrial applications of heavy-duty autonomous vehicles include repetitive manoeuvres, such as, vehicle parking, hub-to-hub transportation etc. This thesis explores the possibility to use the information from previous executions, via reinforcement learning, of specific manoeuvres to improve the performance for future iterations. The manoeuvres are; one straight line path, and one constantly curved path. A proportional-integrative control strategy is designed to control the vehicle and the controller is updated, between each iteration, using a policy gradient method. A rejection sampling procedure is introduced to impose the stability of the control system. This is necessary since the general reinforcement learning framework and policy gradient framework do not consider stability. The performance of the rejection sampling procedure is improved using the ideas of simulated annealing. The performance improvement of the vehicle is evaluated through simulations. Linear and nonlinear vehicle models are evaluated on a straight line path and a constantly curved path. The simulations show that the vehicle improves its ability to track the reference path for all evaluation models and scenarios. Finally, the simulations also show that the controlled system is kept stable throughout the learning process. / Autonoma fordon är en viktig pusselbit i framtidens transportlösningar och industriella miljöer, både klimat- och säkerhetsmässigt. Många manövrar industriella fordon utför är repetetiva, exempelvis parkering. Det här arbetet utforskar möjligheten att lära sig av tidigare försök av manövern för att förbättra fordonets förmåga att utföra den. En proportionelig-integrerande reglerstruktur används för att styra fordonet. Reglerstrukturen är en tillståndsåterkoppling där regulatorn består av två proportionelig-integrerende regulatorer. Reglersystemet är initialiserat stabilt och fordonet låts utföra en iteration av manövern. Regulatorn updateras mellan varje iteration av manövern med hjälp av förstärkningsinlärning. Förstärkningslärning innebär att man använder informationen från tidigare försök av manövern för att förbättra fordonets förmåga att följa referensbanan. Förstärkningslärningen ger alltså instruktioner om hur regulatorn ska uppdateras baserat på hur fordonet presterade under förra iterationen. En samplings procedur implementeras för att försäkra stabiliteten av reglersystemet eftersom förstärkningslärandet inte tar hänsyn till detta. Syftet med samplings proceduren är också att minimera de negativa effekterna på lärningsprocessen. Algoritmen är analyserad genom att simulera fordonet med hjälp av både linjära- och olinjära utvärderingsmodeller på två olika scenarion; en rak bana och en bana med konstant kurvatur. Simuleringarna visar att fordonet förbättrar sin förmåga att följa referensbanorna för alla utvärderingsmodeller av fordonet. Simuleringarna visar också att reglersystemet hålls stabilt under lärningsprocessen.
209

Intelligent Device Selection in Federated Edge Learning with Energy Efficiency

Peng, Cheng 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Due to the increasing demand from mobile devices for the real-time response of cloud computing services, federated edge learning (FEL) emerges as a new computing paradigm, which utilizes edge devices to achieve efficient machine learning while protecting their data privacy. Implementing efficient FEL suffers from the challenges of devices' limited computing and communication resources, as well as unevenly distributed datasets, which inspires several existing research focusing on device selection to optimize time consumption and data diversity. However, these studies fail to consider the energy consumption of edge devices given their limited power supply, which can seriously affect the cost-efficiency of FEL with unexpected device dropouts. To fill this gap, we propose a device selection model capturing both energy consumption and data diversity optimization, under the constraints of time consumption and training data amount. Then we solve the optimization problem by reformulating the original model and designing a novel algorithm, named E2DS, to reduce the time complexity greatly. By comparing with two classical FEL schemes, we validate the superiority of our proposed device selection mechanism for FEL with extensive experimental results. Furthermore, for each device in a real FEL environment, it is the fact that multiple tasks will occupy the CPU at the same time, so the frequency of the CPU used for training fluctuates all the time, which may lead to large errors in computing energy consumption. To solve this problem, we deploy reinforcement learning to learn the frequency so as to approach real value. And compared to increasing data diversity, we consider a more direct way to improve the convergence speed using loss values. Then we formulate the optimization problem that minimizes the energy consumption and maximizes the loss values to select the appropriate set of devices. After reformulating the problem, we design a new algorithm FCE2DS as the solution to have better performance on convergence speed and accuracy. Finally, we compare the performance of this proposed scheme with the previous scheme and the traditional scheme to verify the improvement of the proposed scheme in multiple aspects.
210

Multi-criteria decision making using reinforcement learning and its application to food, energy, and water systems (FEWS) problem

Deshpande, Aishwarya 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Multi-criteria decision making (MCDM) methods have evolved over the past several decades. In today’s world with rapidly growing industries, MCDM has proven to be significant in many application areas. In this study, a decision-making model is devised using reinforcement learning to carry out multi-criteria optimization problems. Learning automata algorithm is used to identify an optimal solution in the presence of single and multiple environments (criteria) using pareto optimality. The application of this model is also discussed, where the model provides an optimal solution to the food, energy, and water systems (FEWS) problem.

Page generated in 0.0915 seconds