• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 710
  • 81
  • 70
  • 22
  • 11
  • 9
  • 8
  • 7
  • 7
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1147
  • 1147
  • 282
  • 241
  • 221
  • 196
  • 176
  • 167
  • 167
  • 164
  • 155
  • 136
  • 130
  • 128
  • 120
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
251

Reinforcement Learning with Gaussian Processes for Unmanned Aerial Vehicle Navigation

Gondhalekar, Nahush Ramesh 03 August 2017 (has links)
We study the problem of Reinforcement Learning (RL) for Unmanned Aerial Vehicle (UAV) navigation with the smallest number of real world samples possible. This work is motivated by applications of learning autonomous navigation for aerial robots in structural inspec- tion. A naive RL implementation suffers from curse of dimensionality in large continuous state spaces. Gaussian Processes (GPs) exploit the spatial correlation to approximate state- action transition dynamics or value function in large state spaces. By incorporating GPs in naive Q-learning we achieve better performance in smaller number of samples. The evalua- tion is performed using simulations with an aerial robot. We also present a Multi-Fidelity Reinforcement Learning (MFRL) algorithm that leverages Gaussian Processes to learn the optimal policy in a real world environment leveraging samples gathered from a lower fidelity simulator. In MFRL, an agent uses multiple simulators of the real environment to perform actions. With multiple levels of fidelity in a simulator chain, the number of samples used in successively higher simulators can be reduced. / Master of Science / Increasing development in the field of infrastructure inspection using Unmanned Aerial Vehicles (UAVs) has been seen in the recent years. This thesis presents work related to UAV navigation using Reinforcement Learning (RL) with the smallest number of real world samples. A naive RL implementation suffers from the curse of dimensionality in large continuous state spaces. Gaussian Processes (GPs) exploit the spatial correlation to approximate state-action transition dynamics or value function in large state spaces. By incorporating GPs in naive Q-learning we achieve better performance in smaller number of samples. The evaluation is performed using simulations with an aerial robot. We also present a Multi-Fidelity Reinforcement Learning (MFRL) algorithm that leverages Gaussian Processes to learn the optimal policy in a real world environment leveraging samples gathered from a lower fidelity simulator. In MFRL, an agent uses multiple simulators of the real environment to perform actions. With multiple levels of fidelity in a simulator chain, the number of samples used in successively higher simulators can be reduced. By developing a bidirectional simulator chain, we try to provide a learning platform for the robots to safely learn required skills in the smallest possible number of real world samples possible.
252

Optimization of Trusses and Frames using Reinforcement Learning / 強化学習を用いたトラスと骨組の最適化

Chi-Tathon, Kupwiwat 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(工学) / 甲第25241号 / 工博第5200号 / 新制||工||1992(附属図書館) / 京都大学大学院工学研究科建築学専攻 / (主査)教授 大崎 純, 教授 西山 峰広, 准教授 藤田 皓平 / 学位規則第4条第1項該当 / Doctor of Agricultural Science / Kyoto University / DFAM
253

LEARNING-BASED OPTIMIZATION OF RESOURCE REDISTRIBUTION IN LARGE-SCALE HETEROGENEOUS DATACENTERS

Chang-Lin Chen (20370300) 04 December 2024 (has links)
<p dir="ltr">This thesis addresses critical optimization challenges in large-scale, heterogeneous data centers: logical cluster formation for virtual machine placement and physical rack movement for efficient infrastructure management. As data centers grow in size and complexity, these systems face rising demands to minimize costs related to fault tolerance, reformation, and resource constraints while adapting to diverse hardware and operational requirements. </p><p dir="ltr">The first part focuses on logical cluster formation, where capacity guarantees must be maintained across millions of servers despite ongoing infrastructure events, such as maintenance and failures. Traditional offline methods fall short under these dynamic, large-scale conditions. To address this, a two-tier approach combining deep reinforcement learning (DRL) with mixed-integer linear programming (MILP) enables real-time resource allocation, reducing server relocations and enhancing resilience across complex server environments.</p><p dir="ltr">The second part tackles optimized rack placement in highly heterogeneous settings, where balancing fault tolerance, energy efficiency, and load distribution is essential. Static layouts struggle to accommodate diverse hardware configurations and fluctuating resource needs. This research proposes a scalable, tiered optimization approach using the Leader Reward method and a gradient-based heuristic to handle the computational demands of large-scale rack positioning.</p><p dir="ltr">By integrating DRL and heuristic techniques, this work provides a robust, scalable solution for cost efficiency and operational resilience in managing large, heterogeneous data centers, advancing intelligent data center management for modern cloud infrastructure.</p>
254

The Impact of Threat on Behavioral and Neural Markers of Learning in Anxiety

Valdespino, Andrew 28 August 2019 (has links)
Anxiety is characterized by apprehensive expectation regarding the forecasted outcomes of choice. Decision science and in particular reinforcement learning models provide a quantitative framework to explain how the likelihood and value of such outcomes are estimated, thus allowing the measurement of parameters of decision-making that may differ between high- and low- anxiety groups. However, the role of anxiety in choice allocation is not sufficiently understood, particularly regarding the influence of transient threat on current decisions. The presence of threat appears to alter choice behavior and may differentially influence quantitatively derived parameters of learning among anxious individuals. Regarding the neurobiology of reinforcement learning, the dorsolateral prefrontal cortex (dlPFC) has been suggested to play a role in temporally integrating experienced outcomes, as well as in coordinating an overall choice action plan, both of which can be described computationally by learning rate and exploration, respectively. Accordingly, it was hypothesized that high trait anxiety would be associated with a lower reward learning rate, a higher loss learning rate, and diminished exploration of available options, and furthermore that threat would increase the magnitude of these parameters in the high anxiety group. We also hypothesized that the magnitude of neural activation (measured by functional near-infrared spectroscopy; FNIRS) across dissociable regions of the left and right dlPFC would be associated with model parameters, and that threat would further increase the magnitude of activation to model parameters. Finally, it was hypothesized that reward and loss outcomes could be differentiated based on FNIRS channel activation, and that a distinct set of channels would differentiate outcomes in high relative to low anxiety groups. To test these hypotheses, a temporal difference learning model was applied to a decision-making (bandit) task to establish differences in learning parameter magnitudes among individuals high (N=26) and low (N=20) in trait anxiety, as well as the impact of threat on learning parameters. Results indicated a positive association between anxiety and both the reward and loss learning rate parameters. However, threat was not found to impact model parameters. Imaging results indicated a positive association between exploration and the left dlPFC. Reward and loss outcomes were successfully differentiated in the high, but not low anxiety group. Results add to a growing literature suggesting anxiety is characterized by differential sensitivity to both losses and rewards in reinforcement learning contexts, and further suggests that the dlPFC plays a role in modulating exploration-based choice strategies. / Doctor of Philosophy / Anxiety is characterized by worry about possible future negative outcomes. Mathematical models in the area of learning theory allow the representation and measurement of individual differences in decision-making tendencies that contribute to negative future apprehension. Currently, the role of anxiety in the allocation of choices, and particularly the influence of threat on decision-making is poorly understood. Threat may influence learning and alter choice behavior, collectively causing negative future apprehension. With regards to how related decision-making is computed in the brain, the dorsolateral prefrontal cortex (dlPFC) has been suggested to play a role tracking and integrating current and past experienced outcomes, in order to coordinate an overall action plan. Outcome tracking and action plan coordination can be represented mathematically within a learning theory framework by learning rate and exploration parameters, respectively. It was hypothesized that high anxiety would be associated with a lower reward learning rate, a higher loss learning rate, and diminished exploration, and furthermore that threat would increase the magnitude of these tendencies in anxious individuals. We also hypothesized that brain activation in the dlPFC would be associated with these tendencies, and that threat would further increase activation in these brain areas. It was also hypothesized that reward and loss outcomes could be differentiated based on brain activation in the dlPFC. To test these hypotheses, a mathematical model was applied to establish differences in learning within high and low anxiety individuals, as well as to test the impact of threat on these learning tendencies. Results indicated a positive association between anxiety and the rate of learning to reward and loss outcomes. Threat was not found to impact these learning rates. A positive association was found between activation in the dlPFC and the tendency to explore. Reward and loss outcomes were successfully differentiated based on brain activation in high, but not low anxiety individuals. Results add to a growing literature suggesting that anxiety is characterized by differential sensitivity to both losses and rewards, and further adds to our understanding of how the brain computes exploration-based choice strategies.
255

Predicting Mutational Pathways of Influenza A H1N1 Virus using Q-learning

Aarathi Raghuraman, FNU 13 August 2021 (has links)
Influenza is a seasonal viral disease affecting over 1 billion people annually around the globe, as reported by the World Health Organization (WHO). The influenza virus has been around for decades causing multiple pandemics and encouraging researchers to perform extensive analysis of its evolutionary patterns. Current research uses phylogenetic trees as the basis to guide population genetics and other phenotypic characteristics when describing the evolution of the influenza genome. Phylogenetic trees are one form of representing the evolutionary trends of sequenced genomes, but that do not capture the multidimensional complexity of mutational pathways. We suggest representing antigenic drifts within influenza A/H1N1 hemagglutinin (HA) protein as a graph, $G = (V, E)$, where $V$ is the set of vertices representing each possible sequence and $E$ is the set of edges representing single amino acid substitutions. Each transition is characterized by a Malthusian fitness model incorporating the genetic adaptation, vaccine similarity, and historical epidemiological response using mortality as the metric where available. Applying reinforcement learning with the vertices as states, edges as actions, and fitness as the reward, we learn the high likelihood mutational pathways and optimal policy, without exploring the entire space of the graph, $G$. Our average predicted versus actual sequence distance of $3.6 \pm 1.2$ amino acids indicates that our novel approach of using naive Q-learning can assist with influenza strain predictions, thus improving vaccine selection for future disease seasons. / Master of Science / Influenza is a seasonal virus affecting over 1 billion people annually around the globe, as reported by the World Health Organization (WHO). The effectiveness of influenza vaccines varies tremendously by the type (A, B, C or D) and season. Of note is the pandemic of 2009, where the influenza A H1N1 virus mutants were significantly different from the chosen vaccine composition. It is pertinent to understand and predict the underlying genetic and environmental behavior of influenza virus mutants to be able to determine the vaccine composition for future seasons, preventing another pandemic. Given the recent 2020 COVID-19 pandemic, which is also a virus that affects the upper respiratory system, novel approaches to predict viruses need to be investigated now more than ever. Thus, in this thesis, I develop a novel approach to predicting a portion of the influenza A H1N1 viruses using machine learning.
256

Random Access Control In Massive Cellular Internet of Things: A Multi-Agent Reinforcement Learning Approach

Bai, Jianan 14 January 2021 (has links)
Internet of things (IoT) is envisioned as a promising paradigm to interconnect enormous wireless devices. However, the success of IoT is challenged by the difficulty of access management of the massive amount of sporadic and unpredictable user traffics. This thesis focuses on the contention-based random access in massive cellular IoT systems and introduces two novel frameworks to provide enhanced scalability, real-time quality of service management, and resource efficiency. First, a local communication based congestion control framework is introduced to distribute the random access attempts evenly over time under bursty traffic. Second, a multi-agent reinforcement learning based preamble selection framework is designed to increase the access capacity under a fixed number of preambles. Combining the two mechanisms provides superior performance under various 3GPP-specified machine type communication evaluation scenarios in terms of achieving much lower access latency and fewer access failures. / Master of Science / In the age of internet of things (IoT), massive amount of devices are expected to be connected to the wireless networks in a sporadic and unpredictable manner. The wireless connection is usually established by contention-based random access, a four-step handshaking process initiated by a device through sending a randomly selected preamble sequence to the base station. While different preambles are orthogonal, preamble collision happens when two or more devices send the same preamble to a base station simultaneously, and a device experiences access failure if the transmitted preamble cannot be successfully received and decoded. A failed device needs to wait for another random access opportunity to restart the aforementioned process and hence the access delay and resource consumption are increased. The random access control in massive IoT systems is challenged by the increased access intensity, which results in higher collision probability. In this work, we aim to provide better scalability, real-time quality of service management, and resource efficiency in random access control for such systems. Towards this end, we introduce 1) a local communication based congestion control framework by enabling a device to cooperate with neighboring devices and 2) a multi-agent reinforcement learning (MARL) based preamble selection framework by leveraging the ability of MARL in forming the decision-making policy through the collected experience. The introduced frameworks are evaluated under the 3GPP-specified scenarios and shown to outperform the existing standard solutions in terms of achieving lower access delays with fewer access failures.
257

Machine Learning and Artificial Intelligence Application in Process Control

Wang, Xiaonian January 2024 (has links)
This thesis consists of four chapters including two main contributions on the application of machine learning and artificial intelligence on process modeling and controller design. Chapter 2 will talk about applying AI to controller design. This chapter proposes and implements a novel reinforcement learning (RL)--based controller design on chemical engineering examples. To address the issue of costly and unsafe training of model-free RL-based controllers, we propose an implementable RL-based controller design that leverages offline MPC calculations, that have already developed based on a step-response model. In this method, a RL agent is trained to imitate the MPC performance. Then, the trained agent is utilized in a model-free RL framework to interact with the actual process so as to continuously learn and optimize its performance under a safe operating range of processes. This contribution is marked as the first implementable RL-based controller for practical industrial application. Chapter 3 will focus on AI applications in process modeling. As nonlinear dynamics are widely encountered and challenging to simulate, nonlinear MPC (NMPC) is recognized as a promising tool to tackle this challenge. However, the lack of a reliable nonlinear model remains a roadblock for this technique. To address this issue, we develop a novel data-driven modeling method that utilizes the nonlinear autoencoder, to result in a modeling technique where the nonlinearity in the model stems from the analysis of the measured variables. Moreover, a quadratic program (QP) based MPC is developed based on this model, by utilizing an autoencoder as a transformer between the controller and process. This work contributes as an extension of the classic Koopman operator modeling method and a remarkable linear MPC design that can outperform other NMPCs such as neural network-based MPC. / Thesis / Master of Applied Science (MASc)
258

Design of neurofuzzy controller using reinforcement learning with application to linear system and inverted pendulum

Saengdeejing, Apiwat 01 January 1998 (has links)
No description available.
259

Neuro-Symbolic Distillation of Reinforcement Learning Agents

Abir, Farhan Fuad 01 January 2024 (has links) (PDF)
In the past decade, reinforcement learning (RL) has achieved breakthroughs across various domains, from surpassing human performance in strategy games to enhancing the training of large language models (LLMs) with human feedback. However, RL has yet to gain widespread adoption in mission-critical fields such as healthcare and autonomous vehicles. This is primarily attributed to the inherent lack of trust, explainability, and generalizability of neural networks in deep reinforcement learning (DRL) agents. While neural DRL agents leverage the power of neural networks to solve specific tasks robustly and efficiently, this often comes at the cost of explainability and generalizability. In contrast, pure symbolic agents maintain explainability and trust but often underperform in high-dimensional data. In this work, we developed a method to distill explainable and trustworthy agents using neuro-symbolic AI. Neuro-symbolic distillation combines the strengths of symbolic reasoning and neural networks, creating a hybrid framework that leverages the structured knowledge representation of symbolic systems alongside the learning capabilities of neural networks. The key steps of neuro-symbolic distillation involve training traditional DRL agents, followed by extracting, selecting, and distilling their learned policies into symbolic forms using symbolic regression and tree-based models. These symbolic representations are then employed instead of the neural agents to make interpretable decisions with comparable accuracy. The approach is validated through experiments on Lunar Lander and Pong, demonstrating that symbolic representations can effectively replace neural agents while enhancing transparency and trustworthiness. Our findings suggest that this approach mitigates the black-box nature of neural networks, providing a pathway toward more transparent and trustworthy AI systems. The implications of this research are significant for fields requiring both high performance and explainability, such as autonomous systems, healthcare, and financial modeling.
260

Towards Human-level Dexterity via Robot Learning

Khandate, Gagan Muralidhar January 2024 (has links)
Dexterous intelligence—the ability to perform complex interactions with multi-fingered hands—is a pinnacle of human physical intelligence and emergent higher-order cognitive skills. However, contrary to Moravec's paradox, dexterous intelligence in humans appears simple only superficially. Many million years were spent co-evolving the human brain and hands including rich tactile sensing. Achieving human-level dexterity with robotic hands has long been a fundamental goal in robotics and represents a critical milestone toward general embodied intelligence. In this pursuit, computational sensorimotor learning has made significant progress, enabling feats such as arbitrary in-hand object reorientation. However, we observe that achieving higher levels of dexterity requires overcoming very fundamental limitations of computational sensorimotor learning. I develop robot learning methods for highly dexterous multi-fingered manipulation by directly addressing these limitations at their root cause. Chiefly, through key studies, this disseration progressively builds an effective framework for reinforcement learning of dexterous multi-fingered manipulation skills. These methods adopt structured exploration, effectively overcoming the limitations of random exploration in reinforcement learning. The insights gained culminate in a highly effective reinforcement learning that incorporates sampling-based planning for direct exploration. Additionally, this thesis explores a new paradigm of using visuo-tactile human demonstrations for dexterity, introducing corresponding imitation learning techniques.

Page generated in 0.122 seconds