Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
241 |
Use of Reinforcement Learning for Interference Avoidance or Efficient Jamming in Wireless CommunicationsSchutz, Zachary Alexander 05 June 2024 (has links)
We implement reinforcement learning in the context of wireless communications in two very different settings. In the first setting, we study the use of reinforcement learning in an underwater acoustic communications network to adapt its transmission frequencies to avoid interference and potential malicious jammers. To that effect, we implement a reinforcement learning algorithm called contextual bandits. The harsh environment of an underwater channel provides a challenging problem. The channel may induce multipath and time delays which lead to time-varying, frequency-selective attenuation. These factors are also influenced by the distance between the transmitter and receiver, the subbands the interference is located within, and the power of the transmitter. We show that the agent is effectively able to avoid frequency bands that have degraded channel quality or that contain interference, both of which are dynamic or time-varying .
In the second setting, we study the use of reinforcement learning to adapt the modulation and power scheme of a jammer seeking to disrupt a wireless communications system. To achieve this, we make use of a linear contextual bandit to learn to jam the victim system.
Prior work has shown that with the use of linear bandits, improved convergence is achieved to jam a single-carrier system using time-domain jamming schemes. However, communications systems today typically employ orthogonal frequency division multiplexing (OFDM) to transmit data, particularly in 4G/5G networks. This work explores the use of linear Thompson Sampling (TS) to jam OFDM-modulated signals. The jammer may select from both time-domain and frequency-domain jamming schemes. We demonstrate that the linear TS algorithm is able to perform better than a traditional reinforcement learning algorithm, upper confidence bound-1 (UCB-1), in terms of maximizing the victim's symbol error rate.
We also draw novel insights by observing the action states, to which the reinforcement learning algorithm converges.
We then investigate the design and modification of the context vector in the hope of in- creasing overall performance of the bandit, such as decreased learning period and increased symbol error rate caused to the victim. This includes running experiments on particular features and examining how the bandit weights the importance of the features in the context vector.
Lastly, we study how to jam an OFDM-modulated signal which employs forward error correction coding. We extend this to leverage reinforcement learning to jam a 5G-based system implementing some aspects of the 5G protocol. This model is then modified to introduce unreliable reward feedback in the form of ACK/NACK observations to the jammer to understand the effect of how imperfect observations of errors can affect the jammer's ability to learn.
We gain insights into the convergence time of the jammer and its ability to jam the victim, as well as improvements to the algorithm, and insights into the vulnerabilities of wireless communications for reinforcement learning based jamming. / Master of Science / In this thesis we implement a class of reinforcement learning known as contextual bandits in two different applications of communications systems and jamming. In the first setting, we study the use of reinforcement learning in an underwater acoustic communications network to adapt its transmission frequencies to avoid interference and potential malicious jammers.
We show that the agent is effectively able to avoid frequency bands that have degraded channel quality or that contain interference, both of which are dynamic or time-varying.
In the second setting, we study the use of reinforcement learning to adapt the jamming type, such as using additive white Gaussian noise, and power scheme of a jammer seeking to disrupt a wireless communications system. To achieve this, we make use of a linear contextual bandit which implies that the contexts that the jammer is able to observe and the sampled probability of each arm has a linear relationship with the reward function.
We demonstrate that the linear algorithm is able to outperform a traditional reinforcement learning algorithm in terms of maximizing the victim's symbol error rate. We extend this work by examining the impact of the context feature vector design, LTE/5G-based protocol specifics (such as error correction coding), and imperfect reward feedback information. We gain insights into the convergence time of the jammer and its ability to jam the victim, as well as improvements to the algorithm, and insights into the vulnerabilities of wireless communications for reinforcement learning based jamming.
|
242 |
Derivative-Free Meta-Blackbox Optimization on ManifoldSel, Bilgehan 06 1900 (has links)
Solving a sequence of high-dimensional, nonconvex, but potentially similar optimization problems poses a significant computational challenge in various engineering applications. This thesis presents the first meta-learning framework that leverages the shared structure among sequential tasks to improve the computational efficiency and sample complexity of derivative-free optimization. Based on the observation that most practical high-dimensional functions lie on a latent low-dimensional manifold, which can be further shared among problem instances, the proposed method jointly learns the meta-initialization of a search point and a meta-manifold. This novel approach enables the efficient adaptation of the optimization process to new tasks by exploiting the learned meta-knowledge. Theoretically, the benefit of meta-learning in this challenging setting is established by proving that the proposed method achieves improved convergence rates and reduced sample complexity compared to traditional derivative-free optimization techniques. Empirically, the effectiveness of the proposed algorithm is demonstrated in two high-dimensional reinforcement learning tasks, showcasing its ability to accelerate learning and improve performance across multiple domains. Furthermore, the robustness and generalization capabilities of the meta-learning framework are explored through extensive ablation studies and sensitivity analyses. The thesis highlights the potential of meta-learning in tackling complex optimization problems and opens up new avenues for future research in this area. / Master of Science / Optimization problems are ubiquitous in various fields, from engineering to finance, where the goal is to find the best solution among a vast number of possibilities. However, solving these problems can be computationally challenging, especially when the search space is high-dimensional and the problem is nonconvex, meaning that there may be multiple locally optimal solutions. This thesis introduces a novel approach to tackle these challenges by leveraging the power of meta-learning, a technique that allows algorithms to learn from previous experiences and adapt to new tasks more efficiently.
The proposed framework is based on the observation that many real-world optimization problems share similar underlying structures, even though they may appear different on the surface. By exploiting this shared structure, the meta-learning algorithm can learn a low-dimensional representation of the problem space, which serves as a guide for efficiently searching for optimal solutions in new, unseen problems. This approach is particularly useful when dealing with a sequence of related optimization tasks, as it allows the algorithm to transfer knowledge from one task to another, thereby reducing the computational burden and improving the overall performance.
The effectiveness of the proposed meta-learning framework is demonstrated through rigorous theoretical analysis and empirical evaluations on challenging reinforcement learning tasks. These tasks involve high-dimensional search spaces and require the algorithm to adapt to changing environments. The results show that the meta-learning approach can significantly accelerate the learning process and improve the quality of the solutions compared to traditional optimization methods.
|
243 |
Reliable Low Latency Machine Learning for Resource Management in Wireless NetworksTaleb Zadeh Kasgari, Ali 30 March 2022 (has links)
Next-generation wireless networks must support a plethora of new applications ranging from the Internet of Things to virtual reality. Each one of these emerging applications have unique rate, reliability, and latency requirements that substantially differ from traditional services such as video streaming. Hence, there is a need for designing an efficient resource management framework that is taking into account different components that can affect the resource usage, including less obvious factors such as human behavior that contribute to the resource usage of the system. The use of machine learning for modeling mentioned components in a resource management system is a promising solution. This is because many hidden factors might contribute to the resource usage pattern of users or machine-type devices that can only be modeled using an end-to-end machine learning solution. Therefore, machine learning algorithms can be used either for modeling a complex factor such as the human brain's delay perception or for designing an end-to-end resource management system. The overarching goal of this dissertation is to develop and deploy machine learning frameworks that are suitable to model the various components of a wireless resource management system that must provide reliable and low latency service to the users. First, by explicitly modeling the limitations of the human brain, a concrete measure for the delay perception of human users in a wireless network is introduced. Then, a new probabilistic model for this delay perception is learned based on the brain features of a human user. Given the learned model for the delay perception of the human brain, a brain-aware resource management algorithm is proposed for allocating radio resources to human users while minimizing the transmit power and taking into account the reliability of both machine type devices and human users. Next, a novel experienced deep reinforcement learning (deep-RL) framework is proposed to provide model-free resource allocation for ultra reliable low latency communication (URLLC) in the downlink of a wireless network. The proposed, experienced deep-RL framework can guarantee high end-to-end reliability and low end-to-end latency, under explicit data rate constraints, for each wireless user without any models of or assumptions on the users' traffic. In particular, in order to enable the deep-RL framework to account for extreme network conditions and operate in highly reliable systems, a new approach based on generative adversarial networks (GANs) is proposed. After that, the problem of network slicing is studied in the context of a wireless system having a time-varying number of users that require two types of slices: reliable low latency (RLL) and self-managed (capacity limited) slices. To address this problem, a novel control framework for stochastic optimization is proposed based on the Lyapunov drift-plus-penalty method. This new framework enables the system to minimize power, maintain slice isolation, and provide reliable and low latency end-to-end communication for RLL slices. Then, a novel concept of three-dimensional (3D) cellular networks, that integrate drone base stations (drone-BS) and cellular-connected drone users (drone-UEs), is introduced. For this new 3D cellular architecture, a novel framework for network planning for drone-BSs as well as latency-minimal cell association for drone-UEs is proposed. For network planning, a tractable method for drone-BSs' deployment based on the notion of truncated octahedron shapes is proposed that ensures full coverage for a given space with minimum number of drone-BSs. In addition, to characterize frequency planning in such 3D wireless networks, an analytical expression for the feasible integer frequency reuse factors is derived. Subsequently, an optimal 3D cell association scheme is developed for which the drone-UEs' latency, considering transmission, computation, and backhaul delays, is minimized. Finally, the concept of super environments is introduced. After formulating this concept mathematically, it is shown that any two markov decision process (MDP) can be a member of a super environment if sufficient additional state space is added. Then the effect of this additional state space on model-free and model-based deep-RL algorithms is investigated. Next, the tradeoff caused by adding the extra state space on the speed of convergence and the optimality of the solution is discussed. In summary, this dissertation led to the development of machine learning algorithms for statistically modeling complex parts in the resource management system. Also, it developed a model-free controller that can control the resource management system reliably, with low latency, and optimally. / Doctor of Philosophy / Next-generation wireless networks must support a plethora of new applications ranging from the Internet of Things to virtual reality. Each one of these emerging applications have unique requirements that substantially differ from traditional services such as video streaming. Hence, there is a need for designing a new and efficient resource management framework that is taking into account different components that can affect the resource usage, including less obvious factors such as human behavior that contributes to the resource usage of the system. The use of machine learning for modeling mentioned components in a resource management system is a promising solution. This is because of the data-driven nature of machine learning algorithms that can help us to model many hidden factors that might contribute to the resource usage pattern of users or devices. These hidden factors can only be modeled using an end-to-end machine learning solution. By end-to-end, we mean the system only relies on its observation of the quality of service (QoS) for users. Therefore, machine learning algorithms can be used either for modeling a complex factor such as the human brain's delay perception or for designing an end-to-end resource management system. The overarching goal of this dissertation is to develop and deploy machine learning frameworks that are suitable to model the various components of a wireless resource management system that must provide reliable and low latency service to the users.
|
244 |
Deep Reinforcement Learning for Next Generation Wireless Networks with Echo State NetworksChang, Hao-Hsuan 26 August 2021 (has links)
This dissertation considers a deep reinforcement learning (DRL) setting under the practical challenges of real-world wireless communication systems. The non-stationary and partially observable wireless environments make the learning and the convergence of the DRL agent challenging. One way to facilitate learning in partially observable environments is to combine recurrent neural network (RNN) and DRL to capture temporal information inherent in the system, which is referred to as deep recurrent Q-network (DRQN). However, training DRQN is known to be challenging requiring a large amount of training data to achieve convergence. In many targeted wireless applications in the 5G and future 6G wireless networks, the available training data is very limited. Therefore, it is important to develop DRL strategies that are capable of capturing the temporal correlation of the dynamic environment that only requires limited training overhead. In this dissertation, we design efficient DRL frameworks by utilizing echo state network (ESN), which is a special type of RNNs where only the output weights are trained. To be specific, we first introduce the deep echo state Q-network (DEQN) by adopting ESN as the kernel of deep Q-networks. Next, we introduce federated ESN-based policy gradient (Fed-EPG) approach that enables multiple agents collaboratively learn a shared policy to achieve the system goal. We designed computationally efficient training algorithms by utilizing the special structure of ESNs, which have the advantage of learning a good policy in a short time with few training data. Theoretical analyses are conducted for DEQN and Fed-EPG approaches to show the convergence properties and to provide a guide to hyperparameter tuning. Furthermore, we evaluate the performance under the dynamic spectrum sharing (DSS) scenario, which is a key enabling technology that aims to utilize the precious spectrum resources more efficiently. Compared to a conventional spectrum management policy that usually grants a fixed spectrum band to a single system for exclusive access, DSS allows the secondary system to dynamically share the spectrum with the primary system. Our work sheds light on the real deployments of DRL techniques in next generation wireless systems. / Doctor of Philosophy / Model-free reinforcement learning (RL) algorithms such as Q-learning are widely used because it can learn the policy directly through interactions with the environment without estimating a model of the environment, which is useful when the underlying system model is complex. Q-learning performs poorly for large-scale models because the training has to updates every element in a large Q-table, which makes training difficult or even impossible. Therefore, deep reinforcement learning (DRL) exploits the powerful deep neural network to approximate the Q-table. Furthermore, a deep recurrent Q-network (DRQN) is introduced to facilitate learning in partially observable environments. However, DRQN training requires a large amount of training data and a long training time to achieve convergence, which is impractical in wireless systems with non-stationary environments and limited training data. Therefore, in this dissertation, we introduce two efficient DRL approaches: deep echo state Q-network (DEQN) and federated ESN-based policy gradient (Fed-EPG) approaches. Theoretical analyses of DEQN and Fed-EPG are conducted to provide the convergence properties and the guideline for designing hyperparameters. We evaluate and demonstrate the performance benefits of the DEQN and Fed-EPG under the dynamic spectrum sharing (DSS) scenario, which is a critical technology to efficiently utilize the precious spectrum resources in 5G and future 6G wireless networks.
|
245 |
Understanding social function in psychiatric illnesses through computational modeling and multiplayer gamesCui, Zhuoya 26 May 2021 (has links)
Impaired social functioning conferred by mental illnesses has been constantly implicated in previous literatures. However, studies of social abnormalities in psychiatric conditions are often challenged by the difficulties of formalizing dynamic social exchanges and quantifying their neurocognitive underpinnings. Recently, the rapid growth of computational psychiatry as a new field along with the development of multiplayer economic paradigms provide powerful tools to parameterize complex interpersonal processes and identify quantitative indicators of social impairments. By utilizing these methodologies, the current set of studies aimed to examine social decision making during multiplayer economic games in participants diagnosed with depression (study 1) and combat-related post-traumatic stress disorder (PTSD, study 2), as well as an online population with elevated symptoms of borderline personality disorder (BPD, study 3). We then quantified and disentangled the impacts of multiple latent decision-making components, mainly social valuation and social learning, on maladaptive social behavior via explanatory modeling. Different underlying alterations were revealed across diagnoses. Atypical social exchange in depression and BPD were found attributed to altered social valuation and social learning respectively, whereas both social valuation and social learning contributed to interpersonal dysfunction in PTSD. Additionally, model-derived indices of social abnormalities positively correlated with levels of symptom severity (study 1 and 2) and exhibited a longitudinal association with symptom change (study 1). Our findings provided mechanistic insights into interpersonal difficulties in psychiatric illnesses, and highlighted the importance of a computational understanding of social function which holds potential clinical implications in differential diagnosis and precise treatment. / Doctor of Philosophy / People with psychiatric conditions often suffer from impaired social relationships due to an inability to engage in everyday social interactions. As different illnesses can sometimes produce the same symptoms, social impairment can also have different causes. For example, individuals who constantly avoid social activities may find them less interesting or attempt to avoid potential negative experiences. While those who display elevated aggression may have a strong desire for social dominance or falsely believe that others are also aggressive. However, it is hard to infer what drives these alterations by just observing the behavior. To address this question, we enrolled people with three different kinds of psychopathology to play an interactive game together with another player and mathematically modeled their latent decision-making processes. By comparing their model parameters to those of the control population, we were able to infer how people with psychopathology made the decisions and which part of the decision-making processes went wrong that led to disrupted social interactions. We found altered model parameters differed among people with major depression, post-traumatic stress disorder and borderline personality disorder, suggesting different causes underlying impaired social behavior observed in the game, the extent of which also positively correlated with their psychiatric symptom severity. Understanding the reasons behind social dysfunctions associated with psychiatric illnesses can help us better differentiate people with different diagnoses and design more effective treatments to restore interpersonal relationships.
|
246 |
The Impact of Threat on Behavioral and Neural Markers of Learning in AnxietyValdespino, Andrew 28 August 2019 (has links)
Anxiety is characterized by apprehensive expectation regarding the forecasted outcomes of choice. Decision science and in particular reinforcement learning models provide a quantitative framework to explain how the likelihood and value of such outcomes are estimated, thus allowing the measurement of parameters of decision-making that may differ between high- and low- anxiety groups. However, the role of anxiety in choice allocation is not sufficiently understood, particularly regarding the influence of transient threat on current decisions. The presence of threat appears to alter choice behavior and may differentially influence quantitatively derived parameters of learning among anxious individuals. Regarding the neurobiology of reinforcement learning, the dorsolateral prefrontal cortex (dlPFC) has been suggested to play a role in temporally integrating experienced outcomes, as well as in coordinating an overall choice action plan, both of which can be described computationally by learning rate and exploration, respectively. Accordingly, it was hypothesized that high trait anxiety would be associated with a lower reward learning rate, a higher loss learning rate, and diminished exploration of available options, and furthermore that threat would increase the magnitude of these parameters in the high anxiety group. We also hypothesized that the magnitude of neural activation (measured by functional near-infrared spectroscopy; FNIRS) across dissociable regions of the left and right dlPFC would be associated with model parameters, and that threat would further increase the magnitude of activation to model parameters. Finally, it was hypothesized that reward and loss outcomes could be differentiated based on FNIRS channel activation, and that a distinct set of channels would differentiate outcomes in high relative to low anxiety groups. To test these hypotheses, a temporal difference learning model was applied to a decision-making (bandit) task to establish differences in learning parameter magnitudes among individuals high (N=26) and low (N=20) in trait anxiety, as well as the impact of threat on learning parameters.
Results indicated a positive association between anxiety and both the reward and loss learning rate parameters. However, threat was not found to impact model parameters. Imaging results indicated a positive association between exploration and the left dlPFC. Reward and loss outcomes were successfully differentiated in the high, but not low anxiety group.
Results add to a growing literature suggesting anxiety is characterized by differential sensitivity to both losses and rewards in reinforcement learning contexts, and further suggests that the dlPFC plays a role in modulating exploration-based choice strategies. / Doctor of Philosophy / Anxiety is characterized by worry about possible future negative outcomes. Mathematical models in the area of learning theory allow the representation and measurement of individual differences in decision-making tendencies that contribute to negative future apprehension. Currently, the role of anxiety in the allocation of choices, and particularly the influence of threat on decision-making is poorly understood. Threat may influence learning and alter choice behavior, collectively causing negative future apprehension. With regards to how related decision-making is computed in the brain, the dorsolateral prefrontal cortex (dlPFC) has been suggested to play a role tracking and integrating current and past experienced outcomes, in order to coordinate an overall action plan. Outcome tracking and action plan coordination can be represented mathematically within a learning theory framework by learning rate and exploration parameters, respectively. It was hypothesized that high anxiety would be associated with a lower reward learning rate, a higher loss learning rate, and diminished exploration, and furthermore that threat would increase the magnitude of these tendencies in anxious individuals. We also hypothesized that brain activation in the dlPFC would be associated with these tendencies, and that threat would further increase activation in these brain areas. It was also hypothesized that reward and loss outcomes could be differentiated based on brain activation in the dlPFC. To test these hypotheses, a mathematical model was applied to establish differences in learning within high and low anxiety individuals, as well as to test the impact of threat on these learning tendencies. Results indicated a positive association between anxiety and the rate of learning to reward and loss outcomes. Threat was not found to impact these learning rates. A positive association was found between activation in the dlPFC and the tendency to explore. Reward and loss outcomes were successfully differentiated based on brain activation in high, but not low anxiety individuals. Results add to a growing literature suggesting that anxiety is characterized by differential sensitivity to both losses and rewards, and further adds to our understanding of how the brain computes exploration-based choice strategies.
|
247 |
Predicting Mutational Pathways of Influenza A H1N1 Virus using Q-learningAarathi Raghuraman, FNU 13 August 2021 (has links)
Influenza is a seasonal viral disease affecting over 1 billion people annually around the globe, as reported by the World Health Organization (WHO). The influenza virus has been around for decades causing multiple pandemics and encouraging researchers to perform extensive analysis of its evolutionary patterns. Current research uses phylogenetic trees as the basis to guide population genetics and other phenotypic characteristics when describing the evolution of the influenza genome. Phylogenetic trees are one form of representing the evolutionary trends of sequenced genomes, but that do not capture the multidimensional complexity of mutational pathways. We suggest representing antigenic drifts within influenza A/H1N1 hemagglutinin (HA) protein as a graph, $G = (V, E)$, where $V$ is the set of vertices representing each possible sequence and $E$ is the set of edges representing single amino acid substitutions. Each transition is characterized by a Malthusian fitness model incorporating the genetic adaptation, vaccine similarity, and historical epidemiological response using mortality as the metric where available. Applying reinforcement learning with the vertices as states, edges as actions, and fitness as the reward, we learn the high likelihood mutational pathways and optimal policy, without exploring the entire space of the graph, $G$. Our average predicted versus actual sequence distance of $3.6 \pm 1.2$ amino acids indicates that our novel approach of using naive Q-learning can assist with influenza strain predictions, thus improving vaccine selection for future disease seasons. / Master of Science / Influenza is a seasonal virus affecting over 1 billion people annually around the globe, as reported by the World Health Organization (WHO). The effectiveness of influenza vaccines varies tremendously by the type (A, B, C or D) and season. Of note is the pandemic of 2009, where the influenza A H1N1 virus mutants were significantly different from the chosen vaccine composition. It is pertinent to understand and predict the underlying genetic and environmental behavior of influenza virus mutants to be able to determine the vaccine composition for future seasons, preventing another pandemic. Given the recent 2020 COVID-19 pandemic, which is also a virus that affects the upper respiratory system, novel approaches to predict viruses need to be investigated now more than ever. Thus, in this thesis, I develop a novel approach to predicting a portion of the influenza A H1N1 viruses using machine learning.
|
248 |
Random Access Control In Massive Cellular Internet of Things: A Multi-Agent Reinforcement Learning ApproachBai, Jianan 14 January 2021 (has links)
Internet of things (IoT) is envisioned as a promising paradigm to interconnect enormous
wireless devices. However, the success of IoT is challenged by the difficulty of access management
of the massive amount of sporadic and unpredictable user traffics. This thesis focuses
on the contention-based random access in massive cellular IoT systems and introduces two
novel frameworks to provide enhanced scalability, real-time quality of service management,
and resource efficiency. First, a local communication based congestion control framework
is introduced to distribute the random access attempts evenly over time under bursty traffic.
Second, a multi-agent reinforcement learning based preamble selection framework is
designed to increase the access capacity under a fixed number of preambles. Combining the
two mechanisms provides superior performance under various 3GPP-specified machine type
communication evaluation scenarios in terms of achieving much lower access latency and
fewer access failures. / Master of Science / In the age of internet of things (IoT), massive amount of devices are expected to be connected
to the wireless networks in a sporadic and unpredictable manner. The wireless connection
is usually established by contention-based random access, a four-step handshaking process
initiated by a device through sending a randomly selected preamble sequence to the base
station. While different preambles are orthogonal, preamble collision happens when two
or more devices send the same preamble to a base station simultaneously, and a device
experiences access failure if the transmitted preamble cannot be successfully received and
decoded. A failed device needs to wait for another random access opportunity to restart the
aforementioned process and hence the access delay and resource consumption are increased.
The random access control in massive IoT systems is challenged by the increased access
intensity, which results in higher collision probability. In this work, we aim to provide better
scalability, real-time quality of service management, and resource efficiency in random access
control for such systems. Towards this end, we introduce 1) a local communication based
congestion control framework by enabling a device to cooperate with neighboring devices
and 2) a multi-agent reinforcement learning (MARL) based preamble selection framework by
leveraging the ability of MARL in forming the decision-making policy through the collected
experience. The introduced frameworks are evaluated under the 3GPP-specified scenarios
and shown to outperform the existing standard solutions in terms of achieving lower access
delays with fewer access failures.
|
249 |
Machine Learning and Artificial Intelligence Application in Process ControlWang, Xiaonian January 2024 (has links)
This thesis consists of four chapters including two main contributions on the application of machine learning and artificial intelligence on process modeling and controller design.
Chapter 2 will talk about applying AI to controller design. This chapter proposes and implements a novel reinforcement learning (RL)--based controller design on chemical engineering examples. To address the issue of costly and unsafe training of model-free RL-based controllers, we propose an implementable RL-based controller design that leverages offline MPC calculations, that have already developed based on a step-response model. In this method, a RL agent is trained to imitate the MPC performance. Then, the trained agent is utilized in a model-free RL framework to interact with the actual process so as to continuously learn and optimize its performance under a safe operating range of processes. This contribution is marked as the first implementable RL-based controller for practical industrial application.
Chapter 3 will focus on AI applications in process modeling. As nonlinear dynamics are widely encountered and challenging to simulate, nonlinear MPC (NMPC) is recognized as a promising tool to tackle this challenge. However, the lack of a reliable nonlinear model remains a roadblock for this technique. To address this issue, we develop a novel data-driven modeling method that utilizes the nonlinear autoencoder, to result in a modeling technique where the nonlinearity in the model stems from the analysis of the measured variables. Moreover, a quadratic program (QP) based MPC is developed based on this model, by utilizing an autoencoder as a transformer between the controller and process. This work contributes as an extension of the classic Koopman operator modeling method and a remarkable linear MPC design that can outperform other NMPCs such as neural network-based MPC. / Thesis / Master of Applied Science (MASc)
|
250 |
ACADIA: Efficient and Robust Adversarial Attacks Against Deep Reinforcement LearningAli, Haider 05 January 2023 (has links)
Existing adversarial algorithms for Deep Reinforcement Learning (DRL) have largely focused on identifying an optimal time to attack a DRL agent. However, little work has been explored in injecting efficient adversarial perturbations in DRL environments. We propose a suite of novel DRL adversarial attacks, called ACADIA, representing AttaCks Against Deep reInforcement leArning. ACADIA provides a set of efficient and robust perturbation-based adversarial attacks to disturb the DRL agent's decision-making based on novel combinations of techniques utilizing momentum, ADAM optimizer (i.e., Root Mean Square Propagation or RMSProp), and initial randomization. These kinds of DRL attacks with novel integration of such techniques have not been studied in the existing Deep Neural Networks (DNNs) and DRL research. We consider two well-known DRL algorithms, Deep-Q Learning Network (DQN) and Proximal Policy Optimization (PPO), under Atari games and MuJoCo where both targeted and non-targeted attacks are considered with or without the state-of-the-art defenses in DRL (i.e., RADIAL and ATLA). Our results demonstrate that the proposed ACADIA outperforms existing gradient-based counterparts under a wide range of experimental settings. ACADIA is nine times faster than the state-of-the-art Carlini and Wagner (CW) method with better performance under defenses of DRL. / Master of Science / Artificial Intelligence (AI) techniques such as Deep Neural Networks (DNN) and Deep Reinforcement Learning (DRL) are prone to adversarial attacks. For example, a perturbed stop sign can force a self-driving car's AI algorithm to increase the speed rather than stop the vehicle. There has been little work developing attacks and defenses against DRL. In DRL, a DNN-based policy decides to take an action based on the observation of the environment and gets the reward in feedback for its improvements. We perturb that observation to attack the DRL agent. There are two main aspects to developing an attack on DRL. One aspect is to identify an optimal time to attack (when-to-attack?). The second aspect is to identify an efficient method to attack (how-to-attack?). To answer the second aspect, we propose a suite of novel DRL adversarial attacks, called ACADIA, representing AttaCks Against Deep reInforcement leArning. We consider two well-known DRL algorithms, Deep-Q Learning Network (DQN) and Proximal Policy Optimization (PPO), under DRL environments of Atari games and MuJoCo where both targeted and non-targeted attacks are considered with or without state-of-the-art defenses. Our results demonstrate that the proposed ACADIA outperforms state-of-the-art perturbation methods under a wide range of experimental settings. ACADIA is nine times faster than the state-of-the-art Carlini and Wagner (CW) method with better performance under the defenses of DRL.
|
Page generated in 0.1208 seconds