Global ETD Search

171	Modeling, Analysis and Control of Nonlinear Switching Systems Kaisare, Niket S. 22 December 2004 (has links) The first part of this two-part thesis examines the reverse-flow operation of auto-thermal methane reforming in a microreactor. A theoretical study is undertaken to explain the physical origins of the experimentally observed improvements in the performance of the reverse-flow operation compared to the unidirectional operation. First, a scaling analysis is presented to understand the effect of various time scales existing within the microreactor, and to obtain guidelines for the optimal reverse-flow operation. Then, the effect of kinetic parameters, transport properties, reactor design and operating conditions on the reactor operation is parametrically studied through numerical simulations. The reverse-flow operation is shown to be more robust than the unidirectional operation with respect to both optimal operating conditions as well as variations in hydrogen throughput requirements. A rational scheme for improved catalyst placement in the microreactor, which exploits the spatial temperature profiles in the reactor, is also presented. Finally, a design modification of the microreactor called "opposed-flow" reactor, which retains the performance benefits of the reverse-flow operation without requiring the input / output port switching, is suggested. In the second part of this thesis, a novel simulation-based Approximate Dynamic Programming (ADP) framework is presented for optimal control of switching between multiple metabolic states in a microbial bioreactor. The cybernetic modeling framework is used to capture these cellular metabolic switches. Model Predictive Control, one of the most popular advanced control methods, is able to drive the reactor to the desired steady state. However, the nonlinearity and switching nature of the system cause computational and performance problems with MPC. The proposed ADP has an advantage over MPC, as the closed-loop optimal policy is computed offline in the form of so-called value or cost-to-go function. Through the use of an approximation of the value function, the infinite horizon problem is converted into an equivalent single-stage problem, which can be solved online. Various issues in implementation of ADP are also addressed. Reinforcement learning Partial oxidation Microreactor Reverse flow Dynamic programming
172	Development and evaluation of an arterial adaptive traffic signal control system using reinforcement learning Xie, Yuanchang 15 May 2009 (has links) This dissertation develops and evaluates a new adaptive traffic signal control system for arterials. This control system is based on reinforcement learning, which is an important research area in distributed artificial intelligence and has been extensively used in many applications including real-time control. In this dissertation, a systematic comparison between the reinforcement learning control methods and existing adaptive traffic control methods is first presented from the theoretical perspective. This comparison shows both the connections between them and the benefits of using reinforcement learning. A Neural-Fuzzy Actor-Critic Reinforcement Learning (NFACRL) method is then introduced for traffic signal control. NFACRL integrates fuzzy logic and neural networks into reinforcement learning and can better handle the curse of dimensionality and generalization problems associated with ordinary reinforcement learning methods. This NFACRL method is first applied to isolated intersection control. Two different implementation schemes are considered. The first scheme uses a fixed phase sequence and variable cycle length, while the second one optimizes phase sequence in real time and is not constrained to the concept of cycle. Both schemes are further extended for arterial control, with each intersection being controlled by one NFACRL controller. Different strategies used for coordinating reinforcement learning controllers are reviewed, and a simple but robust method is adopted for coordinating traffic signals along the arterial. The proposed NFACRL control system is tested at both isolated intersection and arterial levels based on VISSIM simulation. The testing is conducted under different traffic volume scenarios using real-world traffic data collected during morning, noon, and afternoon peak periods. The performance of the NFACRL control system is compared with that of the optimized pre-timed and actuated control. Testing results based on VISSIM simulation show that the proposed NFACRL control has very promising performance. It outperforms optimized pre-timed and actuated control in most cases for both isolated intersection and arterial control. At the end of this dissertation, issues on how to further improve the NFACRL method and implement it in real world are discussed. Reinforcement Learning Adaptive Traffic Signal Control Arterial Dynamic Programming Agent
173	Discretization and Approximation Methods for Reinforcement Learning of Highly Reconfigurable Systems Lampton, Amanda K. 2009 December 1900 (has links) There are a number of techniques that are used to solve reinforcement learning problems, but very few that have been developed for and tested on highly reconfigurable systems cast as reinforcement learning problems. Reconfigurable systems refers to a vehicle (air, ground, or water) or collection of vehicles that can change its geometrical features, i.e. shape or formation, to perform tasks that the vehicle could not otherwise accomplish. These systems tend to be optimized for several operating conditions, and then controllers are designed to reconfigure the system from one operating condition to another. Q-learning, an unsupervised episodic learning technique that solves the reinforcement learning problem, is an attractive control methodology for reconfigurable systems. It has been successfully applied to a myriad of control problems, and there are a number of variations that were developed to avoid or alleviate some limitations in earlier version of this approach. This dissertation describes the development of three modular enhancements to the Q-learning algorithm that solve some of the unique problems that arise when working with this class of systems, such as the complex interaction of reconfigurable parameters and computationally intensive models of the systems. A multi-resolution state-space discretization method is developed that adaptively rediscretizes the state-space by progressively finer grids around one or more distinct Regions Of Interest within the state or learning space. A genetic algorithm that autonomously selects the basis functions to be used in the approximation of the action-value function is applied periodically throughout the learning process. Policy comparison is added to monitor the state of the policy encoded in the action-value function to prevent unnecessary episodes at each level of discretization. This approach is validated on several problems including an inverted pendulum, reconfigurable airfoil, and reconfigurable wing. Results show that the multi-resolution state-space discretization method reduces the number of state-action pairs, often by an order of magnitude, required to achieve a specific goal and the policy comparison prevents unnecessary episodes once the policy has converged to a usable policy. Results also show that the genetic algorithm is a promising candidate for the selection of basis functions for function approximation of the action-value function. multi-resolution discretization reinforcement learning highly recofigurable system morphing
174	A unifying framework for computational reinforcement learning theory Li, Lihong, January 2009 (has links) Thesis (Ph. D.)--Rutgers University, 2009. / "Graduate Program in Computer Science." Includes bibliographical references (p. 238-261).
175	Autonomous qualitative learning of distinctions and actions in a developing agent Mugan, Jonathan William 23 November 2010 (has links) How can an agent bootstrap up from a pixel-level representation to autonomously learn high-level states and actions using only domain general knowledge? This thesis attacks a piece of this problem and assumes that an agent has a set of continuous variables describing the environment and a set of continuous motor primitives, and poses a solution for the problem of how an agent can learn a set of useful states and effective higher-level actions through autonomous experience with the environment. There exist methods for learning models of the environment, and there also exist methods for planning. However, for autonomous learning, these methods have been used almost exclusively in discrete environments. This thesis proposes attacking the problem of learning high-level states and actions in continuous environments by using a qualitative representation to bridge the gap between continuous and discrete variable representations. In this approach, the agent begins with a broad discretization and initially can only tell if the value of each variable is increasing, decreasing, or remaining steady. The agent then simultaneously learns a qualitative representation (discretization) and a set of predictive models of the environment. The agent then converts these models into plans to form actions. The agent then uses those learned actions to explore the environment. The method is evaluated using a simulated robot with realistic physics. The robot is sitting at a table that contains one or two blocks, as well as other distractor objects that are out of reach. The agent autonomously explores the environment without being given a task. After learning, the agent is given various tasks to determine if it learned the necessary states and actions to complete them. The results show that the agent was able to use this method to autonomously learn to perform the tasks. / text Artificial intelligence Robotics Machine learning Reinforcement learning Discretization Qualitative learning
176	Model-based active learning in hierarchical policies Cora, Vlad M. 05 1900 (has links) Hierarchical task decompositions play an essential role in the design of complex simulation and decision systems, such as the ones that arise in video games. Game designers find it very natural to adopt a divide-and-conquer philosophy of specifying hierarchical policies, where decision modules can be constructed somewhat independently. The process of choosing the parameters of these modules manually is typically lengthy and tedious. The hierarchical reinforcement learning (HRL) field has produced elegant ways of decomposing policies and value functions using semi-Markov decision processes. However, there is still a lack of demonstrations in larger nonlinear systems with discrete and continuous variables. To narrow this gap between industrial practices and academic ideas, we address the problem of designing efficient algorithms to facilitate the deployment of HRL ideas in more realistic settings. In particular, we propose Bayesian active learning methods to learn the relevant aspects of either policies or value functions by focusing on the most relevant parts of the parameter and state spaces respectively. To demonstrate the scalability of our solution, we have applied it to The Open Racing Car Simulator (TORCS), a 3D game engine that implements complex vehicle dynamics. The environment is a large topological map roughly based on downtown Vancouver, British Columbia. Higher level abstract tasks are also learned in this process using a model-based extension of the MAXQ algorithm. Our solution demonstrates how HRL can be scaled to large applications with complex, discrete and continuous non-linear dynamics. Hierarchical Reinforcement Learning Decision Theory Bayesian Active Learning Robotics
177	A service-oriented approach to topology formation and resource discovery in wireless ad-hoc networks Gonzalez Valenzuela, Sergio 05 1900 (has links) The past few years have witnessed a significant evolution in mobile computing and communications, in which new trends and applications have the traditional role of computer networks into that of distributed service providers. In this thesis we explore an alternative way to form wireless ad-hoc networks whose topologies can be customized as required by the users’ software applications. In particular, we investigate the applicability of mobile codes to networks created by devices equipped with Bluetooth technology. Computer simulations results suggest that our proposed approach can achieve this task effectively, while matching the level of efficiency seen in other salient proposals in this area. This thesis also addresses the issue of service discovery in mobile ad-hoc networks. We propose the use of a directory whose network location varies in an attempt to reduce traffic overhead driven by users’ hosts looking for service information. We refer to this scheme as the Service Directory Placement Algorithm, or SDPA. We formulate the directory relocation problem as a Markov Decision Process that is solved by using Q-learning. Performance evaluations through computer simulations reveal bandwidth overhead reductions that range between 40% and 48% when compared with a basic broadcast flooding approach for networks comprising hosts moving at pedestrian speeds. We then extend our proposed approach and introduce a multi-directory service discovery system called the Service Directory Placement Protocol, or SDPP. Our findings reveal bandwidth overhead reductions typically ranging from 15% to 75% in networks comprising slow-moving hosts with restricted memory availability. In the fourth and final part of this work, we present the design foundations and architecture of a middleware system that called WISEMAN – WIreless Sensors Employing Mobile Agents. We employ WISEMAN for dispatching and processing mobile programs in Wireless Sensor Networks (WSNs). Our proposed system enables the dynamic creation of semantic relationships between network nodes that cooperate to provide an aggregate service. We present discussions on the advantages of our proposed approach, and in particular, how WISEMAN facilitates the realization of service-oriented tasks in WSNs. Service discovery Topology formation Reinforcement learning Mobile computing
178	RELPH: A Computational Model for Human Decision Making Mohammadi Sepahvand, Nazanin January 2013 (has links) The updating process, which consists of building mental models and adapting them to the changes occurring in the environment, is impaired in neglect patients. A simple rock-paper-scissors experiment was conducted in our lab to examine updating impairments in neglect patients. The results of this experiment demonstrate a significant difference between the performance of healthy and brain damaged participants. While healthy controls did not show any difficulty learning the computer’s strategy, right brain damaged patients failed to learn the computer’s strategy. A computational modeling approach is employed to help us better understand the reason behind this difference and thus learn more about the updating process in healthy people and its impairment in right brain damaged patients. Broadly, we hope to learn more about the nature of the updating process, in general. Also the hope is that knowing what must be changed in the model to “brain-damage” it can shed light on the updating deficit in right brain damaged patients. To do so I adapted a pattern detection method named “ELPH” to a reinforcement-learning human decision making model called “RELPH”. This model is capable of capturing the behavior of both healthy and right brain damaged participants in our task according to our defined measures. Indeed, this thesis is an effort to discuss the possible differences among these groups employing this computational model. computational Modeling Updating neglect reinforcement learning Psychology (Behavioural Neuroscience)
179	Reinforcement Learning and Simulation-Based Search in Computer Go Silver, David Unknown Date No description available.
180	Dynamic Tuning of PI-Controllers based on Model-free Reinforcement Learning Methods Abbasi Brujeni, Lena Unknown Date No description available. Reinforcement Learning Process Control Dynamic tuning PI-controllers Self-tuning

Search results