Global ETD Search

71	Feature Adaptation Algorithms for Reinforcement Learning with Applications to Wireless Sensor Networks And Road Traffic Control Prabuchandran, K J January 2016 (has links) (PDF) Many sequential decision making problems under uncertainty arising in engineering, science and economics are often modelled as Markov Decision Processes (MDPs). In the setting of MDPs, the goal is to and a state dependent optimal sequence of actions that minimizes a certain long-term performance criterion. The standard dynamic programming approach to solve an MDP for the optimal decisions requires a complete model of the MDP and is computationally feasible only for small state-action MDPs. Reinforcement learning (RL) methods, on the other hand, are model-free simulation based approaches for solving MDPs. In many real world applications, one is often faced with MDPs that have large state-action spaces whose model is unknown, however, whose outcomes can be simulated. In order to solve such (large) MDPs, one either resorts to the technique of function approximation in conjunction with RL methods or develops application specific RL methods. A solution based on RL methods with function approximation comes with the associated problem of choosing the right features for approximation and a solution based on application specific RL methods primarily relies on utilizing the problem structure. In this thesis, we investigate the problem of choosing the right features for RL methods based on function approximation as well as develop novel RL algorithms that adaptively obtain best features for approximation. Subsequently, we also develop problem specie RL methods for applications arising in the areas of wireless sensor networks and road traffic control. In the first part of the thesis, we consider the problem of finding the best features for value function approximation in reinforcement learning for the long-run discounted cost objective. We quantify the error in the approximation for any given feature and the approximation parameter by the mean square Bellman error (MSBE) objective and develop an online algorithm to optimize MSBE. Subsequently, we propose the first online actor-critic scheme with adaptive bases to find a locally optimal (control) policy for an MDP under the weighted discounted cost objective. The actor performs gradient search in the space of policy parameters using simultaneous perturbation stochastic approximation (SPSA) gradient estimates. This gradient computation however requires estimates of the value function of the policy. The value function is approximated using a linear architecture and its estimate is obtained from the critic. The error in approximation of the value function, however, results in sub-optimal policies. Thus, we obtain the best features by performing a gradient descent on the Grassmannian of features to minimize a MSBE objective. We provide a proof of convergence of our control algorithm to a locally optimal policy and show numerical results illustrating the performance of our algorithm. In our next work, we develop an online actor-critic control algorithm with adaptive feature tuning for MDPs under the long-run average cost objective. In this setting, a gradient search in the policy parameters is performed using policy gradient estimates to improve the performance of the actor. The computation of the aforementioned gradient however requires estimates of the differential value function of the policy. In order to obtain good estimates of the differential value function, the critic adaptively tunes the features to obtain the best representation of the value function using gradient search in the Grassmannian of features. We prove that our actor-critic algorithm converges to a locally optimal policy. Experiments on two different MDP settings show performance improvements resulting from our feature adaptation scheme. In the second part of the thesis, we develop problem specific RL solution methods for the two aforementioned applications. In both the applications, the size of the state-action space in the formulated MDPs is large. However, by utilizing the problem structure we develop scalable RL algorithms. In the wireless sensor networks application, we develop RL algorithms to find optimal energy management policies (EMPs) for energy harvesting (EH) sensor nodes. First, we consider the case of a single EH sensor node and formulate the problem of finding an optimal EMP in the discounted cost MDP setting. We then propose two RL algorithms to maximize network performance. Through simulations, our algorithms are seen to outperform the algorithms in the literature. Our RL algorithms for the single EH sensor node do not scale when there are multiple sensor nodes. In our second work, we consider the problem of finding optimal energy sharing policies that maximize the network performance of a system comprising of multiple sensor nodes and a single energy harvesting (EH) source. We develop efficient energy sharing algorithms, namely Q-learning algorithm with exploration mechanisms based on the -greedy method as well as upper confidence bound (UCB). We extend these algorithms by incorporating state and action space aggregation to tackle state-action space explosion in the MDP. We also develop a cross entropy based method that incorporates policy parameterization in order to find near optimal energy sharing policies. Through numerical experiments, we show that our algorithms yield energy sharing policies that outperform the heuristic greedy method. In the context of road traffic control, optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users. This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent reinforcement learning (MARL) algorithm for solving this problem by considering each junction in the road network as a separate agent (controller) to obtain dynamic TSC policies. We propose two approaches to minimize the average delay. In the first approach, each agent decides the signal duration of its phases in a round-robin (RR) manner using the multi-agent Q-learning algorithm. We show through simulations over VISSIM (microscopic traffic simulator) that our round-robin MARL algorithms perform significantly better than both the standard fixed signal timing (FST) algorithm and the saturation balancing (SAT) algorithm over two real road networks. In the second approach, instead of optimizing green light duration, each agent optimizes the order of the phase sequence. We then employ our MARL algorithms by suitably changing the state-action space and cost structure of the MDP. We show through simulations over VISSIM that our non-round robin MARL algorithms perform significantly better than the FST, SAT and the round-robin MARL algorithms based on the first approach. However, on the other hand, our round-robin MARL algorithms are more practically viable as they conform with the psychology of road users. Wireless Sensor Networks Road Traffic Control Reinforcement Learning Algorithms Markov Decision Processes (MDPs) Sensor Networks Traffic Signal Control (TSC) Reinforcement Learning Energy Harvesting Sensor Nodes Stochastic Approximation Grassmannian Search Computer Science
72	Analyses of Bus Travel Time Reliability and Transit Signal Priority at the Stop-To-Stop Segment Level Feng, Wei 02 June 2014 (has links) Transit travel time is affected by many factors including traffic signals and traffic condition. Transit agencies have implemented strategies such as transit signal priority (TSP) to reduce transit travel time and improve service reliability. However, due to the lack of empirical data, the joint impact of these factors and improvement strategies on bus travel time has not been studied at the stop-to-stop segment level. This study utilizes and integrates three databases available along an urban arterial corridor in Portland, Oregon. Data sources include stop-level bus automatic vehicle location (AVL) and automatic passenger count (APC) data provided by the Tri-County Metropolitan Transportation District of Oregon (TriMet), the Sydney Coordinated Adaptive Traffic System (SCATS) signal phase log data, and intersection vehicle count data provided by the City of Portland. Based on the unique collection and integration of these fine granularity empirical data, this research utilizes multiple linear regression models to understand and quantify the joint impact of intersection signal delay, traffic conditions and bus stop location on bus travel time and its variability at stop-to-stop segments. Results indicate that intersection signal delay is the key factor that affects bus travel time variability. The amount of signal delay is nearly linearly associated with intersection red phase duration. Results show that the effect of traffic conditions (volumes) on bus travel time varies significantly by intersection and time of day. This study also proposed new and useful performance measures for evaluating the effectiveness of TSP systems. Relationships between TSP requests (when buses are late) and TSP phases were studied by comparing TSP phase start and end times with bus arrival times at intersections. Results show that green extension phases were rarely used by buses that requested TSP and that most green extension phases were granted too late. Early green effectiveness (percent of effective early green phases) is much higher than green extension effectiveness. The estimated average bus and passenger time savings from an early green phase are also greater compared to the average time savings from a green extension phase. On average, the estimated delay for vehicles on the side street due to a TSP phase is less than the time saved for buses and automobiles on the major street. Results from this study can be used to inform cities and transit agencies on how to improve transit operations. Developing appropriate strategies, such as adjusting bus stop consolidation near intersections and optimizing bus operating schedules according to intersection signal timing characteristics, can further reduce bus travel time delay and improve TSP effectiveness. Transportation
73	Aspekte der Verkehrstelematik – ausgewählte Veröffentlichungen 2013 Krimmling, Jürgen, Jaekel, Birgit, Lehnert, Martin 22 May 2019 (has links) Der vierte Band der Schriftenreihe Verkehrstelematik stellt die intermodalen Forschungsthemen und ihre praktischen Anwendungen der Professur für Verkehrsleitsysteme und -prozessautomatisierung an der Fakultät Verkehrswissenschaften „Friedrich List“ der Technischen Universität Dresden mit ausgewählten Veröffentlichungen des Jahres 2013 vor. Die Schwerpunkte der Forschungsarbeit liegen einerseits im Bereich der energieoptimalen Steuerung im Schienenverkehr und zugehörigen Fahrerassistenzsystemen, andererseits auf dem Verkehrsmanagement des Straßenverkehrs. Die energieoptimale Steuerung im Schienenverkehr wird mittels Fahrerassistenzsystemen nicht nur im Eisenbahnverkehr, beispielsweise auf den Zügen des Harz-Elbe-Express, sondern auch auf Straßenbahn- und U-Bahn-Fahrzeugen umgesetzt. Darüber hinaus werden die Methoden und Verfahren der energieoptimalen Steuerung im Rahmen von Ansätzen für ein modularisiertes Verkehrsmanagement bei Eisenbahnen verwendet. Das Verkehrsmanagementsystem der Stadt Dresden VAMOS bildet die Basis für den zweiten Forschungsschwerpunkt. Hierbei werden im vorliegenden Band einerseits Möglichkeiten untersucht, den Straßenverkehr mikroskopisch zu simulieren und dabei die Verkehrszustandsdaten des VAMOS einfließen zu lassen. Andererseits wird der Frage nachgegangen, in wie weit man anhand der im System vorhandenen Floating-Car-Daten Straßensperrungen sicher identifizieren kann. Zu beiden Forschungsschwerpunkten sind fünf weitere Artikel im Rahmen der vom Lehrstuhl organisierten internationalen Konferenz „3rd Models and Technologies for Intelligent Transportation Systems (3. MT-ITS)“ entstanden. Diese Beiträge sind im Konferenzband als dritter Band der vorliegenden Schriftenreihe Verkehrstelematik bereits erschienen. info:eu-repo/classification/ddc/380 ddc:380 info:eu-repo/classification/ddc/620 ddc:620

Page generated in 0.0362 seconds