• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 27
  • 13
  • 6
  • 2
  • 1
  • 1
  • Tagged with
  • 74
  • 74
  • 31
  • 20
  • 14
  • 13
  • 13
  • 11
  • 11
  • 10
  • 9
  • 9
  • 9
  • 9
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Resource Allocation for Sequential Decision Making Under Uncertainaty : Studies in Vehicular Traffic Control, Service Systems, Sensor Networks and Mechanism Design

Prashanth, L A January 2013 (has links) (PDF)
A fundamental question in a sequential decision making setting under uncertainty is “how to allocate resources amongst competing entities so as to maximize the rewards accumulated in the long run?”. The resources allocated may be either abstract quantities such as time or concrete quantities such as manpower. The sequential decision making setting involves one or more agents interacting with an environment to procure rewards at every time instant and the goal is to find an optimal policy for choosing actions. Most of these problems involve multiple (infinite) stages and the objective function is usually a long-run performance objective. The problem is further complicated by the uncertainties in the sys-tem, for instance, the stochastic noise and partial observability in a single-agent setting or private information of the agents in a multi-agent setting. The dimensionality of the problem also plays an important role in the solution methodology adopted. Most of the real-world problems involve high-dimensional state and action spaces and an important design aspect of the solution is the choice of knowledge representation. The aim of this thesis is to answer important resource allocation related questions in different real-world application contexts and in the process contribute novel algorithms to the theory as well. The resource allocation algorithms considered include those from stochastic optimization, stochastic control and reinforcement learning. A number of new algorithms are developed as well. The application contexts selected encompass both single and multi-agent systems, abstract and concrete resources and contain high-dimensional state and control spaces. The empirical results from the various studies performed indicate that the algorithms presented here perform significantly better than those previously proposed in the literature. Further, the algorithms presented here are also shown to theoretically converge, hence guaranteeing optimal performance. We now briefly describe the various studies conducted here to investigate problems of resource allocation under uncertainties of different kinds: Vehicular Traffic Control The aim here is to optimize the ‘green time’ resource of the individual lanes in road networks that maximizes a certain long-term performance objective. We develop several reinforcement learning based algorithms for solving this problem. In the infinite horizon discounted Markov decision process setting, a Q-learning based traffic light control (TLC) algorithm that incorporates feature based representations and function approximation to handle large road networks is proposed, see Prashanth and Bhatnagar [2011b]. This TLC algorithm works with coarse information, obtained via graded thresholds, about the congestion level on the lanes of the road network. However, the graded threshold values used in the above Q-learning based TLC algorithm as well as several other graded threshold-based TLC algorithms that we propose, may not be optimal for all traffic conditions. We therefore also develop a new algorithm based on SPSA to tune the associated thresholds to the ‘optimal’ values (Prashanth and Bhatnagar [2012]). Our thresh-old tuning algorithm is online, incremental with proven convergence to the optimal values of thresholds. Further, we also study average cost traffic signal control and develop two novel reinforcement learning based TLC algorithms with function approximation (Prashanth and Bhatnagar [2011c]). Lastly, we also develop a feature adaptation method for ‘optimal’ feature selection (Bhatnagar et al. [2012a]). This algorithm adapts the features in a way as to converge to an optimal set of features, which can then be used in the algorithm. Service Systems The aim here is to optimize the ‘workforce’, the critical resource of any service system. However, adapting the staffing levels to the workloads in such systems is nontrivial as the queue stability and aggregate service level agreement (SLA) constraints have to be complied with. We formulate this problem as a constrained hidden Markov process with a (discrete) worker parameter and propose simultaneous perturbation based simulation optimization algorithms for this purpose. The algorithms include both first order as well as second order methods and incorporate SPSA based gradient estimates in the primal, with dual ascent for the Lagrange multipliers. All the algorithms that we propose are online, incremental and are easy to implement. Further, they involve a certain generalized smooth projection operator, which is essential to project the continuous-valued worker parameter updates obtained from the SASOC algorithms onto the discrete set. We validate our algorithms on five real-life service systems and compare their performance with a state-of-the-art optimization tool-kit OptQuest. Being ��times faster than OptQuest, our scheme is particularly suitable for adaptive labor staffing. Also, we observe that it guarantees convergence and finds better solutions than OptQuest in many cases. Wireless Sensor Networks The aim here is to allocate the ‘sleep time’ (resource) of the individual sensors in an intrusion detection application such that the energy consumption from the sensors is reduced, while keeping the tracking error to a minimum. We model this sleep–wake scheduling problem as a partially-observed Markov decision process (POMDP) and propose novel RL-based algorithms -with both long-run discounted and average cost objectives -for solving this problem. All our algorithms incorporate function approximation and feature-based representations to handle the curse of dimensionality. Further, the feature selection scheme used in each of the proposed algorithms intelligently manages the energy cost and tracking cost factors, which in turn, assists the search for the optimal sleeping policy. The results from the simulation experiments suggest that our proposed algorithms perform better than a recently proposed algorithm from Fuemmeler and Veeravalli [2008], Fuemmeler et al. [2011]. Mechanism Design The setting here is of multiple self-interested agents with limited capacities, attempting to maximize their individual utilities, which often comes at the expense of the group’s utility. The aim of the resource allocator here then is to efficiently allocate the resource (which is being contended for, by the agents) and also maximize the social welfare via the ‘right’ transfer of payments. In other words, the problem is to find an incentive compatible transfer scheme following a socially efficient allocation. We present two novel mechanisms with progressively realistic assumptions about agent types aimed at economic scenarios where agents have limited capacities. For the simplest case where agent types consist of a unit cost of production and a capacity that does not change with time, we provide an enhancement to the static mechanism of Dash et al. [2007] that effectively deters misreport of the capacity type element by an agent to receive an allocation beyond its capacity, which thereby damages other agents. Our model incorporates an agent’s preference to harm other agents through a additive factor in the utility function of an agent and the mechanism we propose achieves strategy proofness by means of a novel penalty scheme. Next, we consider a dynamic setting where agent types evolve and the individual agents here again have a preference to harm others via capacity misreports. We show via a counterexample that the dynamic pivot mechanism of Bergemann and Valimaki [2010] cannot be directly applied in our setting with capacity-limited alim¨agents. We propose an enhancement to the mechanism of Bergemann and V¨alim¨aki [2010] that ensures truth telling w.r.t. capacity type element through a variable penalty scheme (in the spirit of the static mechanism). We show that each of our mechanisms is ex-post incentive compatible, ex-post individually rational, and socially efficient
72

Feature Adaptation Algorithms for Reinforcement Learning with Applications to Wireless Sensor Networks And Road Traffic Control

Prabuchandran, K J January 2016 (has links) (PDF)
Many sequential decision making problems under uncertainty arising in engineering, science and economics are often modelled as Markov Decision Processes (MDPs). In the setting of MDPs, the goal is to and a state dependent optimal sequence of actions that minimizes a certain long-term performance criterion. The standard dynamic programming approach to solve an MDP for the optimal decisions requires a complete model of the MDP and is computationally feasible only for small state-action MDPs. Reinforcement learning (RL) methods, on the other hand, are model-free simulation based approaches for solving MDPs. In many real world applications, one is often faced with MDPs that have large state-action spaces whose model is unknown, however, whose outcomes can be simulated. In order to solve such (large) MDPs, one either resorts to the technique of function approximation in conjunction with RL methods or develops application specific RL methods. A solution based on RL methods with function approximation comes with the associated problem of choosing the right features for approximation and a solution based on application specific RL methods primarily relies on utilizing the problem structure. In this thesis, we investigate the problem of choosing the right features for RL methods based on function approximation as well as develop novel RL algorithms that adaptively obtain best features for approximation. Subsequently, we also develop problem specie RL methods for applications arising in the areas of wireless sensor networks and road traffic control. In the first part of the thesis, we consider the problem of finding the best features for value function approximation in reinforcement learning for the long-run discounted cost objective. We quantify the error in the approximation for any given feature and the approximation parameter by the mean square Bellman error (MSBE) objective and develop an online algorithm to optimize MSBE. Subsequently, we propose the first online actor-critic scheme with adaptive bases to find a locally optimal (control) policy for an MDP under the weighted discounted cost objective. The actor performs gradient search in the space of policy parameters using simultaneous perturbation stochastic approximation (SPSA) gradient estimates. This gradient computation however requires estimates of the value function of the policy. The value function is approximated using a linear architecture and its estimate is obtained from the critic. The error in approximation of the value function, however, results in sub-optimal policies. Thus, we obtain the best features by performing a gradient descent on the Grassmannian of features to minimize a MSBE objective. We provide a proof of convergence of our control algorithm to a locally optimal policy and show numerical results illustrating the performance of our algorithm. In our next work, we develop an online actor-critic control algorithm with adaptive feature tuning for MDPs under the long-run average cost objective. In this setting, a gradient search in the policy parameters is performed using policy gradient estimates to improve the performance of the actor. The computation of the aforementioned gradient however requires estimates of the differential value function of the policy. In order to obtain good estimates of the differential value function, the critic adaptively tunes the features to obtain the best representation of the value function using gradient search in the Grassmannian of features. We prove that our actor-critic algorithm converges to a locally optimal policy. Experiments on two different MDP settings show performance improvements resulting from our feature adaptation scheme. In the second part of the thesis, we develop problem specific RL solution methods for the two aforementioned applications. In both the applications, the size of the state-action space in the formulated MDPs is large. However, by utilizing the problem structure we develop scalable RL algorithms. In the wireless sensor networks application, we develop RL algorithms to find optimal energy management policies (EMPs) for energy harvesting (EH) sensor nodes. First, we consider the case of a single EH sensor node and formulate the problem of finding an optimal EMP in the discounted cost MDP setting. We then propose two RL algorithms to maximize network performance. Through simulations, our algorithms are seen to outperform the algorithms in the literature. Our RL algorithms for the single EH sensor node do not scale when there are multiple sensor nodes. In our second work, we consider the problem of finding optimal energy sharing policies that maximize the network performance of a system comprising of multiple sensor nodes and a single energy harvesting (EH) source. We develop efficient energy sharing algorithms, namely Q-learning algorithm with exploration mechanisms based on the -greedy method as well as upper confidence bound (UCB). We extend these algorithms by incorporating state and action space aggregation to tackle state-action space explosion in the MDP. We also develop a cross entropy based method that incorporates policy parameterization in order to find near optimal energy sharing policies. Through numerical experiments, we show that our algorithms yield energy sharing policies that outperform the heuristic greedy method. In the context of road traffic control, optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users. This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent reinforcement learning (MARL) algorithm for solving this problem by considering each junction in the road network as a separate agent (controller) to obtain dynamic TSC policies. We propose two approaches to minimize the average delay. In the first approach, each agent decides the signal duration of its phases in a round-robin (RR) manner using the multi-agent Q-learning algorithm. We show through simulations over VISSIM (microscopic traffic simulator) that our round-robin MARL algorithms perform significantly better than both the standard fixed signal timing (FST) algorithm and the saturation balancing (SAT) algorithm over two real road networks. In the second approach, instead of optimizing green light duration, each agent optimizes the order of the phase sequence. We then employ our MARL algorithms by suitably changing the state-action space and cost structure of the MDP. We show through simulations over VISSIM that our non-round robin MARL algorithms perform significantly better than the FST, SAT and the round-robin MARL algorithms based on the first approach. However, on the other hand, our round-robin MARL algorithms are more practically viable as they conform with the psychology of road users.
73

Analyses of Bus Travel Time Reliability and Transit Signal Priority at the Stop-To-Stop Segment Level

Feng, Wei 02 June 2014 (has links)
Transit travel time is affected by many factors including traffic signals and traffic condition. Transit agencies have implemented strategies such as transit signal priority (TSP) to reduce transit travel time and improve service reliability. However, due to the lack of empirical data, the joint impact of these factors and improvement strategies on bus travel time has not been studied at the stop-to-stop segment level. This study utilizes and integrates three databases available along an urban arterial corridor in Portland, Oregon. Data sources include stop-level bus automatic vehicle location (AVL) and automatic passenger count (APC) data provided by the Tri-County Metropolitan Transportation District of Oregon (TriMet), the Sydney Coordinated Adaptive Traffic System (SCATS) signal phase log data, and intersection vehicle count data provided by the City of Portland. Based on the unique collection and integration of these fine granularity empirical data, this research utilizes multiple linear regression models to understand and quantify the joint impact of intersection signal delay, traffic conditions and bus stop location on bus travel time and its variability at stop-to-stop segments. Results indicate that intersection signal delay is the key factor that affects bus travel time variability. The amount of signal delay is nearly linearly associated with intersection red phase duration. Results show that the effect of traffic conditions (volumes) on bus travel time varies significantly by intersection and time of day. This study also proposed new and useful performance measures for evaluating the effectiveness of TSP systems. Relationships between TSP requests (when buses are late) and TSP phases were studied by comparing TSP phase start and end times with bus arrival times at intersections. Results show that green extension phases were rarely used by buses that requested TSP and that most green extension phases were granted too late. Early green effectiveness (percent of effective early green phases) is much higher than green extension effectiveness. The estimated average bus and passenger time savings from an early green phase are also greater compared to the average time savings from a green extension phase. On average, the estimated delay for vehicles on the side street due to a TSP phase is less than the time saved for buses and automobiles on the major street. Results from this study can be used to inform cities and transit agencies on how to improve transit operations. Developing appropriate strategies, such as adjusting bus stop consolidation near intersections and optimizing bus operating schedules according to intersection signal timing characteristics, can further reduce bus travel time delay and improve TSP effectiveness.
74

Aspekte der Verkehrstelematik – ausgewählte Veröffentlichungen 2013

Krimmling, Jürgen, Jaekel, Birgit, Lehnert, Martin 22 May 2019 (has links)
Der vierte Band der Schriftenreihe Verkehrstelematik stellt die intermodalen Forschungsthemen und ihre praktischen Anwendungen der Professur für Verkehrsleitsysteme und -prozessautomatisierung an der Fakultät Verkehrswissenschaften „Friedrich List“ der Technischen Universität Dresden mit ausgewählten Veröffentlichungen des Jahres 2013 vor. Die Schwerpunkte der Forschungsarbeit liegen einerseits im Bereich der energieoptimalen Steuerung im Schienenverkehr und zugehörigen Fahrerassistenzsystemen, andererseits auf dem Verkehrsmanagement des Straßenverkehrs. Die energieoptimale Steuerung im Schienenverkehr wird mittels Fahrerassistenzsystemen nicht nur im Eisenbahnverkehr, beispielsweise auf den Zügen des Harz-Elbe-Express, sondern auch auf Straßenbahn- und U-Bahn-Fahrzeugen umgesetzt. Darüber hinaus werden die Methoden und Verfahren der energieoptimalen Steuerung im Rahmen von Ansätzen für ein modularisiertes Verkehrsmanagement bei Eisenbahnen verwendet. Das Verkehrsmanagementsystem der Stadt Dresden VAMOS bildet die Basis für den zweiten Forschungsschwerpunkt. Hierbei werden im vorliegenden Band einerseits Möglichkeiten untersucht, den Straßenverkehr mikroskopisch zu simulieren und dabei die Verkehrszustandsdaten des VAMOS einfließen zu lassen. Andererseits wird der Frage nachgegangen, in wie weit man anhand der im System vorhandenen Floating-Car-Daten Straßensperrungen sicher identifizieren kann. Zu beiden Forschungsschwerpunkten sind fünf weitere Artikel im Rahmen der vom Lehrstuhl organisierten internationalen Konferenz „3rd Models and Technologies for Intelligent Transportation Systems (3. MT-ITS)“ entstanden. Diese Beiträge sind im Konferenzband als dritter Band der vorliegenden Schriftenreihe Verkehrstelematik bereits erschienen.

Page generated in 0.3392 seconds