Global ETD Search

1	Development of a behaviorally induced system optimal travel demand management system Hu, Xianbiao, Chiu, Yi-Chang, Shelton, Jeff 30 March 2016 (has links) The basic design concept of most advanced traveler information systems (ATIS) is to present generic information to travelers, leaving travelers to react to the information in their own way. This passive way of managing traffic by providing generic traffic information makes it difficult to predict the outcome and may even incur an adverse effect, such as overreaction (also referred to as the herding effect). Active traffic and demand management (ATDM) is another approach that has received continual attention from both academic research and real-world practice, aiming to effectively influence people's travel demand, provide more travel options, coordinate between travelers, and reduce the need for travel. The research discussed in this article deals with how to provide users with a travel option that aims to minimize the marginal system impact that results from this routing. The goal of this research is to take better advantage of the available real-time traffic information provided by ATIS, to further improve the system level traffic condition from User Equilibrium (UE), or a real-world traffic system that is worse than UE, toward System Optimal (SO), and avoid passively managing traffic. A behaviorally induced, system optimal travel demand management model is presented to achieve this goal through incremental routing. Both analytical derivation and numerical analysis have been conducted on Tucson network in Arizona, as well as on the Capital Area Metropolitan Planning Organization (CAMPO) network in Austin, TX. The outcomes of both studies show that our proposed modeling framework is promising for improving network traffic conditions toward SO, and results in substantial economic savings. behaviorally induced system optimal dynamic traffic assignment system optimal travel behaviour travel demand management (TDM)
2	Σχεδίαση και υλοποίηση εφαρμογής πλοήγησης οχημάτων με τη χρήση αλγόριθμου βέλτιστου για το σύστημα υπό περιορισμούς Πλέσσας, Αθανάσιος 07 February 2008 (has links) Την τελευταία δεκαετία έχει παρατηρηθεί μια σημαντική διάδοση των συστημάτων πλοήγησης οχημάτων. Τα συστήματα αυτά συνδυάζοντας τις δυνατότητες που προσφέρει η τεχνολογία και χρησιμοποιώντας τη γεωγραφική αναπαράσταση του οδικού δικτύου, την τρέχουσα θέση του οχήματος και συχνά πληροφορίες για την κίνηση προτείνουν στους οδηγούς τη διαδρομή που πρέπει να ακολουθήσουν για να φτάσουν πιο γρήγορα στον προορισμό τους. Οι εφαρμογές πλοήγησης οχημάτων μπορούν να προσφέρουν τη δυνατότητα διαχείρισης της κυκλοφορίας με τέτοιο τρόπο που επιτρέπει την αύξηση της χωρητικότητας του οδικού δικτύου και επομένως τη μείωση της συμφόρησης, χωρίς να είναι απαραίτητη η υψηλού κόστους επέκταση της οδικής υποδομής. Σε αντίθεση με το βέλτιστο για το χρήστη μοντέλο που εφαρμόζεται στα κλασικά συστήματα πλοήγησης και που δεν παρέχει καμία εγγύηση βελτίωσης της κυκλοφοριακής κατάστασης, για το σκοπό αυτό έχει προταθεί το βέλτιστο για το σύστημα μοντέλο. Το μοντέλο προτείνει διαδρομές με στόχο τη βελτίωση της κυκλοφοριακής κατάστασης στο δίκτυο, αλλά η εφαρμογή του είναι μη ρεαλιστική καθώς οι προτεινόμενες διαδρομές μπορεί να είναι πολύ μακρύτερες από το αναμενόμενο. Στην παρούσα διπλωματική εργασία μελετάται μια τρίτη προσέγγιση: ένας βέλτιστος για το σύστημα υπό περιορισμούς αλγόριθμος. Πρόκειται για ένα συνδυασμό των δύο μοντέλων πλοήγησης με σκοπό τη μείωση της συμφόρησης και ταυτόχρονα τη διατήρηση της δικαιοσύνης στην επιλογή των προτεινόμενων διαδρομών για τους οδηγούς. Αφού γίνει θεωρητική μελέτη του προβλήματος παρουσιάζεται η υλοποίηση ενός συστήματος πρότασης διαδρομών που χρησιμοποιεί το βέλτιστο για το σύστημα υπό περιορισμούς αλγόριθμο. / During the last decade, vehicles' route guidance systems have known a significant spread. These systems, taking advantage of the available technological features and by using the geographical representation of the road network, the current position of a vehicle and often traffic data, propose to drivers the route they should follow in order to reach faster their destination. The applications of route guidance systems offer the chance to manage traffic in such a way that allows an increase in road network capacity and therefore a decrease in traffic congestion, without being necessary the high cost expansion of the road infrastructure. In contrast to the user optimal model that is followed by typical route guidance systems and provides no traffic improvement guarantees, the system optimal model has been proposed for this purpose. The model proposes paths with the goal of improving the traffic condition of the network, however its application is unrealistic since the proposed routes may be much longer than expected. In this thesis a third approach is studied: a constrained system optimal algorithm. The algorithm is a combination of the two navigation models with the goal of reducing congestion and at the same time remaining fair for drivers when selecting a route. After the theoretical study of the problem, the implementation of a route recommendation system that incorporates the constrained system optimal algorithm is presented. Πλοήγηση οχημάτων 629.045 Vehicle route guidance Constrained system optimal
3	Developing the Analysis Methodology and Platform for Behaviorally Induced System Optimal Traffic Management Hu, Xianbiao January 2013 (has links) Traffic congestion has been imposing a tremendous burden on society as a whole. For decades, the most widely applied solution has been building more roads to better accommodate traffic demand, which turns out to be of limited effect. Active Traffic and Demand Management (ATDM) is getting more attention recently and is considered here, as it leverages market-ready technologies and innovative operational approaches to manage traffic congestion within the existing infrastructure. The key to a successful Active Traffic and Demand Management strategy is to effectively induce travelers' behavior to change. In spite of the increased attention and application throughout the U.S. or even the world, most ATDM strategies were implemented on-site through small-scale pilot studies. A systematic framework for analysis and evaluation of such a system in order to effectively track the changes in travelers' behavior and the benefit brought about by such changes has not been established; nor has the effect of its strategies been quantitatively evaluated. In order to effectively evaluate the system benefit and to analyze the behavior changes quantitatively, a systematic framework capable of supporting both macroscopic and microscopic analysis should be established. Such system should be carefully calibrated to reflect the traffic condition in reality, as only after the calibration can the baseline model be used as the foundation for other scenarios in which alternative design or management strategies are incorporated, so that the behavior changes and system benefit can be computed accurately by comparing the alternative scenarios with the baseline scenario. Any effective traffic management strategy would be impossible if the traveler route choice behavior in the urban traffic network has not been fully understood. Theoretical research assumes all users are homogeneous in their route choice decision and will always pick the route with the shortest travel cost, which is not necessarily the case in reality. Researchers in Minnesota found that only 34% of drivers strictly traveled on the shortest path. Drivers' decision is made usually based on several dimensions, and a full understanding of the travel route choice behavior in the urban traffic network is essential. The existence of most current Advanced Traveler Information Systems (ATIS) offer the capability to provide pre-trip and/or en route real time information, allowing travelers to quickly assess and react to unfolding traffic conditions. The basic design concept is to present generic information to drivers, leaving drivers to react to the information their own way. This "passive" way of managing traffic by providing generic traffic information has difficulty in predicting outcome and may even incur adverse effect, such as overreaction (aka herding effects). Furthermore, other questions remain on how to utilize the real-time information better and guide the traffic flow more effectively towards a better solution, and most current research fails to take the traveler's external cost into consideration. Motivated by those concerns, in this research, a behaviorally induced system optimal model is presented, aimed at further improving the system-level traffic condition towards System Optimal through incremental routing, as well as establishing the analysis methodology and evaluation framework to calibrate quantitatively the behavior change and the system benefits. In this process, the traffic models involved are carefully calibrated, first using a two-stage calibration model which is capable of matching not only the traffic counts, but also the time dependent speed profiles of the calibrated links. To the best of our knowledge, this research is the first with a methodology to incorporate the use of field observed data to estimate the Origin-Destination (OD) matrices departure profile. Also proposed in this dissertation is a Constrained K Shortest Paths algorithm (CKSP) that addresses route overlap and travel time deviation issues. This proposed algorithm can generate K Shortest Paths between two given nodes and provide sound route options to the drivers in order to assist their route choice decision process. Thirdly, a behaviorally induced system optimal model includes the development of a marginal cost calculation algorithm, a time-dependent shortest path search algorithm, and schedule delay as well as optimal path finding models, is present to improve the traffic flow from an initial traffic condition which could be User Equilibrium (UE) or any other non-UE or non-System-Optimal (SO) condition towards System Optimal. Case studies are conducted for each individual research and show a rather promising result. The goal of establishing this framework is to better capture and evaluate the effects of behaviorally induced system optimal traffic management strategies on the overall system performance. To realize this goal, the three research models are integrated in order to constitute a comprehensive platform that is not only capable of effectively guiding the traffic flow improvement towards System Optimal, but also capable of accurately evaluating the system benefit from the macroscopic perspective and quantitatively analyzing the behavior changes microscopically. The comprehensive case study on the traffic network in Tucson, Arizona, has been conducted using DynusT (Dynamic Urban Simulation for Transportation) Dynamic Traffic Assignment (DTA) simulation software; the outcome of this study shows that our proposed modeling framework is promising for improving network traffic condition towards System Optimal, resulting in a vast amount of economic saving. Incentive K Shortest Path OD Calibration System Optimal Travel Behavior Civil Engineering Active Traffic and Demand Management
4	An adaptive strategy for providing dynamic route guidance under non-recurrent traffic congestion Lee, Sang-Keon 06 June 2008 (has links) Traffic congestion on urban road networks has been recognized as one of the most serious problems with which modern cities are confronted. It is generally anticipated that Dynamic Route Guidance Systems (DRGS) will play an important role in reducing urban traffic congestion and improving traffic flows and safety. One of the most critical issues in designing these systems is in the development of optimal routing strategies that would maximize the benefits to overall system as well as individual users. Infrastructure based DRGS have advantage of pursuing system optimal routing strategy, which is more essential under abnormal traffic conditions such as non-recurrent congestion and natural disaster. However user compliance could be a problem under such a strategy, particularly when some of equipped drivers are urged not to choose minimum travel time path for the sake of improving the total network travel time. On the other hand, In-vehicle based DRGS can utilize the user-specified route selection criteria to avoid "Braess Paradox" under normal traffic conditions. However, it may be of little use under abnormal traffic conditions and high DRGS market penetration. In conducting the comparative analysis between system optimal strategy and user equilibrium strategy, significant differences were found within the mid-range traffic demand. The maximum total travel time difference occurs when the level of traffic demand is half of the system capacity. At this point, system optimal route guidance strategy can save more than 11% of the total travel time of user equilibrium route guidance strategy. The research proposes an adaptive routing strategy as an efficient dynamic route guidance under non-recurrent traffic congestion. Computation results show that there is no need to implement system optimal routing strategy at the initial stage of the incident. However, it is critical to use system optimal routing strategy as freeway and arterial are getting congested and the queue delay in freeway increases. The adaptive routing strategy is evaluated using Traffic simulation model, INTEGRATION. According to simulation results using an ideal network, the travel time saving ratio is maximum when both arterial and freeway have normal traffic demand under incident. In case of a realistic network, the adaptive routing strategy also proved to save the total travel time between 3% to 10% over the traditional user equilibrium routing strategy. The reduction of total travel time increases as the incident duration increases. Consequently, it is concluded that the adaptive routing strategy for DRGS is more efficient than using user equilibrium routing strategy alone. / Ph. D. adaptive strategy user equilibrium Dynamic Route Guidance System ATIS System Optimal integrated traffic simulation non-recurrent congestion LD5655.V856 1996.L45
5	Regret minimisation and system-efficiency in route choice / Minimização de Regret e eficiência do sistema em escala de rotas Ramos, Gabriel de Oliveira January 2018 (has links) Aprendizagem por reforço multiagente (do inglês, MARL) é uma tarefa desafiadora em que agentes buscam, concorrentemente, uma política capaz de maximizar sua utilidade. Aprender neste tipo de cenário é difícil porque os agentes devem se adaptar uns aos outros, tornando o objetivo um alvo em movimento. Consequentemente, não existem garantias de convergência para problemas de MARL em geral. Esta tese explora um problema em particular, denominado escolha de rotas (onde motoristas egoístas deve escolher rotas que minimizem seus custos de viagem), em busca de garantias de convergência. Em particular, esta tese busca garantir a convergência de algoritmos de MARL para o equilíbrio dos usuários (onde nenhum motorista consegue melhorar seu desempenho mudando de rota) e para o ótimo do sistema (onde o tempo médio de viagem é mínimo). O principal objetivo desta tese é mostrar que, no contexto de escolha de rotas, é possível garantir a convergência de algoritmos de MARL sob certas condições. Primeiramente, introduzimos uma algoritmo de aprendizagem por reforço baseado em minimização de arrependimento, o qual provamos ser capaz de convergir para o equilíbrio dos usuários Nosso algoritmo estima o arrependimento associado com as ações dos agentes e usa tal informação como sinal de reforço dos agentes. Além do mais, estabelecemos um limite superior no arrependimento dos agentes. Em seguida, estendemos o referido algoritmo para lidar com informações não-locais, fornecidas por um serviço de navegação. Ao usar tais informações, os agentes são capazes de estimar melhor o arrependimento de suas ações, o que melhora seu desempenho. Finalmente, de modo a mitigar os efeitos do egoísmo dos agentes, propomos ainda um método genérico de pedágios baseados em custos marginais, onde os agentes são cobrados proporcionalmente ao custo imposto por eles aos demais. Neste sentido, apresentamos ainda um algoritmo de aprendizagem por reforço baseado em pedágios que, provamos, converge para o ótimo do sistema e é mais justo que outros existentes na literatura. / Multiagent reinforcement learning (MARL) is a challenging task, where self-interested agents concurrently learn a policy that maximise their utilities. Learning here is difficult because agents must adapt to each other, which makes their objective a moving target. As a side effect, no convergence guarantees exist for the general MARL setting. This thesis exploits a particular MARL problem, namely route choice (where selfish drivers aim at choosing routes that minimise their travel costs), to deliver convergence guarantees. We are particularly interested in guaranteeing convergence to two fundamental solution concepts: the user equilibrium (UE, when no agent benefits from unilaterally changing its route) and the system optimum (SO, when average travel time is minimum). The main goal of this thesis is to show that, in the context of route choice, MARL can be guaranteed to converge to the UE as well as to the SO upon certain conditions. Firstly, we introduce a regret-minimising Q-learning algorithm, which we prove that converges to the UE. Our algorithm works by estimating the regret associated with agents’ actions and using such information as reinforcement signal for updating the corresponding Q-values. We also establish a bound on the agents’ regret. We then extend this algorithm to deal with non-local information provided by a navigation service. Using such information, agents can improve their regrets estimates, thus performing empirically better. Finally, in order to mitigate the effects of selfishness, we also present a generalised marginal-cost tolling scheme in which drivers are charged proportional to the cost imposed on others. We then devise a toll-based Q-learning algorithm, which we prove that converges to the SO and that is fairer than existing tolling schemes. Sistemas multiagentes Informatica : Transportes Multiagent reinforcement learning Route choice User equilibrium System optimal Regret minimisation Action regret Travel information Marginal-cost tolling
6	Optimal Integrated Dynamic Traffic Assignment and Signal Control for Evacuation of Large Traffic Networks with Varying Threat Levels Nassir, Neema January 2013 (has links) This research contributes to the state of the art and state of the practice in solving a very important and computationally challenging problem in the areas of urban transportation systems, operations research, disaster management, and public policy. Being a very active topic of research during the past few decades, the problem of developing an efficient and practical strategy for evacuation of real-sized urban traffic networks in case of disasters from different causes, quickly enough to be employed in immediate disaster management scenarios, has been identified as one of the most challenging and yet vital problems by many researchers. More specifically, this research develops fast methods to find the optimal integrated strategy for traffic routing and traffic signal control to evacuate real-sized urban networks in the most efficient manner. In this research a solution framework is proposed, developed and tested which is capable of solving these problems in very short computational time. An efficient relaxation-based decomposition method is proposed, implemented for two evacuation integrated routing and signal control model formulations, proven to be optimal for both formulations, and verified to reduce the computational complexity of the optimal integrated routing and signal control problem. The efficiency of the proposed decomposition method is gained by reducing the integrated optimal routing and signal control problem into a relaxed optimal routing problem. This has been achieved through an insight into intersection flows in the optimal routing solution: in at least one of the optimal solutions of the routing problem, each street during each time interval only carries vehicles in at most one direction. This property, being essential to the proposed decomposition method, is called "unidirectionality" in this dissertation. The conditions under which this property exists in the optimal evacuation routing solution are identified, and the existence of unidirectionality is proven for: (1) the common Single-Destination System-Optimal Dynamic Traffic Assignment (SD-SODTA) problem, with the objective to minimize the total time spent in the threat area; and, (2) for the single-destination evacuation problem with varying threat levels, with traffic models that have no spatial queue propagation. The proposed decomposition method has been implemented in compliance with two widely-accepted traffic flow models, the Cell Transmission Model (CTM) and the Point Queue (PQ) model. In each case, the decomposition method finds the optimal solution for the integrated routing and signal control problem. Both traffic models have been coded and applied to a realistic real-size evacuation scenario with promising results. One important feature that is explored is the incorporation of evacuation safety aspects in the optimization model. An index of the threat level is associated with each link that reflects the adverse effects of traveling in a given threat zone on the safety and health of evacuees during the process of evacuation. The optimization problem is then formulated to minimize the total exposure of evacuees to the threat. A hypothetical large-scale chlorine gas spill in a high populated urban area (downtown Tucson, Arizona) has been modeled for testing the evacuation models where the network has varying threat levels. In addition to the proposed decomposition method, an efficient network-flow solution algorithm is also proposed to find the optimal routing of traffic in networks with several threat zones, where the threat levels may be non-uniform across different zones. The proposed method can be categorized in the class of "negative cycle canceling" algorithms for solving minimum cost flow problems. The unique feature in the proposed algorithm is introducing a multi-source shortest path calculation which enables the efficient detection and cancellation of negative cycles. The proposed method is proven to find the optimal solution, and it is also applied to and verified for a mid-size test network scenario. Evacuation Network Flow Algorithms Point-Queue and Spatial-Queue Traffic Signal Optimization Civil Engineering Cell Transmission Model
7	Optimal predictive control of thermal storage in hollow core ventilated slab systems Ren, Mei Juan January 1997 (has links) The energy crisis together with greater environmental awareness, has increased interest in the construction of low energy buildings. Fabric thermal storage systems provide a promising approach for reducing building energy use and cost, and consequently, the emission of environmental pollutants. Hollow core ventilated slab systems are a form of fabric thermal storage system that, through the coupling of the ventilation air with the mass of the slab, are effective in utilizing the building fabric as a thermal store. However, the benefit of such systems can only be realized through the effective control of the thermal storage. This thesis investigates an optimum control strategy for the hollow core ventilated slab systems, that reduces the energy cost of the system without prejudicing the building occupants thermal comfort. The controller uses the predicted ambient temperature and solar radiation, together with a model of the building, to predict the energy costs of the system and the thermal comfort conditions in the occupied space. The optimum control strategy is identified by exercising the model with a numerical optimization method, such that the energy costs are minimized without violating the building occupant's thermal comfort. The thesis describes the use of an Auto Regressive Moving Average model to predict the ambient conditions for the next 24 hours. A building dynamic lumped parameter thermal network model, is also described, together with its validation. The implementation of a Genetic Algorithm search method for optimizing the control strategy is described, and its performance in finding an optimum solution analysed. The characteristics of the optimum schedule of control setpoints are investigated for each season, from which a simplified time-stage control strategy is derived. The effects of weather prediction errors on the optimum control strategy are investigated and the performance of the optimum controller is analysed and compared to a conventional rule-based control strategy. The on-line implementation of the optimal predictive controller would require the accurate estimation of parameters for modelling the building, which could form part of future work. 621.042
8	Regret minimisation and system-efficiency in route choice / Minimização de Regret e eficiência do sistema em escala de rotas Ramos, Gabriel de Oliveira January 2018 (has links) Aprendizagem por reforço multiagente (do inglês, MARL) é uma tarefa desafiadora em que agentes buscam, concorrentemente, uma política capaz de maximizar sua utilidade. Aprender neste tipo de cenário é difícil porque os agentes devem se adaptar uns aos outros, tornando o objetivo um alvo em movimento. Consequentemente, não existem garantias de convergência para problemas de MARL em geral. Esta tese explora um problema em particular, denominado escolha de rotas (onde motoristas egoístas deve escolher rotas que minimizem seus custos de viagem), em busca de garantias de convergência. Em particular, esta tese busca garantir a convergência de algoritmos de MARL para o equilíbrio dos usuários (onde nenhum motorista consegue melhorar seu desempenho mudando de rota) e para o ótimo do sistema (onde o tempo médio de viagem é mínimo). O principal objetivo desta tese é mostrar que, no contexto de escolha de rotas, é possível garantir a convergência de algoritmos de MARL sob certas condições. Primeiramente, introduzimos uma algoritmo de aprendizagem por reforço baseado em minimização de arrependimento, o qual provamos ser capaz de convergir para o equilíbrio dos usuários Nosso algoritmo estima o arrependimento associado com as ações dos agentes e usa tal informação como sinal de reforço dos agentes. Além do mais, estabelecemos um limite superior no arrependimento dos agentes. Em seguida, estendemos o referido algoritmo para lidar com informações não-locais, fornecidas por um serviço de navegação. Ao usar tais informações, os agentes são capazes de estimar melhor o arrependimento de suas ações, o que melhora seu desempenho. Finalmente, de modo a mitigar os efeitos do egoísmo dos agentes, propomos ainda um método genérico de pedágios baseados em custos marginais, onde os agentes são cobrados proporcionalmente ao custo imposto por eles aos demais. Neste sentido, apresentamos ainda um algoritmo de aprendizagem por reforço baseado em pedágios que, provamos, converge para o ótimo do sistema e é mais justo que outros existentes na literatura. / Multiagent reinforcement learning (MARL) is a challenging task, where self-interested agents concurrently learn a policy that maximise their utilities. Learning here is difficult because agents must adapt to each other, which makes their objective a moving target. As a side effect, no convergence guarantees exist for the general MARL setting. This thesis exploits a particular MARL problem, namely route choice (where selfish drivers aim at choosing routes that minimise their travel costs), to deliver convergence guarantees. We are particularly interested in guaranteeing convergence to two fundamental solution concepts: the user equilibrium (UE, when no agent benefits from unilaterally changing its route) and the system optimum (SO, when average travel time is minimum). The main goal of this thesis is to show that, in the context of route choice, MARL can be guaranteed to converge to the UE as well as to the SO upon certain conditions. Firstly, we introduce a regret-minimising Q-learning algorithm, which we prove that converges to the UE. Our algorithm works by estimating the regret associated with agents’ actions and using such information as reinforcement signal for updating the corresponding Q-values. We also establish a bound on the agents’ regret. We then extend this algorithm to deal with non-local information provided by a navigation service. Using such information, agents can improve their regrets estimates, thus performing empirically better. Finally, in order to mitigate the effects of selfishness, we also present a generalised marginal-cost tolling scheme in which drivers are charged proportional to the cost imposed on others. We then devise a toll-based Q-learning algorithm, which we prove that converges to the SO and that is fairer than existing tolling schemes. Sistemas multiagentes Informatica : Transportes Multiagent reinforcement learning Route choice User equilibrium System optimal Regret minimisation Action regret Travel information Marginal-cost tolling
9	Regret minimisation and system-efficiency in route choice / Minimização de Regret e eficiência do sistema em escala de rotas Ramos, Gabriel de Oliveira January 2018 (has links) Aprendizagem por reforço multiagente (do inglês, MARL) é uma tarefa desafiadora em que agentes buscam, concorrentemente, uma política capaz de maximizar sua utilidade. Aprender neste tipo de cenário é difícil porque os agentes devem se adaptar uns aos outros, tornando o objetivo um alvo em movimento. Consequentemente, não existem garantias de convergência para problemas de MARL em geral. Esta tese explora um problema em particular, denominado escolha de rotas (onde motoristas egoístas deve escolher rotas que minimizem seus custos de viagem), em busca de garantias de convergência. Em particular, esta tese busca garantir a convergência de algoritmos de MARL para o equilíbrio dos usuários (onde nenhum motorista consegue melhorar seu desempenho mudando de rota) e para o ótimo do sistema (onde o tempo médio de viagem é mínimo). O principal objetivo desta tese é mostrar que, no contexto de escolha de rotas, é possível garantir a convergência de algoritmos de MARL sob certas condições. Primeiramente, introduzimos uma algoritmo de aprendizagem por reforço baseado em minimização de arrependimento, o qual provamos ser capaz de convergir para o equilíbrio dos usuários Nosso algoritmo estima o arrependimento associado com as ações dos agentes e usa tal informação como sinal de reforço dos agentes. Além do mais, estabelecemos um limite superior no arrependimento dos agentes. Em seguida, estendemos o referido algoritmo para lidar com informações não-locais, fornecidas por um serviço de navegação. Ao usar tais informações, os agentes são capazes de estimar melhor o arrependimento de suas ações, o que melhora seu desempenho. Finalmente, de modo a mitigar os efeitos do egoísmo dos agentes, propomos ainda um método genérico de pedágios baseados em custos marginais, onde os agentes são cobrados proporcionalmente ao custo imposto por eles aos demais. Neste sentido, apresentamos ainda um algoritmo de aprendizagem por reforço baseado em pedágios que, provamos, converge para o ótimo do sistema e é mais justo que outros existentes na literatura. / Multiagent reinforcement learning (MARL) is a challenging task, where self-interested agents concurrently learn a policy that maximise their utilities. Learning here is difficult because agents must adapt to each other, which makes their objective a moving target. As a side effect, no convergence guarantees exist for the general MARL setting. This thesis exploits a particular MARL problem, namely route choice (where selfish drivers aim at choosing routes that minimise their travel costs), to deliver convergence guarantees. We are particularly interested in guaranteeing convergence to two fundamental solution concepts: the user equilibrium (UE, when no agent benefits from unilaterally changing its route) and the system optimum (SO, when average travel time is minimum). The main goal of this thesis is to show that, in the context of route choice, MARL can be guaranteed to converge to the UE as well as to the SO upon certain conditions. Firstly, we introduce a regret-minimising Q-learning algorithm, which we prove that converges to the UE. Our algorithm works by estimating the regret associated with agents’ actions and using such information as reinforcement signal for updating the corresponding Q-values. We also establish a bound on the agents’ regret. We then extend this algorithm to deal with non-local information provided by a navigation service. Using such information, agents can improve their regrets estimates, thus performing empirically better. Finally, in order to mitigate the effects of selfishness, we also present a generalised marginal-cost tolling scheme in which drivers are charged proportional to the cost imposed on others. We then devise a toll-based Q-learning algorithm, which we prove that converges to the SO and that is fairer than existing tolling schemes. Sistemas multiagentes Informatica : Transportes Multiagent reinforcement learning Route choice User equilibrium System optimal Regret minimisation Action regret Travel information Marginal-cost tolling
10	A Comparative Evaluation Of Fdsa,ga, And Sa Non-linear Programming Algorithms And Development Of System-optimal Methodology For Dynamic Pricing On I-95 Express Graham, Don 01 January 2013 (has links) As urban population across the globe increases, the demand for adequate transportation grows. Several strategies have been suggested as a solution to the congestion which results from this high demand outpacing the existing supply of transportation facilities. High –Occupancy Toll (HOT) lanes have become increasingly more popular as a feature on today’s highway system. The I-95 Express HOT lane in Miami Florida, which is currently being expanded from a single Phase (Phase I) into two Phases, is one such HOT facility. With the growing abundance of such facilities comes the need for indepth study of demand patterns and development of an appropriate pricing scheme which reduces congestion. This research develops a method for dynamic pricing on the I-95 HOT facility such as to minimize total travel time and reduce congestion. We apply non-linear programming (NLP) techniques and the finite difference stochastic approximation (FDSA), genetic algorithm (GA) and simulated annealing (SA) stochastic algorithms to formulate and solve the problem within a cell transmission framework. The solution produced is the optimal flow and optimal toll required to minimize total travel time and thus is the system-optimal solution. We perform a comparative evaluation of FDSA, GA and SA non-linear programming algorithms used to solve the NLP and the ANOVA results show that there are differences in the performance of the NLP algorithms in solving this problem and reducing travel time. We then conclude by demonstrating that econometric iv forecasting methods utilizing vector autoregressive (VAR) techniques can be applied to successfully forecast demand for Phase 2 of the 95 Express which is planned for 2014 Dynamic pricing congestion pricing hot lanes non linear programming network stochastic approximation algorithms fdsa ga sa spsa i 95 express anova demand forecast vector auto regression system optimal Civil Engineering Engineering

Search results