• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • 5
  • 4
  • 1
  • Tagged with
  • 24
  • 24
  • 24
  • 9
  • 9
  • 8
  • 6
  • 5
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Commande sous contraintes et incertitudes des réseaux de transport / Control under constraints and uncertainties of transportation networks

Sleiman, Mohamad 12 December 2018 (has links)
Le transport a toujours été l'un des composants déterminants de la vie urbaine et de son développement économique. A partir de la seconde moitié du siècle dernier, l'amélioration du niveau de vie moyen et du taux d'équipement des ménages a permis au plus grand nombre d'accéder au déplacement par véhicule particulier. Nous avons donc assisté à une course entre la croissance du trafic routier et les progrès quantitatifs et qualitatifs de la voirie. Cette quantité d'actions génère des problèmes au niveau de la fluidité du trafic, d'où l'apparition de congestion.La congestion se produit aujourd'hui de façon quasi-quotidienne dans les réseaux routiers. Elle est source de perte de temps, augmentation de la consommation d'énergie, nuisance et détérioration de l'environnement. La solution aux problèmes de congestion routière ne passe pas toujours par l'augmentation de l'investissement dans les infrastructures de transport. En effet, l'offre de terrains est épuisée et le développement de l'infrastructure routière est coûteux. D'où, la tendance actuelle est plutôt à une meilleure utilisation des infrastructures existantes. En particulier, les feux de signalisation jouent un rôle important parmi les approches qui permettent d'éviter la congestion. En effet, la conception d'une meilleur commande des feux de signalisation a fait l'objet de plusieurs recherches afin d’améliorer la circulation au niveau du réseau à grande échelle.Dans ce mémoire, nous nous intéressons essentiellement à un travail en amont (action a priori) permettant d'éviter la congestion en forçant le nombre de véhicules à ne pas dépasser les capacités maximales des voies du réseau de transport. Après avoir décrire les réseaux de carrefours des feux, nous présentons d'une manière non exhaustive, les méthodes développées pour la gestion et la régulation des carrefours. Ensuite, nous proposons trois stratégies de contrôle qui traitent le problème de contrôle de manières différentes. La première fait appel à la théorie des systèmes dissipatifs, la deuxième consiste à stabiliser le système au sens de Lyapunov autour de sa situation nominale et la troisième le stabilise en temps fini (pendant les heures de pointe). Ces commandes proposées respectent les contraintes sur l'état et sur la commande et prennent en considération les incertitudes existantes dans le système. Finalement, l'existence des commandes proposées a été caractérisée par la faisabilité de certaines LMI en utilisant l'outil CVX sous MATLAB. De plus, les performances de chaque commande sont évaluées par des simulations. / Transport has always been one of the key components of urban life and its economic development. From the second half of the last century, the improvement in the average standard of living and the household equipment rate allowed the greatest number of people to access the journey by private vehicle. We therefore witnessed a race between the growth of road traffic and the quantitative and qualitative progress of roads. This quantity of actions generates problems with the fluidity of the traffic, hence the appearance of congestion.The congestion occurs today almost daily in road networks. It is source of waste of time, increase of the energy consumption, the nuisance and the deterioration of the environment. The solution to the problems of road congestion does not still pass by the increase of the investment in the infrastructures of transport. Indeed, the offer of grounds is exhausted and the development of the road infrastructure is expensive. Hence, the current trend is rather for a better use of the existing infrastructures. In particular, traffic lights play an important role in avoiding congestion. Indeed, the design of a better control of traffic lights has been the subject of several researches in order to improve the network circulation on a large scale.In this thesis, we are mainly interested in a work that prevents the congestion by forcing the number of vehicles to not exceed the lane capacities. After having described the network of intersections, we have realized a state of the art on the methods developed for the management and regulation of intersections. Next, we propose three control strategies that treat the control problem in different ways. The first one involves the theory of dissipative systems, the second one is to stabilize the system in the sense of Lyapunov around its nominal situation and the third one stabilizes it in finite time (during peak hours). These proposed controls respect the constraints on both state and control. In addition, they take into account the uncertainties in the system. Finally, the result of each strategy developed is presented by LMI in order to be solved by using the CVX tool under MATLAB. Besides, the performance of each control is evaluated by simulations.
22

Dynamic Cycle Time in Traffic Signal of Cyclic Max-Pressure Control

Zoabi, Razi, Haddad, Jack 23 June 2023 (has links)
In this paper, a new cyclic structure of a max pressure travel time-based traffic signal control is developed to seek an optimal coordination in large-scale urban networks. The focus of the current paper is on dynamic manipulation of cycle lengths within cyclic structure. Following the application of a decentralized approach, which requires only local information in order to offer proper phase durations, the control strategy aims at maximizing the overall network throughput. Previous works of cyclic max-pressure control have presented a cyclic notion to actuate the controller in a cyclic manner. However, no input has been provided on the optimal cycle length for each intersection to be chosen in a network, and along with the dynamic and stochastic nature of the trips, it is not clear what are the main phases of the intersections and how to coordinate them. The developed cyclic max pressure control schemes are compared with an exiting cyclic scheme in the literature. Simulation results show that the newly proposed cyclic structure of the time-based approach offers better decision-making.
23

Resource Allocation for Sequential Decision Making Under Uncertainaty : Studies in Vehicular Traffic Control, Service Systems, Sensor Networks and Mechanism Design

Prashanth, L A January 2013 (has links) (PDF)
A fundamental question in a sequential decision making setting under uncertainty is “how to allocate resources amongst competing entities so as to maximize the rewards accumulated in the long run?”. The resources allocated may be either abstract quantities such as time or concrete quantities such as manpower. The sequential decision making setting involves one or more agents interacting with an environment to procure rewards at every time instant and the goal is to find an optimal policy for choosing actions. Most of these problems involve multiple (infinite) stages and the objective function is usually a long-run performance objective. The problem is further complicated by the uncertainties in the sys-tem, for instance, the stochastic noise and partial observability in a single-agent setting or private information of the agents in a multi-agent setting. The dimensionality of the problem also plays an important role in the solution methodology adopted. Most of the real-world problems involve high-dimensional state and action spaces and an important design aspect of the solution is the choice of knowledge representation. The aim of this thesis is to answer important resource allocation related questions in different real-world application contexts and in the process contribute novel algorithms to the theory as well. The resource allocation algorithms considered include those from stochastic optimization, stochastic control and reinforcement learning. A number of new algorithms are developed as well. The application contexts selected encompass both single and multi-agent systems, abstract and concrete resources and contain high-dimensional state and control spaces. The empirical results from the various studies performed indicate that the algorithms presented here perform significantly better than those previously proposed in the literature. Further, the algorithms presented here are also shown to theoretically converge, hence guaranteeing optimal performance. We now briefly describe the various studies conducted here to investigate problems of resource allocation under uncertainties of different kinds: Vehicular Traffic Control The aim here is to optimize the ‘green time’ resource of the individual lanes in road networks that maximizes a certain long-term performance objective. We develop several reinforcement learning based algorithms for solving this problem. In the infinite horizon discounted Markov decision process setting, a Q-learning based traffic light control (TLC) algorithm that incorporates feature based representations and function approximation to handle large road networks is proposed, see Prashanth and Bhatnagar [2011b]. This TLC algorithm works with coarse information, obtained via graded thresholds, about the congestion level on the lanes of the road network. However, the graded threshold values used in the above Q-learning based TLC algorithm as well as several other graded threshold-based TLC algorithms that we propose, may not be optimal for all traffic conditions. We therefore also develop a new algorithm based on SPSA to tune the associated thresholds to the ‘optimal’ values (Prashanth and Bhatnagar [2012]). Our thresh-old tuning algorithm is online, incremental with proven convergence to the optimal values of thresholds. Further, we also study average cost traffic signal control and develop two novel reinforcement learning based TLC algorithms with function approximation (Prashanth and Bhatnagar [2011c]). Lastly, we also develop a feature adaptation method for ‘optimal’ feature selection (Bhatnagar et al. [2012a]). This algorithm adapts the features in a way as to converge to an optimal set of features, which can then be used in the algorithm. Service Systems The aim here is to optimize the ‘workforce’, the critical resource of any service system. However, adapting the staffing levels to the workloads in such systems is nontrivial as the queue stability and aggregate service level agreement (SLA) constraints have to be complied with. We formulate this problem as a constrained hidden Markov process with a (discrete) worker parameter and propose simultaneous perturbation based simulation optimization algorithms for this purpose. The algorithms include both first order as well as second order methods and incorporate SPSA based gradient estimates in the primal, with dual ascent for the Lagrange multipliers. All the algorithms that we propose are online, incremental and are easy to implement. Further, they involve a certain generalized smooth projection operator, which is essential to project the continuous-valued worker parameter updates obtained from the SASOC algorithms onto the discrete set. We validate our algorithms on five real-life service systems and compare their performance with a state-of-the-art optimization tool-kit OptQuest. Being ��times faster than OptQuest, our scheme is particularly suitable for adaptive labor staffing. Also, we observe that it guarantees convergence and finds better solutions than OptQuest in many cases. Wireless Sensor Networks The aim here is to allocate the ‘sleep time’ (resource) of the individual sensors in an intrusion detection application such that the energy consumption from the sensors is reduced, while keeping the tracking error to a minimum. We model this sleep–wake scheduling problem as a partially-observed Markov decision process (POMDP) and propose novel RL-based algorithms -with both long-run discounted and average cost objectives -for solving this problem. All our algorithms incorporate function approximation and feature-based representations to handle the curse of dimensionality. Further, the feature selection scheme used in each of the proposed algorithms intelligently manages the energy cost and tracking cost factors, which in turn, assists the search for the optimal sleeping policy. The results from the simulation experiments suggest that our proposed algorithms perform better than a recently proposed algorithm from Fuemmeler and Veeravalli [2008], Fuemmeler et al. [2011]. Mechanism Design The setting here is of multiple self-interested agents with limited capacities, attempting to maximize their individual utilities, which often comes at the expense of the group’s utility. The aim of the resource allocator here then is to efficiently allocate the resource (which is being contended for, by the agents) and also maximize the social welfare via the ‘right’ transfer of payments. In other words, the problem is to find an incentive compatible transfer scheme following a socially efficient allocation. We present two novel mechanisms with progressively realistic assumptions about agent types aimed at economic scenarios where agents have limited capacities. For the simplest case where agent types consist of a unit cost of production and a capacity that does not change with time, we provide an enhancement to the static mechanism of Dash et al. [2007] that effectively deters misreport of the capacity type element by an agent to receive an allocation beyond its capacity, which thereby damages other agents. Our model incorporates an agent’s preference to harm other agents through a additive factor in the utility function of an agent and the mechanism we propose achieves strategy proofness by means of a novel penalty scheme. Next, we consider a dynamic setting where agent types evolve and the individual agents here again have a preference to harm others via capacity misreports. We show via a counterexample that the dynamic pivot mechanism of Bergemann and Valimaki [2010] cannot be directly applied in our setting with capacity-limited alim¨agents. We propose an enhancement to the mechanism of Bergemann and V¨alim¨aki [2010] that ensures truth telling w.r.t. capacity type element through a variable penalty scheme (in the spirit of the static mechanism). We show that each of our mechanisms is ex-post incentive compatible, ex-post individually rational, and socially efficient
24

Feature Adaptation Algorithms for Reinforcement Learning with Applications to Wireless Sensor Networks And Road Traffic Control

Prabuchandran, K J January 2016 (has links) (PDF)
Many sequential decision making problems under uncertainty arising in engineering, science and economics are often modelled as Markov Decision Processes (MDPs). In the setting of MDPs, the goal is to and a state dependent optimal sequence of actions that minimizes a certain long-term performance criterion. The standard dynamic programming approach to solve an MDP for the optimal decisions requires a complete model of the MDP and is computationally feasible only for small state-action MDPs. Reinforcement learning (RL) methods, on the other hand, are model-free simulation based approaches for solving MDPs. In many real world applications, one is often faced with MDPs that have large state-action spaces whose model is unknown, however, whose outcomes can be simulated. In order to solve such (large) MDPs, one either resorts to the technique of function approximation in conjunction with RL methods or develops application specific RL methods. A solution based on RL methods with function approximation comes with the associated problem of choosing the right features for approximation and a solution based on application specific RL methods primarily relies on utilizing the problem structure. In this thesis, we investigate the problem of choosing the right features for RL methods based on function approximation as well as develop novel RL algorithms that adaptively obtain best features for approximation. Subsequently, we also develop problem specie RL methods for applications arising in the areas of wireless sensor networks and road traffic control. In the first part of the thesis, we consider the problem of finding the best features for value function approximation in reinforcement learning for the long-run discounted cost objective. We quantify the error in the approximation for any given feature and the approximation parameter by the mean square Bellman error (MSBE) objective and develop an online algorithm to optimize MSBE. Subsequently, we propose the first online actor-critic scheme with adaptive bases to find a locally optimal (control) policy for an MDP under the weighted discounted cost objective. The actor performs gradient search in the space of policy parameters using simultaneous perturbation stochastic approximation (SPSA) gradient estimates. This gradient computation however requires estimates of the value function of the policy. The value function is approximated using a linear architecture and its estimate is obtained from the critic. The error in approximation of the value function, however, results in sub-optimal policies. Thus, we obtain the best features by performing a gradient descent on the Grassmannian of features to minimize a MSBE objective. We provide a proof of convergence of our control algorithm to a locally optimal policy and show numerical results illustrating the performance of our algorithm. In our next work, we develop an online actor-critic control algorithm with adaptive feature tuning for MDPs under the long-run average cost objective. In this setting, a gradient search in the policy parameters is performed using policy gradient estimates to improve the performance of the actor. The computation of the aforementioned gradient however requires estimates of the differential value function of the policy. In order to obtain good estimates of the differential value function, the critic adaptively tunes the features to obtain the best representation of the value function using gradient search in the Grassmannian of features. We prove that our actor-critic algorithm converges to a locally optimal policy. Experiments on two different MDP settings show performance improvements resulting from our feature adaptation scheme. In the second part of the thesis, we develop problem specific RL solution methods for the two aforementioned applications. In both the applications, the size of the state-action space in the formulated MDPs is large. However, by utilizing the problem structure we develop scalable RL algorithms. In the wireless sensor networks application, we develop RL algorithms to find optimal energy management policies (EMPs) for energy harvesting (EH) sensor nodes. First, we consider the case of a single EH sensor node and formulate the problem of finding an optimal EMP in the discounted cost MDP setting. We then propose two RL algorithms to maximize network performance. Through simulations, our algorithms are seen to outperform the algorithms in the literature. Our RL algorithms for the single EH sensor node do not scale when there are multiple sensor nodes. In our second work, we consider the problem of finding optimal energy sharing policies that maximize the network performance of a system comprising of multiple sensor nodes and a single energy harvesting (EH) source. We develop efficient energy sharing algorithms, namely Q-learning algorithm with exploration mechanisms based on the -greedy method as well as upper confidence bound (UCB). We extend these algorithms by incorporating state and action space aggregation to tackle state-action space explosion in the MDP. We also develop a cross entropy based method that incorporates policy parameterization in order to find near optimal energy sharing policies. Through numerical experiments, we show that our algorithms yield energy sharing policies that outperform the heuristic greedy method. In the context of road traffic control, optimal control of traffic lights at junctions or traffic signal control (TSC) is essential for reducing the average delay experienced by the road users. This problem is hard to solve when simultaneously considering all the junctions in the road network. So, we propose a decentralized multi-agent reinforcement learning (MARL) algorithm for solving this problem by considering each junction in the road network as a separate agent (controller) to obtain dynamic TSC policies. We propose two approaches to minimize the average delay. In the first approach, each agent decides the signal duration of its phases in a round-robin (RR) manner using the multi-agent Q-learning algorithm. We show through simulations over VISSIM (microscopic traffic simulator) that our round-robin MARL algorithms perform significantly better than both the standard fixed signal timing (FST) algorithm and the saturation balancing (SAT) algorithm over two real road networks. In the second approach, instead of optimizing green light duration, each agent optimizes the order of the phase sequence. We then employ our MARL algorithms by suitably changing the state-action space and cost structure of the MDP. We show through simulations over VISSIM that our non-round robin MARL algorithms perform significantly better than the FST, SAT and the round-robin MARL algorithms based on the first approach. However, on the other hand, our round-robin MARL algorithms are more practically viable as they conform with the psychology of road users.

Page generated in 0.0709 seconds