Spelling suggestions: "subject:"reinforcement learning"" "subject:"einforcement learning""
371 |
MILATRAS: MIcrosimulation Learning-based Approach to TRansit ASsignmentWahba, Mohamed Medhat Amin Abdel-Latif 26 February 2009 (has links)
Public transit is considered a cost-effective alternative to mitigate the effects of traffic gridlock through the implementation of innovative service designs, and deploying new smart systems for operations control and traveller information. Public transport planners use transit assignment models to predict passenger loads and levels of service.
Existing transit assignment approaches have limitations in evaluating the effects of information technologies, since they are neither sensitive to the types of information that may be provided to travellers nor to the traveller’s response to that information. Moreover, they are not adequate for evaluating the impacts of Intelligent Transportation Systems (ITS) deployments on service reliability, which in turn affect passengers’ behaviour.
This dissertation presents an innovative transit assignment framework, namely the MIcrosimulation Learning-based Approach to TRansit ASsignment – MILATRAS. MILATRAS uses learning and adaptation to represent the dynamic feedback of passengers’ trip choices and their adaptation to service performance. Individual passengers adjust their behaviour (i.e. trip choices) according to their experience with the transit system performance. MILATRAS introduces the concept of ‘mental model’ to maintain and distinguish between the individual’s experience with service performance and the information provided about system conditions.
A dynamic transit path choice model is developed using concepts of Markovian Decision Process (MDP) and Reinforcement Learning (RL). It addresses the departure time and path choices with and without information provision. A parameter-calibration procedure using a generic optimization technique (Genetic Algorithms) is also proposed. A proof-of-concept prototype has been implemented; it investigates the impact of different traveller information provision scenarios on departure time and path choices, and network performance. A large-scale application, including parameter calibration, is conducted for the Toronto Transit Commission (TTC) network.
MILATRAS implements a microsimulation, stochastic (nonequilibrium-based) approach for modelling within-day and day-to-day variations in the transit assignment process, where aggregate travel patterns can be extracted from individual choices. MILATRAS addresses many limitations of existing transit assignment models by exploiting methodologies already established in the areas of traffic assignment and travel behaviour modeling. Such approaches include the microsimulation of transportation systems, learning-based algorithms for modelling travel behaviour, agent-based representation for travellers, and the adoption of Geographical Information Systems (GIS).
This thesis presents a significant step towards the advancement of the modelling for the transit assignment problem by providing a detailed operational specification for an integrated dynamic modelling framework – MILATRAS.
|
372 |
MILATRAS: MIcrosimulation Learning-based Approach to TRansit ASsignmentWahba, Mohamed Medhat Amin Abdel-Latif 26 February 2009 (has links)
Public transit is considered a cost-effective alternative to mitigate the effects of traffic gridlock through the implementation of innovative service designs, and deploying new smart systems for operations control and traveller information. Public transport planners use transit assignment models to predict passenger loads and levels of service.
Existing transit assignment approaches have limitations in evaluating the effects of information technologies, since they are neither sensitive to the types of information that may be provided to travellers nor to the traveller’s response to that information. Moreover, they are not adequate for evaluating the impacts of Intelligent Transportation Systems (ITS) deployments on service reliability, which in turn affect passengers’ behaviour.
This dissertation presents an innovative transit assignment framework, namely the MIcrosimulation Learning-based Approach to TRansit ASsignment – MILATRAS. MILATRAS uses learning and adaptation to represent the dynamic feedback of passengers’ trip choices and their adaptation to service performance. Individual passengers adjust their behaviour (i.e. trip choices) according to their experience with the transit system performance. MILATRAS introduces the concept of ‘mental model’ to maintain and distinguish between the individual’s experience with service performance and the information provided about system conditions.
A dynamic transit path choice model is developed using concepts of Markovian Decision Process (MDP) and Reinforcement Learning (RL). It addresses the departure time and path choices with and without information provision. A parameter-calibration procedure using a generic optimization technique (Genetic Algorithms) is also proposed. A proof-of-concept prototype has been implemented; it investigates the impact of different traveller information provision scenarios on departure time and path choices, and network performance. A large-scale application, including parameter calibration, is conducted for the Toronto Transit Commission (TTC) network.
MILATRAS implements a microsimulation, stochastic (nonequilibrium-based) approach for modelling within-day and day-to-day variations in the transit assignment process, where aggregate travel patterns can be extracted from individual choices. MILATRAS addresses many limitations of existing transit assignment models by exploiting methodologies already established in the areas of traffic assignment and travel behaviour modeling. Such approaches include the microsimulation of transportation systems, learning-based algorithms for modelling travel behaviour, agent-based representation for travellers, and the adoption of Geographical Information Systems (GIS).
This thesis presents a significant step towards the advancement of the modelling for the transit assignment problem by providing a detailed operational specification for an integrated dynamic modelling framework – MILATRAS.
|
373 |
ON DEVELOPMENTAL VARIATION IN HIERARCHICAL SYMBIOTIC POLICY SEARCHKelly, Stephen 16 August 2012 (has links)
A hierarchical symbiotic framework for policy search with genetic programming (GP)
is evaluated in two control-style temporal sequence learning domains. The symbiotic
formulation assumes each policy takes the form of a cooperative team between multiple
symbiont programs. An initial cycle of evolution establishes a diverse range of
host behaviours with limited capability. The second cycle uses these initial policies
as meta actions for reuse by symbiont programs. The relationship between development and ecology is explored by explicitly altering the interaction between learning agent and environment at fixed points throughout evolution. In both task domains, this developmental diversity significantly improves performance. Specifically, ecologies designed to promote good specialists in the first developmental phase and then good generalists result in much stronger organisms from the perspective of generalization ability and efficiency. Conversely, when there is no diversity in the interaction between task environment and policy learner, the resulting hierarchy is not as robust
or general.
The relative contribution from each cycle of evolution in the resulting hierarchical
policies is measured from the perspective of multi-level selection. These multi-level
policies are shown to be significantly better than the sum of contributing meta actions.
|
374 |
Game-independent AI agents for playing Atari 2600 console gamesNaddaf, Yavar Unknown Date
No description available.
|
375 |
Gradient Temporal-Difference Learning AlgorithmsMaei, Hamid Reza Unknown Date
No description available.
|
376 |
Statistical analysis of L1-penalized linear estimation with applicationsÁvila Pires, Bernardo Unknown Date
No description available.
|
377 |
Using behaviour patterns to generate scripts for computer role-playing gamesCutumisu, Maria Unknown Date
No description available.
|
378 |
Learning with ALiCE IILockery, Daniel Alexander 14 September 2007 (has links)
The problem considered in this thesis is the development of an autonomous prototype robot capable of gathering sensory information
from its environment allowing it to provide feedback on the condition of specific targets to aid in maintenance of hydro equipment. The context for the solution to this problem is based on the power grid environment operated by the local hydro utility. The intent is to monitor power line structures by travelling
along skywire located at the top of towers, providing a view of everything beneath it including, for example, insulators, conductors, and towers. The contribution of this thesis is a novel robot design with the potential to prevent hazardous situations and the use of rough coverage feedback modified reinforcement learning algorithms to establish behaviours.
|
379 |
DRARS, A Dynamic Risk-Aware Recommender SystemBouneffouf, Djallel 19 December 2013 (has links) (PDF)
L'immense quantité d'information générée et gérée au quotidien par les systèmes d'information et leurs utilisateurs conduit inéluctablement ?a la problématique de surcharge d'information. Dans ce contexte, les systèmes de recommandation traditionnels fournissent des informations pertinentes aux utilisateurs. Néanmoins, avec la propagation récente des dispositifs mobiles (Smartphones et tablettes), nous constatons une migration progressive des utilisateurs vers la manipulation d'environnements pérvasifs. Le problème avec les approches traditionnelles de recommandation est qu'elles n'utilisent pas toute l'information disponible pour produire des recommandations. Davantage d'informations contextuelles pourraient être utilisées dans le processus de recommandation pour aboutir à des recommandations plus précises. Les systèmes de recommandations sensibles au contexte (CARS) combinent les caractéristiques des systèmes sensibles au contexte et des systèmes de recommandation an de fournir des informations personnalisées aux utilisateurs dans des environnements ubiquitaires. Dans cette perspective ou tout ce qui concerne l'utilisateur est dynamique, les contenus qu'il manipule et son environnement, deux questions principales doivent être adressées : i) Comment prendre en compte la dynamicité des contenus de l'utilisateur ? et ii ) Comment éviter d'être intrusif en particulier dans des situations critiques ?. En réponse ?a ces questions, nous avons développé un système de recommandation dynamique et sensible au risque appelé DRARS (Dynamic Risk-Aware Recommender System), qui modélise la recommandation sensible au contexte comme un problème de bandit. Ce système combine une technique de filtrage basée sur le contenu et un algorithme de bandit contextuel. Nous avons montré que DRARS améliore la stratégie de l'algorithme UCB (Upper Con dence Bound), le meilleur algorithme actuellement disponible, en calculant la valeur d'exploration la plus optimale pour maintenir un compromis entre exploration et exploitation basé sur le niveau de risque de la situation courante de l'utilisateur. Nous avons mené des expériences dans un contexte industriel avec des données réelles et des utilisateurs réels et nous avons montré que la prise en compte du niveau de risque de la situation de l'utilisateur augmentait significativement la performance du système de recommandation.
|
380 |
Spectral Approaches to Learning Predictive RepresentationsBoots, Byron 01 September 2012 (has links)
A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time consuming process. This motivates an alternative approach: learning a model directly from observations. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or, they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose spectral subspace identification algorithms which provably learn compact, accurate, predictive models of partially observable dynamical systems directly from sequences of action-observation pairs. Our research agenda includes several variations of this general approach: spectral methods for classical models like Kalman filters and hidden Markov models, batch algorithms and online algorithms, and kernel-based algorithms for learning models in high- and infinite-dimensional feature spaces. All of these approaches share a common framework: the model’s belief space is represented as predictions of observable quantities and spectral algorithms are applied to learn the model parameters. Unlike the popular EM algorithm, spectral learning algorithms are statistically consistent, computationally efficient, and easy to implement using established matrixalgebra techniques. We evaluate our learning algorithms on a series of prediction and planning tasks involving simulated data and real robotic systems.
|
Page generated in 0.1115 seconds