61 |
Conception et commande d'une structure de locomotion compliante pour le franchissement d'obstacle / Design and control of a compliant locomotion structure for obstacle crossingBouton, Arthur 16 November 2017 (has links)
La recherche d’une locomotion performante sur des terrains accidentés constitue encore à l’heure actuelle un défi pour les systèmes robotisés de toutes sortes s’y attelant. Les robots hybrides de type “roues-pattes”, qui tentent d’allier l’efficacité énergétique des roues à l’agilité des pattes, en sont un exemple aux capacités potentiellement très prometteuses. Malheureusement, le contrôle de telles structures s’avère rapidement problématique du fait des redondances cinématiques, mais aussi et surtout de la difficulté que pose la connaissance exacte de la géométrie du sol à mesure que le robot avance. Cette thèse propose alors une réponse à la complexité des systèmes roulants reconfigurables par une approche synergique entre compliance et actionnement. Pour cela, nous proposons d’exploiter une décomposition idéalement orthogonale entre les différentes formes de compliances qui réalisent la suspension du robot. Ainsi, l’actionnement au sein de la structure est ici dédié à un contrôle des efforts verticaux s’exerçant sur les roues, tandis que les déplacements horizontaux de ces dernières sont le fait d’une raideur passive combinée à une modulation locale des vitesses d’entraînement. La posture du robot est maîtrisée via l’asservissement des forces verticales fournies par un actionnement de type série-élastique. Ceci permet de garantir une adaptation spontanée de la hauteur des roues tout en conservant l’ascendant sur la distribution de la charge. La faisabilité d’un tel système de locomotion est validée à travers un prototype reposant sur quatre “roues-pattes” compliantes. Celui-ci, entièrement conçu dans le cadre de cette étude, approche la décomposition fonctionnelle proposée tout en répondant aux contraintes de réalisation et de robustesse. Tirant parti de la décomposition fonctionnelle proposée pour la structure, deux procédés de commande sont présentés afin de réaliser le franchissement des obstacles : le premier vise à exploiter l’inertie du châssis pour réaliser une modification locale des forces verticales appliquées aux roues, tandis que le second est basé sur la sélection d’un mode de répartition des efforts adaptés à la poursuite d’une évolution quasi-statique en toutes circonstances. Pour cette dernière commande, deux méthodes de synthèse sont abordées : l’une via un algorithme d’apprentissage de type “Q-learning” et l’autre par détermination de règles expertes paramétrées. Ces commandes, validées par des simulations dynamiques dans des situations variées, se basent exclusivement sur des données proprioceptives accessibles immédiatement par la mesure des variables articulaires de la structure. De cette manière, le robot réagit directement au contact des obstacles, sans avoir besoin de connaître à l’avance la géométrie du sol. / Performing an efficient locomotion on rough terrains is still a challenge for robotic systems of all kinds. “Wheel-on-leg” robots that try to combine energy efficiency of wheels with leg agility are an example with potentially very promising capabilities. Unfortunately, control of such structures turns out to be problematic because of the kinematic redundanciesand, above all, the difficulty of precisely evaluating the ground geometry as the robot advances. This thesis proposes a solution to the complexity of reconfigurable rolling systems by a synergic approach between compliance and actuation.To this purpose, we propose to exploit an ideally orthogonal decomposition between the different movements enabled by the robot suspension due to compliant elements. Then, the structure actuation is here dedicated to controlling the vertical forces applied on wheels, while the horizontal wheel displacements are due to a passive stiffness combined with a local modulation of wheel speed. The robot posture is controlled through the vertical forces servoing provided by a series elastic actuation. This ensures a spontaneous adaptation of wheel heights while keeping the control on load distribution. The feasibility of such a locomotion system is validated through a prototype based on four compliant “wheel-legs”. Entirely conceived as part of this study, this one approximates the proposed functional decomposition while meeting the realization and robustness constraints. We also present two control methods that take advantage of the functional decompositionproposed for the structure in order to cross obstacles. The first one aims to exploit the chassis inertia in order to perform a local modification of the vertical forces applied on wheels, while the second one is based on the selection of proper ways of distributing forces in order to be able to pursue a quasi-static advance in all circumstances. Two approaches are given for the production of the last control : either with a “Q-learning” algorithm or by determining parameterized expert rules. Validated by dynamic simulations in various situations, these controls rely only on proprioceptive data immediately provided by the measurement of articular variables. This way, the robot directly reacts when it touches obstacles, without having to know the ground geometry in advance.
|
62 |
Solution Of Delayed Reinforcement Learning Problems Having Continuous Action SpacesRavindran, B 03 1900 (has links) (PDF)
No description available.
|
63 |
Plánování cesty robotu pomocí posilovaného učení / Robot path planning by means of reinforcement learningVeselovský, Michal January 2013 (has links)
This thesis is dealing with path planning for autonomous robot in enviromenment with static obstacles. Thesis includes analysis of different approaches for path planning, description of methods utilizing reinforcement learning and experiments with them. Main outputs of thesis are working algorithms for path planning based on Q-learning, verifying their functionality and mutual comparison.
|
64 |
A Modified Q-Learning Approach for Predicting Mortality in Patients Diagnosed with SepsisDunn, Noah M. 15 April 2021 (has links)
No description available.
|
65 |
Where not what: the role of spatial-motor processing in decision-makingBanks, Parker January 2021 (has links)
Decision-making is comprised of an incredibly varied set of behaviours. However, all vertebrates tend to repeat previously rewarding actions and avoid those that have led to loss, behaviours known collectively as the win-stay, lose-shift strategy. This response strategy is supported by the sensorimotor striatum and nucleus accumbens, structures also implicated in spatial processing and the integration of sensory information in order to guide motor action. Therefore, choices may be represented as spatial-motor actions whose value is determined by the rewards and punishments associated with that action. In this dissertation I demonstrate that the location of choices relative to previous rewards and punishments, rather than their identities, determines their value. Chapters 2 and 4 demonstrate that the location of rewards and punishments drives future decisions to win-stay or lose-shift towards that location. Even when choices differ in colour or shape, choice value is determined by location, not visual identity. Chapter 3 compares decision-making when two, six, twelve, or eighteen choices are present, finding that the value of a win or loss is not tied to a single location, but is distributed throughout the choice environment. Finally, Chapter 5 provides anatomical support for the spatial-motor basis of choice. Specifically, win-stay responses are associated with greater oscillatory activity than win-shift responses in the motor cortex corresponding to the hand used to make a choice, whereas lose-shift responses are accompanied by greater activation of frontal systems compared to lose-stay responses. The win-stay and lose-shift behaviours activate structures known to project to different regions of the striatum. Overall, this dissertation provides behavioural evidence that choice location, not visual identity, determines choice value. / Thesis / Doctor of Philosophy (PhD)
|
66 |
Deep Q Learning with a Multi-Level Vehicle Perception for Cooperative Automated Highway DrivingHamilton, Richard January 2021 (has links)
Autonomous vehicles, commonly known as “self-driving cars”, are increasingly becoming of interest for researchers due to their potential to mitigate traffic accidents and congestion. Using reinforcement learning, previous research has demonstrated that a DQN agent can be trained to effectively navigate a simulated two-lane environment via cooperative driving, in which a model of V2X technology allows an AV to receive information from surrounding vehicles (termed Primary Perceived Vehicles) to make driving decisions. Results have demonstrated that the DQN agent can learn to navigate longitudinally and laterally, but with a prohibitively high collision rate of 1.5% - 4.8% and an average speed of 13.4 m/s. In this research, the impact of including information from traffic vehicles that are outside of those that immediately surround the AV (termed Secondary Perceived Vehicles) as inputs to a DQN agent is investigated. Results indicate that while including velocity and distance information from SPVs does not improve the collision rate and average speed of the driving algorithm, it does yield a lower standard deviation of speed during episodes, indicating lower acceleration. This effect, however, is lost when the agent is tested under constant traffic flow scenarios (as opposed to fluctuating driving conditions). Taken together, it is concluded that while the SPV inclusion does not have
an impact on collision rate and average speed, its ability to achieve the same performance with lower acceleration can significantly improve fuel economy and drive quality. These findings give a better understanding of how additional vehicle information during cooperative driving
affects automated driving. / Thesis / Master of Applied Science (MASc)
|
67 |
Deep Reinforcement Learning for Card GamesTegnér Mohringe, Oscar, Cali, Rayan January 2022 (has links)
This project aims to investigate how reinforcement learning (RL) techniques can be applied to the card game LimitTexas Hold’em. RL is a type of machine learning that can learn to optimally solve problems that can be formulated according toa Markov Decision Process.We considered two different RL algorithms, Deep Q-Learning(DQN) for its popularity within the RL community and DeepMonte-Carlo (DMC) for its success in other card games. With the goal of investigating how different parameters affect their performance and if possible achieve human performance.To achieve this, a subset of the parameters used by these methods were varied and their impact on the overall learning performance was investigated. With both DQN and DMC we were able to isolate parameters that had a significant impact on the performance.While both methods failed to reach human performance, both showed obvious signs of learning. The DQN algorithm’s biggest flaw was that it tended to fall into simplified strategies where it would stick to using only one action. The pitfall for DMC was the fact that the algorithm has a high variance and therefore needs a lot of samples to train. However, despite this fallacy,the algorithm has seemingly developed a primitive strategy. We believe that with some modifications to the methods, better results could be achieved. / Detta projekt strävar efter att undersöka hur olika Förstärkningsinlärning (RL) tekniker kan implementeras för kortspelet Limit Texas Hold’Em. RL är en typ av maskininlärning som kan lära sig att optimalt lösa problem som kan formuleras enligt en markovbeslutsprocess. Vi betraktade två olika algoritmer, Deep Q-Learning (DQN) som valdes för sin popularitet och Deep Monte-Carlo (DMC) valdes för dess tidigare framgång i andra kortspel. Med målet att undersöka hur olika parametrar påverkar inlärningsprocessen och om möjligt uppnå mänsklig prestanda. För att uppnå detta så valdes en delmängd av de parametrar som används av dessa metoder. Dessa ändrades successivt för att sedan mäta dess påverkan på den övergripande inlärningsprestandan. Med både DQN och DMC så lyckades vi isolera parametrar som hade en signifikant påverkan på prestandan. Trots att båda metoderna misslyckades med att uppnå mänsklig prestanda så visade båda tecken på upplärning. Det största problemet med DQN var att metoden tenderade att fastna i enkla strategier där den enbart valde ett drag. För DMC så låg problemet i att metoden har en hög varians vilket innebär att metoden behöver mycket tid för att tränas upp. Dock så lyckades ändå metoden utveckla en primitiv strategi. Vi tror att metoder med ett par modifikationer skulle kunna nå ett bättre resultat. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm
|
68 |
Complementary Layered LearningMondesire, Sean 01 January 2014 (has links)
Layered learning is a machine learning paradigm used to develop autonomous robotic-based agents by decomposing a complex task into simpler subtasks and learns each sequentially. Although the paradigm continues to have success in multiple domains, performance can be unexpectedly unsatisfactory. Using Boolean-logic problems and autonomous agent navigation, we show poor performance is due to the learner forgetting how to perform earlier learned subtasks too quickly (favoring plasticity) or having difficulty learning new things (favoring stability). We demonstrate that this imbalance can hinder learning so that task performance is no better than that of a suboptimal learning technique, monolithic learning, which does not use decomposition. Through the resulting analyses, we have identified factors that can lead to imbalance and their negative effects, providing a deeper understanding of stability and plasticity in decomposition-based approaches, such as layered learning. To combat the negative effects of the imbalance, a complementary learning system is applied to layered learning. The new technique augments the original learning approach with dual storage region policies to preserve useful information from being removed from an agent’s policy prematurely. Through multi-agent experiments, a 28% task performance increase is obtained with the proposed augmentations over the original technique.
|
69 |
Control of an Inverted Pendulum Using Reinforcement Learning MethodsKärn, Joel January 2021 (has links)
In this paper the two reinforcement learning algorithmsQ-learning and deep Q-learning (DQN) are used tobalance an inverted pendulum. In order to compare the two, bothalgorithms are optimized to some extent, by evaluating differentvalues for some parameters of the algorithms. Since the differencebetween Q-learning and DQN is a deep neural network (DNN),some benefits of a DNN are then discussed.The conclusion is that this particular problem is simple enoughfor the Q-learning algorithm to work well and is preferable,even though the DQN algorithm solves the problem in fewerepisodes. This is due to the stability of the Q-learning algorithmand because more time is required to find a suitable DNN andevaluate appropriate parameters for the DQN algorithm, than tofind the proper parameters for the Q-learning algorithm. / I denna rapport används två algoritmer inom förstärkningsinlärning och djup Q-inlärning (DQN), för att balancera en omvänd pendel. För att jämföra dem så optimeras algoritmerna i viss utsträckning genom att testa olika värden för vissa av deras parametrar. Eftersom att skillnaden mellan Q-inlärning och DQN är ett djupt neuralt nätverk (DNN) så diskuterades fördelen med ett DNN. Slutstatsen är att för ett så pass enkelt problem så fungerar Q-inlärningsalgoritmen bra och är att föredra, trots att DQNalgoritmen löser problemet på färre episoder. Detta är pågrund av Q-inlärningsalgoritmens stabilitet och att mer tid krävs för att hitta ett passande DNN och hitta lämpliga parametrar för DQN-algoritmen än vad det krävs för att hitta bra parametrar för Q-inlärningsalgoritmen. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
|
70 |
Energy Efficiency of 5G Radio Access NetworksPeesapati, Saivenkata Krishna Gowtam January 2020 (has links)
The roll-out of the fifth-generation (5G) wireless networks alongside existing generations and characterized by a dense deployment of base stations (BSs) to serve an ever-increasing number of users and services leads to a drastic increase in the overall network energy consumption (EC). It can lead to an unprecedented rise in operational expenditure (OPEX) for the network operators and an increased global carbon footprint. The present-day networks are dimensioned according to the peak traffic demands, and hence are under-utilized due to the daily traffic variations. Therefore, to save energy, BSs can be put into sleep with different levels following the daily load variations. Selection of the right sleep level at the right instant is important to adapt the availability of the resources to the traffic load to maximize the energy savings without degrading the performance of the network. Previous studies focused on the selection of sleep modes (SMs) to maximize energy saving or the sleep duration given configuration and network resources. However, adaptive BS configuration together with SMs have not been investigated. In this thesis, the goal is to consider the design of the wireless network resources to cover an area with a given traffic demand in combination with sleep mode management. To achieve this, a novel EC model is proposed to capture the activity time of a 5G BS in a multi-cell environment. The activity factor of a BS is defined as the fraction of time the BS is transmitting over a fixed period and is dependent on the amount of BS resources. The new model captures the variation in power consumption by configuring three BS resources: 1) the active array size, 2) the bandwidth, and 3) the spatial multiplexing factor. We then implement a Q-learning algorithm to adapt these resources following the traffic demand and also the selection of sleep levels. Our results show that the difference in the average daily EC of BSs considered can be as high as 60% depending on the deployment area. Furthermore, the EC of a BS can be reduced by 57% during the low traffic hours by having deeper sleep levels as compared to the baseline scenario with no sleep modes. Implementing the resource adaptation algorithm further reduces the average EC of the BS by up to 20% as compared to the case without resource adaptation. However, the EE gain obtained by the algorithm depends on its convergence, which varies with the distribution of the users in the cell, the peak traffic demand, and the BS resources available. Our results show that by combining resource adaptation with deep sleep levels, one can obtain significant energy savings under variable traffic load. However, to ensure the reliability of the results obtained, we emphasize the need to guarantee the convergence of the algorithm before its use for resource adaptation. / Under de senaste åren har intresset för energieffektivitet (EE) av mobila kommunikationssystem ökat på grund av den ökande energiförbrukningen (EF). Med femte generationens mobilsystem, vilket kännetecknas av mer komplexa och kraftfulla basstationer (BS) för att betjäna ett ständigt ökande antal användare och tjänster, riskerar nätverkets totala EF att öka ytterligare. Detta kan leda till en markant ökning av operativa utgifter (OPEX) för nätoperatörerna och ett ökat globalt koldioxidavtryck. Många studier har visat att dagens nätverk ofta är överdimensionerade och att radioresurserna är underutnyttjade på grund av variationerna i det dagliga trafikbehovet. Genom att anpassa BS radioresurser efter trafikbehovet kan man säkerställa att man uppfyller användarkraven samtidigt som man minskar den totala EF. I denna studie föreslås en aktivitetsbaserad metod för att utvärdera EF för en BS. Aktivitetsfaktorn för en BS definieras som den bråkdel av tiden som BS är aktiv (sänder data) under en fast period och är beroende av mängden radioresurser. För att kvantifiera EF för en BS föreslås en ny modell som beräknar in effekt till BS som funktion av utstrålad effekt från BS. Den nya modellen fångar variationen i energiförbrukning med tre huvudsakliga radioresurser som är: 1) antal sändarantenner 2) bandbredd och 3) den spatiella multiplexingfaktorn (antal användare som schemaläggs samtidigt). Därefter implementeras en Q- inlärningsalgoritm för att anpassa dessa resurser efter det upplevda trafikbehovet och vilolägen som BS kan växla till när den är inaktiv. Ett viloläge innebär att viss hårdvara i BS stängs av. Resultatet visar att man genom att identifiera rätt typ av BS utifrån lokala trafikförhållanden kan få energibesparingar så höga som 60%. Vidare kan EF för en BS reduceras med 57% under den tid av dygnet då trafiken är som lägst genom att ha djupare vilolägen jämfört med basscenariot utan vilolägen. Genom att implementera Q-inlärningsalgoritmen som anpassar tillgängliga radioresurser till trafikbehovet minskar den genomsnittliga EF för BS ytterligare med upp till 20%. Vinsten i EE som erhålls av algoritmen beror dock till stor del på dess konvergens, som varierar med fördelningen av användarna i cellen, topptrafikbehovet och BS tillgängliga radioresurser. Resultatet visar att genom att kombinera resursanpassning med vilolägen kan man få betydande energibesparingar under varierande trafikbelastning. För att säkerställa tillförlitligheten av de erhållna resultaten betonas emellertid behovet av att garantera konvergensen av algoritmen innan den används för resursanpassning.
|
Page generated in 1.6092 seconds