• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 64
  • 15
  • 6
  • 4
  • 3
  • 2
  • 2
  • Tagged with
  • 104
  • 104
  • 66
  • 34
  • 26
  • 25
  • 25
  • 20
  • 18
  • 17
  • 16
  • 16
  • 16
  • 16
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Coordinating transportation services in a hospital environment using Deep Reinforcement Learning

Lundström, Caroline, Hedberg, Sara January 2018 (has links)
Artificial Intelligence has in the recent years become a popular subject, many thanks to the recent progress in the area of Machine Learning and particularly to the achievements made using Deep Learning. When combining Reinforcement Learning and Deep Learning, an agent can learn a successful behavior for a given environment. This has opened the possibility for a new domain of optimization. This thesis evaluates if a Deep Reinforcement Learning agent can learn to aid transportation services in a hospital environment. A Deep Q-learning Networkalgorithm (DQN) is implemented, and the performance is evaluated compared to a Linear Regression-, a random-, and a smart agent. The result indicates that it is possible for an agent to learn to aid transportation services in a hospital environment, although it does not outperform linear regression on the most difficult task. For the more complex tasks, the learning process of the agent is unstable, and implementation of a Double Deep Q-learning Network may stabilize the process. An overall conclusion is that Deep Reinforcement Learning can perform well on these types of problems and more applied research may result in greater innovations.
52

Solu??es de coexist?ncia LTE/Wi-Fi em banda n?o licenciada

Santana, Pedro Maia de 07 December 2017 (has links)
Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2018-04-02T12:36:23Z No. of bitstreams: 1 PedroMaiaDeSantana_DISSERT.pdf: 2069422 bytes, checksum: 1a983f5131609d3cf7fe3bd96b0ef27d (MD5) / Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2018-04-04T13:34:35Z (GMT) No. of bitstreams: 1 PedroMaiaDeSantana_DISSERT.pdf: 2069422 bytes, checksum: 1a983f5131609d3cf7fe3bd96b0ef27d (MD5) / Made available in DSpace on 2018-04-04T13:34:36Z (GMT). No. of bitstreams: 1 PedroMaiaDeSantana_DISSERT.pdf: 2069422 bytes, checksum: 1a983f5131609d3cf7fe3bd96b0ef27d (MD5) Previous issue date: 2017-12-07 / Este trabalho tem como objetivo realizar um estudo sobre a aplica??o de redes LTE no espectro ISM (Industrial Scientific and Medical) e seu consequente impacto sobre tecnologias comumente coexistentes na mesma faixa de frequ?ncia. Inicialmente, ? realizada uma elucida??o te?rica sobre as regulamenta??es que envolvem o uso de espectro n?o-licenciado. Na sequ?ncia, s?o apresentadas as principais solu??es de coexist?ncia do LTE nesse meio, destacando-se o mecanismo recentemente padronizado pelo 3GPP, o LTE-LBT, e tecnologias espec?ficas de empresas pioneiras na ?rea, tais como a solu??o LTE-DC. Como elemento pr?tico complementar ? investiga??o te?rica inicial, s?o desenvolvidas an?lises de desempenho das respectivas solu??es utilizando o simulador ns-3. A novidade do trabalho ? materializada pela apresenta??o de uma proposta de solu??o para o mecanismo Carrier-Sensing Adaptive Transmission (CSAT). Essa solu??o, baseada em aprendizado de m?quina, visa melhorar o desempenho conjunto dos sistemas que coexistem na faixa ISM. Este trabalho tamb?m prop?e uma solu??o de coexist?ncia do LTE-DC consigo pr?prio a partir de uma abordagem utilizando teoria dos jogos. Essas solu??es s?o comparada com as solu??es cl?ssicas e o seus ganhos s?o evidenciado em cen?rios definidos por ?rg?os de padroniza??o mundial. / This work aims to perform a study about the application of LTE networks in ISM (Industrial Scientific and Medical) spectrum and its impact over technologies that communly coexist in the same frequency range. Initially, it?s made a theoretical elucidation about regulamentations involving the non licensed spectrum usage. In sequence, it?s presented the main LTE coexistence solutions in this field, highlighting the recent mechanism standardized by 3GPP, the LTE-LBT, and specific technologies of pioneering companies in this domain, like LTE-DC solution. As a practical element complementary to the initial theoretical investigation, it?s developed performance analyzes of the respective solutions using ns-3 simulator. The novelty of the work is materialized by the presentation of a solution proposal for the Carrier-Sensing Adaptive Transmission (CSAT). This solution, based on machine learning, aims to improve the joint performance of systems that coexist in the ISM band. This work also propose a solution for LTE-DC self-coexistence by a game theory approach. These solutions are compared to the classical ones and their gains are evidenced in scenarios defined by global standardization institutions.
53

Conception et commande d'une structure de locomotion compliante pour le franchissement d'obstacle / Design and control of a compliant locomotion structure for obstacle crossing

Bouton, Arthur 16 November 2017 (has links)
La recherche d’une locomotion performante sur des terrains accidentés constitue encore à l’heure actuelle un défi pour les systèmes robotisés de toutes sortes s’y attelant. Les robots hybrides de type “roues-pattes”, qui tentent d’allier l’efficacité énergétique des roues à l’agilité des pattes, en sont un exemple aux capacités potentiellement très prometteuses. Malheureusement, le contrôle de telles structures s’avère rapidement problématique du fait des redondances cinématiques, mais aussi et surtout de la difficulté que pose la connaissance exacte de la géométrie du sol à mesure que le robot avance. Cette thèse propose alors une réponse à la complexité des systèmes roulants reconfigurables par une approche synergique entre compliance et actionnement. Pour cela, nous proposons d’exploiter une décomposition idéalement orthogonale entre les différentes formes de compliances qui réalisent la suspension du robot. Ainsi, l’actionnement au sein de la structure est ici dédié à un contrôle des efforts verticaux s’exerçant sur les roues, tandis que les déplacements horizontaux de ces dernières sont le fait d’une raideur passive combinée à une modulation locale des vitesses d’entraînement. La posture du robot est maîtrisée via l’asservissement des forces verticales fournies par un actionnement de type série-élastique. Ceci permet de garantir une adaptation spontanée de la hauteur des roues tout en conservant l’ascendant sur la distribution de la charge. La faisabilité d’un tel système de locomotion est validée à travers un prototype reposant sur quatre “roues-pattes” compliantes. Celui-ci, entièrement conçu dans le cadre de cette étude, approche la décomposition fonctionnelle proposée tout en répondant aux contraintes de réalisation et de robustesse. Tirant parti de la décomposition fonctionnelle proposée pour la structure, deux procédés de commande sont présentés afin de réaliser le franchissement des obstacles : le premier vise à exploiter l’inertie du châssis pour réaliser une modification locale des forces verticales appliquées aux roues, tandis que le second est basé sur la sélection d’un mode de répartition des efforts adaptés à la poursuite d’une évolution quasi-statique en toutes circonstances. Pour cette dernière commande, deux méthodes de synthèse sont abordées : l’une via un algorithme d’apprentissage de type “Q-learning” et l’autre par détermination de règles expertes paramétrées. Ces commandes, validées par des simulations dynamiques dans des situations variées, se basent exclusivement sur des données proprioceptives accessibles immédiatement par la mesure des variables articulaires de la structure. De cette manière, le robot réagit directement au contact des obstacles, sans avoir besoin de connaître à l’avance la géométrie du sol. / Performing an efficient locomotion on rough terrains is still a challenge for robotic systems of all kinds. “Wheel-on-leg” robots that try to combine energy efficiency of wheels with leg agility are an example with potentially very promising capabilities. Unfortunately, control of such structures turns out to be problematic because of the kinematic redundanciesand, above all, the difficulty of precisely evaluating the ground geometry as the robot advances. This thesis proposes a solution to the complexity of reconfigurable rolling systems by a synergic approach between compliance and actuation.To this purpose, we propose to exploit an ideally orthogonal decomposition between the different movements enabled by the robot suspension due to compliant elements. Then, the structure actuation is here dedicated to controlling the vertical forces applied on wheels, while the horizontal wheel displacements are due to a passive stiffness combined with a local modulation of wheel speed. The robot posture is controlled through the vertical forces servoing provided by a series elastic actuation. This ensures a spontaneous adaptation of wheel heights while keeping the control on load distribution. The feasibility of such a locomotion system is validated through a prototype based on four compliant “wheel-legs”. Entirely conceived as part of this study, this one approximates the proposed functional decomposition while meeting the realization and robustness constraints. We also present two control methods that take advantage of the functional decompositionproposed for the structure in order to cross obstacles. The first one aims to exploit the chassis inertia in order to perform a local modification of the vertical forces applied on wheels, while the second one is based on the selection of proper ways of distributing forces in order to be able to pursue a quasi-static advance in all circumstances. Two approaches are given for the production of the last control : either with a “Q-learning” algorithm or by determining parameterized expert rules. Validated by dynamic simulations in various situations, these controls rely only on proprioceptive data immediately provided by the measurement of articular variables. This way, the robot directly reacts when it touches obstacles, without having to know the ground geometry in advance.
54

Solution Of Delayed Reinforcement Learning Problems Having Continuous Action Spaces

Ravindran, B 03 1900 (has links) (PDF)
No description available.
55

Implementing an OpenAI Gym for Machine Learning of Microgrid Electricity Trading

Lundholm, André January 2021 (has links)
Samhället går idag bort från centraliserad energi mot decentraliserade system. Istället för att köpa från stora företag som skapar el från fossila bränslen har många förnybara alternativ kommit. Eftersom konsumenter kan generera solenergi med solpaneler kan de också bli producenter. Detta skapar en stor marknad för handel av el mellan konsumenter i stället för företag. Detta skapar ett så kallat mikronät. Syftet med denna avhandling är att hitta en lösning för att köpa och sälja på dessa mikronät. Genom att använda en Q-learning-lösning med OpenAI Gym-verktygslådan och en mikronätsimulering syftar denna avhandling till att svara på följande frågor: I vilken utsträckning kan Qlearning användas för att köpa och sälja energi i ett mikrosystem, hur lång tid tar det köp och sälj algoritm för att träna och slutligen påverkar latens genomförbarheten av Q-learning för mikronät. För att svara på dessa frågor måste jag mäta latens och utbildningstid för Q-learninglösningen. En neural nätverkslösning skapades också för att jämföra med Q-learning-lösningen. Från dessa resultat kunde jag säga att en del av det inte var så tillförlitligt, men vissa slutsatser kunde fortfarande göras. För det första är den utsträckning som Q-learning kan användas för att köpa och sälja ganska bra om man bara tittar på noggrannhetsresultaten på 97%, men detta sitter på mikronätets simulering för att vara korrekt. Hur lång tid det tar att köpa och sälja algoritm för att träna uppmättes till cirka 12 sekunder. Latensen anses vara noll med Q-learning-lösningen, så den har stor genomförbarhet. Genom dessa frågor kan jag dra slutsatsen att en Q-learning OpenAI Gym-lösning är genomförbart. / Society is today moving away from centralized power towards decentralized systems. Instead of buying from large companies that create electricity from fossil fuels, many renewable alternatives have arrived. Since consumers can generate solar power with solar panels, they can also become the producers. This creates a large market for trading electricity between consumer instead of companies. This creates a so called microgrid. The purpose of this thesis is to find a solution to buying and selling on these microgrids. By using a Q-learning solution with the OpenAI Gym toolkit and a microgrid simulation this thesis aims to answer the following questions: To what extent can Q-learning be used to buy and sell energy in a microgrid system, how long does it take the buy and sell algorithm to train and finally does latency affect the feasibility of Q-learning for microgrids. To answer these questions, I must measure the latency and training time of the Q-learning solution. A neural network solution was also created to compare to the Q-learning solution. From these results I could tell some of it was not that reliable, but some conclusions could still be made. First, the extent that Q-learning can be used to buy and sell is quite great if just looking at the accuracy results of 97%, but this is on the microgrid simulation to be correct. How long it takes to buy and sell algorithm to train was measured to about 12 seconds. The latency is considered zero with the Q-learning solution, so it has great feasibility. Through these questions I can conclude that a Qlearning OpenAI Gym solution is a viable one.
56

Plánování cesty robotu pomocí posilovaného učení / Robot path planning by means of reinforcement learning

Veselovský, Michal January 2013 (has links)
This thesis is dealing with path planning for autonomous robot in enviromenment with static obstacles. Thesis includes analysis of different approaches for path planning, description of methods utilizing reinforcement learning and experiments with them. Main outputs of thesis are working algorithms for path planning based on Q-learning, verifying their functionality and mutual comparison.
57

A Modified Q-Learning Approach for Predicting Mortality in Patients Diagnosed with Sepsis

Dunn, Noah M. 15 April 2021 (has links)
No description available.
58

Where not what: the role of spatial-motor processing in decision-making

Banks, Parker January 2021 (has links)
Decision-making is comprised of an incredibly varied set of behaviours. However, all vertebrates tend to repeat previously rewarding actions and avoid those that have led to loss, behaviours known collectively as the win-stay, lose-shift strategy. This response strategy is supported by the sensorimotor striatum and nucleus accumbens, structures also implicated in spatial processing and the integration of sensory information in order to guide motor action. Therefore, choices may be represented as spatial-motor actions whose value is determined by the rewards and punishments associated with that action. In this dissertation I demonstrate that the location of choices relative to previous rewards and punishments, rather than their identities, determines their value. Chapters 2 and 4 demonstrate that the location of rewards and punishments drives future decisions to win-stay or lose-shift towards that location. Even when choices differ in colour or shape, choice value is determined by location, not visual identity. Chapter 3 compares decision-making when two, six, twelve, or eighteen choices are present, finding that the value of a win or loss is not tied to a single location, but is distributed throughout the choice environment. Finally, Chapter 5 provides anatomical support for the spatial-motor basis of choice. Specifically, win-stay responses are associated with greater oscillatory activity than win-shift responses in the motor cortex corresponding to the hand used to make a choice, whereas lose-shift responses are accompanied by greater activation of frontal systems compared to lose-stay responses. The win-stay and lose-shift behaviours activate structures known to project to different regions of the striatum. Overall, this dissertation provides behavioural evidence that choice location, not visual identity, determines choice value. / Thesis / Doctor of Philosophy (PhD)
59

Deep Q Learning with a Multi-Level Vehicle Perception for Cooperative Automated Highway Driving

Hamilton, Richard January 2021 (has links)
Autonomous vehicles, commonly known as “self-driving cars”, are increasingly becoming of interest for researchers due to their potential to mitigate traffic accidents and congestion. Using reinforcement learning, previous research has demonstrated that a DQN agent can be trained to effectively navigate a simulated two-lane environment via cooperative driving, in which a model of V2X technology allows an AV to receive information from surrounding vehicles (termed Primary Perceived Vehicles) to make driving decisions. Results have demonstrated that the DQN agent can learn to navigate longitudinally and laterally, but with a prohibitively high collision rate of 1.5% - 4.8% and an average speed of 13.4 m/s. In this research, the impact of including information from traffic vehicles that are outside of those that immediately surround the AV (termed Secondary Perceived Vehicles) as inputs to a DQN agent is investigated. Results indicate that while including velocity and distance information from SPVs does not improve the collision rate and average speed of the driving algorithm, it does yield a lower standard deviation of speed during episodes, indicating lower acceleration. This effect, however, is lost when the agent is tested under constant traffic flow scenarios (as opposed to fluctuating driving conditions). Taken together, it is concluded that while the SPV inclusion does not have an impact on collision rate and average speed, its ability to achieve the same performance with lower acceleration can significantly improve fuel economy and drive quality. These findings give a better understanding of how additional vehicle information during cooperative driving affects automated driving. / Thesis / Master of Applied Science (MASc)
60

Deep Reinforcement Learning for Card Games

Tegnér Mohringe, Oscar, Cali, Rayan January 2022 (has links)
This project aims to investigate how reinforcement learning (RL) techniques can be applied to the card game LimitTexas Hold’em. RL is a type of machine learning that can learn to optimally solve problems that can be formulated according toa Markov Decision Process.We considered two different RL algorithms, Deep Q-Learning(DQN) for its popularity within the RL community and DeepMonte-Carlo (DMC) for its success in other card games. With the goal of investigating how different parameters affect their performance and if possible achieve human performance.To achieve this, a subset of the parameters used by these methods were varied and their impact on the overall learning performance was investigated. With both DQN and DMC we were able to isolate parameters that had a significant impact on the performance.While both methods failed to reach human performance, both showed obvious signs of learning. The DQN algorithm’s biggest flaw was that it tended to fall into simplified strategies where it would stick to using only one action. The pitfall for DMC was the fact that the algorithm has a high variance and therefore needs a lot of samples to train. However, despite this fallacy,the algorithm has seemingly developed a primitive strategy. We believe that with some modifications to the methods, better results could be achieved. / Detta projekt strävar efter att undersöka hur olika Förstärkningsinlärning (RL) tekniker kan implementeras för kortspelet Limit Texas Hold’Em. RL är en typ av maskininlärning som kan lära sig att optimalt lösa problem som kan formuleras enligt en markovbeslutsprocess. Vi betraktade två olika algoritmer, Deep Q-Learning (DQN) som valdes för sin popularitet och Deep Monte-Carlo (DMC) valdes för dess tidigare framgång i andra kortspel. Med målet att undersöka hur olika parametrar påverkar inlärningsprocessen och om möjligt uppnå mänsklig prestanda. För att uppnå detta så valdes en delmängd av de parametrar som används av dessa metoder. Dessa ändrades successivt för att sedan mäta dess påverkan på den övergripande inlärningsprestandan. Med både DQN och DMC så lyckades vi isolera parametrar som hade en signifikant påverkan på prestandan. Trots att båda metoderna misslyckades med att uppnå mänsklig prestanda så visade båda tecken på upplärning. Det största problemet med DQN var att metoden tenderade att fastna i enkla strategier där den enbart valde ett drag. För DMC så låg problemet i att metoden har en hög varians vilket innebär att metoden behöver mycket tid för att tränas upp. Dock så lyckades ändå metoden utveckla en primitiv strategi. Vi tror att metoder med ett par modifikationer skulle kunna nå ett bättre resultat. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm

Page generated in 0.1347 seconds