• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 711
  • 81
  • 70
  • 22
  • 11
  • 9
  • 8
  • 7
  • 7
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 1148
  • 1148
  • 282
  • 241
  • 222
  • 196
  • 176
  • 167
  • 167
  • 164
  • 155
  • 136
  • 130
  • 128
  • 120
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
391

Neural network control of nonstrict feedback and nonaffine nonlinear discrete-time systems with application to engine control

Vance, Jonathan Blake, January 2007 (has links) (PDF)
Thesis (Ph. D.)--University of Missouri--Rolla, 2007. / Vita. The entire thesis text is included in file. Title from title screen of thesis/dissertation PDF file (viewed March 26, 2008) Includes bibliographical references.
392

PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates

Pazis, Jason January 2015 (has links)
<p>As the reinforcement learning community has shifted its focus from heuristic methods to methods that have performance guarantees, PAC-optimal exploration algorithms have received significant attention. Unfortunately, the majority of current PAC-optimal exploration algorithms are inapplicable in realistic scenarios: 1) They scale poorly to domains of realistic size. 2) They are only applicable to discrete state-action spaces. 3) They assume that experience comes from a single, continuous trajectory. 4) They assume that value function updates are instantaneous. The goal of this work is to bridge the gap between theory and practice, by introducing an efficient and customizable PAC optimal exploration algorithm, that is able to explore in multiple, continuous or discrete state MDPs simultaneously. Our algorithm does not assume that value function updates can be completed instantaneously, and maintains PAC guarantees in realtime environments. Not only do we extend the applicability of PAC optimal exploration algorithms to new, realistic settings, but even when instant value function updates are possible, our bounds present a significant improvement over previous single MDP exploration bounds, and a drastic improvement over previous concurrent PAC bounds. We also present Bellman error MDPs, a new analysis methodology for online and offline reinforcement learning algorithms, and TCE, a new, fine grained metric for the cost of exploration.</p> / Dissertation
393

Developing basic soccer skills using reinforcement learning for the RoboCup small size league

Yoon, Moonyoung 03 1900 (has links)
Thesis (MSc)--Stellenbosch University, 2015. / ENGLISH ABSTRACT: This study has started as part of a research project at Stellenbosch University (SU) that aims at building a team of soccer-playing robots for the RoboCup Small Size League (SSL). In the RoboCup SSL the Decision- Making Module (DMM) plays an important role for it makes all decisions for the robots in the team. This research focuses on the development of some parts of the DMM for the team at SU. A literature study showed that the DMM is typically developed in a hierarchical structure where basic soccer skills form the fundamental building blocks and high-level team behaviours are implemented using these basic soccer skills. The literature study also revealed that strategies in the DMM are usually developed using a hand-coded approach in the RoboCup SSL domain, i.e., a specific and fixed strategy is coded, while in other leagues a Machine Learning (ML) approach, Reinforcement Learning (RL) in particular, is widely used. This led to the following research objective of this thesis, namely to develop basic soccer skills using RL for the RoboCup Small Size League. A second objective of this research is to develop a simulation environment to facilitate the development of the DMM. A high-level simulator was developed and validated as a result. The temporal-difference value iteration algorithm with state-value functions was used for RL, along with a Multi-Layer Perceptron (MLP) as a function approximator. Two types of important soccer skills, namely shooting skills and passing skills were developed using the RL and MLP combination. Nine experiments were conducted to develop and evaluate these skills in various playing situations. The results showed that the learning was very effective, as the learning agent executed the shooting and passing tasks satisfactorily, and further refinement is thus possible. In conclusion, RL combined with MLP was successfully applied in this research to develop two important basic soccer skills for robots in the RoboCup SSL. These form a solid foundation for the development of a complete DMM along with the simulation environment established in this research. / AFRIKAANSE OPSOMMING: Hierdie studie het ontstaan as deel van 'n navorsingsprojek by Stellenbosch Universiteit wat daarop gemik was om 'n span sokkerrobotte vir die RoboCup Small Size League (SSL) te ontwikkel. Die besluitnemingsmodule (BM) speel 'n belangrike rol in die RoboCup SSL, aangesien dit besluite vir die robotte in die span maak. Hierdie navorsing fokus op ontwikkeling van enkele komponente van die BM vir die span by SU. 'n Literatuurstudie het getoon dat die BM tipies ontwikkel word volgens 'n hiërargiese struktuur waarin basiese sokkervaardighede die fundamentele boublokke vorm en hoëvlak spangedrag word dan gerealiseer deur hierdie basiese vaardighede te gebruik. Die literatuur het ook getoon dat strategieë in die BM van die RoboCup SSL domein gewoonlik ontwikkel word deur 'n hand-gekodeerde benadering, dit wil s^e, 'n baie spesifieke en vaste strategie word gekodeer, terwyl masjienleer (ML) en versterkingsleer (VL) wyd in ander ligas gebruik word. Dit het gelei tot die navorsingsdoelwit in hierdie tesis, naamlik om basiese sokkervaardighede vir robotte in die RoboCup SSL te ontwikkel. 'n Tweede doelwit was om 'n simulasie-omgewing te ontwikkel wat weer die ontwikkeling van die BM sou fasiliteer. Hierdie simulator is suksesvol ontwikkel en gevalideer. Die tydwaarde-verskil iterariewe algoritme met toestandwaarde-funksies is gebruik vir VL saam met 'n multi-laag perseptron (MLP) vir funksiebenaderings. Twee belangrike sokkervaardighede, naamlik doelskop- en aangeevaardighede is met hierdie kombinasie van VL en MLP ontwikkel. Nege eksperimente is uitgevoer om hierdie vaardighede in verskillende speelsituasies te ontwikkel en te evalueer. Volgens die resultate was die leerproses baie effektief, aangesien die leer-agent die doelskiet- en aangeetake bevredigend uitgevoer het, en verdere verfyning is dus moontlik. Die gevolgtrekking is dat VL gekombineer met MLP suksesvol toegepas is in hierdie navorsingswerk om twee belangrike, basiese sokkervaardighede vir robotte in die RoboCup SSL te ontwikkel. Dit vorm 'n sterk fondament vir die ontwikkeling van 'n volledige BM tesame met die simulasie-omgewing wat in hierdie werk daargestel is.
394

Co-aprendizado entre motoristas e controladores semafóricos em simulação microscópica de trânsito / Co-learning between drivers and traffic lights in microscopic traffic simulation

Lemos, Liza Lunardi January 2018 (has links)
Um melhor uso da infraestrutura da rede de transporte é um ponto fundamental para atenuar os efeitos dos congestionamentos no trânsito. Este trabalho utiliza aprendizado por reforço multiagente (MARL) para melhorar o uso da infraestrutura e, consequentemente, mitigar tais congestionamentos. A partir disso, diversos desafios surgem. Primeiro, a maioria da literatura assume que os motoristas aprendem (semáforos não possuem nenhum tipo de aprendizado) ou os semáforos aprendem (motoristas não alteram seus comportamentos). Em segundo lugar, independentemente do tipo de classe de agentes e do tipo de aprendizado, as ações são altamente acopladas, tornando a tarefa de aprendizado mais difícil. Terceiro, quando duas classes de agentes co-aprendem, as tarefas de aprendizado de cada agente são de natureza diferente (do ponto de vista do aprendizado por reforço multiagente). Finalmente, é utilizada uma modelagem microscópica, que modela os agentes com um alto nível de detalhes, o que não é trivial, pois cada agente tem seu próprio ritmo de aprendizado. Portanto, este trabalho não propõe somente a abordagem de co-aprendizado em agentes que atuam em ambiente compartilhado, mas também argumenta que essa tarefa precisa ser formulada de forma assíncrona. Além disso, os agentes motoristas podem atualizar os valores das ações disponíveis ao receber informações de outros motoristas. Os resultados mostram que a abordagem proposta, baseada no coaprendizado, supera outras políticas em termos de tempo médio de viagem. Além disso, quando o co-aprendizado é utilizado, as filas de veículos parados nos semáforos são menores. / A better use of transport network infrastructure is a key point in mitigating the effects of traffic congestion. This work uses multiagent reinforcement learning (MARL) to improve the use of infrastructure and, consequently, to reduce such congestion. From this, several challenges arise. First, most literature assumes that drivers learn (traffic lights do not have any type of learning) or the traffic lights learn (drivers do not change their behaviors). Second, regardless of the type of agent class and the type of learning, the actions are highly coupled, making the learning task more difficult. Third, when two classes of agents co-learn, the learning tasks of each agent are of a different nature (from the point of view of multiagent reinforcement learning). Finally, a microscopic modeling is used, which models the agents with a high level of detail, which is not trivial, since each agent has its own learning pace. Therefore, this work does not only propose the co-learnig approach in agents that act in a shared environment, but also argues that this taks needs to be formulated asynchronously. In addtion, driver agents can update the value of the available actions by receiving information from other drivers. The results show that the proposed approach, based on co-learning, outperforms other policies regarding average travel time. Also, when co-learning is use, queues of stopped vehicles at traffic lights are lower.
395

“Vanilla, Vanilla .but what about Pistachio?” A Computational Cognitive Clinical Neuroscience Approach to the Exploration-Exploitation Dilemma

Cogliati Dezza, Irene 28 November 2018 (has links) (PDF)
On the 24th November of 1859, Charles Darwin published the first edition of The Origin of Species. One hundred fifty-nine years later, our understanding of human and animal adaptation to the surrounding environment remains a major scientific challenge. How do humans and animals generate apt decision strategies in order to achieve this adaptation? How does their brain efficiently carry out complex computations in order to produce such adaptive behaviors? Although an exhaustive answer to these questions continues to feel out of reach, the investigation of adaptive processing results relevant in understanding mind/brain relationship and in elucidating scenarios where mind/brain interactions are corrupted such as in psychiatric disorders. Additionally, understanding how the brain efficiently scales problems when producing complex and adaptive behaviors can inspire and contribute to resolve Artificial Intelligence (AI) problems (e.g. scaling problems, generalization etc.) and consequently to the develop intelligent machines. During my PhD, I investigated adaptive behaviors at behavioral, cognitive, and neural level. I strongly believe that, as Marr already pointed out, in order to understand how our brain-machine works we need to investigate the phenomenon from 3 different levels: behavioral, algorithm and neural implementation. For this reason, throughout my doctoral work I took advantages of computational modeling methods together with cognitive neuroscience techniques in order to investigate the underlying mechanisms of adaptive behaviors. / Doctorat en Sciences psychologiques et de l'éducation / info:eu-repo/semantics/nonPublished
396

Safety-aware apprenticeship learning

Zhou, Weichao 03 July 2018 (has links)
It is well acknowledged in the AI community that finding a good reward function for reinforcement learning is extremely challenging. Apprenticeship learning (AL) is a class of “learning from demonstration” techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent uses inverse reinforcement learning (IRL) methods to recover expert policy from a set of expert demonstrations. However, as the agent learns exclusively from observations, given a constraint on the probability of the agent running into unwanted situations, there is no verification, nor guarantee, for the learnt policy on the satisfaction of the restriction. In this dissertation, we study the problem of how to guide AL to learn a policy that is inherently safe while still meeting its learning objective. By combining formal methods with imitation learning, a Counterexample-Guided Apprenticeship Learning algorithm is proposed. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure both safety and performance of the learnt policy. This algorithm guarantees that given some formal safety specification defined by probabilistic temporal logic, the learnt policy shall satisfy this specification. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.
397

Do contingency-conflicting elements drop out of equivalence classes? Re-testing Sidman's (2000) theory

Silguero, Russell V. 12 1900 (has links)
Sidman's (2000) theory of stimulus equivalence states that all positive elements in a reinforcement contingency enter an equivalence class. The theory also states that if an element from an equivalence class conflicts with a programmed reinforcement contingency, the conflicting element will drop out of the equivalence class. Minster et al. (2006) found evidence suggesting that a conflicting element does not drop out of an equivalence class. In an effort to explain maintained accuracy on programmed reinforcement contingencies, the authors seem to suggest that participants will behave in accordance with a particular partitioning of the equivalence class which continues to include the conflicting element. This hypothesis seems to explain their data well, but their particular procedures are not a good test of the notion of "dropping out" due to the pre-establishment of equivalence classes before the conflicting member entered the class. The current experiment first developed unpartitioned equivalence classes and only later exposed participants to reinforcement contingencies that conflicted with pre-established equivalence classes. The results are consistent with the notion that a partition developed such that the conflicting element had dropped out of certain subclasses of the original equivalence class. The notion of a partitioning of an equivalence class seems to provide a fuller description of the phenomenon Sidman (1994, 2000) described as "dropping out" of an equivalence class.
398

User experience driven CPU frequency scaling on mobile devices : towards better energy efficiency

Seeker, Volker Günter January 2017 (has links)
With the development of modern smartphones, mobile devices have become ubiquitous in our daily lives. With high processing capabilities and a vast number of applications, users now need them for both business and personal tasks. Unfortunately, battery technology did not scale with the same speed as computational power. Hence, modern smartphone batteries often last for less than a day before they need to be recharged. One of the most power hungry components is the central processing unit (CPU). Multiple techniques are applied to reduce CPU energy consumption. Among them is dynamic voltage and frequency scaling (DVFS). This technique reduces energy consumption by dynamically changing CPU supply voltage depending on the currently running workload. Reducing voltage, however, also makes it necessary to reduce the clock frequency, which can have a significant impact on task performance. Current DVFS algorithms deliver a good user experience, however, as experiments conducted later in this thesis will show, they do not deliver an optimal energy efficiency for an interactive mobile workload. This thesis presents methods and tools to determine where energy can be saved during mobile workload execution when using DVFS. Furthermore, an improved DVFS technique is developed that achieves a higher energy efficiency than the current standard. One important question when developing a DVFS technique is: How much can you slow down a task to save energy before the negative effect on performance becomes intolerable? The ultimate goal when optimising a mobile system is to provide a high quality of experience (QOE) to the end user. In that context, task slowdowns become intolerable when they have a perceptible effect on QOE. Experiments conducted in this thesis answer this question by identifying workload periods in which performance changes are directly perceptible by the end user and periods where they are imperceptible, namely interaction lags and interaction idle periods. Interaction lags are the time it takes the system to process a user interaction and display a corresponding response. Idle periods are the periods between interactions where the user perceives the system as idle and ready for the next input. By knowing where those periods are and how they are affected by frequency changes, a more energy efficient DVFS governor can be developed. This thesis begins by introducing a methodology that measures the duration of interaction lags as perceived by the user. It uses them as an indicator to benchmark the quality of experience for a workload execution. A representative benchmark workload is generated comprising 190 minutes of interactions collected from real users. In conjunction with this QOE benchmark, a DVFS Oracle study is conducted. It is able to find a frequency profile for an interactive mobile workload which has the maximum energy savings achievable without a perceptible performance impact on the user. The developed Oracle performance profile achieves a QOE which is indistinguishable from always running on the fastest frequency while needing 45% less energy. Furthermore, this Oracle is used as a baseline to evaluate how well current mobile frequency governors are performing. It shows that none of these governors perform particularly well and up to 32% energy savings are possible. Equipped with a benchmark and an optimisation baseline, a user perception aware DVFS technique is developed in the second part of this thesis. Initially, a runtime heuristic is introduced which is able to detect interaction lags as the user would perceive them. Using this heuristic, a reinforcement learning driven governor is developed which is able to learn good frequency settings for interaction lag and idle periods based on sample observations. It consumes up to 22% less energy than current standard governors on mobile devices, and maintains a low impact on QOE.
399

Desenvolvimento de um framework para utilização do GR-Learning em problemas de otimização combinatória

Silva, Alexsandro Trindade Sales da 20 July 2016 (has links)
Submitted by Lara Oliveira (lara@ufersa.edu.br) on 2017-04-17T22:15:28Z No. of bitstreams: 1 AlexsandroTSS_DISSERT.pdf: 2805904 bytes, checksum: d89eac5a3d1bbff746e28effc0f94ba8 (MD5) / Approved for entry into archive by Vanessa Christiane (referencia@ufersa.edu.br) on 2017-04-26T12:14:12Z (GMT) No. of bitstreams: 1 AlexsandroTSS_DISSERT.pdf: 2805904 bytes, checksum: d89eac5a3d1bbff746e28effc0f94ba8 (MD5) / Approved for entry into archive by Vanessa Christiane (referencia@ufersa.edu.br) on 2017-04-26T12:16:44Z (GMT) No. of bitstreams: 1 AlexsandroTSS_DISSERT.pdf: 2805904 bytes, checksum: d89eac5a3d1bbff746e28effc0f94ba8 (MD5) / Made available in DSpace on 2017-04-26T12:19:14Z (GMT). No. of bitstreams: 1 AlexsandroTSS_DISSERT.pdf: 2805904 bytes, checksum: d89eac5a3d1bbff746e28effc0f94ba8 (MD5) Previous issue date: 2016-07-20 / The use of metaheuristics for solving combinatorial optimization problems belong to NP-Hard class is becoming increasingly common, and second Temponi (2007 apud RIBEIRO, 1996) a metaheurist should be modeled according to the problem she was designed to solve. This most often requires many changes when you have to apply the same metaheuristic to various types of combinatorial optimization problems. In this work we propose a framework for use of a hybrid metaheuristic proposed by Almeida (2014) who used the GRASP Reactive along with a reinforcement learning technique (called GR-learning). Specifically, the Q-learning algorithm that was used to learn over which the iterations value for the parameter α (alpha) used during the construction phase of GRASP. The GR-Learning was used to solve the problem of p-centers applied to Public Security in the city of Mossoró/RN. To validate the effectiveness of the framework proposed it was used to solve two classical problems of combinatorial optimi-zation: The Hub Location Problem (HLP), and the Cutting Stock Problem (CSP). To validate the results obtained we used instances with results known in the literature and in addition has created an instance with data from the Brazilian airline industry. The results showed that the proposed framework was quite competitive when compared to other results of different algo-rithms known in the literature as got great value in almost all instances of HLP as well as new values (better than those obtained with other algorithms known in the literature) for some ins-tances of CSP / A utilização de metaheurísticas para resolução de problemas de otimização combinatória per-tencentes à classe NP-Difícil vem se tornando cada vez mais comum, e segundo Temponi (2007 apud RIBEIRO, 1996) uma metaheurística deve ser modelada de acordo com o proble-ma que ela foi projetada para resolver. Isto na maioria vezes requer muitas alterações quando se tem que aplicar uma mesma metaheurística a diversos tipos de problemas de otimização combinatória. Neste trabalho foi proposto um framework para utilização de uma metaheurísti-ca híbrida proposta por Almeida (2014) que utilizou a metaheurística GRASP Reativo junta-mente com uma técnica de aprendizagem por reforço (denominada GR-Learning). Especifi-camente, o algoritmo Q-learning, que foi utilizado para aprender com o passar das iterações qual valor para o parâmetro α (alfa) utilizar durante a fase de construção da GRASP. O GR-Learning foi utilizado para resolver o problema dos p-Centros aplicado a Segurança Pública na Cidade de Mossoró/RN. Para validar a eficácia do framework proposto o mesmo foi utili-zado para resolver dois problemas clássicos de otimização combinatória: O Problema de Lo-calização de Hubs (do inglês Hub Location Problem - HLP) e o Problema de Corte e Estoque – PCE (do inglês Cutting Stock Problem - CSP). Para validação dos resultados obtidos foram utilizadas instâncias com resultados já conhecidos na literatura e adicionalmente foi criada uma instância com dados do setor aeroviário Brasileiro. Os resultados obtidos mostraram que o framework proposto foi bastante competitivo quando comparado a outros resultados de di-versos algoritmos já conhecidos na literatura, pois obteve o valor ótimo em quase todas as instâncias do HLP como também novos valores (melhores que os obtidos com outros algorit-mos já conhecido na literatura) para algumas instâncias do CSP / 2017-04-17
400

Aprendizado em sistemas multiagente através de coordenação oportunista. / Towards joint learning in multiagent systems through oppotunistic coordination

Oliveira, Denise de January 2009 (has links)
O tamanho da representação de ações e estados conjuntos é um fator chave que limita o uso de algoritmos de apendizado por reforço multiagente em problemas complexos. Este trabalho propõe o opportunistic Coordination Learning (OPPORTUNE), um método de aprendizado por reforço multiagente para lidar com grandes cenários. Visto que uma solução centralizada não é praticável em grandes espaços de estado-ação, um modode reduzir a complexidade do problema é decompô-lo em subproblemas utilizando cooperação entre agentes independentes em algumas partes do ambiente. No método proposto, agentes independentes utilizam comunicação e um mecanismo de cooperação que permite que haja expansão de suas percepções sobre o ambiente e para que executem ações cooperativas apenas quando é melhor que agir de modo individual. O OPPORTUNE foi testado e comparado em dois cenários: jogo de perseguição e controle de tráfego urbano. / The size of the representation of joint states and actions is a key factor that limits the use oh standard multiagent reinforcement learning algorithms in complex problems. This work proposes opportunistic Coordination Learning (OPPORTUNE), a multiagent reinforcement learning method to cope with large scenarios. Because a centralized solution becomes impratical in large state-action spaces, one way of reducing the complexity is to decompose the problem into sub-problems using cooperation between independent agents in some parts of the environment. In the proposed method, independent agents use communication and cooperation mechanism allowing them to extended their perception of the environment and to perform cooperative actions only when this is better than acting individually. OPPORTUNE was tested and compared in twm scenarios: pursuit game and urban traffic control.

Page generated in 0.1309 seconds