Global ETD Search

11	Reinforcement Programming: A New Technique in Automatic Algorithm Development White, Spencer Kesson 03 July 2006 (has links) (PDF) Reinforcement programming is a new technique for using computers to automatically create algorithms. By using the principles of reinforcement learning and Q-learning, reinforcement programming learns programs based on example inputs and outputs. State representations and actions are provided. A transition function and rewards are defined. The system is trained until the system converges on a policy that can be directly implemented as a computer program. The efficiency of reinforcement programming is demonstrated by comparing a generalized in-place iterative sort learned through genetic programming to a sorting algorithm of the same type created using reinforcement programming. The sort learned by reinforcement programming is a novel algorithm. Reinforcement programming is more efficient and provides a more effective solution than genetic programming in the cases attempted. As additional examples, reinforcement programming is used to learn three binary addition problems. Computer Sciences
12	Predicting Mutational Pathways of Influenza A H1N1 Virus using Q-learning Aarathi Raghuraman, FNU 13 August 2021 (has links) Influenza is a seasonal viral disease affecting over 1 billion people annually around the globe, as reported by the World Health Organization (WHO). The influenza virus has been around for decades causing multiple pandemics and encouraging researchers to perform extensive analysis of its evolutionary patterns. Current research uses phylogenetic trees as the basis to guide population genetics and other phenotypic characteristics when describing the evolution of the influenza genome. Phylogenetic trees are one form of representing the evolutionary trends of sequenced genomes, but that do not capture the multidimensional complexity of mutational pathways. We suggest representing antigenic drifts within influenza A/H1N1 hemagglutinin (HA) protein as a graph, $G = (V, E)$, where $V$ is the set of vertices representing each possible sequence and $E$ is the set of edges representing single amino acid substitutions. Each transition is characterized by a Malthusian fitness model incorporating the genetic adaptation, vaccine similarity, and historical epidemiological response using mortality as the metric where available. Applying reinforcement learning with the vertices as states, edges as actions, and fitness as the reward, we learn the high likelihood mutational pathways and optimal policy, without exploring the entire space of the graph, $G$. Our average predicted versus actual sequence distance of $3.6 \pm 1.2$ amino acids indicates that our novel approach of using naive Q-learning can assist with influenza strain predictions, thus improving vaccine selection for future disease seasons. / Master of Science / Influenza is a seasonal virus affecting over 1 billion people annually around the globe, as reported by the World Health Organization (WHO). The effectiveness of influenza vaccines varies tremendously by the type (A, B, C or D) and season. Of note is the pandemic of 2009, where the influenza A H1N1 virus mutants were significantly different from the chosen vaccine composition. It is pertinent to understand and predict the underlying genetic and environmental behavior of influenza virus mutants to be able to determine the vaccine composition for future seasons, preventing another pandemic. Given the recent 2020 COVID-19 pandemic, which is also a virus that affects the upper respiratory system, novel approaches to predict viruses need to be investigated now more than ever. Thus, in this thesis, I develop a novel approach to predicting a portion of the influenza A H1N1 viruses using machine learning. Fitness Mutational Pathways Mutational Paths Reinforcement Learning Q-Learning
13	Machine Learning Simulation: Torso Dynamics of Robotic Biped Renner, Michael Robert 22 August 2007 (has links) Military, Medical, Exploratory, and Commercial robots have much to gain from exchanging wheels for legs. However, the equations of motion of dynamic bipedal walker models are highly coupled and non-linear, making the selection of an appropriate control scheme difficult. A temporal difference reinforcement learning method known as Q-learning develops complex control policies through environmental exploration and exploitation. As a proof of concept, Q-learning was applied through simulation to a benchmark single pendulum swing-up/balance task; the value function was first approximated with a look-up table, and then an artificial neural network. We then applied Evolutionary Function Approximation for Reinforcement Learning to effectively control the swing-leg and torso of a 3 degree of freedom active dynamic bipedal walker in simulation. The model began each episode in a stationary vertical configuration. At each time-step the learning agent was rewarded for horizontal hip displacement scaled by torso altitude--which promoted faster walking while maintaining an upright posture--and one of six coupled torque activations were applied through two first-order filters. Over the course of 23 generations, an approximation of the value function was evolved which enabled walking at an average speed of 0.36 m/s. The agent oscillated the torso forward then backward at each step, driving the walker forward for forty-two steps in thirty seconds without falling over. This work represents the foundation for improvements in anthropomorphic bipedal robots, exoskeleton mechanisms to assist in walking, and smart prosthetics. / Master of Science Dynamic Bipedal Walking Reinforcement Learning Q-Learning Torso NEAT+Q
14	Uma contribui??o ? solu??o do problema dos k-servos usando aprendizagem por refor?o Lima J?nior, Manoel Leandro de 06 April 2005 (has links) Made available in DSpace on 2014-12-17T14:55:59Z (GMT). No. of bitstreams: 1 ManoelLJ.pdf: 474615 bytes, checksum: 061ee02f4ad5cc23a561d346dd73a9da (MD5) Previous issue date: 2005-04-06 / Neste trabalho ? proposto um novo algoritmo online para o resolver o Problema dos k-Servos (PKS). O desempenho desta solu??o ? comparado com o de outros algoritmos existentes na literatura, a saber, os algoritmos Harmonic e Work Function, que mostraram ser competitivos, tornando-os par?metros de compara??o significativos. Um algoritmo que apresente desempenho eficiente em rela??o aos mesmos tende a ser competitivo tamb?m, devendo, obviamente, se provar o referido fato. Tal prova, entretanto, foge aos objetivos do presente trabalho. O algoritmo apresentado para a solu??o do PKS ? baseado em t?cnicas de aprendizagem por refor?o. Para tanto, o problema foi modelado como um processo de decis?o em m?ltiplas etapas, ao qual ? aplicado o algoritmo Q-Learning, um dos m?todos de solu??o mais populares para o estabelecimento de pol?ticas ?timas neste tipo de problema de decis?o. Entretanto, deve-se observar que a dimens?o da estrutura de armazenamento utilizada pela aprendizagem por refor?o para se obter a pol?tica ?tima cresce em fun??o do n?mero de estados e de a??es, que por sua vez ? proporcional ao n?mero n de n?s e k de servos. Ao se analisar esse crescimento (matematicamente, ) percebe-se que o mesmo ocorre de maneira exponencial, limitando a aplica??o do m?todo a problemas de menor porte, onde o n?mero de n?s e de servos ? reduzido. Este problema, denominado maldi??o da dimensionalidade, foi introduzido por Belmann e implica na impossibilidade de execu??o de um algoritmo para certas inst?ncias de um problema pelo esgotamento de recursos computacionais para obten??o de sua sa?da. De modo a evitar que a solu??o proposta, baseada exclusivamente na aprendizagem por refor?o, seja restrita a aplica??es de menor porte, prop?e-se uma solu??o alternativa para problemas mais realistas, que envolvam um n?mero maior de n?s e de servos. Esta solu??o alternativa ? hierarquizada e utiliza dois m?todos de solu??o do PKS: a aprendizagem por refor?o, aplicada a um n?mero reduzido de n?s obtidos a partir de um processo de agrega??o, e um m?todo guloso, aplicado aos subconjuntos de n?s resultantes do processo de agrega??o, onde o crit?rio de escolha do agendamento dos servos ? baseado na menor dist?ncia ao local de demanda K-Servos Aprendizado por Refor?o Q-Learning K-Servos Reinforcement Learning Q-Learning CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
15	SistEX UM SISTEMA DINÂMICO PARA DETECTAR A EXPERIÊNCIA DO ALUNO / SistEX - A DYNAMIC SYSTEM TO DETECT THE STUDENT EXPERIENCE Possobom, Camila Cerezer 15 April 2014 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / The widespread use of virtual learning environment (VLE) has great potential for the development of applications that meet needs in education. U -Learning environments the goal is to seek information related to the needs and preferences of users to create their context and present adaptations in its content to suit the user's profile, as in most traditional VLEs, such as Moodle, this process is not generally considered. Given the importance of a dynamic application that can continually adapt to the levels of students' knowledge, this dissertation proposes a module called SistEX (A Dynamic System for Detecting Student Experience). The adaptations in the environment used the Adaptive Hypermedia, and the type of information collected was the level of user knowledge, which were obtained through questionnaires applications. Furthermore, it was used and adapted the algorithm Q -Learning, from the intelligent tutoring system (ITS), to contribute to the user's learning process. As a result, the Software and user testing demonstrated that the environment SistEX worked satisfactorily, based on the assessments made by users who tested the module and its operation. The questionnaire was the System Usability Scale (SUS) on the module developed, which gave a result within a range considered good, which includes the objectives proposed in this work, even though some limitations and difficulties have been identified during development. / A difusão do uso de ambiente virtual de aprendizagem (AVA) apresenta um grande potencial para o desenvolvimento de aplicações que atendam necessidades na área da educação. Em ambientes U-Learning o objetivo é buscar informações relacionadas às necessidades e preferências dos usuários para criar o seu contexto e apresentar adaptações no seu conteúdo para se adequar ao perfil do usuário, visto que na maioria dos AVAs tradicionais, como o Moodle, esse processo geralmente não é considerado. Tendo em vista a importância de uma aplicação mais dinâmica e que consiga se adaptar continuamente aos níveis de conhecimento dos alunos, esta dissertação propõe um módulo denominado de SistEX (Um Sistema Dinâmico para Detectar a Experiência do Aluno). As adaptações realizadas no ambiente utilizaram a Hipermídia Adaptativa, sendo que o tipo de informação coletada foi o nível de conhecimento do usuário, que foram obtidas por meio de aplicações de questionários. Além disso, foi utilizado e adaptado o algoritmo Q-Learning, proveniente do sistema tutor inteligente (STI), para contribuir no processo de aprendizagem do usuário. Como resultados, no teste de Software e com usuários demonstraram que o ambiente SistEX atuou de forma satisfatória, tendo como base as avaliações feitas por usuários que testaram o módulo e o seu funcionamento. O questionário aplicado foi o System Usability Scale (SUS) sobre o módulo desenvolvido, que apresentou resultado dentro de uma escala considerada como boa, o que contempla os objetivos propostos neste trabalho, mesmo que algumas limitações e dificuldades tenham sido identificados durante o desenvolvimento. U-learning Q-learning SUS Nível de conhecimento U -learning Q -learning SUS Level of knowledge
16	Uma implementa??o paralela h?brida para o problema do caixeiro viajante usando algoritmos gen?ticos, GRASP e aprendizagem por refor?o Santos, Jo?o Paulo Queiroz dos 06 March 2009 (has links) Made available in DSpace on 2014-12-17T14:55:11Z (GMT). No. of bitstreams: 1 JoaoPQS.pdf: 1464588 bytes, checksum: ad1e7b6af306b0ce9b1ccb1fb510c4ab (MD5) Previous issue date: 2009-03-06 / The metaheuristics techiniques are known to solve optimization problems classiﬁed as NP-complete and are successful in obtaining good quality solutions. They use non-deterministic approaches to generate solutions that are close to the optimal, without the guarantee of ﬁnding the global optimum. Motivated by the difﬁculties in the resolution of these problems, this work proposes the development of parallel hybrid methods using the reinforcement learning, the metaheuristics GRASP and Genetic Algorithms. With the use of these techniques, we aim to contribute to improved efﬁciency in obtaining efﬁcient solutions. In this case, instead of using the Q-learning algorithm by reinforcement learning, just as a technique for generating the initial solutions of metaheuristics, we use it in a cooperative and competitive approach with the Genetic Algorithm and GRASP, in an parallel implementation. In this context, was possible to verify that the implementations in this study showed satisfactory results, in both strategies, that is, in cooperation and competition between them and the cooperation and competition between groups. In some instances were found the global optimum, in others theses implementations reach close to it. In this sense was an analyze of the performance for this proposed approach was done and it shows a good performance on the requeriments that prove the efﬁciency and speedup (gain in speed with the parallel processing) of the implementations performed / As metaheur?sticas s?o t?cnicas conhecidas para a resolu??o de problemas de otimiza??o, classiﬁcados como NP-Completos e v?m obtendo sucesso em solu??es aproximadas de boa qualidade. Elas fazem uso de abordagens n?o determin?sticas que geram solu??es que se aproximam do ?timo, mas no entanto, sem a garantia de que se encontre o ?timo global. Motivado pelas diﬁculdades em torno da resolu??o destes problemas, este trabalho prop?s o desenvolvimento de m?todos paralelos h?bridos utilizando a aprendizagem por refor?o e as metaheur?sticas GRASP e Algoritmos Gen?ticos. Com a utiliza??o dessas t?cnicas em conjunto, objetivou-se ent?o, contribuir na obten??o de solu??es mais eﬁcientes. Neste caso, ao inv?s de utilizar o algoritmo Q-learning da aprendizagem por refor?o, apenas como t?cnica de gera??o das solu??es iniciais das metaheur?sticas, este tamb?m aplicado de forma cooperativa e competitiva com o Algoritmo Gen?tico e o GRASP, em uma implementa??o paralela. Neste contexto, foi poss?vel veriﬁcar que as implementa??es realizadas neste trabalho apresentaram resultados satisfat?rios, tanto na parte de coopera??o e competi??o entre os algoritmos Q-learning, GRASP a Algoritmos Gen?ticos, quanto na parte de coopera??o e competi??o entre grupos destes tr?s algoritmos. Em algumas inst?ncias foi encontrado o ?timo global; quando n?o encontrado, conseguiu-se chegar bem pr?ximo de seu valor. Neste sentido foi realizada uma an?lise do desempenho da abordagem proposta e veriﬁcou-se um bom comportamento em rela??o aos quesitos que comprovam a eﬁci?ncia e o speedup (ganho de velocidade com o processamento paralelo) das implementa??es realizadas Metaheur?sticas GRASP Algoritmos gen?ticos Q-learning Sistemas paralelos e distribu?dos GRASP metaheuristics Genetic algorithm Q-learning Parallel and distributed systems CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA
17	AI-driven admission control : with Deep Reinforcement Learning / AI-driven antagningskontroll : med Djup Förstärkningslärande Ai, Lingling January 2021 (has links) 5G is expected to provide a high-performance and highly efficient network to prominent industry verticals with ubiquitous access to a wide range of services with orders of magnitude of improvement over 4G. Network slicing, which allocates network resources according to users’ specific requirements, is a key feature to fulfil the diversity of requirements in 5G network. However, network slicing also brings more orchestration and difficulty in monitoring and admission control. Although the problem of admission control has been extensively studied, those research take measurements for granted. Fixed high monitoring frequency can waste system resources, while low monitoring frequency (low level of observability) can lead to insufficient information for good admission control decisions. To achieve efficient admission control in 5G, we consider the impact of configurable observability, i.e. control observed information by configuring measurement frequency, is worth investigating. Generally, we believe more measurements provide more information about the monitored system, thus enabling a capable decision-maker to have better decisions. However, more measurements also bring more monitoring overhead. To study the problem of configurable observability, we can dynamically decide what measurements to monitor and their frequencies to achieve efficient admission control. In the problem of admission control with configurable observability, the objective is to minimize monitoring overhead while maintaining enough information to make proper admission control decisions. In this thesis, we propose using the Deep Reinforcement Learning (DRL) method to achieve efficient admission control in a simulated 5G end-to-end network, including core network, radio access network and four dynamic UEs. The proposed method is evaluated by comparing with baseline methods using different performance metrics, and then the results are discussed. With experiments, the proposed method demonstrates the ability to learn from interaction with the simulated environment and have good performance in admission control and used low measurement frequencies. After 11000 steps of learning, the proposed DRL agents generally achieve better performance than the threshold-based baseline agent, which takes admission decisions based on combined threshold conditions on RTT and throughput. Furthermore, the DRL agents that take non-zero measurement costs into consideration uses much lower measurement frequencies than DRL agents that take measurement costs as zero. / 5G förväntas ge ett högpresterande och högeffektivt nätverk till framstående industrivertikaler genom allmän tillgång till ett brett utbud av tjänster, med förbättringar i storleksordningar jämfört med 4G. Network slicing, som allokerar nätverksresurser enligt specifika användarkrav, är en nyckelfunktion för att uppfylla mångfalden av krav i 5G-nätverk. Network slicing kräver däremot också mer orkestrering och medför svårigheter med övervakning och tillträdeskontroll. Även om problemet med tillträdeskontroll har studerats ingående, tar de studierna mätfrekvenser för givet. Detta trots att hög övervakningsfrekvens kan slösa systemresurser, medan låg övervakningsfrekvens (låg nivå av observerbarhet) kan leda till otillräcklig information för att ta bra beslut om antagningskontroll. För att uppnå effektiv tillträdeskontroll i 5G anser vi att effekten av konfigurerbar observerbarhet, det vill säga att kontrollera observerad information genom att konfigurera mätfrekvens, är värt att undersöka. Generellt tror vi att fler mätningar ger mer information om det övervakade systemet, vilket gör det möjligt för en kompetent beslutsfattare att fatta bättre beslut. Men fler mätningar ger också högre övervakningskostnader. För att studera problemet med konfigurerbar observerbarhet kan vi dynamiskt bestämma vilka mätningar som ska övervakas och deras frekvenser för att uppnå effektiv tillträdeskontroll. I problemet med tillträdeskontroll med konfigurerbar observerbarhet är målet att minimera övervakningskostnader samtidigt som tillräckligt med information bibehålls för att fatta korrekta beslut om tillträdeskontroll. I denna avhandling föreslår vi att använda Deep Reinforcement Learning (DRL)-metoden för att uppnå effektiv tillträdeskontroll i ett simulerat 5G-änd-till-änd-nätverk, inklusive kärnnät, radioaccessnätverk och fyra dynamiska användarenheter. Den föreslagna metoden utvärderas genom att jämföra med standardmetoder som använder olika prestationsmått, varpå resultaten diskuteras. I experiment visar den föreslagna metoden förmågan att lära av interaktion med den simulerade miljön och ha god prestanda i tillträdeskontroll och använda låga mätfrekvenser. Efter 11 000 inlärningssteg uppnår de föreslagna DRL-agenterna i allmänhet bättre prestanda än den tröskelbaserade standardagenten, som fattar tillträdesbeslut baserat på kombinerade tröskelvillkor för RTT och throughput. Dessutom använder de DRL-agenter som tar hänsyn till nollskilda mätkostnader, mycket lägre mätfrekvenser än DRL-agenter som tar mätkostnaderna som noll. Admission Control Reinforcement Learning Configurable Observability Network Slicing Deep Q-Learning Antagningskontroll förstärkningsinlärning konfigurerbar observerbarhet nätverksdelning Deep Q-Learning Computer and Information Sciences Data- och informationsvetenskap
18	Distributed Optimisation in Multi-Agent Systems Through Deep Reinforcement Learning Eriksson, Andreas, Hansson, Jonas January 2019 (has links) The increased availability of computing power have made reinforcement learning a popular field of science in the most recent years. Recently, reinforcement learning has been used in applications like decreasing energy consumption in data centers, diagnosing patients in medical care and in text-tospeech software. This project investigates how well two different reinforcement learning algorithms, Q-learning and deep Qlearning, can be used as a high-level planner for controlling robots inside a warehouse. A virtual warehouse was created, and the two different algorithms were tested. The reliability of both algorithms where found to be insufficient for real world applications but the deep Q-learning algorithm showed great potential and further research is encouraged. Reinforcement learning distributed optimisation Q-learning deep Q-learning warehouse robots artificial neural networks. Elektroteknik och elektronik
19	Machine Learning for Traffic Control of Unmanned Mining Machines : Using the Q-learning and SARSA algorithms / Maskininlärning för Trafikkontroll av Obemannade Gruvmaskiner : Med användning av algoritmerna Q-learning och SARSA Gustafsson, Robin, Fröjdendahl, Lucas January 2019 (has links) Manual configuration of rules for unmanned mining machine traffic control can be time-consuming and therefore expensive. This paper presents a Machine Learning approach for automatic configuration of rules for traffic control in mines with autonomous mining machines by using Q-learning and SARSA. The results show that automation might be able to cut the time taken to configure traffic rules from 1-2 weeks to a maximum of approximately 6 hours which would decrease the cost of deployment. Tests show that in the worst case the developed solution is able to run continuously for 24 hours 82% of the time compared to the 100% accuracy of the manual configuration. The conclusion is that machine learning can plausibly be used for the automatic configuration of traffic rules. Further work in increasing the accuracy to 100% is needed for it to replace manual configuration. It remains to be examined whether the conclusion retains pertinence in more complex environments with larger layouts and more machines. / Manuell konfigurering av trafikkontroll för obemannade gruvmaskiner kan vara en tidskrävande process. Om denna konfigurering skulle kunna automatiseras så skulle det gynnas tidsmässigt och ekonomiskt. Denna rapport presenterar en lösning med maskininlärning med Q-learning och SARSA som tillvägagångssätt. Resultaten visar på att konfigureringstiden möjligtvis kan tas ned från 1–2 veckor till i värsta fallet 6 timmar vilket skulle minska kostnaden för produktionssättning. Tester visade att den slutgiltiga lösningen kunde köra kontinuerligt i 24 timmar med minst 82% träffsäkerhet jämfört med 100% då den manuella konfigurationen används. Slutsatsen är att maskininlärning eventuellt kan användas för automatisk konfiguration av trafikkontroll. Vidare arbete krävs för att höja träffsäkerheten till 100% så att det kan användas istället för manuell konfiguration. Fler studier bör göras för att se om detta även är sant och applicerbart för mer komplexa scenarier med större gruvlayouts och fler maskiner. Machine Learning reinforcement learning Q-learning SARSA autonomous machines mining Maskininlärning reinforcement learning Q-learning SARSA självstyrande maskiner gruvdrift Software Engineering Programvaruteknik
20	Distributed Deep Reinforcement Learning for a Multi-Robot Warehouse System Stenberg, Holger, Wahréus, Johan January 2021 (has links) This project concerns optimizing the behavior ofmultiple dispatching robots in a virtual warehouse environment.Q-learning and deep Q-learning algorithms, two establishedmethods in reinforcement learning, were used for this purpose.Simulations were run during the project, implementing andcomparing different algorithms on environments with up to fourrobots. The efficiency of a given algorithm was assessed primarilyby the number of packages it enabled the robots to deliver andhow fast the solution converged. The simulation results revealedthat a Q-learning algorithm could solve problems in environmentswith up to two active robots efficiently. To solve more complexproblems in environments with more than two robots, deep Qlearninghad to be implemented to avoid prolonged computationsand excessive memory usage. / Detta projekt handlar om att optimera rörelserna för ett flertal robotar i en virtuell miljö. Q-learning och deep Q-learning-algoritmer, två väletablerade metoder inom maskininlärning, användes för detta. Under projektet utfördes simuleringar där de olika algoritmerna jämfördes i miljöer med upp till fyra robotar. En given algoritms prestanda bedömdes med avseende på hur många paket robotarna kunde leverera i miljön samt hur snabbt en lösning konvergerade. Resultaten visade att Q-learning kunde lösa problem i miljöer med upp 2 robotar effektivt. För större problem användes deep Q-learning för att undvika långvariga beräkningar och stor minnesåtgång. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Deep-Q Learning Multi-agent system Neural network Reinforcement Learning Robots Q-learning Elektroteknik och elektronik

Search results