Global ETD Search

31	Využití opakovaně posilovaného učení pro řízení čtyřnohého robotu / Using of Reinforcement Learning for Four Legged Robot Control Ondroušek, Vít January 2011 (has links) The Ph.D. thesis is focused on using the reinforcement learning for four legged robot control. The main aim is to create an adaptive control system of the walking robot, which will be able to plan the walking gait through Q-learning algorithm. This aim is achieved using the design of the complex three layered architecture, which is based on the DEDS paradigm. The small set of elementary reactive behaviors forms the basis of proposed solution. The set of composite control laws is designed using simultaneous activations of these behaviors. Both types of controllers are able to operate on the plain terrain as well as on the rugged one. The model of all possible behaviors, that can be achieved using activations of mentioned controllers, is designed using an appropriate discretization of the continuous state space. This model is used by the Q-learning algorithm for finding the optimal strategies of robot control. The capabilities of the control unit are shown on solving three complex tasks: rotation of the robot, walking of the robot in the straight line and the walking on the inclined plane. These tasks are solved using the spatial dynamic simulations of the four legged robot with three degrees of freedom on each leg. Resulting walking gaits are evaluated using the quantitative standardized indicators. The video files, which show acting of elementary and composite controllers as well as the resulting walking gaits of the robot, are integral part of this thesis.
32	Estratégia para otimização de offloading entre as redes móveis VLC e LTE baseada em q-learning / Strategy for offloading optimization between mobile networks VLC and LTE based q-learning SOUTO, Anderson Vinicius de Freitas 31 August 2018 (has links) Submitted by Luciclea Silva (luci@ufpa.br) on 2018-11-09T17:16:39Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Dissertacao_Estrategiaotimizacaooffloading.pdf: 4353496 bytes, checksum: 660c9fb62874c25c2071d6e88692d9a9 (MD5) / Approved for entry into archive by Luciclea Silva (luci@ufpa.br) on 2018-11-09T17:17:01Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Dissertacao_Estrategiaotimizacaooffloading.pdf: 4353496 bytes, checksum: 660c9fb62874c25c2071d6e88692d9a9 (MD5) / Made available in DSpace on 2018-11-09T17:17:01Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Dissertacao_Estrategiaotimizacaooffloading.pdf: 4353496 bytes, checksum: 660c9fb62874c25c2071d6e88692d9a9 (MD5) Previous issue date: 2018-08-31 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / O aumento no consumo de tráfego de dados é motivado pelo aumento do número de dispositivos como smartphone e tablets, já que há uma necessidade de estar conectado com tudo e com todos. As aplicações como streaming de vídeo e jogos online demandam por maior taxa de transmissão de dados, essa alta demanda corrobora para um a sobrecarga das redes móveis baseadas por radiofrequência, de modo a culminar em uma possível escassez do espectro RF. Por tanto, este trabalho busca otimizar o offloading entre LTE e VLC, e para isso é utilizado uma metodologia baseado em aprendizado por reforço denominada de Q-Learning. O algoritmo utiliza como entrada as variáveis do ambiente que estão relacionadas à qualidade do sinal, densidade e velocidade do usuário para aprender e selecionar a melhor conexão. Por tanto, os resultados da simulação mostram a eficiência da metodologia proposta em comparação com o esquema RSS predominante na literatura da área. já que provou por métricas de QoS, suportar maiores taxas de transmissão de dados, assim como, garantiu uma melhoria de 18% em relação as interrupções de serviço a medida que o número de usuários aumenta no sistema. / The increase in the consumption of data traffic is motivated by the increasing number of devices like smartphone and tablets, since there is a need to be connected with everything and with everyone. Applications such as streaming video and online games require a higher rate of data transmission, this high demand corroborates the overload of mobile networks based on radio frequency, so as to culminate in a possible shortage of the RF spectrum. Therefore, this work seeks to optimize offloading between LTE and VLC, and for this a methodology based on reinforcement learning called Q-Learning is used. The algorithm uses as input the environment variables that are related to the signal quality, density and speed of the user to learn and select the best connection. Therefore, the results of the simulation show the efficiency of the proposed methodology in comparison with the predominant RSS scheme in the area literature. as it has been proven by QoS metrics to support higher data rates, as well as ensuring an 18% improvement over service interruptions as the number of users increases in the system. CNPQ::ENGENHARIAS::ENGENHARIA ELETRICA Offloading VLC LTE Q-learning QoS REDES E SISTEMAS DISTRIBUÍDOS COMPUTAÇÃO APLICADA
33	Machine Learning Methods for Network Intrusion Detection and Intrusion Prevention Systems Stefanova, Zheni Svetoslavova 03 July 2018 (has links) Given the continuing advancement of networking applications and our increased dependence upon software-based systems, there is a pressing need to develop improved security techniques for defending modern information technology (IT) systems from malicious cyber-attacks. Indeed, anyone can be impacted by such activities, including individuals, corporations, and governments. Furthermore, the sustained expansion of the network user base and its associated set of applications is also introducing additional vulnerabilities which can lead to criminal breaches and loss of critical data. As a result, the broader cybersecurity problem area has emerged as a significant concern, with many solution strategies being proposed for both intrusion detection and prevention. Now in general, the cybersecurity dilemma can be treated as a conflict-resolution setup entailing a security system and minimum of two decision agents with competing goals (e.g., the attacker and the defender). Namely, on the one hand, the defender is focused on guaranteeing that the system operates at or above an adequate (specified) level. Conversely, the attacker is focused on trying to interrupt or corrupt the system’s operation. In light of the above, this dissertation introduces novel methodologies to build appropriate strategies for system administrators (defenders). In particular, detailed mathematical models of security systems are developed to analyze overall performance and predict the likely behavior of the key decision makers influencing the protection structure. The initial objective here is to create a reliable intrusion detection mechanism to help identify malicious attacks at a very early stage, i.e., in order to minimize potentially critical consequences and damage to system privacy and stability. Furthermore, another key objective is also to develop effective intrusion prevention (response) mechanisms. Along these lines, a machine learning based solution framework is developed consisting of two modules. Specifically, the first module prepares the system for analysis and detects whether or not there is a cyber-attack. Meanwhile, the second module analyzes the type of the breach and formulates an adequate response. Namely, a decision agent is used in the latter module to investigate the environment and make appropriate decisions in the case of uncertainty. This agent starts by conducting its analysis in a completely unknown milieu but continually learns to adjust its decision making based upon the provided feedback. The overall system is designed to operate in an automated manner without any intervention from administrators or other cybersecurity personnel. Human input is essentially only required to modify some key model (system) parameters and settings. Overall, the framework developed in this dissertation provides a solid foundation from which to develop improved threat detection and protection mechanisms for static setups, with further extensibility for handling streaming data. Intrusion Detection Machine Learning Network Security Q-learning Streaming data Computer Sciences Statistics and Probability
34	A Q-Learning Approach to Minefield Characterization from Unmanned Aerial Vehicles Daugherty, Stephen Greyson January 2012 (has links) <p>The treasure hunt problem to determine how a computational agent can maximize its ability to detect and/or classify multiple targets located in a region of interest (ROI) populated with multiple obstacles. One particular instance of this problem involves optimizing the performance of a sensor mounted on an unmanned aerial vehicle (UAV) flying over a littoral region in order to detect mines buried underground. </p><p>Buried objects (including non-metallic ones) have an effect on the thermal conductivity and heat retention of the soil in which they reside. Because of this, objects that are not very deep below the surface often create measurable thermal anomalies on the surface soil. Because of this, infrared (IR) sensors have the potential to find mines and minelike objects (referred to in this thesis as clutters).</p><p>As the sensor flies over the ROI, sensor data is obtained. The sensor receives the data as pixellated infrared light signatures. Using this, ground temperature measurements are recorded and used to generate a two-dimensional thermal profile of the field of view (FOV) and map that profile onto the geography of the ROI.</p><p>The input stream of thermal data is then passed to an image processor that estimates the size and shape of the detected target. Then a Bayesian Network (BN) trained from a database of known mines and clutters is used to provide the posterior probability that the evidence obtained by the IR sensor for each detected target was the result of a mine or a clutter. The output is a confidence level (CL), and each target is classified as a mine or a clutter according to the most likely explanation (MLE) for the sensor evidence. Though the sensor may produce incomplete, noisy data, inferences from the BN attenuate the problem.</p><p>Since sensor performance depends on altitude and environmental conditions, the value of the IR information can be further improved by choosing the flight path intelligently. This thesis assumes that the UAV is flying through an environmentally homogeneous ROI and addresses the question of how the optimal altitude can be determined for any given multi-dimensional environmental state. </p><p>In general, high altitudes result in poor resolution, whereas low altitudes result in very limited FOVs. The problem of weighing these tradeoffs can be addressed by creating a scoring function that is directly dependent on a comparison between sensor outputs and ground truth. The scoring function provides a flexible framework through which multiple mission objectives can be addressed by assigning different weights to correct detections, correct non-detections, false detections, and false non-detections.</p><p>The scoring function provides a metric of sensor performance that can be used as feedback to optimize the sensor altitude as a function of the environmental conditions. In turn, the scoring function can be empirically evaluated over a number of different altitudes and then converted to empirical Q scores that also weigh future rewards against immediate ones. These values can be used to train a neural network (NN). The NN filters the data and interpolates between discrete Q-values to provide information about the optimal sensor altitude.</p><p>The research described in this thesis can be used to determine the optimal control policy for an aircraft in two different situations. The global maximum of the Q-function can be used to determine the altitude at which a UAV should cruise over an ROI for which the environmental conditions are known a priori. Alternatively, the local maxima of the Q-function can be used to determine the altitude to which a UAV should move if the environmental variables change during flight. </p><p>This thesis includes the results of computer simulations of a sensor flying over an ROI. The ROI is populated with targets whose characteristics are based on actual mines and minelike objects. The IR sensor itself is modeled by using a BN to create a stochastic simulation of the sensor performance. The results demonstrate how Q-learning can be applied to signals from a UAV-mounted IR sensor whose data stream is preprocessed by a BN classifier in order to determine an optimal flight policy for a given set of environmental conditions.</p> / Thesis Mechanical engineering Computer science Aircraft Algorithm Machine Learning Q-Learning Sensor
35	Hierarchical reinforcement learning in adversarial environments Kwok, Hing-Wah, Computer Science & Engineering, Faculty of Engineering, UNSW January 2009 (has links) It is known that one of the downfalls of reinforcement learning is the amount of time required to learn an optimal policy. This especially holds true for environments with large state spaces or environments with multiple agents. It is also known that standard Q-Learning develops a deterministic policy, and so in games where a stochastic policy is required (such as rock, paper, scissors) a Q-Learner opponent can be defeated without too much difficulty once the learning has ceased. Initially we investigated the impact that the MAXQ hierarchical reinforcement learning algorithm had in an adversarial environment. We found that it was difficult to conduct state space abstraction, especially when an unpredictable or co-evolving opponent was involved. We noticed that to keep the domains zero-sum, discounted learning was required. We had also found that a speed increase could be obtained through the use of hierarchy in the adversarial environment. We then investigated the ability to obtain similar learning speed increases to adversarial reinforcement learning through the use of this hierarchical methodology. Applying the hierarchical decomposition to Bowling's Win or Learn Fast (WoLF) algorithm we were able to maintain the accelerated learning rate whilst simultaneously retaining the stochastic elements of the WoLF algorithm. We made an assessment on the impact of the adversarial component of the hierarchy at both the higher and lower tiers of the hierarchical tree. Finally, we introduce the idea of pivot points. A pivot point is the last possible time you can wait before having to make a decision and thus revealing your strategy to the opponent. This results in maximising confusion for the opponent. Through the use of these pivot points, which could only have been discovered through the use of hierarchy, we were able to perform improved state-space abstraction since no decision needed to be made, in regards to the opponent, until this point was reached. Q-Learning. Reinforcement learning.
36	Proposta de arquitetura em Hardware para FPGA da t?cnica Qlearning de aprendizagem por refor?o Silva, Lucileide Medeiros Dantas da 18 November 2016 (has links) Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2017-03-20T19:50:20Z No. of bitstreams: 1 LucileideMedeirosDantasDaSilva_DISSERT.pdf: 995627 bytes, checksum: c5a0cf7ba1df8a88275e1f7c185e1eac (MD5) / Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2017-03-22T19:38:28Z (GMT) No. of bitstreams: 1 LucileideMedeirosDantasDaSilva_DISSERT.pdf: 995627 bytes, checksum: c5a0cf7ba1df8a88275e1f7c185e1eac (MD5) / Made available in DSpace on 2017-03-22T19:38:28Z (GMT). No. of bitstreams: 1 LucileideMedeirosDantasDaSilva_DISSERT.pdf: 995627 bytes, checksum: c5a0cf7ba1df8a88275e1f7c185e1eac (MD5) Previous issue date: 2016-11-18 / O Q-learning ? uma t?cnica de aprendizagem por refor?o off-policy que tem como principal vantagem a possibilidade de obter uma pol?tica ?tima interagindo com o ambiente sem que o modelo deste ambiente necessite ser conhecido. Este trabalho descreve uma proposta de arquitetura paralela em ponto fixo da t?cnica usando hardware reconfigur?vel do FPGA (Field Programmable Gates Arrays). O objetivo de desenvolver essa t?cnica em hardware ? otimizar o tempo de processamento do sistema. S?o apresentados resultados de converg?ncia do algoritmo, ?rea de ocupa??o e frequ?ncia de amostragem. Tamb?m s?o apresentados detalhes de implementa??o da arquitetura. O projeto foi desenvolvido utilizando a plataforma de desenvolvimento System Generator da Xilinx sendo projetado para o FPGA Virtex 6 xc6vcx240t-1ff1156. / Q-learning is a off-policy reinforcement learning technique which has as main advantage the possibility of obtaining an optimal policy interacting with an unknown model environment. This work proposes a parallel fixed-point Q-learning algorithm architecture, implemented in FPGA. Fundamental to this approach is optimize system processing time. Convergence results are presented. The processing time and occupied area were analyzed for diferentes scenarios and various fixed point formats. Architecture implementation details were featured. The entire project was developed using the System Generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA. FPGA Q-learning Aprendizagem por refor?o Hardware
37	WiFi and LTE Coexistence in the Unlicensed Spectrum Rupasinghe, Nadisanka 26 March 2015 (has links) Today, smart-phones have revolutionized wireless communication industry towards an era of mobile data. To cater for the ever increasing data traffic demand, it is of utmost importance to have more spectrum resources whereby sharing under-utilized spectrum bands is an effective solution. In particular, the 4G broadband Long Term Evolution (LTE) technology and its foreseen 5G successor will benefit immensely if their operation can be extended to the under-utilized unlicensed spectrum. In this thesis, first we analyze WiFi 802.11n and LTE coexistence performance in the unlicensed spectrum considering multi-layer cell layouts through system level simulations. We consider a time division duplexing (TDD)-LTE system with an FTP traffic model for performance evaluation. Simulation results show that WiFi performance is more vulnerable to LTE interference, while LTE performance is degraded only slightly. Based on the initial findings, we propose a Q-Learning based dynamic duty cycle selection technique for configuring LTE transmission gaps, so that a satisfactory throughput is maintained both for LTE and WiFi systems. Simulation results show that the proposed approach can enhance the overall capacity performance by 19% and WiFi capacity performance by 77%, hence enabling effective coexistence of LTE and WiFi systems in the unlicensed band. Beacon licensed-Assisted Access (LAA) Q-Learning TDD-LTE WiFi 802.11n Systems and Communications
38	Návrh simulátoru autonomního dopravního prostředku / Design of autonomous vehicle simulator Machač, Petr January 2020 (has links) Tato práce se zabývá simulačními prostředky pro vývoj algoritmů pro řízení autonomních automobilů. V zásadě lze rozdělit na dvě části, na rešeršní, teoretickou, a praktickou, vývojovou. V té prvně zmíněné je uveden přehled dostupných nástrojů pro simulaci autonomních vozidel, jedná se jak o nástroje open-sourcové tak placené. Dále se v teoretické části popisuje princip a nástroje, resp. enginy pro řešení dynamických rovnic na počítači. Důraz je kladen na fyzikální engine Box2D který je dle zadání této práce využit ve druhé části teze pro vývoj vlastního prostředí simulujícího autonomní automobil.
39	Reinforcement Learning with Auxiliary Memory Suggs, Sterling 08 June 2021 (has links) Deep reinforcement learning algorithms typically require vast amounts of data to train to a useful level of performance. Each time new data is encountered, the network must inefficiently update all of its parameters. Auxiliary memory units can help deep neural networks train more efficiently by separating computation from storage, and providing a means to rapidly store and retrieve precise information. We present four deep reinforcement learning models augmented with external memory, and benchmark their performance on ten tasks from the Arcade Learning Environment. Our discussion and insights will be helpful for future RL researchers developing their own memory agents. Reinforcement learning auxiliary memory neural computer Atari machine learning Q-learning Physical Sciences and Mathematics
40	[pt] ESTUDO DE TÉCNICAS DE APRENDIZADO POR REFORÇO APLICADAS AO CONTROLE DE PROCESSOS QUÍMICOS / [en] STUDY OF REINFORCEMENT LEARNING TECHNIQUES APPLIED TO THE CONTROL OF CHEMICAL PROCESSES 30 December 2021 (has links) [pt] A indústria 4.0 impulsionou o desenvolvimento de novas tecnologias para atender as demandas atuais do mercado. Uma dessas novas tecnologias foi a incorporação de técnicas de inteligência computacional no cotidiano da indústria química. Neste âmbito, este trabalho avaliou o desempenho de controladores baseados em aprendizado por reforço em processos químicos industriais. A estratégia de controle interfere diretamente na segurança e no custo do processo. Quanto melhor for o desempenho dessa estrategia, menor será a produção de efluentes e o consumo de insumos e energia. Os algoritmos de aprendizado por reforço apresentaram excelentes resultados para o primeiro estudo de caso, o reator CSTR com a cinética de Van de Vusse. Entretanto, para implementação destes algoritmos na planta química do Tennessee Eastman Process mostrou-se que mais estudos são necessários. A fraca ou inexistente propriedade Markov, a alta dimensionalidade e as peculiaridades da planta foram fatores dificultadores para os controladores desenvolvidos obterem resultados satisfatórios. Foram avaliados para o estudo de caso 1, os algoritmos Q-Learning, Actor Critic TD, DQL, DDPG, SAC e TD3, e para o estudo de caso 2 foram avaliados os algoritmos CMA-ES, TRPO, PPO, DDPG, SAC e TD3. / [en] Industry 4.0 boosted the development of new technologies to meet current market demands. One of these new technologies was the incorporation of computational intelligence techniques into the daily life of the chemical industry. In this context, this present work evaluated the performance of controllers based on reinforcement learning in industrial chemical processes. The control strategy directly affects the safety and cost of the process. The better the performance of this strategy, the lower will be the production of effluents and the consumption of input and energy. The reinforcement learning algorithms showed excellent results for the first case study, the Van de Vusse s reactor. However, to implement these algorithms in the Tennessee Eastman Process chemical plant it was shown that more studies are needed. The weak Markov property, the high dimensionality and peculiarities of the plant were factors that made it difficult for the developed controllers to obtain satisfactory results. For case study 1, the algorithms Q-Learning, Actor Critic TD, DQL, DDPG, SAC and TD3 were evaluated, and for case study 2 the algorithms CMA-ES, TRPO, PPO, DDPG, SAC and TD3 were evaluated. [pt] APRENDIZADO POR REFORCO [pt] SAC [pt] TD3 [pt] DDPG [pt] DEEP Q-LEARNING [pt] ATOR-CRITICO [pt] REATOR DE VAN DE VUSSE [pt] CONTROLE DE PROCESSOS QUIMICOS [pt] APRENDIZADO POR REFORCO PROFUNDO [pt] Q-LEARNING [pt] PROCESSO TENNESSEE EASTMAN [en] REINFORCEMENT LEARNING [en] SAC [en] TD3 [en] DDPG [en] DEEP Q-LEARNING [en] ACTOR CRITIC [en] CHEMICAL PROCESS CONTROL [en] DEEP REINFORCEMENT LEARNING [en] Q-LEARNING [en] TENNESSEE EASTMAN PROCESS

Search results