Global ETD Search

71	卷積深度Ｑ-學習之ETF自動交易系統 / Convolutional Deep Q-learning for ETF Automated Trading System 陳非霆, Chen, Fei-Ting Unknown Date (has links) 本篇文章使用了增強學習與捲積深度學習結合的DQCN模型製作交易系統，希望藉由此交易系統能自行判斷是否買賣ETF，由於ETF屬於穩定性高且手續費高的衍生性金融商品，所以該系統不即時性的做買賣，採用每二十個開盤日進行一次買賣，並由這20個開盤日進行買賣的預測，希望該系統能最大化我們未來的報酬。 DQN是一種增強學習的模型，並在其中使用深度學習進行動作價值的預測，利用增強學習的自我更新動作價值的機制，再用深度學習強大的學習能力成就了人工智慧，並在其取得良好的成效。 / In this paper, we used DCQN model, which is combined with reinforcement learning and CNN to train a trading system and hope the trading system could judge whether buy or sell ETFs. Since ETFs is a derivative financial good with high stability and related fee, the system does not perform real-time trading and it performs every 20 trading day. The system predicts value of action based on data in the last 20 opening days to maximize our future rewards. DQN is a reinforcement learning model, using deep learning to predict value of actions in model. Combined with the RL's mechanism, which updates value of actions, and deep learning, which has a strong ability of learning, to finish an artificial intelligence. We got a perfect effect. 深度學習增強學習卷積神經網路 Q-learning DQN ETF Deep learning Neural network CNN Q-leanring DQN ETF
72	Distributed spectrum sensing and interference management for cognitive radios with low capacity control channels Van Den Biggelaar, Olivier 05 October 2012 (has links) Cognitive radios have been proposed as a new technology to counteract the spectrum scarcity issue and increase the spectral efficiency. In cognitive radios, the sparse assigned frequency bands are opened to secondary users, provided that interference induced on the primary licensees is negligible. Cognitive radios are established in two steps: the radios firstly sense the available frequency bands by detecting the presence of primary users and secondly communicate using the bands that have been identified as not in use by the primary users.<p><p>In this thesis we investigate how to improve the efficiency of cognitive radio networks when multiple cognitive radios cooperate to sense the spectrum or control their interferences. A major challenge in the design of cooperating devices lays in the need for exchange of information between these devices. Therefore, in this thesis we identify three specific types of control information exchange whose efficiency can be improved. Specifically, we first study how cognitive radios can efficiently exchange sensing information with a coordinator node when the reporting channels are noisy. Then, we propose distributed learning algorithms allowing to allocate the primary network sensing times and the secondary transmission powers within the secondary network. Both distributed allocation algorithms minimize the need for information exchange compared to centralized allocation algorithms. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished Electricité Cognitive Radios Control Channels Control Information Multi-Agent Q-Learning Cooperation
73	[en] PESSIMISTIC Q-LEARNING: AN ALGORITHM TO CREATE BOTS FOR TURN-BASED GAMES / [pt] Q-LEARNING PESSIMISTA: UM ALGORITMO PARA GERAÇÃO DE BOTS DE JOGOS EM TURNOS ADRIANO BRITO PEREIRA 25 January 2017 (has links) [pt] Este documento apresenta um novo algoritmo de aprendizado por reforço, o Q-Learning Pessimista. Nossa motivação é resolver o problema de gerar bots capazes de jogar jogos baseados em turnos e contribuir para obtenção de melhores resultados através dessa extensão do algoritmo Q-Learning. O Q-Learning Pessimista explora a flexibilidade dos cálculos gerados pelo Q-Learning tradicional sem a utilização de força bruta. Para medir a qualidade do bot gerado, consideramos qualidade como a soma do potencial de vitória e empate em um jogo. Nosso propósito fundamental é gerar bots de boa qualidade para diferentes jogos. Desta forma, podemos utilizar este algoritmo para famílias de jogos baseados em turno. Desenvolvemos um framework chamado Wisebots e realizamos experimentos com alguns cenários aplicados aos seguintes jogos tradicionais: TicTacToe, Connect-4 e CardPoints. Comparando a qualidade do Q-Learning Pessimista com a do Q-Learning tradicional, observamos ganhos de 0,8 por cento no TicTacToe, obtendo um algoritmo que nunca perde. Observamos também ganhos de 35 por cento no Connect-4 e de 27 por cento no CardPoints, elevando ambos da faixa de 50 por cento a 60 por cento para 90 por cento a 100 por cento de qualidade. Esses resultados ilustram o potencial de melhoria com o uso do Q-Learning Pessimista, sugerindo sua aplicação aos diversos tipos de jogos de turnos. / [en] This document presents a new algorithm for reinforcement learning method, Q-Learning Pessimistic. Our motivation is to resolve the problem of generating bots able to play turn-based games and contribute to achieving better results through this extension of the Q-Learning algorithm. The Q-Learning Pessimistic explores the flexibility of the calculations generated by the traditional Q-learning without the use of force brute. To measure the quality of bot generated, we consider quality as the sum of the potential to win and tie in a game. Our fundamental purpose, is to generate bots with good quality for different games. Thus, we can use this algorithm to families of turn-based games. We developed a framework called Wisebots and conducted experiments with some scenarios applied to the following traditional games TicTacToe, Connect-4 and CardPoints. Comparing the quality of Pessimistic Q-Learning with the traditional Q-Learning, we observed gains to 100 per cent in the TicTacToe, obtaining an algorithm that never loses. Also observed in 35 per cent gains Connect-4 and 27 per cent in CardPoints, increasing both the range of 60 per cent to 80 per cent for 90 per cent to 100 per cent of quality. These results illustrate the potential for improvement with the use of Q-Learning Pessimistic, suggesting its application to various types of games. [pt] APRENDIZADO POR REFORCO [en] REINFORCEMENT LEARNING [pt] APRENDIZADO DE MAQUINA [en] MACHINE LEARNING [pt] INTELIGENCIA ARTIFICIAL [en] ARTIFICIAL INTELLIGENCE [pt] JOGO [en] GAME [pt] Q-LEARNING PESSIMISTA [pt] BOTS
74	Deep Reinforcement Learning for the Optimization of Combining Raster Images in Forest Planning Wen, Yangyang January 2021 (has links) Raster images represent the treatment options of how the forest will be cut. Economic benefits from cutting the forest will be generated after the treatment is selected and executed. Existing raster images have many clusters and small sizes, this becomes the principal cause of overhead. If we can fully explore the relationship among the raster images and combine the old data sets according to the optimization algorithm to generate a new raster image, then this result will surpass the existing raster images and create higher economic benefits. The question of this project is can we create a dynamic model that treats the updating pixel’s status as an agent selecting options for an empty raster image in response to neighborhood environmental and landscape parameters. This project is trying to explore if it is realistic to use deep reinforcement learning to generate new and superior raster images. Finally, this project aims to explore the feasibility, usefulness, and effectiveness of deep reinforcement learning algorithms in optimizing existing treatment options. The problem was modeled as a Markov decision process, in which the pixel to be updated was an agent of the empty raster image, which would determine the choice of the treatment option for the current empty pixel. This project used the Deep Q learning neural network model to calculate the Q values. The temporal difference reinforcement learning algorithm was applied to predict future rewards and to update model parameters. After the modeling was completed, this project set up the model usefulness experiment to test the usefulness of the model. Then the parameter correlation experiment was set to test the correlation between the parameters and the benefit of the model. Finally, the trained model was used to generate a larger size raster image to test its effectiveness. Raster images Optimization Deep Reinforcement Learning Markov Decision Process Deep Q Learning Neural Network Temporal Difference Model Usefulness Parameter Correlation Model Effectiveness. Computer Systems Datorsystem
75	Simulated Fixed-Wing Aircraft Attitude Control using Reinforcement Learning Methods David Jona Richter (11820452) 20 December 2021 (has links) <div>Autonomous transportation is a research field that has gained huge interest in recent years, with autonomous electric or hydrogen cars coming ever closer to seeing everyday use. Not just cars are subject to autonomous research though, the field of aviation is also being explored for fully autonomous flight. One very important aspect for making autonomous flight a reality is attitude control, the control of roll, pitch, and sometimes yaw. Traditional approaches for automated attitude control use PID (proportional-integral-derivative) controllers, which use hand-tuned parameters to fulfill the task. In this work, however, the use of Reinforcement Learning algorithms for attitude control will be explored. With the surge of more and more powerful artificial neural networks, which have proven to be universally usable function approximators, Deep Reinforcement Learning also becomes an intriguing option. </div><div>A software toolkit will be developed and used to allow for the use of multiple flight simulators to train agents with Reinforcement Learning as well as Deep Reinforcement Learning. Experiments will be run using different hyperparamters, algorithms, state representations, and reward functions to explore possible options for autonomous attitude control using Reinforcement Learning.</div> Reinforcement Learning Deep Learning Deep Reinforcement Learning Machine Learning Neural Networks Q-Learning Deep Q-Networks (DQN) Aviation Attitude Control Autopilot
76	Policy-based Reinforcement learning control for window opening and closing in an office building Kaisaravalli Bhojraj, Gokul, Markonda, Yeswanth Surya Achyut January 2020 (has links) The level of indoor comfort can highly be influenced by window opening and closing behavior of the occupant in an office building. It will not only affect the comfort level but also affects the energy consumption, if not properly managed. This occupant behavior is not easy to predict and control in conventional way. Nowadays, to call a system smart it must learn user behavior, as it gives valuable information to the controlling system. To make an efficient way of controlling a window, we propose RL (Reinforcement Learning) in our thesis which should be able to learn user behavior and maintain optimal indoor climate. This model free nature of RL gives the flexibility in developing an intelligent control system in a simpler way, compared to that of the conventional techniques. Data in our thesis is taken from an office building in Beijing. There has been implementation of Value-based Reinforcement learning before for controlling the window, but here in this thesis we are applying policy-based RL (REINFORCE algorithm) and also compare our results with value-based (Q-learning) and there by getting a better idea, which suits better for the task that we have in our hand and also to explore how they behave. Based on our work it is found that policy based RL provides a great trade-off in maintaining optimal indoor temperature and learning occupant’s behavior, which is important for a system to be called smart. Markov decision processes Policy-based Reinforcement learning Value-based Reinforcement learning Q-learning REINFORCE policy gradient window control indoor comfort level Social Sciences Samhällsvetenskap
77	A Comparative Study of Reinforcement-based and Semi-classical Learning in Sensor Fusion Bodén, Johan January 2021 (has links) Reinforcement learning has proven itself very useful in certain areas, such as games. However, the approach has been seen as quite limited. Reinforcement-based learning has for instance not been commonly used for classification tasks as it is receiving feedback on how well it did for an action performed on a specific input. This slows the performance convergence rate as compared to other classification approaches which has the input and the corresponding output to train on. Nevertheless, this thesis aims to investigate whether reinforcement-based learning could successfully be employed on a classification task. Moreover, as sensor fusion is an expanding field which can for instance assist autonomous vehicles in understanding its surroundings, it is also interesting to see how sensor fusion, i.e., fusion between lidar and RGB images, could increase the performance in a classification task. In this thesis, a reinforcement-based learning approach is compared to a semi-classical approach. As an example of a reinforcement learning model, a deep Q-learning network was chosen, and a support vector machine classifier built on top of a deep neural network, was chosen as an example of a semi-classical model. In this work, these frameworks are compared with and without sensor fusion to see whether fusion improves their performance. Experiments show that the evaluated reinforcement-based learning approach underperforms in terms of metrics but mainly due to its slow learning process, in comparison to the semi-classical approach. However, on the other hand using reinforcement-based learning to carry out a classification task could still in some cases be advantageous, as it still performs fairly well in terms of the metrics presented in this work, e.g. F1-score, or for instance imbalanced datasets. As for the impact of sensor fusion, a notable improvement can be seen, e.g. when training the deep Q-learning model for 50 episodes, the F1-score increased with 0.1329; especially, when taking into account that the most of the lidar data used in the fusion is lost since this work projects the 3D lidar data onto the same 2D plane as the RGB images. machine learning reinforcement learning deep Q-learning network classical learning supervised learning support vector machine deep neural network sensor fusion Computer and Information Sciences Data- och informationsvetenskap
78	Offline Reinforcement Learning for Remote Electrical Tilt Optimization : An application of Conservative Q-Learning / Offline förstärkningsinlärning för fjärran antennlutningsoptimering : En tillämpning av konservativ Q-inlärning Kastengren, Marcus January 2021 (has links) In telecom networks adjusting the tilt of antennas in an optimal manner, the so called remote electrical tilt (RET) optimization, is a method to ensure quality of service (QoS) for network users. Tilt adjustments made during operations in real-world networks are usually executed through a suboptimal policy, and a significant amount of data is collected during the execution of such policy. The policy collecting the data is known as the behavior policy and can be used to learn improved tilt update policies in an offline manner. In this thesis the RET optimization problem is formulated in a offline Reinforcement Learning (RL) setting, where the objective is to learn an optimal policy from batches of data collected by the logging policy. Offline RL is a challenging problem where traditional RL algorithms can fail to learn policies that will perform well when evaluated online.In this thesis Conservative Q-learning (CQL) is applied to tackle the challenges of offline RL, with the purpose of learning improved policies for tilt adjustment from data in a simulated environment. Experiments are made with different types of function approximators to model the Q-function. Specifically, an Artificial Neural Network (ANN) and a linear model are employed in the experiments. With linear function approximation, two novel algorithms which combine the properties of CQL and the classic Least Squares Policy Iteration (LSPI) algorithm are proposed. They are also used for learning RET adjustment policies. In online evaluation in the simulator one of the proposed algorithms with simple linear function approximation achieves similar results to CQL with the more complex artificial neural network function approximator. These versions of CQL outperform both the behavior policy and the naive Deep Q-Networks (DQN) method. / I telekomnätverk är justering av lutningen av antenner, kallat Remote Electrical Tilt (RET) optimering en metod för att säkerställa servicekvalitet för användare av nätverket. Justeringar under drift är gjorda med ickeoptimala riktlinjer men gjort på ett säkert sätt och data samlas in under driften. Denna datan kan potentiellt användas för att skaffa fram bättre riktlinjer för att justera antennlutningen.Antennlutningsproblemet kan formuleras som ett offline-förstärkandeinlärningsproblem, där målet är att ta fram optimala riktlinjer från ett dataset. Offline-förstärkningsinlärning är ett utmanande problem där naiva implementationer av traditionella förstärkningsinlärnings-algoritmer kan fallera.I denna masteruppsats används metoden konservativ Q-inlärning (CQL) för att tackla utmaningarna hos offline-förstärkningsinlärning och för att hitta förbättrade riktlinjer för antennlutningsjusteringar i en simulerad miljö. Problem-uppställningens egenskaper gör att Q-inlärningsmetoder som CQL behöver funktions-approximatorer för modellera Q-funktionen. I denna masteruppsats görs experiment med både expressiva artificiella neurala nätverk och linjära kombinationer av simpla basfunktioner som funktions-approximatorer.I fallet med linjära funktions-approximatorer så föreslås två nya algoritmer som kombinerar egenskaperna hos CQL med den klassiska förstäkningsinlärningsalgoritmen minsta-kvadrat policyiteration (LSPI) som sedan också används för att skapa riktlinjer för antennlutningsjustering.Resultaten visar att CQL med artificiella neurala nätverk och en av dom föreslagna algoritmerna kan lära sig riktlinjer med bättre resultat en både riktlinjerna som samlade in träningsdatan och den klassiska metoden djupt Q-nätverk applicerad offline. Remote Electrical Tilt Antenna Tilt Optimization Reinforcement Learning Offline Reinforcement Learning Conservative Q-Learning Fjärrlutning Antennlutningsoptimering Förstärkningsinlärning Offline-förstärkningsinlärning Konservativ Q-inlärning Other Mathematics Annan matematik
79	Model-Free Reinforcement Learning for Hierarchical OO-MDPs Goldblatt, John Dallan 23 May 2022 (has links) No description available. Artificial Intelligence Computer Science reinforcement learning RL hierarchical reinforcement learning HRL hierarchical hierarchy object-oriented reinforcement learning object-oriented OO-MDP OOMDP MDP Q-Learning QLearning MaxQ KOOL
80	Stuck state avoidance through PID estimation training of Q-learning agent / Förhindrande av odefinierade tillstånd vid Q-learning träning genom PID estimering Moritz, Johan, Winkelmann, Albin January 2019 (has links) Reinforcement learning is conceptually based on an agent learning through interaction with its environment. This trial-and-error learning method makes the process prone to situations in which the agent is stuck in a dead-end, from which it cannot keep learning. This thesis studies a method to diminish the risk that a wheeled inverted pendulum, or WIP, falls over during training by having a Qlearning based agent estimate a PID controller before training it on the balance problem. We show that our approach is equally stable compared to a Q-learning agent without estimation training, while having the WIP falling over less than half the number of times during training. Both agents succeeds in balancing the WIP for a full hour in repeated tests. / Reinforcement learning baseras på en agent som lär sig genom att interagera med sin omgivning. Denna inlärningsmetod kan göra att agenten hamnar i situationer där den fastnar och inte kan fortsätta träningen. I denna examensuppsats utforskas en metod för att minska risken att en självkörande robot faller under inlärning. Detta görs genom att en Q-learning agent tränas till att estimera en PID kontroller innan den tränar på balanseringsproblemet. Vi visar att vår metod är likvärdigt stabil jämfört med en Q-learning agent utan estimeringsträning. Under träning faller roboten färre än hälften så många gånger när den kontrolleras av vår metod. Båda agenterna lyckas balansera roboten under en hel timme. Q-learning QL PID wheeled inverted pendulum WIP reinforcement learning estimation training Q-learning QL PID självbalanserande robot reinforcement learning estimeringsträning Computer and Information Sciences Data- och informationsvetenskap

Search results