Global ETD Search

1	Dynamic generalisation of continuous action spaces in reinforcement learning : a neurally inspired approach Smith, Andrew James January 2002 (has links) This thesis is about the dynamic generalisation of continuous action spaces in reinforcement learning problems. The standard Reinforcement Learning (RL) account provides a principled and comprehensive means of optimising a scalar reward signal in a Markov Decision Process. However, the theory itself does not directly address the imperative issue of generalisation which naturally arises as a consequence of large or continuous state and action spaces. A current thrust of research is aimed at fusing the generalisation capabilities of supervised (and unsupervised) learning techniques with the RL theory. An example par excellence is Tesauro’s TD-Gammon. Although much effort has gone into researching ways to represent and generalise over the input space, much less attention has been paid to the action space. This thesis first considers the motivation for learning real-valued actions, and then proposes a set of key properties desirable in any candidate algorithm addressing generalisation of both input and action spaces. These properties include: Provision of adaptive and online generalisation, adherence to the standard theory with a central focus on estimating expected reward, provision for real-valued states and actions, and full support for a real-valued discounted reward signal. Of particular interest are issues pertaining to robustness in non-stationary environments, scalability, and efficiency for real-time learning in applications such as robotics. Since exploring the action space is discovered to be a potentially costly process, the system should also be flexible enough to enable maximum reuse of learned actions. A new approach is proposed which succeeds for the first time in addressing all of the key issues identified. The algorithm, which is based on the ubiquitous self-organising map, is analysed and compared with other techniques including those based on the backpropagation algorithm. The investigation uncovers some important implications of the differences between these two particular approaches with respect to RL. In particular, the distributed representation of the multi-layer perceptron is judged to be something of a double-edged sword offering more sophisticated and more scalable generalising power, but potentially causing problems in dynamic or non-equiprobable environments, and tasks involving a highly varying input-output mapping. The thesis concludes that the self-organising map can be used in conjunction with current RL theory to provide real-time dynamic representation and generalisation of continuous action spaces. The proposed model is shown to be reliable in non-stationary, unpredictable and noisy environments and judged to be unique in addressing and satisfying a number of desirable properties identified as important to a large class of RL problems. 004
2	Extensions Of S-spaces Losert, Bernd 01 January 2013 (has links) Given a convergence space X, a continuous action of a convergence semigroup S on X and a compactification Y of X, under what conditions on X and the action on X is it possible to extend the action to a continuous action on Y . Similarly, given a Cauchy space X, a Cauchy continuous action of a Cauchy semigroup S on X and a completion Y of X, under what conditions on X and the action on X is it possible to extend the action to a Cauchy continuous action on Y . We answer the first question for some particular compactifications like the one-point compactification and the star compactification as well as for the class of regular compactifications. We answer the second question for the class of regular strict completions. Using these results, we give sufficient conditions under which the pseudoquotient of a compactification/completion of a space is the compactification/completion of the pseudoquotient of the given space Convergence space cauchy space convergence semigroup cauchy semigroup continuous action cauchy continuous action compactification completion pseudoquotients Mathematics
3	Inteligentní řídící člen aktivního magnetického ložiska / Inteligent Controller of Active Magnetic Bearing Turek, Milan January 2011 (has links) The PhD thesis describes control design of active magnetic bearing. Active magnetic bearing is nonlinear unstable system. This means it is not possible to use classic methods of control design for linear time invariant systems. Also methods of nonlinear control design are not universal and theirs application is not easy task. The thesis describes usage of simple nonlinear compensation which linearizes response of active magnetic bearing and allows usage of classic methods of control design for linear time invariant systems. It is shown that CARLA method can significantly improve parameters of designed controller. First part of thesis describes derivation of model of controlled active magnetic bearing and nonlinear compensation which linearizes response of controlled active magnetic bearing on input signal. Following part contains description of methods of state control design methods, selected methods of robust control design and most common methods of artificial intelligence used for control design and implementation. Next part describes hardware of used experimental device and its parameters. It also contains experimental derivation of model of electromagnetic force because the parameters are not available from manufacturer. Last part describes control design of active magnetic bearing. Several different approaches are described here. The approaches vary from completely experimental approach, through using Ziegler-Nichols method, state control design to methods for robust control design. During design is heavily used CARLA method which is very suitable for usage for online learning in real controller due its principle.
4	Solution Of Delayed Reinforcement Learning Problems Having Continuous Action Spaces Ravindran, B 03 1900 (has links) (PDF) No description available. Machine Learning Self Organizing Systems Reinforcement Learning Q-learning Continuous Action Spaces Q-functions Delayed Reinforcement Learning Computer Science
5	[pt] CONJUNTOS ONLINE PARA APRENDIZADO POR REFORÇO PROFUNDO EM ESPAÇOS DE AÇÃO CONTÍNUA / [en] ONLINE ENSEMBLES FOR DEEP REINFORCEMENT LEARNING IN CONTINUOUS ACTION SPACES RENATA GARCIA OLIVEIRA 01 February 2022 (has links) [pt] Este trabalho busca usar o comitê de algoritmos de aprendizado por reforço profundo (deep reinforcement learning) sob uma nova perspectiva. Na literatura, a técnica de comitê é utilizada para melhorar o desempenho, mas, pela primeira vez, esta pesquisa visa utilizar comitê para minimizar a dependência do desempenho de aprendizagem por reforço profundo no ajuste fino de hiperparâmetros, além de tornar o aprendizado mais preciso e robusto. Duas abordagens são pesquisadas; uma considera puramente a agregação de ação, enquanto que a outra também leva em consideração as funções de valor. Na primeira abordagem, é criada uma estrutura de aprendizado online com base no histórico de escolha de ação contínua do comitê com o objetivo de integrar de forma flexível diferentes métodos de ponderação e agregação para as ações dos agentes. Em essência, a estrutura usa o desempenho passado para combinar apenas as ações das melhores políticas. Na segunda abordagem, as políticas são avaliadas usando seu desempenho esperado conforme estimado por suas funções de valor. Especificamente, ponderamos as funções de valor do comitê por sua acurácia esperada, calculada pelo erro da diferença temporal. As funções de valor com menor erro têm maior peso. Para medir a influência do esforço de ajuste do hiperparâmetro, grupos que consistem em uma mistura de diferentes quantidades de algoritmos bem e mal parametrizados foram criados. Para avaliar os métodos, ambientes clássicos como o pêndulo invertido, cart pole e cart pole duplo são usados como benchmarks. Na validação, os ambientes de simulação Half Cheetah v2, um robô bípede, e o Swimmer v2 apresentaram resultados superiores e consistentes demonstrando a capacidade da técnica de comitê em minimizar o esforço necessário para ajustar os hiperparâmetros dos algoritmos. / [en] This work seeks to use ensembles of deep reinforcement learning algorithms from a new perspective. In the literature, the ensemble technique is used to improve performance, but, for the first time, this research aims to use ensembles to minimize the dependence of deep reinforcement learning performance on hyperparameter fine-tuning, in addition to making it more precise and robust. Two approaches are researched; one considers pure action aggregation, while the other also takes the value functions into account. In the first approach, an online learning framework based on the ensemble s continuous action choice history is created, aiming to flexibly integrate different scoring and aggregation methods for the agents actions. In essence, the framework uses past performance to only combine the best policies actions. In the second approach, the policies are evaluated using their expected performance as estimated by their value functions. Specifically, we weigh the ensemble s value functions by their expected accuracy as calculated by the temporal difference error. Value functions with lower error have higher weight. To measure the influence on the hyperparameter tuning effort, groups consisting of a mix of different amounts of well and poorly parameterized algorithms were created. To evaluate the methods, classic environments such as the inverted pendulum, cart pole and double cart pole are used as benchmarks. In validation, the Half Cheetah v2, a biped robot, and Swimmer v2 simulation environments showed superior and consistent results demonstrating the ability of the ensemble technique to minimize the effort needed to tune the the algorithms. [pt] APRENDIZADO POR REFORCO [pt] APRENDIZADO POR COMITE [pt] COMITE DE ACOES CONTINUAS [pt] OTIMIZACAO DE HIPERPARAMETROS [en] REINFORCEMENT LEARNING [en] ENSEMBLE LEARNING [en] CONTINUOUS ACTION ENSEMBLE [en] DEEP DETERMINISTIC POLICY GRADIENT [en] HYPERPARAMETER OPTIMIZATION
6	Entwicklung und Validierung einer Simulationsbasis zum Test von Reglern raumlufttechnischer Anlagen Le, Huu-Thoi 19 January 2004 (has links) (PDF) Heutzutage gewinnt die Simulation von Gebäuden und Anlagen zunehmend an Bedeutung, um die Betriebsweise der Anlagen zu diagnostizieren bzw. zu bewerten und den Energiebedarf vorherzusagen. Dabei hängt die erzielte Genauigkeit von dem Kompliziertheitsgrad des angewendeten Simulationsprogramms ab. Deshalb ist Modellbildung und -validierung ein sehr wichtiger Bestandteil eines Softwareentwicklungsprozesses, um die Zuverlässigkeit zu sichern. Am Institut für Thermodynamik und Technische Gebäudeausrüstung liegen zahlreiche Simulationsmodelle vor. Im Rahmen dieser vorliegenden Arbeit wurden weitere benötigte Modelle (hygrisches Verhalten der Wände (vereinfachtes Verfahren), Rippenrohrwärmeüberträger, Wärmeregenerator et al.) entwickelt und in das Programm TRNSYS eingefügt sowie die vorhandenen Modelle an ihre Genauigkeit angepasst. Insbesondere sind dies die Modelle für Splitsysteme bei stetiger und nichtstetiger Regelung mit der detaillierten Betrachtung des Anlagenverhaltens sowohl beim Voll- als auch beim Teillastbetrieb. Damit ist es erstmals gelungen, das gesamte Anlagensystem der Splittechnik ausführlich zu beschreiben. Um die analytische Validierung durchführen zu können, wurden die analytischen Modelle für eine Splitanlage bei stetiger und nichtstetiger Regelung unter den vordefinierten Randbedingungen entwickelt. Zur analytischen Validierung finden auch die vorhandenen Simulationsmodelle Anwendung, so dass sich die meisten Komponenten und das Simulationsprogramm TRNSYS verifizieren ließen. Diese Validierung erfolgte im Rahmen des IEA-SHC/HVAC BESTEST TASK 22. Da an diesem TASK verschiedene Forschungsinstitutionen mit jeweils unterschiedlichen Simulationsprogrammen teilnahmen, ergab sich die beste Möglichkeit, vergleichende Tests durchzuführen. Wenn dabei ein Programm signifikante Unterschiede zu den anderen liefert, liegt dies nicht immer an Programmfehlern. Aber kollektive Erfahrungen aus diesem TASK zeigen, dass bei Abweichungen meistens Fehler bzw. fragwürdige Algorithmen gefunden wurden. Nachdem das Simulationsprogramm TRNSYS validiert war, erfolgte die Erstellung eines Konzeptes zur Fehlererkennung und Diagnose der Regelstrategien von RLTA. Das Verfahren erlaubt sowohl die Beseitigung der möglichen Fehler in der Planungsphase beim Entwurf der Regelstrategien als auch den Test der vorhandenen Regelstrategien. Dies erhöht die Zuverlässigkeit und damit die Sicherheit beim Anlagenbetrieb. Schließlich dient das Verfahren als Werkzeug zur Optimierung der Betriebsweise von RLTA. Das Regelverhalten wurde anhand typischer Fälle vorgestellt und diskutiert. Mit Hilfe des Verfahrens zur Fehlererkennung und Diagnose der Betriebsweise von RLTA ließen sich vorhandene Regelstrategien testen und verbessern. Gebäude- Anlagen- Simulation Modellbildung und Betriebsver Building System Simulation Control Behaviour HVAC Systems ddc:620 rvk:ZI 8740 Gebäudeleittechnik Klimatechnik Simulation Validierung
7	Entwicklung und Validierung einer Simulationsbasis zum Test von Reglern raumlufttechnischer Anlagen Le, Huu-Thoi 11 February 2004 (has links) Heutzutage gewinnt die Simulation von Gebäuden und Anlagen zunehmend an Bedeutung, um die Betriebsweise der Anlagen zu diagnostizieren bzw. zu bewerten und den Energiebedarf vorherzusagen. Dabei hängt die erzielte Genauigkeit von dem Kompliziertheitsgrad des angewendeten Simulationsprogramms ab. Deshalb ist Modellbildung und -validierung ein sehr wichtiger Bestandteil eines Softwareentwicklungsprozesses, um die Zuverlässigkeit zu sichern. Am Institut für Thermodynamik und Technische Gebäudeausrüstung liegen zahlreiche Simulationsmodelle vor. Im Rahmen dieser vorliegenden Arbeit wurden weitere benötigte Modelle (hygrisches Verhalten der Wände (vereinfachtes Verfahren), Rippenrohrwärmeüberträger, Wärmeregenerator et al.) entwickelt und in das Programm TRNSYS eingefügt sowie die vorhandenen Modelle an ihre Genauigkeit angepasst. Insbesondere sind dies die Modelle für Splitsysteme bei stetiger und nichtstetiger Regelung mit der detaillierten Betrachtung des Anlagenverhaltens sowohl beim Voll- als auch beim Teillastbetrieb. Damit ist es erstmals gelungen, das gesamte Anlagensystem der Splittechnik ausführlich zu beschreiben. Um die analytische Validierung durchführen zu können, wurden die analytischen Modelle für eine Splitanlage bei stetiger und nichtstetiger Regelung unter den vordefinierten Randbedingungen entwickelt. Zur analytischen Validierung finden auch die vorhandenen Simulationsmodelle Anwendung, so dass sich die meisten Komponenten und das Simulationsprogramm TRNSYS verifizieren ließen. Diese Validierung erfolgte im Rahmen des IEA-SHC/HVAC BESTEST TASK 22. Da an diesem TASK verschiedene Forschungsinstitutionen mit jeweils unterschiedlichen Simulationsprogrammen teilnahmen, ergab sich die beste Möglichkeit, vergleichende Tests durchzuführen. Wenn dabei ein Programm signifikante Unterschiede zu den anderen liefert, liegt dies nicht immer an Programmfehlern. Aber kollektive Erfahrungen aus diesem TASK zeigen, dass bei Abweichungen meistens Fehler bzw. fragwürdige Algorithmen gefunden wurden. Nachdem das Simulationsprogramm TRNSYS validiert war, erfolgte die Erstellung eines Konzeptes zur Fehlererkennung und Diagnose der Regelstrategien von RLTA. Das Verfahren erlaubt sowohl die Beseitigung der möglichen Fehler in der Planungsphase beim Entwurf der Regelstrategien als auch den Test der vorhandenen Regelstrategien. Dies erhöht die Zuverlässigkeit und damit die Sicherheit beim Anlagenbetrieb. Schließlich dient das Verfahren als Werkzeug zur Optimierung der Betriebsweise von RLTA. Das Regelverhalten wurde anhand typischer Fälle vorgestellt und diskutiert. Mit Hilfe des Verfahrens zur Fehlererkennung und Diagnose der Betriebsweise von RLTA ließen sich vorhandene Regelstrategien testen und verbessern. info:eu-repo/classification/ddc/620 ddc:620

1

Page generated in 0.119 seconds