Global ETD Search

1	APRENDIZAGEM POR REFORÇO E PROGRAMACÃO DINÂMICA ADAPTATIVA PARA PROJETO E AVALIAÇÃO DO DESEMPENHO DE ALGORITMOS DLQR EM SISTEMAS MIMO / LEARNING BY STRENGTHENING AND ADAPTIVE DYNAMIC PROGRAMMING FOR DESIGN AND EVALUATION OF PERFORMANCE DLQR ALGORITHMS IN MIMO SYSTEMS Lopes, Leandro Rocha 04 April 2011 (has links) Made available in DSpace on 2016-08-17T14:53:16Z (GMT). No. of bitstreams: 1 Leandro Rocha Lopes.pdf: 1075564 bytes, checksum: 01e184ed6d7c65323c0dfc1515da19a3 (MD5) Previous issue date: 2011-04-04 / Due to the increasing of technological development and its associated industrial applications, control design methods to attend high performance requests and reinforcement learning are been developed, not only, to solve new problems, as well as, to improve the performance of implemented controllers in the real systems. The reinforcement learning (RL) and discrete linear quadratic regulator (DLQR) approaches are connected by adaptive dynamic programming (ADP). This connection is oriented to the design of optimal controller for multivariable systems (MIMO). The proposed method for DLQR controllers tuning can been heuristic guidance for biased variations in weighting matrices of instantenous reward. The heuristics performance are evaluated in terms of convergence of heuristic dynamic programming (HDP) and action dependent (AD-HDP) algorithms. The algorithms and tuning are evaluated by the capability to map the plane-Z in MIMO dynamic system of third order. / Em decorrência do crescente desenvolvimento tecnológico e das consequentes aplicações industriais, técnicas de controle de alto desempenho e aprendizado por reforço estão sendo desenvolvidas não só para solucionar novos problemas, mas também para melhorar o desempenho de controladores já implementados em sistemas do mundo real. As abordagens do aprendizado por reforço e do regulador linear quadrático discreto (DLQR) são conectadas pelos métodos de programação dinâmica adaptativa. Esta união é orientada para o projeto de controladores ótimos em sistemas multivariáveis (MIMO). O método proposto para sintonia de controladores DLQR fornece diretrizes para construção de heurísticas polarizadas que são aplicadas na seleção das matrizes de ponderação da recompensa instantânea. Investiga-se o desempenho das heurísticas associadas com a sintonia de controladores lineares discretos e aspectos de convergência que estão relacionados com as variações QR nos algoritmos de programação dinâmica heurística (HDP) e Ação Dependente (ADHDP). Os algoritmos e a sintonia são avaliados pela capacidade em estabelecer a política de controle ótimo que mapeia o plano-Z em um sistema dinãmico multivariável de terceira ordem. Programação Dinâmica Controle ótimo HDP Q-Function ADHDP Sistemas Multivariáveis Convergência DLQR Dynamic Programming Optimal Control HDP Q-Function ADHDP Multivariable Systems Convergence DLQR
2	Using Reinforcement Learning to Evaluate Player Pair Performance in Ice Hockey Ljung, Dennis January 2021 (has links) A recent study using reinforcement learning with a Q-functions to quantify the impact of individual player actions in ice hockey has shown promising results. The model takes into account the context of the actions and captures internal dynamic features of the play which simple common metrics e.g., counting goals or assists, do not. It also performs look ahead which is important in a low scoring game like ice hockey. However, it does not capture the chemistry between the players i.e., how well the players play together which is important in a team sport like ice hockey. In this paper, we, therefore, extend this earlier work on individual player performance with new metrics on player pairs impact when on ice together. Our resulting top pairings are compared to NHL’s official statistics and extended analysis is performed that investigate the relationship with time on ice which provides insights that could be of relevance to coaches. Computer Engineering Datorteknik
3	Bi-fractional transforms in phase space Agyo, Sanfo David January 2016 (has links) The displacement operator is related to the displaced parity operator through a two dimensional Fourier transform. Both operators are important operators in phase space and the trace of both with respect to the density operator gives the Wigner functions (displaced parity operator) and Weyl functions (displacement operator). The generalisation of the parity-displacement operator relationship considered here is called the bi-fractional displacement operator, O(α, β; θα, θβ). Additionally, the bi-fractional displacement operators lead to the novel concept of bi-fractional coherent states. The generalisation from Fourier transform to fractional Fourier transform can be applied to other phase space functions. The case of the Wigner-Weyl function is considered and a generalisation is given, which is called the bi-fractional Wigner functions, H(α, β; θα, θβ). Furthermore, the Q−function and P−function are also generalised to give the bi-fractional Q−functions and bi-fractional P−functions respectively. The generalisation is likewise applied to the Moyal star product and Berezin formalism for products of non-commutating operators. These are called the bi-fractional Moyal star product and bi-fractional Berezin formalism. Finally, analysis, applications and implications of these bi-fractional transforms to the Heisenberg uncertainty principle, photon statistics and future applications are discussed.
4	Bi-fractional transforms in phase space Agyo, Sanfo D. January 2016 (has links) The displacement operator is related to the displaced parity operator through a two dimensional Fourier transform. Both operators are important operators in phase space and the trace of both with respect to the density operator gives the Wigner functions (displaced parity operator) and Weyl functions (displacement operator). The generalisation of the parity-displacement operator relationship considered here is called the bi-fractional displacement operator, O(α, β; θα, θβ). Additionally, the bi-fractional displacement operators lead to the novel concept of bi-fractional coherent states. The generalisation from Fourier transform to fractional Fourier transform can be applied to other phase space functions. The case of the Wigner-Weyl function is considered and a generalisation is given, which is called the bi-fractional Wigner functions, H(α, β; θα, θβ). Furthermore, the Q−function and P−function are also generalised to give the bi-fractional Q−functions and bi-fractional P−functions respectively. The generalisation is likewise applied to the Moyal star product and Berezin formalism for products of non-commutating operators. These are called the bi-fractional Moyal star product and bi-fractional Berezin formalism. Finally, analysis, applications and implications of these bi-fractional transforms to the Heisenberg uncertainty principle, photon statistics and future applications are discussed.
5	Value Function Estimation in Optimal Control via Takagi-Sugeno Models and Linear Programming Díaz Iza, Henry Paúl 23 March 2020 (has links) [ES] La presente Tesis emplea técnicas de programación dinámica y aprendizaje por refuerzo para el control de sistemas no lineales en espacios discretos y continuos. Inicialmente se realiza una revisión de los conceptos básicos de programación dinámica y aprendizaje por refuerzo para sistemas con un número finito de estados. Se analiza la extensión de estas técnicas mediante el uso de funciones de aproximación que permiten ampliar su aplicabilidad a sistemas con un gran número de estados o sistemas continuos. Las contribuciones de la Tesis son: -Se presenta una metodología que combina identificación y ajuste de la función Q, que incluye la identificación de un modelo Takagi-Sugeno, el cálculo de controladores subóptimos a partir de desigualdades matriciales lineales y el consiguiente ajuste basado en datos de la función Q a través de una optimización monotónica. -Se propone una metodología para el aprendizaje de controladores utilizando programación dinámica aproximada a través de programación lineal. La metodología hace que ADP-LP funcione en aplicaciones prácticas de control con estados y acciones continuos. La metodología propuesta estima una cota inferior y superior de la función de valor óptima a través de aproximadores funcionales. Se establecen pautas para los datos y la regularización de regresores con el fin de obtener resultados satisfactorios evitando soluciones no acotadas o mal condicionadas. -Se plantea una metodología bajo el enfoque de programación lineal aplicada a programación dinámica aproximada para obtener una mejor aproximación de la función de valor óptima en una determinada región del espacio de estados. La metodología propone aprender gradualmente una política utilizando datos disponibles sólo en la región de exploración. La exploración incrementa progresivamente la región de aprendizaje hasta obtener una política convergida. / [CA] La present Tesi empra tècniques de programació dinàmica i aprenentatge per reforç per al control de sistemes no lineals en espais discrets i continus. Inicialment es realitza una revisió dels conceptes bàsics de programació dinàmica i aprenentatge per reforç per a sistemes amb un nombre finit d'estats. S'analitza l'extensió d'aquestes tècniques mitjançant l'ús de funcions d'aproximació que permeten ampliar la seua aplicabilitat a sistemes amb un gran nombre d'estats o sistemes continus. Les contribucions de la Tesi són: -Es presenta una metodologia que combina identificació i ajust de la funció Q, que inclou la identificació d'un model Takagi-Sugeno, el càlcul de controladors subòptims a partir de desigualtats matricials lineals i el consegüent ajust basat en dades de la funció Q a través d'una optimització monotónica. -Es proposa una metodologia per a l'aprenentatge de controladors utilitzant programació dinàmica aproximada a través de programació lineal. La metodologia fa que ADP-LP funcione en aplicacions pràctiques de control amb estats i accions continus. La metodologia proposada estima una cota inferior i superior de la funció de valor òptima a través de aproximadores funcionals. S'estableixen pautes per a les dades i la regularització de regresores amb la finalitat d'obtenir resultats satisfactoris evitant solucions no fitades o mal condicionades. -Es planteja una metodologia sota l'enfocament de programació lineal aplicada a programació dinàmica aproximada per a obtenir una millor aproximació de la funció de valor òptima en una determinada regió de l'espai d'estats. La metodologia proposa aprendre gradualment una política utilitzant dades disponibles només a la regió d'exploració. L'exploració incrementa progressivament la regió d'aprenentatge fins a obtenir una política convergida. / [EN] The present Thesis employs dynamic programming and reinforcement learning techniques in order to obtain optimal policies for controlling nonlinear systems with discrete and continuous states and actions. Initially, a review of the basic concepts of dynamic programming and reinforcement learning is carried out for systems with a finite number of states. After that, the extension of these techniques to systems with a large number of states or continuous state systems is analysed using approximation functions. The contributions of the Thesis are: -A combined identification/Q-function fitting methodology, which involves identification of a Takagi-Sugeno model, computation of (sub)optimal controllers from Linear Matrix Inequalities, and the subsequent data-based fitting of Q-function via monotonic optimisation. -A methodology for learning controllers using approximate dynamic programming via linear programming is presented. The methodology makes that ADP-LP approach can work in practical control applications with continuous state and input spaces. The proposed methodology estimates a lower bound and upper bound of the optimal value function through functional approximators. Guidelines are provided for data and regressor regularisation in order to obtain satisfactory results avoiding unbounded or ill-conditioned solutions. -A methodology of approximate dynamic programming via linear programming in order to obtain a better approximation of the optimal value function in a specific region of state space. The methodology proposes to gradually learn a policy using data available only in the exploration region. The exploration progressively increases the learning region until a converged policy is obtained. / This work was supported by the National Department of Higher Education, Science, Technology and Innovation of Ecuador (SENESCYT), and the Spanish ministry of Economy and European Union, grant DPI2016-81002-R (AEI/FEDER,UE). The author also received the grant for a predoctoral stay, Programa de Becas Iberoamérica- Santander Investigación 2018, of the Santander Bank. / Díaz Iza, HP. (2020). Value Function Estimation in Optimal Control via Takagi-Sugeno Models and Linear Programming [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/139135 Optimal control Linear programming Approximate dynamic programming Control applications Neural networks Reinforcement learning Takagi-Sugeno Linear matrix inequality Intelligent control INGENIERIA DE SISTEMAS Y AUTOMATICA
6	Performance analysis of spectrum sensing techniques for cognitive radio systems Gismalla Yousif, Ebtihal January 2013 (has links) Cognitive radio is a technology that aims to maximize the current usage of the licensed frequency spectrum. Cognitive radio aims to provide services for license-exempt users by making use of dynamic spectrum access (DSA) and opportunistic spectrum sharing strategies (OSS). Cognitive radios are defined as intelligent wireless devices capable of adapting their communication parameters in order to operate within underutilized bands while avoiding causing interference to licensed users. An underused band of frequencies in a specific location or time is known as a spectrum hole. Therefore, in order to locate spectrum holes, reliable spectrum sensing algorithms are crucial to facilitate the evolution of cognitive radio networks. Since a large and growing body of literature has mainly focused into the conventional time domain (TD) energy detector, throughout this thesis the problem of spectrum sensing is investigated within the context of a frequency domain (FD) approach. The purpose of this study is to investigate detection based on methods of nonparametric power spectrum estimation. The considered methods are the periodogram, Bartlett's method, Welch overlapped segments averaging (WOSA) and the Multitaper estimator (MTE). Another major motivation is that the MTE is strongly recommended for the application of cognitive radios. This study aims to derive the detector performance measures for each case. Another aim is to investigate and highlight the main differences between the TD and the FD approaches. The performance is addressed for independent and identically distributed (i.i.d.) Rayleigh channels and the general Rician and Nakagami fading channels. For each of the investigated detectors, the analytical models are obtained by studying the characteristics of the Hermitian quadratic form representation of the decision statistic and the matrix of the Hermitian form is identified. The results of the study have revealed the high accuracy of the derived mathematical models. Moreover, it is found that the TD detector differs from the FD detector in a number of aspects. One principal and generalized conclusion is that all the investigated FD methods provide a reduced probability of false alarm when compared with the TD detector. Also, for the case of periodogram, the probability of sensing errors is independent of the length of observations, whereas in time domain the probability of false alarm is increased when the sample size increases. The probability of false alarm is further reduced when diversity reception is employed. Furthermore, compared to the periodogram, both Bartlett method and Welch method provide better performance in terms of lower probability of false alarm but an increased probability of detection for a given probability of false alarm. Also, the performance of both Bartlett's method and WOSA is sensitive to the number of segments, whereas WOSA is also sensitive to the overlapping factor. Finally, the performance of the MTE is dependent on the number of employed discrete prolate spheroidal (Slepian) sequences, and the MTE outperforms the periodogram, Bartlett's method and WOSA, as it provides the minimal probability of false alarm.

1

Page generated in 0.0785 seconds