Spelling suggestions: "subject:"entropy control"" "subject:"entropy coontrol""
1 |
Performance improvement for stochastic systems using state estimationZhou, Yuyang January 2018 (has links)
Recent developments in the practice control field have heightened the need for performance enhancement. The designed controller should not only guarantee the variables to follow their set point values, but also ought to focus on the performance of systems like quality, efficiency, etc. Hence, with the fact that the inevitable noises are widely existing during industry processes, the randomness of the tracking errors can be considered as a critical performance to improve further. In addition, due to the fact that some controllers for industrial processes cannot be changed once the parameters are designed, it is crucial to design a control algorithm to minimise the randomness of tracking error without changing the existing closed-loop control. In order to achieve the above objectives, a class of novel algorithms are proposed in this thesis for different types of systems with unmeasurable states. Without changing the existing closed-loop proportional integral(PI) controller, the compensative controller is extra added to reduce the randomness of tracking error. That means the PI controller can always guarantee the basic tracking property while the designed compensative signal can be removed any time without affecting the normal operation. Instead of just using the output information as PI controller, the compensative controller is designed to minimise the randomness of tracking error using estimated states information. Since most system states are unmeasurable, proper filters are employed to estimate the system states. Based on the stochastic system control theory, the criterion to characterise the system randomness are valid to different systems. Therefore a brief review about the basic concepts of stochastic system control contained in this thesis. More specifically, there are overshoot minimisation for linear deterministic systems, minimum variance control for linear Gaussian stochastic systems, and minimum entropy control for non-linear and non-Gaussian stochastic systems. Furthermore, the stability analysis of each system is discussed in mean-square sense. To illustrate the effectiveness of presented control methods, the simulation results are given. Finally, the works of this thesis are summarised and the future work towards to the limitations existed in the proposed algorithms are listed.
|
2 |
Intrinsic exploration for reinforcement learning beyond rewardsCreus-Castanyer, Roger 07 1900 (has links)
Dans l'apprentissage par renforcement, une fonction de récompense guide le comportement de l'agent vers des objectifs spécifiques. Cependant, dans des environnements complexes, ces récompenses extrinsèques ne suffisent souvent pas, car leur conception nécessite beaucoup de travail humain. Cette thèse explore les récompenses intrinsèques comme une alternative, en mettant en avant leur potentiel pour permettre aux agents d'apprendre de manière autonome et d'explorer sans supervision.
Tout d'abord, nous identifions un problème majeur avec de nombreuses récompenses intrinsèques : leur nature non-stationnaire, qui complique l'optimisation. Pour résoudre ce problème, nous proposons des objectifs stationnaires pour l'exploration (SOFE), qui transforment les récompenses non-stationnaires en récompenses stationnaires grâce à des représentations d'état augmentées. Cette approche améliore les performances de différentes méthodes de récompenses intrinsèques dans divers environnements.
Ensuite, nous introduisons S-Adapt, une nouvelle méthode de motivation intrinsèque adaptative basée sur le contrôle de l'entropie. Ce mécanisme, conçu comme un problème de bandit à plusieurs bras, permet aux agents de développer des comportements émergents dans divers environnements sans avoir besoin de récompenses extrinsèques.
Enfin, nous présentons RLeXplore, un cadre complet qui normalise l'implémentation de huit méthodes de récompense intrinsèque de pointe. Ce cadre vise à résoudre les incohérences dans l'optimisation et les détails de mise en œuvre des récompenses intrinsèques, accélérant ainsi la recherche dans le domaine du RL à motivation intrinsèque.
Ces contributions avancent notre compréhension et l'application de la motivation intrinsèque dans des environnements virtuels, montrant sa capacité à développer des comportements d'agent plus autonomes dans une variété de situations complexes / In reinforcement learning, a reward function is used to guide the agent's behavior towards task-specific objectives. However, such extrinsic rewards often fall short in complex environments due to the significant human effort required for their design. This thesis explores intrinsic rewards as an alternative, focusing on their potential to enable agents to learn autonomously and explore in an unsupervised manner. First, we identify a fundamental issue with many intrinsic rewards: their non-stationarity, which complicates the optimization process. To mitigate this, we propose Stationary Objectives For Exploration (\textbf{SOFE}), which transforms non-stationary rewards into stationary ones through augmented state representations and achieves performance gains across various intrinsic reward methods and environments. Secondly, we present \textbf{S-Adapt} a novel approach for adaptive intrinsic motivation based on entropy control. This adaptive mechanism, framed as a multi-armed bandit problem, empowers agents to exhibit emergent behaviors in diverse settings without extrinsic rewards. Finally, we introduce \textbf{RLeXplore}, a comprehensive framework that standardizes the implementation of eight state-of-the-art intrinsic reward methods. This framework addresses the lack of consistency in the optimization and implementation details of intrinsic rewards, thereby accelerating research progress in intrinsically-motivated RL. Collectively, these contributions advance the understanding and application of intrinsic motivation in RL, demonstrating its viability for developing more autonomous agent behavior across a spectrum of challenging environments.
|
Page generated in 0.0379 seconds