In this thesis, a Reinforcement Learning (RL) method called Sarsa is used to dynamically tune a PI-controller for a Continuous Stirred Tank Heater (CSTH) experimental setup. The proposed approach uses an approximate model to train the RL agent in the simulation environment before implementation on the real plant. This is done in order to help the RL agent initially start from a reasonably stable policy. Learning without any information about the dynamics of the process is not practically feasible due to the great amount of data (time) that the RL algorithm requires and safety issues.
The process in this thesis is modeled with a First Order Plus Time Delay (FOPTD) transfer function, because almost all of the chemical processes can be sufficiently represented by this class of transfer functions. The presence of a delay term in this type of transfer functions makes them inherently more complicated models for RL methods.
RL methods should be combined with generalization techniques to handle the continuous state space. Here, parameterized quadratic function approximation compounded with k-nearest neighborhood function approximation is used for the regions close and far from the origin, respectively. Applying each of these generalization methods separately has some disadvantages, hence their combination is used to overcome these flaws.
The proposed RL-based PI-controller is initially trained in the simulation environment. Thereafter, the policy of the simulation-based RL agent is used as the starting policy of the RL agent during implementation on the experimental setup. As a result of the existing plant-model mismatch, the performance of the RL-based PI-controller using this primary policy is not as good as the simulationresults; however, training on the real plant results in a significant improvement in this performance. On the other hand, the IMC-tuned PI-controllers, which are the most commonly used feedback controllers are also compared and they also degrade because of the inevitable plant-model mismatch. To improve the performance of these IMC-tuned PI-controllers, re-tuning of these controllers based on a more precise model of the process is necessary.
The experimental tests are carried out for the cases of set-point tracking and disturbance rejection. In both cases, the successful adaptability of the RL-based PI-controller is clearly evident.
Finally, in the case of a disturbance entering the process, the performance of the proposed model-free self-tuning PI-controller degrades more, when compared to the existing IMC controllers. However, the adaptability of the RL-based PI- controller provides a good solution to this problem. After being trained to handle disturbances in the process, an improved control policy is obtained, which is able to successfully return the output to the set-point. / Process Control
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:AEU.10048/1122 |
Date | 06 1900 |
Creators | Abbasi Brujeni, Lena |
Contributors | Dr. Jong Min Lee (Chemical and Materials Engineering), Dr. Sirish L. Shah (Chemical and Materials Engineering), Dr. Vinay Prasad (Chemical and Materials Engineering), Dr. Richard Sutton (Computing Science) |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | English |
Detected Language | English |
Type | Thesis |
Format | 597017 bytes, application/pdf |
Page generated in 0.0022 seconds