Return to search

Non-linear neurocontrol of chemical processes using reinforcement learning

Thesis (MScEng)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: The difficulties of chemical process control using plain Proportional-Integral-
Derivative (PID) methods include interaction of process manipulated- and control
variables as well as difficulty in tuning. One way of eliminating these problems is to
use a centralized non-linear control solution such as a feed-forward neural network.
While many ways exist to train such neurocontrollers, one of the promising
active research areas is reinforcement learning. The biggest drawing card of the
neurocontrol using reinforcement learning paradigm is that no expert knowledge of
the system is neccesary - all control knowledge is gained by interaction with the
plant model.
This work uses episodic reinforcement learning to train controllers using two
types of process model - non-linear dynamic models and non-linear autoregressive
models. The first was termed model-based training and the second data-based learning.
By testing the controllers obtained during data-based learning on the original
model, the effect of plant model mismatch and therefore real-world applicability
could be seen. In addition, two reinforcement learning algorithms, Policy Gradients
with Parameter-based Exploration (PGPE) and the Covariance Matrix Adaptation
Evolution Strategy (CMA-ES) were compared to one-another. Set point tracking
was facilitated by the use of integral error feedback.
Two control case studies were conducted to test the effectiveness of each type
of controller and algorithm, and allowed comparison to multi-loop feedback control.
The first is a ball mill grinding circuit pilot plant model with 5 degrees of freedom,
and the second a 41-stage binary distillation column with 7 degrees of freedom.
The ball mill case study showed that centralized non-linear feedback control
using neural networks can improve on even highly optimized PI control methods,
with the proposed integral error-feedback neural network architecture working very
well at tracking the set point. CMA-ES produced better results than PGPE, being
able to find up to 20% better solutions. When compared to PI control, the ball mill neurocontrol solution had a 6% higher productivity and showed more than 10%
improvement of the product size set point tracking. In the case of some plant-model
mismatch (88% fit), the data-based ball mill neurocontroller still achieved better set
point tracking and disturbance handling than PI control, but productivity did not
improve.
The distillation case study showed less positive results. While reinforcement
learning was able to learn successful controllers in the case of no plant-model mismatch
and outperform LV - and (L/D)(V/B)-based PI control, the best-performing
neurocontroller still performed up to 20% worse than DB-based PI control. Once
again, CMA-ES showed better performance than PGPE, with latter even failing to
find feasible control solutions.
While on-line learning in the ball mill study was made impossible due to stability
issues, on-line adaptation in the distillation case study succeeded with the use of a
partial neurocontroller. The learner was able to achieve, with a success rate of
just over 50%, greater than 95% purity in both distillate and bottoms within 2,000
minutes of interacting with the plant.
Overall, reinforcement learning showed that, when there is sufficient room for
improvement over existing control implementations, it can make for a very good
replacement control solution even when no model is available. Future work should
focus on evaluating these techniques in lab-scale control studies. / AFRIKAANSE OPSOMMING: Die probleme van prosesbeheer met behulp van gewone Proporsioneel-Integraal-
Afgeleide (PID) metodes sluit interaksie van proses gemanipuleerde- en beheerveranderlikes,
sowel as probleme met in-stemming in. Een manier om hierdie probleme
te elimineer, is deur ’n gesentraliseerde nie-lineêre oplossing te gebruik, soos
’n vorentoe-gevoerde neurale netwerk.
Daar is baie maniere is om sulke neurobeheerders op te lei, waarvan die meer innoverende
maniere versterkingsleer is. Die grootste trekpleister van versterkingsleer
is dat geen deskundige kennis van die stelsel nodig is nie - alle beheerkennis word
opgedoen deur interaksie met die aanleg model.
Hierdie werk gebruik episodiese versterkingsleer om beheerders met behulp van
twee tipes van prosesmodel op te lei - nie-lineêre dinamiese modelle en nie-lineêre
outoregressiewe modelle. Die eerste was model-gebaseerde opleiding en die tweede
data-gebaseerde opleiding genoem. Deur die beheerders wat verkry is tydens datagebaseerde
opleiding op die oorspronklike model te toets, kon die effek van die
verskil tussen aanleg en model gesien word, en ’n aanduiding van werklike wêreld
toepaslikheid gee. Twee versterkingsleer algoritmes was met mekaar vergelyk - Policy
Gradients with Parameter-based Exploration (PGPE), en die Covariance Matrix
Adaptation Evolution Strategy. Stelpunt volging was deur integraalfout-terugvoer
gefasiliteer.
Twee gevallestudies is uitgevoer om die doeltreffendheid van elke tipe beheerder
en algoritme te toets, deur vergelyking met PI terugvoerbeheer. Die eerste is ’n
balmeul toetsaanleg met ’n vryheidsgraad van 5 en die tweede ’n binêre distillasie
kolom met ’n vryheidsgraad van 7.
Die balmeul gevallestudie het getoon dat gesentraliseerde nie-lineêre terugvoerbeheer
met behulp van neurale netwerke selfs op hoogs-geoptimeerde PI beheer
metodes kan verbeter. In vergelyking met PI beheer, kon die balmeul neurobeheer oplossing ’n 6% hoër produktiwiteit handhaaf en het meer as 10% verbetering in
die handhawing van die produkgrootte stel punt getoon. In die geval van ’n 12%
aanleg-model verskil, het die data-gebaseerde balmeul neurobeheerder steeds beter
stel punt handhawing en versteuring hantering as PI beheer gewys, alhoewel produktiwiteit
nie verbeter het nie. In beide gevalle het die integraalfout oplossing
sukses getoon, en CMA-ES het tot 20% beter as PGPE gevaar.
Die distillasie gevallestudie het getoon dat die sukses van die balmeul gevallestudie
nie noodwendig na ander aanlegte uitbrei nie. Alhoewel versterkingsleer in staat
was om suksesvolle beheerders in die geval van geen aanleg-model verskil te leer, het
die beste presterende neurobeheerder steeds tot 20% swakker as DB-gebaseerde PI
beheer gevaar. Weereens het CMA-ES beter as PGPE gevaar, met die laasgenoemde
wat selfs nie daarin kon slaag om werkende oplossings te vind nie.
Alhoewel onstabiliteit aan-lyn aanpassing in die balmeul gevallestudie onmoontlik
gemaak het, is an-lyn aanpassing in die distillasie gevallestudie moontlik gemaak
deur die gebruik van ’n gedeeltelike neurobeheerder. Die leerder was in staat om, met
’n slaagsyfer van net meer as 50 %, meer as 95 % suiwerheid in beide uitlaatstrome
te bereik in 2,000 minute van die interaksie met die aanleg.
Op die ou end het versterkingsleer getoon dat, wanneer daar voldoende ruimte
is vir verbetering oor bestaande beheer implementasies, kan dit ’n baie goeie vervanging
wees selfs wanneer daar geen model beskikbaar is nie. Toekomstige werk
moet fokus op laboratoriumskaal toepassings van hierdie tegnieke.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/17871
Date12 1900
CreatorsHunter, Stephen Leon
ContributorsAldrich, C., Stellenbosch University. Faculty of Engineering. Dept. of Process Engineering.
PublisherStellenbosch : Stellenbosch University
Source SetsSouth African National ETD Portal
Languageen_ZA
Detected LanguageUnknown
TypeThesis
Format111 p. : ill.
RightsStellenbosch University

Page generated in 0.0735 seconds