Return to search

Process monitoring and fault diagnosis using random forests

Thesis (PhD (Process Engineering))--University of Stellenbosch, 2010. / Dissertation presented for the Degree
of
DOCTOR OF PHILOSOPHY
(Extractive Metallurgical Engineering)
in the Department of Process Engineering
at the University of Stellenbosch / ENGLISH ABSTRACT: Fault diagnosis is an important component of process monitoring, relevant in the greater context of developing
safer, cleaner and more cost efficient processes. Data-driven unsupervised (or feature extractive) approaches
to fault diagnosis exploit the many measurements available on modern plants. Certain current unsupervised
approaches are hampered by their linearity assumptions, motivating the investigation of nonlinear methods.
The diversity of data structures also motivates the investigation of novel feature extraction methodologies in
process monitoring.
Random forests are recently proposed statistical inference tools, deriving their predictive accuracy from the
nonlinear nature of their constituent decision tree members and the power of ensembles. Random forest
committees provide more than just predictions; model information on data proximities can be exploited to
provide random forest features. Variable importance measures show which variables are closely associated
with a chosen response variable, while partial dependencies indicate the relation of important variables to said
response variable.
The purpose of this study was therefore to investigate the feasibility of a new unsupervised method based on
random forests as a potentially viable contender in the process monitoring statistical tool family. The
hypothesis investigated was that unsupervised process monitoring and fault diagnosis can be improved by
using features extracted from data with random forests, with further interpretation of fault conditions aided by
random forest tools. The experimental results presented in this work support this hypothesis.
An initial study was performed to assess the quality of random forest features. Random forest features were
shown to be generally difficult to interpret in terms of geometry present in the original variable space. Random
forest mapping and demapping models were shown to be very accurate on training data, and to extrapolate
weakly to unseen data that do not fall within regions populated by training data.
Random forest feature extraction was applied to unsupervised fault diagnosis for process data, and compared
to linear and nonlinear methods. Random forest results were comparable to existing techniques, with the
majority of random forest detections due to variable reconstruction errors. Further investigation revealed that
the residual detection success of random forests originates from the constrained responses and poor
generalization artifacts of decision trees. Random forest variable importance measures and partial
dependencies were incorporated in a visualization tool to allow for the interpretation of fault conditions.
A dynamic change point detection application with random forests proved more successful than an existing
principal component analysis-based approach, with the success of the random forest method again residing in
reconstruction errors.
The addition of random forest fault diagnosis and change point detection algorithms to a suite of abnormal
event detection techniques is recommended. The distance-to-model diagnostic based on random forest
mapping and demapping proved successful in this work, and the theoretical understanding gained supports the
application of this method to further data sets. / AFRIKAANSE OPSOMMING: Foutdiagnose is ’n belangrike komponent van prosesmonitering, en is relevant binne die groter konteks van die
ontwikkeling van veiliger, skoner en meer koste-effektiewe prosesse. Data-gedrewe toesigvrye of
kenmerkekstraksie-benaderings tot foutdiagnose benut die vele metings wat op moderne prosesaanlegte
beskikbaar is. Party van die huidige toesigvrye benaderings word deur aannames rakende liniariteit belemmer,
wat as motivering dien om nie-liniêre metodes te ondersoek. Die diversiteit van datastrukture is ook verdere
motivering vir ondersoek na nuwe kenmerkekstraksiemetodes in prosesmonitering.
Lukrake-woude is ’n nuwe statistiese inferensie-tegniek, waarvan die akkuraatheid toegeskryf kan word aan die
nie-liniêre aard van besluitnemingsboomlede en die bekwaamheid van ensembles. Lukrake-woudkomitees
verskaf meer as net voorspellings; modelinligting oor datapuntnabyheid kan benut word om lukrakewoudkenmerke
te verskaf. Metingbelangrikheidsaanduiers wys watter metings in ’n noue verhouding met ’n
gekose uitsetveranderlike verkeer, terwyl parsiële afhanklikhede aandui wat die verhouding van ’n belangrike
meting tot die gekose uitsetveranderlike is.
Die doel van hierdie studie was dus om die uitvoerbaarheid van ’n nuwe toesigvrye metode vir
prosesmonitering gebaseer op lukrake-woude te ondersoek. Die ondersoekte hipotese lui: toesigvrye
prosesmonitering en foutdiagnose kan verbeter word deur kenmerke te gebruik wat met lukrake-woude
geëkstraheer is, waar die verdere interpretasie van foutkondisies deur addisionele lukrake-woude-tegnieke
bygestaan word. Eksperimentele resultate wat in hierdie werkstuk voorgelê is, ondersteun hierdie hipotese.
’n Intreestudie is gedoen om die gehalte van lukrake-woudkenmerke te assesseer. Daar is bevind dat dit
moeilik is om lukrake-woudkenmerke in terme van die geometrie van die oorspronklike metingspasie te
interpreteer. Verder is daar bevind dat lukrake-woudkartering en -dekartering baie akkuraat is vir
opleidingsdata, maar dat dit swak ekstrapolasie-eienskappe toon vir ongesiene data wat in gebiede buite dié
van die opleidingsdata val.
Lukrake-woudkenmerkekstraksie is in toesigvrye-foutdiagnose vir gestadigde-toestandprosesse toegepas, en is
met liniêre en nie-liniêre metodes vergelyk. Resultate met lukrake-woude is vergelykbaar met dié van
bestaande metodes, en die meerderheid lukrake-woudopsporings is aan metingrekonstruksiefoute toe te skryf.
Verdere ondersoek het getoon dat die sukses van res-opsporing op die beperkte uitsetwaardes en swak
veralgemenende eienskappe van besluitnemingsbome berus. Lukrake-woude-metingbelangrikheidsaanduiers
en parsiële afhanklikhede is ingelyf in ’n visualiseringstegniek wat vir die interpretasie van foutkondisies
voorsiening maak.
’n Dinamiese aanwending van veranderingspuntopsporing met lukrake-woude is as meer suksesvol bewys as ’n
bestaande metode gebaseer op hoofkomponentanalise. Die sukses van die lukrake-woudmetode is weereens
aan rekonstruksie-reswaardes toe te skryf.
’n Voorstel wat na aanleiding van hierde studie gemaak is, is dat die lukrake-woudveranderingspunt- en
foutopsporingsmetodes by ’n soortgelyke stel metodes gevoeg kan word. Daar is in hierdie werk bevind dat die
afstand-vanaf-modeldiagnostiek gebaseer op lukrake-woudkartering en -dekartering suksesvol is vir
foutopsporing. Die teoretiese begrippe wat ontsluier is, ondersteun die toepassing van hierdie metodes op
verdere datastelle.

Identiferoai:union.ndltd.org:netd.ac.za/oai:union.ndltd.org:sun/oai:scholar.sun.ac.za:10019.1/5360
Date12 1900
CreatorsAuret, Lidia
ContributorsAldrich, C., University of Stellenbosch. Faculty of Engineering. Dept. of Process Engineering.
PublisherStellenbosch : University of Stellenbosch
Source SetsSouth African National ETD Portal
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format214 p. : ill.
RightsUniversity of Stellenbosch

Page generated in 0.0032 seconds