Spelling suggestions: "subject:"classification methodologies"" "subject:"1classification methodologies""
1 |
The use of classification methods for gross error detection in process dataGerber, Egardt 12 1900 (has links)
Thesis (MScEng)-- Stellenbosch University, 2013. / ENGLISH ABSTRACT: All process measurements contain some element of error. Typically, a distinction is made between
random errors, with zero expected value, and gross errors with non-zero magnitude. Data Reconciliation
(DR) and Gross Error Detection (GED) comprise a collection of techniques designed to attenuate
measurement errors in process data in order to reduce the effect of the errors on subsequent use of the
data. DR proceeds by finding the optimum adjustments so that reconciled measurement data satisfy
imposed process constraints, such as material and energy balances. The DR solution is optimal under
the assumed statistical random error model, typically Gaussian with zero mean and known covariance.
The presence of outliers and gross errors in the measurements or imposed process constraints invalidates
the assumptions underlying DR, so that the DR solution may become biased. GED is required to detect,
identify and remove or otherwise compensate for the gross errors. Typically GED relies on formal
hypothesis testing of constraint residuals or measurement adjustment-based statistics derived from the
assumed random error statistical model.
Classification methodologies are methods by which observations are classified as belonging to one of
several possible groups. For the GED problem, artificial neural networks (ANN’s) have been applied
historically to resolve the classification of a data set as either containing or not containing a gross error.
The hypothesis investigated in this thesis is that classification methodologies, specifically classification
trees (CT) and linear or quadratic classification functions (LCF, QCF), may provide an alternative to the
classical GED techniques.
This hypothesis is tested via the modelling of a simple steady-state process unit with associated
simulated process measurements. DR is performed on the simulated process measurements in order to
satisfy one linear and two nonlinear material conservation constraints. Selected features from the DR
procedure and process constraints are incorporated into two separate input vectors for classifier
construction. The performance of the classification methodologies developed on each input vector is
compared with the classical measurement test in order to address the posed hypothesis.
General trends in the results are as follows: - The power to detect and/or identify a gross error is a strong function of the gross error magnitude
as well as location for all the classification methodologies as well as the measurement test.
- For some locations there exist large differences between the power to detect a gross error and the
power to identify it correctly. This is consistent over all the classifiers and their associated
measurement tests, and indicates significant smearing of gross errors.
- In general, the classification methodologies have higher power for equivalent type I error than
the measurement test.
- The measurement test is superior for small magnitude gross errors, and for specific locations,
depending on which classification methodology it is compared with.
There is significant scope to extend the work to more complex processes and constraints, including
dynamic processes with multiple gross errors in the system. Further investigation into the optimal
selection of input vector elements for the classification methodologies is also required. / AFRIKAANSE OPSOMMING: Alle prosesmetings bevat ʼn sekere mate van metingsfoute. Die fout-element van ʼn prosesmeting word
dikwels uitgedruk as bestaande uit ʼn ewekansige fout met nul verwagte waarde, asook ʼn nie-ewekansige
fout met ʼn beduidende grootte. Data Rekonsiliasie (DR) en Fout Opsporing (FO) is ʼn versameling van
tegnieke met die doelwit om die effek van sulke foute in prosesdata op die daaropvolgende aanwending
van die data te verminder. DR word uitgevoer deur die optimale veranderinge aan die oorspronklike
prosesmetings aan te bring sodat die aangepaste metings sekere prosesmodelle gehoorsaam, tipies
massa- en energie-balanse. Die DR-oplossing is optimaal, mits die statistiese aannames rakende die
ewekansige fout-element in die prosesdata geldig is. Dit word tipies aanvaar dat die fout-element
normaal verdeel is, met nul verwagte waarde, en ʼn gegewe kovariansie matriks.
Wanneer nie-ewekansige foute in die data teenwoordig is, kan die resultate van DR sydig wees. FO is
daarom nodig om nie-ewekansige foute te vind (Deteksie) en te identifiseer (Identifikasie). FO maak
gewoonlik staat op die statistiese eienskappe van die meting aanpassings wat gemaak word deur die DR
prosedure, of die afwykingsverskil van die model vergelykings, om formele hipoteses rakende die
teenwoordigheid van nie-ewekansige foute te toets.
Klassifikasie tegnieke word gebruik om die klasverwantskap van observasies te bepaal. Rakende die FO
probleem, is sintetiese neurale netwerke (SNN) histories aangewend om die Deteksie en Identifikasie
probleme op te los. Die hipotese van hierdie tesis is dat klassifikasie tegnieke, spesifiek klassifikasiebome
(CT) en lineêre asook kwadratiese klassifikasie funksies (LCF en QCF), suksesvol aangewend
kan word om die FO probleem op te los.
Die hipotese word ondersoek deur middel van ʼn simulasie rondom ʼn eenvoudige gestadigde toestand
proses-eenheid wat aan een lineêre en twee nie-lineêre vergelykings onderhewig is. Kunsmatige
prosesmetings word geskep met behulp van lukrake syfers sodat die foutkomponent van elke
prosesmeting bekend is. DR word toegepas op die kunsmatige data, en die DR resultate word gebruik
om twee verskillende insetvektore vir die klassifikasie tegnieke te skep. Die prestasie van die
klassifikasie metodes word vergelyk met die metingstoets van klassieke FO ten einde die gestelde
hipotese te beantwoord. Die onderliggende tendense in die resultate is soos volg:
- Die vermoë om ‘n nie-ewekansige fout op te spoor en te identifiseer is sterk afhanklik van die
grootte asook die ligging van die fout vir al die klassifikasie tegnieke sowel as die metingstoets.
- Vir sekere liggings van die nie-ewekansige fout is daar ‘n groot verskil tussen die vermoë om die
fout op te spoor, en die vermoë om die fout te identifiseer, wat dui op smering van die fout. Al
die klassifikasie tegnieke asook die metingstoets baar hierdie eienskap.
- Oor die algemeen toon die klassifikasie metodes groter sukses as die metingstoets.
- Die metingstoets is meer suksesvol vir relatief klein nie-ewekansige foute, asook vir sekere
liggings van die nie-ewekansige fout, afhangende van die klassifikasie tegniek ter sprake.
Daar is verskeie maniere om die bestek van hierdie ondersoek uit te brei. Meer komplekse, niegestadigde
prosesse met sterk nie-lineêre prosesmodelle en meervuldige nie-ewekansige foute kan
ondersoek word. Die moontlikheid bestaan ook om die prestasie van klassifikasie metodes te verbeter
deur die gepaste keuse van insetvektor elemente.
|
Page generated in 0.1455 seconds