Return to search

Variable selection and neural networks for high-dimensional data analysis: application in infrared spectroscopy and chemometrics

This thesis focuses particularly on the application of chemometrics in the field of
analytical chemistry. Chemometrics (or multivariate analysis) consists in finding a relationship
between two groups of variables, often called dependent and independent variables.
In infrared spectroscopy for instance, chemometrics consists in the prediction of a quantitative
variable (the obtention of which is delicate, requiring a chemical analysis and a qualified
operator), such as the concentration of a component present in the studied product from spectral
data measured on various wavelengths or wavenumbers (several hundreds, even several thousands).
In this research we propose a methodology in the field of chemometrics to handle the chemical data (spectrophotometric data)
which are often in high dimension.
To handle these data, we first propose a new incremental method (step-by-step) for the selection
of spectral data using linear and non-linear
regression based on the combination of three principles: linear or non-linear regression,
incremental procedure for the variable selection, and use of a validation set. This procedure allows
on one hand to benefit from the advantages of non-linear methods to predict chemical data
(there is often a non-linear relationship between dependent and independent variables), and on the
other hand to avoid the overfitting phenomenon, one of the most crucial problems encountered with
non-linear models. Secondly, we propose to improve the previous method by a judicious
choice of the first selected variable, which has a very important influence on the final
performances of the prediction. The idea is to use a measure of the mutual information between
the independent and dependent variables to select the first one; then the previous
incremental method (step-by-step) is used to select the next variables. The variable selected
by mutual information can have a good interpretation from the spectrochemical point of view, and
does not depend on the data distribution in the training and validation sets.
On the contrary, the traditional chemometric linear methods such as PCR or PLSR produce new
variables which do not have any interpretation from the spectrochemical point of view.
Four real-life datasets (wine, orange juice, milk powder and apples) are presented in order to
show the efficiency and advantages of both proposed procedures compared to the traditional
chemometric linear methods often used, such as MLR, PCR and PLSR.

Identiferoai:union.ndltd.org:BICfB/oai:ucl.ac.be:ETDUCL:BelnUcetd-11182003-100404
Date24 November 2003
CreatorsBenoudjit, Nabil
PublisherUniversite catholique de Louvain
Source SetsBibliothèque interuniversitaire de la Communauté française de Belgique
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://edoc.bib.ucl.ac.be:81/ETD-db/collection/available/BelnUcetd-11182003-100404/
Rightsrestricted, J'accepte que le texte de la thèse (ci-après l'oeuvre), sous réserve des parties couvertes par la confidentialité, soit publié dans le recueil électronique des thèses UCL. A cette fin, je donne licence à l'UCL : - le droit de fixer et de reproduire l'oeuvre sur support électronique : logiciel ETD/db - le droit de communiquer l'oeuvre au public Cette licence, gratuite et non exclusive, est valable pour toute la durée de la propriété littéraire et artistique, y compris ses éventuelles prolongations, et pour le monde entier. Je conserve tous les autres droits pour la reproduction et la communication de la thèse, ainsi que le droit de l'utiliser dans de futurs travaux. Je certifie avoir obtenu, conformément à la législation sur le droit d'auteur et aux exigences du droit à l'image, toutes les autorisations nécessaires à la reproduction dans ma thèse d'images, de textes, et/ou de toute oeuvre protégés par le droit d'auteur, et avoir obtenu les autorisations nécessaires à leur communication à des tiers. Au cas où un tiers est titulaire d'un droit de propriété intellectuelle sur tout ou partie de ma thèse, je certifie avoir obtenu son autorisation écrite pour l'exercice des droits mentionnés ci-dessus.

Page generated in 0.0024 seconds