This thesis deals with the development of estimation algorithms with embedded feature selection the context of high dimensional data, in the supervised and unsupervised frameworks. The contributions of this work are materialized by two algorithms, GLOSS for the supervised domain and Mix-GLOSS for unsupervised counterpart. Both algorithms are based on the resolution of optimal scoring regression regularized with a quadratic formulation of the group-Lasso penalty which encourages the removal of uninformative features. The theoretical foundations that prove that a group-Lasso penalized optimal scoring regression can be used to solve a linear discriminant analysis bave been firstly developed in this work. The theory that adapts this technique to the unsupervised domain by means of the EM algorithm is not new, but it has never been clearly exposed for a sparsity-inducing penalty. This thesis solidly demonstrates that the utilization of group-Lasso penalized optimal scoring regression inside an EM algorithm is possible. Our algorithms have been tested with real and artificial high dimensional databases with impressive resuits from the point of view of the parsimony without compromising prediction performances.
Identifer | oai:union.ndltd.org:CCSD/oai:tel.archives-ouvertes.fr:tel-00868847 |
Date | 07 June 2013 |
Creators | Sanchez Merchante, Luis Francisco |
Publisher | Université de Technologie de Compiègne |
Source Sets | CCSD theses-EN-ligne, France |
Language | English |
Detected Language | English |
Type | PhD thesis |
Page generated in 0.008 seconds