Return to search

Data-Driven Supervised Classifiers in High-Dimensional Spaces: Application on Gene Expression Data

Several ready-to-use supervised classifiers perform predictively well in large-sample cases, but generally, the same cannot be expected when transitioning to high-dimensional settings. This can be explained by the classical supervised theory that has not been developed within high-dimensional spaces, giving several classifiers a hard combat against the curse of dimensionality. A rise in parsimonious classification procedures, particularly techniques incorporating feature selectors, can be observed. It can be interpreted as a two-step procedure: allowing an arbitrary selector to obtain a feature subset independent of a ready-to-use model and subsequently classify unlabelled instances within the selected subset. Modeling the two-step procedure is often heavy in motivation, and theoretical and algorithmic descriptions are frequently overlooked. In this thesis, we aim to describe the theoretical and algorithmic framework when employing a feature selector as a pre-processing step for Support Vector Machine and assess its validity in high-dimensional settings. The validity of the proposed classifier is evaluated based on predictive performance through a comparative study with a state-of-the-art algorithm designed for advanced learning tasks. The chosen algorithm effectively employs feature relevance during training, making it suitable for high-dimensional settings. The results suggest that the proposed classifier performs predicatively superior to the Support Vector Machine in lower input dimensions; however, a high rate of convergence towards a performance comparable to the Support Vector Machine tends to emerge for input dimensions beyond a certain threshold. Additionally, the thesis could not conclude any strict superior performance between the chosen state-of-the-art algorithm and the proposed classifier. Nonetheless, the state-of-the-art algorithm imposes a more balanced performance across both labels.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-532347
Date January 2024
CreatorsEfrem, Nabiel H.
PublisherUppsala universitet, Statistiska institutionen
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.002 seconds