Since their introduction in 1995, Support Vector Machines (SVM) have come to be a widely employed machine learning model for binary classification, owing to their explainable architecture, efficient forward inference, and good ability to generalize. A common desire, not only for SVMs but for machine learning classifiers in general, is to have the model do feature selection, using only a limited subset of the available attributes in its predictions. Various alterations to the SVM problem formulation exist that address this, and in this report we compare a range of such SVM models. We compare how the accuracy and feature selection compare between the models for different datasets, both real and synthetic, and we also investigate the impact of dataset size on the aforementioned quantities. Our conclusions are that models trained to classify samples based on a smaller subset of features, tend to perform at a comparable level to dense models, with particular advantage when the dataset is small. Furthermore, as the training dataset grows in size, the number of selected features also increases, giving a more complex classifier when prompted with a larger data supply.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-348822 |
Date | January 2024 |
Creators | Johansson, Adam, Mattsson, Anton |
Publisher | KTH, Skolan för teknikvetenskap (SCI) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | TRITA-SCI-GRU ; 2024:128 |
Page generated in 0.0021 seconds