The study explores binary classification with Support Vector Machines as means to predict a satisfaction score based on customer surveys in the customer supportdomain. Standard feature selection methods and their impact on results are evaluated and a feature scoring metric Log Odds Ratio is implemented for addressingasymmetrical class distributions. Results show that the feature selection andscoring methods implemented improve performance significantly. Results alsoshow that it is possible to get decent predictive values on test data based onlimited amount of training observations. However mixed results are presentedin a real-world application example as a there is a significant error rate fordiscriminating the minority class. We also show the negative effects of usingcommon metrics such as accuracy and f-measure for optimizing models whendealing with high-skew data in a classification context.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-300165 |
Date | January 2016 |
Creators | Hedlund, Henrik |
Publisher | Uppsala universitet, Institutionen för lingvistik och filologi |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.1466 seconds