In this thesis, we examine the machine learning models logistic regression, multilayer perceptron and random forests in the purpose of discriminate between good and bad credit applicants. In addition to these models we address the problem of imbalanced data with the Synthetic Minority Over-Sampling Technique (SMOTE). The data available have 273 286 entries and contains information about the invoice of the applicant and the credit decision process as well as information about the applicant. The data was collected during the period 2015-2017. With AUC-values at about 73%some patterns are found that can discriminate between customers that are likely to pay their invoice and customers that are not. However, the more advanced models only performed slightly better than the logistic regression.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-138968 |
Date | January 2017 |
Creators | Sandberg, Martina |
Publisher | Linköpings universitet, Statistik och maskininlärning |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0016 seconds