Global ETD Search

Return to search

Study of Single and Ensemble Machine Learning Models on Credit Data to Detect Underlying Non-performing Loans

In this paper, we try to compare the performance of two feature dimension reduction methods, the LASSO and PCA. Both simulation study and empirical study show that the LASSO is superior to PCA when selecting significant variables. We apply Logistics Regression (LR), Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Tree (DT) and their corresponding ensemble machines constructed by bagging and adaptive boosting (adaboost) in our study. Three experiments are conducted to explore the impact of class-unbalanced data set on all models. Empirical study indicates that when the percentage of performing loans exceeds 83.3%, the training models shall be carefully applied. When we have class-balanced data set, ensemble machines indeed have a better performance over single machines. The weaker the single machine, the more obvious the improvement we can observe.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-297080

Machine learning

Feature Dimension Reduction

NPL

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-297080
Date	January 2016
Creators	Li, Qiongzhu
Publisher	Uppsala universitet, Statistiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0016 seconds

Study of Single and Ensemble Machine Learning Models on Credit Data to Detect Underlying Non-performing Loans

Description

Links & Downloads

Tags

Additional Fields