Return to search

Bank Customer Churn Prediction : A comparison between classification and evaluation methods

This study aims to assess which supervised statistical learning method; random forest, logistic regression or K-nearest neighbor, that is the best at predicting banks customer churn. Additionally, the study evaluates which cross-validation set approach; k-Fold cross-validation or leave-one-out cross-validation that yields the most reliable results. Predicting customer churn has increased in popularity since new technology, regulation and changed demand has led to an increase in competition for banks. Thus, with greater reason, banks acknowledge the importance of maintaining their customer base.   The findings of this study are that unrestricted random forest model estimated using k-Fold is to prefer out of performance measurements, computational efficiency and a theoretical point of view. Albeit, k-Fold cross-validation and leave-one-out cross-validation yield similar results, k-Fold cross-validation is to prefer due to computational advantages.   For future research, methods that generate models with both good interpretability and high predictability would be beneficial. In order to combine the knowledge of which customers end their engagement as well as understanding why. Moreover, interesting future research would be to analyze at which dataset size leave-one-out cross-validation and k-Fold cross-validation yield the same results.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-411918
Date January 2020
CreatorsTandan, Isabelle, Goteman, Erika
PublisherUppsala universitet, Statistiska institutionen, Uppsala universitet, Statistiska institutionen
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0024 seconds