This study investigates the performance of Logistic Regression, k-Nearest Neighbors (KNN), and Random Forest algorithms in predicting customer churn within an e-commerce platform. The choice of the mentioned algorithms was due to the unique characteristics of the dataset and the unique perception and value provided by each algorithm. Iterative models ‘examinations, encompassing preprocessing techniques, feature engineering, and rigorous evaluations, were conducted. Logistic Regression showcased moderate predictive capabilities but lagged in accurately identifying potential churners due to its assumptions of linearity between log odds and predictors. KNN emerged as the most accurate classifier, achieving superior sensitivity and specificity (98.22% and 96.35%, respectively), outperforming other models. Random Forest, with sensitivity and specificity (91.75% and 95.83% respectively) excelled in specificity but slightly lagged in sensitivity. Feature importance analysis highlighted "Tenure" as the most impactful variable for churn prediction. Preprocessing techniques differed in performance across models, emphasizing the importance of tailored preprocessing. The study's findings underscore the significance of continuous model refinement and optimization in addressing complex business challenges like customer churn. The insights serve as a foundation for businesses to implement targeted retention strategies, mitigating customer attrition, and promote growth in e-commerce platforms.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:du-48495 |
Date | January 2024 |
Creators | Aljifri, Ahmed |
Publisher | Högskolan Dalarna, Institutionen för information och teknik |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0034 seconds