Return to search

Employee Turnover Prediction - A Comparative Study of Supervised Machine Learning Models

Background: In every organization, employees are an essential resource. For several reasons, employees are neglected by the organizations, which leads to employee turnover. Employee turnover causes considerable losses to the organization. Using machine learning algorithms and with the data in hand, a prediction of an employee’s future in an organization is made. Objectives: The aim of this thesis is to conduct a comparison study utilizing supervised machine learning algorithms such as Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost to predict an employee’s future in a company. Using evaluation metrics models are assessed in order to discover the best efficient model for the data in hand. Methods: The quantitative research approach is used in this thesis, and data is analyzed using statistical analysis. The labeled data set comes from Kaggle and includes information on employees at a company. The data set is used to train algorithms. The created models will be evaluated on the test set using evaluation measures including Accuracy, Precision, Recall, F1 Score, and ROC curve to determine which model performs the best at predicting employee turnover. Results: Among the studied features in the data set, there is no feature that has a significant impact on turnover. Upon analyzing the results, the XGBoost classifier has better mean accuracy with 85.3%, followed by the Random Forest classifier with 83% accuracy than the other two algorithms. XGBoost classifier has better precision with 0.88, followed by Random Forest Classifier with 0.82. Both the Random Forest classifier and XGBoost classifier showed a 0.69 Recall score. XGBoost classifier had the highest F1 Score with 0.77, followed by the Random Forest classifier with 0.75. In the ROC curve, the XGBoost classifier had a higher area under the curve(AUC) with 0.88. Conclusions: Among the studied four machine learning algorithms, Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost, the XGBoost classifier is the most optimal with a good performance score respective to the tested performance metrics. No feature is found majorly affect employee turnover.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:bth-23399
Date January 2022
CreatorsKovvuri, Suvoj Reddy, Dommeti, Lydia Sri Divya
PublisherBlekinge Tekniska Högskola, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0026 seconds