Return to search

Using XGBoost to classify theBeihang Keystroke Dynamics Database

Keystroke Dynamics enable biometric security systems by collecting and analyzing computer keyboard usage data. There are different approaches to classifying keystroke data and a method that has been gaining a lot of attention in the machine learning industry lately is the decision tree framework of XGBoost. XGBoost has won several Kaggle competitions in the last couple of years, but its capacity in the keystroke dynamics field has not yet been widely explored. Therefore, this thesis has attempted to classify the existing Beihang Keystroke Dynamics Database using XGBoost. To do this, keystroke features such as dwell time and flight time were extracted from the dataset, which contains 47 usernames and passwords. XGBoost was then applied to a binary classification problem, where the model attempts to distinguish keystroke feature sequences from genuine users from those of `impostors'. In this way, the ratio of inaccurately and accurately labeled password inputs can be analyzed. The result showed that, after tuning of the hyperparameters, the XGBoost yielded Equal Error Rates (EER) at best 0.31 percentage points better than the SVM used in the original study of the database at 11.52%, and a highest AUC of 0.9792. The scores achieved by this thesis are however significantly worse than a lot of others in the same field, but so were the results in the original study. The results varied greatly depending on user tested. These results suggests that XGBoost may be a useful tool, that should be tuned, but that a better dataset should be used to sufficiently benchmark the tool. Also, the quality of the model is greatly affected by variance among the users. For future research purposes, one should make sure that the database used is of good quality. To create a security system utilizing XGBoost, one should be careful of the setting and quality requirements when collecting training data

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-357420
Date January 2018
CreatorsBlomqvist, Johanna
PublisherUppsala universitet, Datalogi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess
RelationUPTEC F, 1401-5757 ; 18049

Page generated in 0.0017 seconds