Return to search

Prediction of Code Lifetime

There are several previous studies in which machine learning algorithms are used to predict how fault-prone a piece of code is. This thesis takes on a slightly different approach by attempting to predict how long a piece of code will remain unmodified after being written (its “lifetime”). This is based on the hypothesis that frequently modified code is more likely to contain weaknesses, which may make lifetime predictions useful for code evaluation purposes. In this thesis, the predictions are made with machine learning algorithms which are trained on open source code examples from GitHub. Two different machine learning algorithms are used: the multilayer perceptron and the support vector machine. A piece of code is described by three groups of features: code contents, code properties obtained from static code analysis, and metadata from the version control system Git. In a series of experiments it is shown that the support vector machine is the best performing algorithm and that all three feature groups are useful for predicting lifetime. Both the multilayer perceptron and the support vector machine outperform a baseline prediction which always outputs the mean lifetime of the training set. This indicates that lifetime to some extent can be predicted based on information extracted from the code. However, lifetime prediction performance is shown to be highly dataset dependent with large error magnitudes.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-135060
Date January 2017
CreatorsNordfors, Per
PublisherLinköpings universitet, Statistik, Linköpings universitet, Tekniska fakulteten
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0015 seconds