With the growing software technologies companies tend to develop automated solutions to save time and money. Automated solutions have seen tremendous growth in the software industry and have benefited from extensive machine learning research. Although extensive research has been done in the area of automated bug classification, with the new data being collected, more precise methods are yet to be developed. An automated bug classifier will process the content of the bug report and assign it to the person or department that would fix the problem. A bug report typically contains an unstructured text field where the problem is described in detail. A lot of research regarding information extraction from such text fields has been done. This thesis uses a topic modeling technique, Latent Dirichlet Allocation (LDA), and a numerical statistic Term Frequency - Inverse Document Frequency (TF-IDF), to generate two different features from the unstructured text fields of the bug report. A third set of features was created by concatenating the TF-IDF and the LDA features. The class distribution of the data used in this thesis changes over time. To explore if time has an impact on the prediction, the age of the bug report was introduced as a feature. The importance of this feature, when used along with the LDA and TF-IDF features, was also explored in this thesis. These generated feature vectors were used as predictors to train three different classification models; multinomial logistic regression, dense neural networks, and DO-probit. The prediction of the classifiers, for the correct department to handle a bug, was evaluated on the accuracy and the F1-score of the prediction. For comparison, the predictions from a Support Vector Machine (SVM) using a linear kernel was treated as the baseline. The best results for the multinomial logistic regression and the dense neural networks classifiers were obtained when the TF-IDF features of the bug reports were used as predictors. Among the three classifiers trained the dense neural network had the best performance, though the classifier was not able to perform better than the SVM baseline. Using age as a feature did not give a significant improvement in the predictive performance of the classifiers, but was able to identify some interesting patterns in the data. Further research on other ways of using the age of the bug reports could be promising.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-166224 |
Date | January 2020 |
Creators | Adhikarla, Sridhar |
Publisher | Linköpings universitet, Institutionen för datavetenskap, Linköpings universitet, Filosofiska fakulteten |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.002 seconds