Return to search

NLP-based Failure log Clustering to Enable Batch Log Processing in Industrial DevOps Setting

The rapid development, updating, and maintenance of industrial software systems have increased the necessity for software artifact testing. Some medium and large industries are forced to automate the test analysis process due to the proliferation of test data. The examination of test results can be automated by grouping them into subsets comprised of comparable test outcomes and their batch analysis. In this instance, the first step is to identify a precise and reliable categorization mechanism based on structural similarities and error categories. In addition, since errors and the number of subgroups are not specified, a method that does not require prior knowledge of the target subsets should be implemented. Clustering is one of the appropriate methods for separating test results, given this description. This work presents an appropriate approach for grouping test results and accelerating the test analysis process by implementing multiple clustering algorithms (K-means, Agglomerative, DBSCAN, Fuzzy-c-means, and Spectral) on test results from industrial contexts and comparing their time and efficiency in outputs. The lack of organization and textual character of the test findings is one of the primary obstacles in this study, necessitating the implementation of feature selection methods. Consequently, this study employs three distinct approaches to feature selection (TF-IDF, FastText, and Bert). This research was conducted by implementing a series of trials in a controlled and isolated environment, with the assistance of Westermo Technologies AB's test process results, as part of the AIDOaRT Project, in order to establish an acceptable way for clustering industrial test results. The conclusion of this thesis shows that K-means and Agglomerative yield the highest performance and evaluation scores; however, the K-means is superior in terms of execution time and speed. In addition, by organizing a Focus Group meeting to qualitatively examine the results from the perspective of engineers and experts, it can be determined that, from their perspective, clustering results increases the speed of test analysis and decreases the review workload.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:mdh-59203
Date January 2022
CreatorsHomayouni, Ali
PublisherMälardalens universitet, Akademin för innovation, design och teknik
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.1552 seconds