Global ETD Search

1	Automated Duplicate Bug Reports Detection - An Experiment at Axis Communication AB Kang, Li January 2017 (has links) Context. Bug tracking systems play an important role in software maintenance. They allow users to submit bug reports. However, it has been observed that often a bug report submitted is a duplicate (when several users submit bug reports for the same problem, these reports are called duplicated issue reports) which results in considerable duplicate bug reports in a bug tracking system. Solutions for automating the process of duplicate bug reports detection can increase the productivity of software maintenance activities, as new incoming bug reports are directly compared with the existing bug reports to identify their similar bug reports, which is no need for the human to spend time reading, understanding, and searching. Although recently there has been considerable research on such solutions, there is still much room for improvement regarding accuracy and recall rate during the duplicate detection process. Besides, very few tools were evaluated in an industrial setting. Objectives. In this study, firstly, we aim to characterize automated duplicate bug report detection methods by exploring categories of all those methods, identifying proposed evaluation methods, specifying performance difference between the categories of methods. Then we propose a method leveraging recent advances on using semantic model – Doc2vec and present an overall framework - preprocessing, training a semantic model, calculating and ranking similarity, and retrieving duplicate bug reports of the proposed method. Finally, we apply an experiment to evaluate the performance of the proposed method and compare it with the selected best methods for the task of duplicate bug report detection Methods. To classify and analyze all existing research on automated duplicate bug report detection, we conducted a systematic mapping study. To evaluate our proposed method, we conducted an experiment with an identified number of bug reports on the internal bug report database of Axis Communication AB. Results. We classified automated duplicate bug report detection techniques into three categories - TOP N recommendation and ranking approach, binary classification approach, and decision-making approach. We found that recall-rate@k is the most common evaluation metric, and found that TOP N recommendation and ranking approach has the best performance among the identified approaches. The experimental results showed that the recall rate of our proposed approach is significantly higher than the combination of TF-IDF with Word2vec and the combination of TF-IDF with LSI. Our combination of Doc2vec and TF-IDF approach, has a recall rate@1-10 of 18.66%-42.88% in the TROUBLE data, which is an improvement of 1.63%-9.42% to the state-of-art. Conclusions. In this thesis, we identified and classified 44 automated duplicate bug report detection research papers by conducting a systematic mapping study. We provide an overview of the state-of-art, identifying evaluation metrics, investigating the scientific evidence in the reported results, and identifying needs for future research. We implemented a bug tracking system with a duplicate bug report detection module where a list of Top-N related bug reports (along with a numerical value representing a similar score) is created. After conducting the experiment, we found that our proposed approach - the combination of Doc2vec and TF-IDF approach produces the best recall rate.Keywords: Similar Similar Bugs Paragraph Vector Information Retrieval Recommendation Systems Software Engineering Programvaruteknik
2	Klasifikace žánrů pomocí strojového učení / Genres classification by means of machine learning Bílek, Jan January 2018 (has links) In this thesis, we compare the bag of words approach with doc2vec doc- ument embeddings on the task of classification of book genres. We cre- ate 3 datasets with different text lengths by extracting short snippets from books in Project Gutenberg repository. Each dataset comprises of more than 200000 documents and 14 different genres. For 3200-character documents, we achieve F1-score of 0.862 when stacking models trained on both bag of words and doc2vec representations. We also explore the relationships be- tween documents, genres and words using similarity metrics on their vector representations and report typical words for each genre. As part of the thesis, we also present an online webapp for book genre classification. 1

Search results

Automated Duplicate Bug Reports Detection - An Experiment at Axis Communication AB

Klasifikace žánrů pomocí strojového učení / Genres classification by means of machine learning