• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Source code search for automatic bug localization

Shayan Ali A Akbar (9761117) 14 December 2020 (has links)
This dissertation advances the state-of-the-art in information retrieval (IR) based automatic bug localization for large software systems. We present techniques from three generations of IR based bug localization and compare their performances on our large and diverse bug localization dataset --- the Bugzbook dataset. The three generations span over fifteen years of research in mining software repositories for bug localization and include: (1) the generation of simple bag-of-words (BoW) based techniques, (2) the generation in which software-centric information such as bug and code change histories as well as structured information embedded in bug reports and code files are exploited to improve retrieval, and (3) the third and most recent generation in which order and semantic relationships between terms are modeled to improve the performance of bug localization systems. The dissertation also presents a novel technique called SCOR (Source Code Retrieval with Semantics and Order) which combines Markov Random Fields (MRF) based term-term ordering dependencies with semantic word vectors obtained from neural network based word embedding algorithms, such as word2vec, to better localize bugs in code files. The results presented in this dissertation show that while term-term ordering and semantic relationships significantly improve the performance when they are modeled separately in retrieval systems, the best precisions in retrieval are obtained when they are modeled together in a single retrieval system. We also show that the semantic representations of software terms learned by training the word embedding algorithm on a corpus of software repositories can be used to perform search in new software code repositories not present in the training corpus of the word embedding algorithm.<br>

Page generated in 0.0665 seconds