1 |
Conclusion stability for natural language based mining of design discussionsMahadi, Alvi 11 February 2021 (has links)
Developer discussions range from in-person hallway chats to comment chains on bug reports. Being able to identify discussions that touch on software design would be helpful in documentation and refactoring software. Design mining is the application of machine learning techniques to correctly label a given discussion artifact, such as a pull request, as pertaining (or not) to design. In this work we demonstrate a simple example of how design mining works. We first replicate an existing state-of-the-art design mining study to show how conclusion stability is poor on different artifact types and different projects. Then we introduce two techniques—augmentation and context specificity—that greatly improve the conclusion stability and cross-project relevance of design mining. Our new approach achieves AUC-ROC of 0.88 on within dataset classification and 0.84 on the cross-dataset classification task. / Graduate
|
2 |
Exploring Design Discussions With Semi-Supervised Topic ModellingLasrado, Roshan N. 11 August 2022 (has links)
Stack Overflow is a rich source of questions and answers—discussions—about software development. One topic of discussion is software design, such as the correct use of design patterns or best practices in data access. Since design is a more abstract topic in software engineering, researchers have long sought to characterize and model design knowledge. However, these approaches typically require significant expert input to contextualize the abstract design information. In this study, we explore how combining expert input with Stack Overflow might serve as an effective way to identify design topics. Being able to identify and classify this design knowledge would enable the discovery and sharing of this knowledge, enabling developers better leverage Stack Overflow for crowd-sourcing their design decisions. We first perform inductive coding of design-tagged Stack Overflow questions and answers to identify the design concepts that developers discuss. We report on areas where inter-rater agreement was a challenge, including abstraction levels. Since inductive coding is expensive, we apply a semi-supervised (Anchored CorEx) approach. We find that it outperforms LDA and offers superior interpretability and the ability to incorporate expert domain knowledge. We leverage Anchored CorEx to identify how design is discussed on Stack Overflow and leveraged in GitHub projects. We conclude by describing how our experience using the semi-supervised CorEx approach leads us to believe that approaches like Anchored CorEx that combine domain knowledge and scalability are key for analyzing large SE text repositories. / Graduate
|
Page generated in 2.1594 seconds