<p> This dissertation addresses the problems of program comprehension to support the evolution of large-scale software systems. The research concerns how software engineers locate features and concepts along with categorizing changes within very large bodies of source code along with their versioned histories. More specifically, advanced Information Retrieval (IR) and Natural Language Processing (NLP) are utilized and enhanced to support various software engineering tasks. This research is not aimed at directly improving IR or NLP approaches; rather it is aimed at understanding how additional information can be leveraged to improve the final results. The work advances the field by investigating approaches to augment and re-document source code with different types of abstract behavior information. The hypothesis is that enriching the source code corpus with meaningful descriptive information, and integrating this orthogonal information (semantic and structural) that is extracted from source code, will improve the results of the IR methods for indexing and querying information. Moreover, adding this new information to a corpus is a form of supervision. That is, apriori knowledge is often used to direct and supervise machine-learning and IR approaches. </p><p> The main contributions of this dissertation involve improving on the results of previous work in feature location and source code querying. The dissertation demonstrates that the addition of statically derived information from source code (e.g., method stereotypes) can improve the results of IR methods applied to the problem of feature location. Further contributions include showing the effects of eliminating certain textual information (comments and function calls) from being included when performing source code indexing for feature/concept location. Moreover, the dissertation demonstrates an IR-based method of natural language topic extraction that assists developers in gaining an overview of past maintenance activities based on software repository commits. </p><p> The ultimate goal of this work is to reduce the costs, effort, and time of software maintenance by improving the results of previous work in feature location and source code querying, and by supporting a new platform for enhancing program comprehension and facilitating software engineering research.</p>
Identifer | oai:union.ndltd.org:PROQUEST/oai:pqdtoai.proquest.com:3618939 |
Date | 13 June 2014 |
Creators | Alhindawi, Nouh |
Publisher | Kent State University |
Source Sets | ProQuest.com |
Language | English |
Detected Language | English |
Type | thesis |
Page generated in 0.002 seconds