Bioconductor is the second largest R software package repository that is primarily used for the analysis of genomic and biological data. With downloads exceeding millions in recent years, the widespread growth of the repository's adoption can be attributed to it's diverse selection of community-created packages, written in the programming language R, that allow statistical methodologies for analysis and modelling of data. However, as these packages evolve, their APIs go through changes that can break existing user code. Fixing these API breaking changes whenever a package is updated can be frustrating and time-consuming, especially since a large fraction of the user community are researchers who do not necessarily have software engineering background. In that context, we first present a tool that can detect syntactic API breaking changes between two released versions of a library written in R through static analysis of the package source code. This tool can be of utility to R package developers, so that they can more comprehensively report or handle the breaking changes in their releases, and to R package users, who want to be aware of the API differences that may exist between two releases before upgrading the libraries in their code. Through the use of this tool and manual inspection, we also conducted an empirical study of the breaking changes and backward incompatibility in Bioconductor packages. We studied the 100 most downloaded packages in the repository and found that 28% of all packages releases are backward incompatible. We also found that 55% of these breaking changes go undocumented and developers don't maintain semantic versioning for 22% of the releases. Finally, we manually inspected 10 library releases that consisted of breaking changes and found 2% of the API-s to affect 31 client projects. / Master of Science / Bioconductor is a software repository that consists of over 2000 software libraries. These libraries can provide users with reusable functions, or APIs, to perform statistical and graphical data analysis. The developers of these libraries will generally make timely updates to the library source code and the functions for various maintainability purposes. However, when clients install these library updates in their existing code, their code might not compile, run or behave the same way it used to anymore due to the changes made in the APIs of the libraries. Such a library release that consists of changes that can potentially break older code is considered to be backward incompatible. Without proper documentation from the library developer's side, fixing these issues can be time consuming as the client might have to manually look at the changes made in the library's source code. In order to tackle this issue, we first present a tool that can analyse two versions of a library and identify a subset of the breaking changes in the API. This can be helpful for both the users and the developers of the libraries to be aware of any breaking changes that exist in a new release. Afterwards, we conduct a study on the Bioconductor ecosystem to see how serious the problem of backward incompatibility really is by studying the top 100 most downloaded packages from the repository. We see that 28% of the releases across these 100 packages are backward incompatible.
Since clients are likely to be using multiple libraries at once, this figure can potentially cause frequent issues in client code. We then go on to check how often developers maintain the correct release protocols when updating their libraries. These include versioning the releases in correct ways, so as to let the users be aware of what releases may be backward incompatible and documenting any breaking changes that occur in a NEWS file that users have access to. In that aspect, we find that 22% of the releases are not versioned correctly and roughly 55% of the breaking changes in the API are not documented. Finally, we investigate how frequently these breaking changes can actually affect client code. Here, we manually inspect 10 releases with a high number of a subset of the breaking changes and find 31 projects that implement these APIs, which would break upon a library update.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/113116 |
Date | 10 January 2023 |
Creators | Chowdhury, Hemayet Ahmed |
Contributors | Computer Science and Applications, Meng, Na, Gulzar, Muhammad Ali, Li, Song |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Thesis |
Format | ETD, application/pdf, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.002 seconds