• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Investigating the Reproducbility of NPM packages

Goswami, Pronnoy 19 May 2020 (has links)
The meteoric increase in the popularity of JavaScript and a large developer community has led to the emergence of a large ecosystem of third-party packages available via the Node Package Manager (NPM) repository which contains over one million published packages and witnesses a billion daily downloads. Most of the developers download these pre-compiled published packages from the NPM repository instead of building these packages from the available source code. Unfortunately, recent articles have revealed repackaging attacks to the NPM packages. To achieve such attacks the attackers primarily follow three steps – (1) download the source code of a highly depended upon NPM package, (2) inject malicious code, and (3) then publish the modified packages as either misnamed package (i.e., typo-squatting attack) or as the official package on the NPM repository using compromised maintainer credentials. These attacks highlight the need to verify the reproducibility of NPM packages. Reproducible Build is a concept that allows the verification of build artifacts for pre-compiled packages by re-building the packages using the same build environment configuration documented by the package maintainers. This motivates us to conduct an empirical study (1) to examine the reproducibility of NPM packages, (2) to assess the influence of any non-reproducible packages, and (3) to explore the reasons for non-reproducibility. Firstly, we downloaded all versions/releases of 226 most-depended upon NPM packages, and then built each version with the available source code on Github. Secondly, we applied diffoscope, a differencing tool to compare the versions we built against the version downloaded from the NPM repository. Finally, we did a systematic investigation of the reported differences. At least one version of 65 packages was found to be non-reproducible. Moreover, these non- reproducible packages have been downloaded millions of times per week which could impact a large number of users. Based on our manual inspection and static analysis, most reported differences were semantically equivalent but syntactically different. Such differences result due to non-deterministic factors in the build process. Also, we infer that semantic differences are introduced because of the shortcomings in the JavaScript uglifiers. Our research reveals challenges of verifying the reproducibility of NPM packages with existing tools, reveal the point of failures using case studies, and sheds light on future directions to develop better verification tools. / Master of Science / Software packages are distributed as pre-compiled binaries to facilitate software development. There are various package repositories for various programming languages such as NPM (JavaScript), pip (Python), and Maven (Java). Developers install these pre-compiled packages in their projects to implement certain functionality. Additionally, these package repositories allow developers to publish new packages and help the developer community to reduce the delivery time and enhance the quality of the software product. Unfortunately, recent articles have revealed an increasing number of attacks on the package repositories. Moreover, developers trust the pre-compiled binaries, which often contain malicious code. To address this challenge, we conduct our empirical investigation to analyze the reproducibility of NPM packages for the JavaScript ecosystem. Reproducible Builds is a concept that allows any individual to verify the build artifacts by replicating the build process of software packages. For instance, if the developers could verify that the build artifacts of the pre-compiled software packages available in the NPM repository are identical to the ones generated when they individually build that specific package, they could mitigate and be aware of the vulnerabilities in the software packages. The build process is usually described in configuration files such as package.json and DOCKERFILE. We chose the NPM registry for our study because of three primary reasons – (1) it is the largest package repository, (2) JavaScript is the most widely used programming language, and (3) there is no prior dataset or investigation that has been conducted by researchers. We took a two-step approach in our study – (1) dataset collection, and (2) source-code differencing for each pair of software package versions. For the dataset collection phase, we downloaded all available releases/versions of 226 popularly used NPM packages and for the code-differencing phase, we used an off-the-shelf tool called diffoscope. We revealed some interesting findings. Firstly, at least one of the 65 packages as found to be non-reproducible, and these packages have millions of downloads per week. Secondly, we found 50 package-versions to have divergent program semantics which high- lights the potential vulnerabilities in the source-code and improper build practices. Thirdly, we found that the uglification of JavaScript code introduces non-determinism in the build process. Our research sheds light on the challenges of verifying the reproducibility of NPM packages with the current state-of-the-art tools and the need to develop better verification tools in the future. To conclude, we believe that our work is a step towards realizing the reproducibility of NPM packages and making the community aware of the implications of non-reproducible build artifacts.

Page generated in 0.0449 seconds