Return to search

Enabling Large-Scale Mining Software Repositories (MSR) Studies Using Web-Scale Platforms

The Mining Software Repositories (MSR) field analyzes software data to uncover knowledge and assist software developments. Software projects and products continue to grow in size and complexity. In-depth analysis of these large systems and their evolution is needed to better understand the characteristics of such large-scale systems and projects. However, classical software analysis platforms (e.g., Prolog-like, SQL-like, or specialized programming scripts) face many challenges when performing large-scale MSR studies. Such software platforms rarely scale easily out of the box. Instead, they often require analysis-specific one-time ad hoc scaling tricks and designs that are not reusable for other types of analysis and that are costly to maintain. We believe that the web community has faced many of the scaling challenges facing the software engineering community, as they cope with the enormous growth of the web data. In this thesis, we report on our experience in using MapReduce and Pig, two web-scale platforms, to perform large MSR studies. Through our case studies, we carefully demonstrate the benefits and challenges of using web platforms to prepare (i.e., Extract, Transform, and Load, ETL) software data for further analysis. The results of our studies show that: 1) web-scale platforms provide an effective and efficient platform for large-scale MSR studies; 2) many of the web community’s guidelines for using web-scale platforms must be modified to achieve the optimal performance for large-scale MSR studies. This thesis will help other software engineering researchers who want to scale their studies. / Thesis (Master, Computing) -- Queen's University, 2010-05-28 00:37:19.443

Identiferoai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:OKQ.1974/5693
Date31 May 2010
CreatorsShang, Weiyi
ContributorsQueen's University (Kingston, Ont.). Theses (Queen's University (Kingston, Ont.))
Source SetsLibrary and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada
LanguageEnglish, English
Detected LanguageEnglish
TypeThesis
RightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
RelationCanadian theses

Page generated in 0.0022 seconds