Return to search

Validating the Quality of a Big Data Java Corpus

Recent research within the field of Software Engineering have used GitHub, the largest hub for open source projects with almost 20 million users and 57 million repositories, to mine large amounts of source code to get more trustworthy results when developing machine and deep learning models. Mining GitHub comes with many challenges since the dataset is large and the data does not only contain quality software projects. In this project, we try to mine projects from GitHub based on earlier research by others and try to validate the quality by comparing the projects with a small subset of quality projects with the help of software complexity metrics.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:lnu-75410
Date January 2018
CreatorsPalmqvist, Simon
PublisherLinnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM)
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0024 seconds