Global ETD Search

Return to search

Validating the Quality of a Big Data Java Corpus

Recent research within the field of Software Engineering have used GitHub, the largest hub for open source projects with almost 20 million users and 57 million repositories, to mine large amounts of source code to get more trustworthy results when developing machine and deep learning models. Mining GitHub comes with many challenges since the dataset is large and the data does not only contain quality software projects. In this project, we try to mine projects from GitHub based on earlier research by others and try to validate the quality by comparing the projects with a small subset of quality projects with the help of software complexity metrics.

http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-75410

mining software repositories

GitHub

GHTorrent

Chidamber & Kemerer metrics

software complexity

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:lnu-75410
Date	January 2018
Creators	Palmqvist, Simon
Publisher	Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0024 seconds

Validating the Quality of a Big Data Java Corpus

Description

Links & Downloads

Tags

Additional Fields