Return to search

N-Grams as a Measure of Naturalness and Complexity

We live in a time where software is used everywhere. It is used even for creating other software by helping developers with writing or generating new code. To do this properly, metrics to measure software quality are being used to evaluate the final code. However, they are sometimes too costly to compute, or simply don't have the expected effect. Therefore, new and better ways of software evaluation are needed. In this research, we are investigating the usage of the statistical approaches used commonly in the natural language processing (NLP) area. In order to introduce and evaluate new metrics, a Java N-gram language model is created from a large Java language code corpus. Naturalness, a method-level metric, is introduced and calculated for chosen projects. The correlation with well-known software complexity metrics are calculated and discussed. The results, however, show that the metric, in the form that we have defined it, is not suitable for software complexity evaluation since it is highly correlated with a well-known metric (token count), which is much easier to compute. Different definition of the metric is suggested, which could be a target of future study and research.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:lnu-90006
Date January 2019
CreatorsRandák, Richard
PublisherLinnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM)
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0022 seconds