Global ETD Search

Return to search

Assessing Code Quality and Performance in AI-Generated Code for Test Automation

Recent advancements in Artificial Intelligence (AI) have directly impacted and benefited many fields, such as Education, Healthcare, Entertainment, etc. Computer Science and Software Engineering are also fields that have been affected and benefited from these advances and today AI-powered services such as OpenAI’s ChatGPT, GitHub copilot and Hugging Face’s Huggingchat are widely used as aids to write, compare, or analyze source code for different types of applications. One lingering question about these services is how good they are in terms of code quality, standardization, and readiness to be used. In most cases source code retrieved from these services require modifications to fulfill their original purpose effectively. This work presents an experiment with the aim of analyzing how state-of-the-art Large Language Models (LLMs) perform when generating test scripts for a target application. More specifically, we set up a controlled environment with a backend application - developed in Python - and used ten different large language models to generate test scripts for said backend application. Then, we evaluated the results using code metrics, as well as metrics related to test execution to see how good the generated test code was. For this, we used the following models: GPT3.5-turbo, GPT-4, GPT4.0-turbo, Codellama-70B, Google Gemma-7b-it, Llama2-13B, Llama2-70B, Mistral-7B, Mixtral8x7B and NeuralHermes2.5-7B. The results of the experiment revealed that GPT4.0-turbo outperformed the other models both when the target application is fully working but also when we intentionally introduced bugs into the application. Although the experiments in this work were performed on a simple backend application, they show the performance of the selected models when it comes to specific code metrics for the simple scenario. Our intention is that this work will serve as an inspiration for further work and investigation, specifically to code metrics and coding standards within Automated Software Testing.

http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-107550

Artificial Intelligence

Applied AI

Software Testing

Computer and Information Sciences

Data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:ltu-107550
Date	January 2024
Creators	Silva, Rafael
Publisher	Luleå tekniska universitet, Institutionen för system- och rymdteknik
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0016 seconds

Assessing Code Quality and Performance in AI-Generated Code for Test Automation

Description

Links & Downloads

Tags

Additional Fields