Global ETD Search

Return to search

Can AI models solve the programming challenge Advent of Code? : Evaluating state of the art large language models

Large Language Models were developed during the 2010s, and chatbots like ChatGPT quickly became popular. The continued development of LLMs led to tools with specific use cases, one of which is software development. In this study, eight different LLMs are tested on their ability to solve the programming challenge Advent of Code. Advent of Code consists of 25 problems, each with two parts. Each LLM is given five attempts to try to solve the problem by generating Python code, and after each attempt, feedback is provided to the tools on any issues with the solution. The results show that ChatGPT-4 and Github Copilot generated the most correct solutions, with ChatGPT-4 generating the most correct solutions on the first attempt. The quality of the code is also examined using SonarQube, and ChatGPT-4 is the best in this regard as well. Of the tools tested in this study, Google's Gemini and Gemini Advanced had the fewest correct solutions. Based on the results of this study, it is clear that these LLMs are good at generating code, but Advent of Code 2023 is too difficult to solve. Despite this, these tools demonstrate that they can be useful for programmers.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-531628

LLM

Advent of Code

Computer and Information Sciences

Data- och informationsvetenskap

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-531628
Date	January 2024
Creators	Sandström, Johannes
Publisher	Uppsala universitet, Avdelningen Vi3
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UPTEC IT, 1401-5749 ; 24032

Page generated in 0.001 seconds

Can AI models solve the programming challenge Advent of Code? : Evaluating state of the art large language models

Description

Links & Downloads

Tags

Additional Fields