Large Language Models were developed during the 2010s, and chatbots like ChatGPT quickly became popular. The continued development of LLMs led to tools with specific use cases, one of which is software development. In this study, eight different LLMs are tested on their ability to solve the programming challenge Advent of Code. Advent of Code consists of 25 problems, each with two parts. Each LLM is given five attempts to try to solve the problem by generating Python code, and after each attempt, feedback is provided to the tools on any issues with the solution. The results show that ChatGPT-4 and Github Copilot generated the most correct solutions, with ChatGPT-4 generating the most correct solutions on the first attempt. The quality of the code is also examined using SonarQube, and ChatGPT-4 is the best in this regard as well. Of the tools tested in this study, Google's Gemini and Gemini Advanced had the fewest correct solutions. Based on the results of this study, it is clear that these LLMs are good at generating code, but Advent of Code 2023 is too difficult to solve. Despite this, these tools demonstrate that they can be useful for programmers.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-531628 |
Date | January 2024 |
Creators | Sandström, Johannes |
Publisher | Uppsala universitet, Avdelningen Vi3 |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Relation | UPTEC IT, 1401-5749 ; 24032 |
Page generated in 0.0022 seconds