Return to search

FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems

In a software life-cycle Source code repositories serve as vast storage areas for program code, ensuring its maintenance and version control throughout the development process. It is not uncommon for these repositories to house programs with hidden errors, which only manifest under specific input conditions, causing the program to deviate from its intended functionality. The growing intricacy of software design has amplified the time and resources required to pinpoint and rectify these issues. These errors, often unintended by developers, can be challenging to identify and correct. While there are techniques to auto-correct faulty code, the expansive realm of potential solutions for a single bug means there's a scarcity of tools and datasets for effective evaluation of the corrected code. This study presents FIXEVAL, a benchmark that includes flawed code entries from competitive coding challenges and their corresponding corrections. FIXEVAL offers an extensive test suite that not only gauges the accuracy of fixes generated by models but also allows for the assessment of a program's functional correctness. This suite further sheds light on time, memory limits, and acceptance based on specific outcomes. We utilize cutting-edge language models, trained on coding languages, as our reference point and juxtapose them using match-based (essentially token similarity) and execution-based (focusing on functional assessment) criteria. Our research indicates that while match-based criteria might not truly represent the functional precision of fixes generated by models, execution-based approaches offer a comprehensive evaluation tailored to the solution. Consequently, we posit that FIXEVAL paves the way for practical automated error correction and assessment of code generated by models. Dataset and models for all of our experiments are made publicly available at https://github.com/mahimanzum/FixEval. / Master of Science / Think of source code repositories as big digital libraries where computer programs are kept safe and updated. Sometimes, these programs have hidden mistakes that only show up under certain conditions, making the program act differently than planned which we call bugs or errors. As software gets more complex, it takes more time and effort to find and fix these mistakes. Even though there are ways to automatically fix these errors, finding the best solution can be like looking for a needle in a haystack. That's why there aren't many tools to check if the automatic fixes are right. Enter FIXEVAL: our new tool that tests and compares faulty computer code from coding competitions and their fixes. It has a set of tests to see how well the fixed code works and gives insights into its performance and results. We used the latest computer language tools to see how well they fix code, comparing them in two ways: by looking at the code's structure and by testing its function. Our findings? Just looking at the code's structure isn't enough; we need to test how it works in action. We believe FIXEVAL is a big step forward in making sure automatic code fixes are spot-on. Dataset and models for all of our experiments are made publicly available at https://github.com/mahimanzum/FixEval.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/116666
Date14 November 2023
CreatorsHaque, Md Mahim Anjum
ContributorsComputer Science and Applications, Brown, Dwayne Christian, Lourentzou, Ismini, Tilevich, Eli
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0023 seconds