• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Comparative Analysis of ChatGPT-4and Gemini Advanced in ErroneousCode Detection and Correction

Sun, Erik Wen Han, Grace, Yasine January 2024 (has links)
This thesis investigates the capabilities of two advanced Large Language Models(LLMs) OpenAI’s ChatGPT-4 and Google’s Gemini Advanced in the domain ofSoftware engineering. While LLMs are widely utilized across various applications,including text summarization and synthesis, their potential for detecting and correct-ing programming errors has not been thoroughly explored. This study aims to fill thisgap by conducting a comprehensive literature search and experimental comparisonof ChatGPT-4 and Gemini Advanced using the QuixBugs and LeetCode benchmarkdatasets, with specific focus on Python and Java programming languages. The re-search evaluates the models’ abilities to detect and correct bugs using metrics suchas Accuracy, Recall, Precision, and F1-score.Experimental results presets that ChatGPT-4 consistently outperforms GeminiAdvanced in both the detection and correction of bugs. These findings provide valu-able insights that could guide further research in the field of LLMs.
2

A Method for Automated Assessment of Large Language Model Chatbots : Exploring LLM-as-a-Judge in Educational Question-Answering Tasks

Duan, Yuyao, Lundborg, Vilgot January 2024 (has links)
This study introduces an automated evaluation method for large language model (LLM) based chatbots in educational settings, utilizing LLM-as-a-Judge to assess their performance. Our results demonstrate the efficacy of this approach in evaluating the accuracy of three LLM-based chatbots (Llama 3 70B, ChatGPT 4, Gemini Advanced) across two subjects: history and biology. The analysis reveals promising performance across different subjects. On a scale from 1 to 5 describing the correctness of the judge itself, the LLM judge’s average scores for correctness when evaluating each chatbot on history related questions are 3.92 (Llama 3 70B), 4.20 (ChatGPT 4), 4.51 (Gemini Advanced); for biology related questions, the average scores are 4.04 (Llama 3 70B), 4.28 (ChatGPT 4), 4.09 (Gemini Advanced). This underscores the potential of leveraging the LLM-as-a-judge strategy to evaluate the correctness of responses from other LLMs.

Page generated in 0.1854 seconds