Spelling suggestions: "subject:"delimitatation cerification"" "subject:"delimitatation erification""
1 |
Citation Evaluation Using Large Language Models (LLMs) : Can LLMs evaluate citations in scholarly documents? An experimental study on ChatGPTZeeb, Ahmad, Olsson, Philip January 2024 (has links)
This study investigates the capacity of Large Language Models (LLMs), specifically ChatGPT 3.5 and 4, to evaluate citations in scholarly papers. Given the importance of accurate citations in academic writing, the goal was to determine how well these models can assist in verifying citations. A series of experiments were conducted using a dataset of our own creation. This dataset includes the three main citation categories: Direct Quotation, Paraphrasing, and Summarising, along with subcategories such as minimal and long source text. In the preliminary experiment, ChatGPT 3.5 demonstrated perfect accuracy, while ChatGPT 4 showed a tendency towards false positives. Further experiments with an extended dataset revealed that ChatGPT 4 excels in correctly identifying valid citations, particularly with longer and more complex texts, but is also more prone to wrong predictions. ChatGPT 3.5, on the other hand, provided a more balanced performance across different text lengths, with both models achieving an accuracy rate of 90.7%. The reliability experiments indicated that ChatGPT 4 is more consistent in its responses compared to ChatGPT 3.5, although it also had a higher rate of consistent wrong predictions. This study highlights the potential of LLMs to assist scholars in citation verification, suggesting a hybrid approach where ChatGPT 4 is used for initial scans and ChatGPT 3.5 for final verification, paving the way for automating this process. Additionally, this study contributes a dataset that can be further expanded and tested on, offering a valuable resource for future research in this domain.
|
Page generated in 0.1169 seconds