Return to search

Leveraging Linguistic Insights for Uncertainty Calibration of ChatGPT and Evaluating Crowdsourced Annotations

<p dir="ltr">The quality of crowdsource annotations has always been a challenge due to the variability in annotators backgrounds, task complexity, the subjective nature of many labeling tasks, and various other reasons. Hence, it is crucial to evaluate these annotations to ensure their reliability. Traditionally, human experts evaluate the quality of crowdsourced annotations, but this approach has its own challenges. Hence, this paper proposes to leverage large language models like ChatGPT-4 to evaluate one of the existing crowdsourced MAVEN dataset and explore its potential as an alternative solution. However, due to stochastic nature of LLMs, it is important to discern when to trust and question LLM responses. To address this, we introduce a novel approach that applies Rubin's framework for identifying and using linguistic cues within LLM responses as indicators of LLMs certainty levels. Our findings reveal that ChatGPT-4 successfully identified 63% of the incorrect labels, highlighting the potential for improving data label quality through human-AI collaboration on these identified inaccuracies. This study underscores the promising role of LLMs in evaluating crowdsourced data annotations offering a way to enhance accuracy and fairness of crowdsource annotations while saving time and costs.</p><p dir="ltr"><br></p>

  1. 10.25394/pgs.26214551.v1
Identiferoai:union.ndltd.org:purdue.edu/oai:figshare.com:article/26214551
Date09 July 2024
CreatorsVenkata Divya Sree Pulipati (18469230)
Source SetsPurdue University
Detected LanguageEnglish
TypeText, Thesis
RightsCC BY 4.0
Relationhttps://figshare.com/articles/thesis/Leveraging_Linguistic_Insights_for_Uncertainty_Calibration_of_ChatGPT_and_Evaluating_Crowdsourced_Annotations/26214551

Page generated in 0.002 seconds