Global ETD Search

Return to search

Leveraging Linguistic Insights for Uncertainty Calibration of ChatGPT and Evaluating Crowdsourced Annotations

The quality of crowdsource annotations has always been a challenge due to the variability in annotators backgrounds, task complexity, the subjective nature of many labeling tasks, and various other reasons. Hence, it is crucial to evaluate these annotations to ensure their reliability. Traditionally, human experts evaluate the quality of crowdsourced annotations, but this approach has its own challenges. Hence, this paper proposes to leverage large language models like ChatGPT-4 to evaluate one of the existing crowdsourced MAVEN dataset and explore its potential as an alternative solution. However, due to stochastic nature of LLMs, it is important to discern when to trust and question LLM responses. To address this, we introduce a novel approach that applies Rubin's framework for identifying and using linguistic cues within LLM responses as indicators of LLMs certainty levels. Our findings reveal that ChatGPT-4 successfully identified 63% of the incorrect labels, highlighting the potential for improving data label quality through human-AI collaboration on these identified inaccuracies. This study underscores the promising role of LLMs in evaluating crowdsourced data annotations offering a way to enhance accuracy and fairness of crowdsource annotations while saving time and costs.

10.25394/pgs.26214551.v1

Natural language processing

Data quality

Computational linguistics

Crowdsourcing

Certainty calibration

LLM

Evaluating annotation quality

Identifer	oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/26214551
Date	09 July 2024
Creators	Venkata Divya Sree Pulipati (18469230)
Source Sets	Purdue University
Detected Language	English
Type	Text, Thesis
Rights	CC BY 4.0
Relation	https://figshare.com/articles/thesis/Leveraging_Linguistic_Insights_for_Uncertainty_Calibration_of_ChatGPT_and_Evaluating_Crowdsourced_Annotations/26214551

Page generated in 0.002 seconds

Leveraging Linguistic Insights for Uncertainty Calibration of ChatGPT and Evaluating Crowdsourced Annotations

Description

Links & Downloads

Tags

Additional Fields