Global ETD Search

Return to search

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and
machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the
magnitudes of and relationships among indices for inter-rater agreement used to assess the relatedness of human and automated essay scoring,
and to examine impacts of rater variability on inter-rater agreement. To implement the investigations, my study consists of two parts:
empirical and simulation studies. Based on the results from the empirical study, the overall effects for inter-rater agreement were .63 and
.99 for exact and adjacent proportions of agreement, .48 for kappas, and between .75 and .78 for correlations. Additionally, significant
differences between 6-point scales and the other scales (i.e., 3-, 4-, and 5-point scales) for correlations, kappas and proportions of
agreement existed. Moreover, based on the results of the simulated data, the highest agreements and lowest discrepancies achieved in the
matched rater distribution pairs. Specifically, the means of exact and adjacent proportions of agreement, kappa and weighted kappa values,
and correlations were .58, .95, .42, .78, and .78, respectively. Meanwhile the average standardized mean difference was .0005 in the matched
rater distribution pairs. Acceptable values for inter-rater agreement as evaluation criteria for automated essay scoring, impacts of rater
variability on inter-rater agreement, and relationships among inter-rater agreement indices were discussed. / A Dissertation submitted to the Department of Educational Psychology and Learning Systems in partial
fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2017. / November 10, 2017. / Automated Essay Scoring, Inter-Rater Agreement, Meta-Analysis, Rater Variability / Includes bibliographical references. / Betsy Jane Becker, Professor Directing Dissertation; Fred Huffer, University Representative; Insu Paek,
Committee Member; Qian Zhang, Committee Member.

Statistics

Educational tests and measurements

Identifer	oai:union.ndltd.org:fsu.edu/oai:fsu.digital.flvc.org:fsu_605041
Contributors	Yun, Jiyeo (author), Becker, Betsy Jane, 1956- (professor directing dissertation), Huffer, Fred W. (Fred William) (university representative), Paek, Insu (committee member), Zhang, Qian (committee member), Florida State University (degree granting institution), College of Education (degree granting college), Department of Educational Psychology and Learning Systems (degree granting departmentdgg)
Publisher	Florida State University
Source Sets	Florida State University
Language	English, English
Detected Language	English
Type	Text, text, doctoral thesis
Format	1 online resource (118 pages), computer, application/pdf

Page generated in 0.0025 seconds

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Description

Links & Downloads

Tags

Additional Fields