Return to search

Subscore equating with the random groups design

There is an increasing demand for subscore reporting in the testing industry. Many testing programs already include subscores as part of their score report or consider a plan of reporting subscores. However, relatively few studies have been conducted on subscore equating. The purpose of this dissertation is to address the necessity for subscore equating and to evaluate the performance of various equating methods for subscores.
Assuming the random groups design and number-correct scoring, this dissertation analyzed two sets of real data and simulated data with four study factors including test dimensionality, subtest length, form difference in difficulty, and sample size. Equating methods considered in this dissertation were linear equating, equipercentile equating, equipercentile with log-linear presmoothing, equipercentile equating with cubic-spline postsmoothing, IRT true score equating using a three-parameter logistic model (3PL) with separate calibration (3PsepT), IRT observed score equating using 3PL with separate calibration (3PsepO), IRT true score equating using 3PL with simultaneous calibration (3PsimT), IRT observed score equating using 3PL with simultaneous calibration (3PsimO), IRT true score equating using a bifactor model (BF) with simultaneous calibration (BFT), and IRT observed score equating using BF with simultaneous calibration (BFO). They were compared to identity equating and evaluated with respect to systematic, random, and total errors of equating.
The main findings of this dissertation were as follows: (1) reporting subscores without equating would provide misleading information in terms of score profiles; (2) reporting subscores without a pre-specified test specification would bring practical issues such as constructing alternate subtest forms with comparable difficulty, conducting equating between forms with different lengths, and deciding an appropriate score scale to be reported; (3) the best performing subscore equating method, overall, was 3PsepO followed by equipercentile equating with presmoothing, and the worst performing method was BFT; (4) simultaneous calibration involving other subtest items in the calibration process yielded larger bias but smaller random error than did separate calibration, indicating that borrowing information from other subtests increased bias but decreased random error in subscore equating; (5) BFO performed the best when a test is multidimensional, form difference is small, subtest length is short, or sample size is small; (6) equating results for BFT and BFO were affected by the magnitude of factor loading and variability for the estimated general and specific factors; and (7) smoothing improved equating results, in general.

Identiferoai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-6475
Date01 May 2016
CreatorsLim, Euijin
ContributorsLee, Won-Chan
PublisherUniversity of Iowa
Source SetsUniversity of Iowa
LanguageEnglish
Detected LanguageEnglish
Typedissertation
Formatapplication/pdf
SourceTheses and Dissertations
RightsCopyright 2016 Euijin Lim

Page generated in 0.0022 seconds