Return to search

Obtaining norm-referenced scores from criterion-referenced tests: An analysis of estimation errors

One customized testing model equates a criterion-referenced test (CRT) to a norm-referenced test (NRT) so that performance on the CRT can produce an estimate of performance on the NRT. The error associated with these estimated norms is not well understood. The purpose of this study was to examine the extent and nature of error present in these normative scores. In two subject areas and at three grade levels, actual NRT scores were compared to NRT scores which were estimated from a CRT. The estimation error was analyzed for individual scores and for group means at different parts of the score distribution. For individuals, the mean absolute difference between the actual NRT scores and the estimated NRT scores was approximately five raw score points on a 60-item reading subtest and approximately two points on a 30-item mathematics subtest. A comparison of the standard errors of substitution showed that individual differences were similar whether a parallel form or a CRT estimate was substituted for the NRT score. The bias present in the estimation of NRT scores from a CRT for groups of examinees is shown by the mean difference between the estimated and actual NRT scores. For all subtests, mean differences were less than one score point, indicating that group data can be accurately obtained through the use of this model. To examine the accuracy of estimation at different parts of the score distribution, the data was divided into three score groups (low, middle, and high) and, subsequently, into deciles. After correcting for a regression effect, mean group differences between actual NRT scores and those estimated from a CRT were fairly consistent for groups at different parts of the distribution. Individual scores, however, were most accurate at the upper end of the score distribution with a decline in accuracy as the score level decreased. In conclusion, this study offers evidence that NRT scores can be estimated from performance on a CRT with reasonable accuracy. However, generalizability of these results to other sets of tests or other populations is unknown. It is recommended that similar research be pursued under varying conditions.

Identiferoai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:dissertations-8179
Date01 January 1991
CreatorsTucker, Charlene Gower
PublisherScholarWorks@UMass Amherst
Source SetsUniversity of Massachusetts, Amherst
LanguageEnglish
Detected LanguageEnglish
Typetext
SourceDoctoral Dissertations Available from Proquest

Page generated in 0.015 seconds