Global ETD Search

Return to search

A comparison of calibration methods and proficiency estimators for creating IRT vertical scales

The main purpose of this study was to construct different vertical scales based on various combinations of calibration methods and proficiency estimators to investigate the impact different choices may have on these properties of the vertical scales that result: grade-to-grade growth, grade-to-grade variability, and the separation of grade distributions. Calibration methods investigated were concurrent calibration, separate calibration, and fixed a, b, and c item parameters for common items with simple prior updates (FSPU). Proficiency estimators investigated were Maximum Likelihood Estimator (MLE) with pattern scores, Expected A Posteriori (EAP) with pattern scores, pseudo-MLE with summed scores, pseudo-EAP with summed scores, and Quadrature Distribution (QD). The study used datasets from the Iowa Tests of Basic Skills (ITBS) in the Vocabulary, Reading Comprehension (RC), Math Problem Solving and Data Interpretation (MPD), and Science tests for grades 3 through 8.
For each of the research questions, the following conclusions were drawn from the study. With respect to the comparisons of three calibration methods, for the RC and Science tests, concurrent calibration, compared to FSPU and separate calibration, showed less growth and more slowly decreasing growth in the lower grades, less decrease in variability over grades, and less separation in the lower grades in terms of horizontal distances. For the Vocabulary and MPD tests, differences in both grade-to-grade growth and in the separation of grade distributions were trivial. With respect to the comparisons of five proficiency estimators, for all content areas, the trend of pseudo-MLE ≥ MLE > QD > EAP ≥ pseudo-EAP was found in within-grade SDs, and the trend of pseudo-EAP ≥ EAP > QD > MLE ≥ pseudo-MLE was found in the effect sizes. However, the degree of decrease in variability over grades was similar across proficiency estimators. With respect to the comparisons of the four content areas, for the Vocabulary and MPD tests compared to the RC and Science tests, growth was less, but somewhat steady, and the decrease in variability over grades was less. For separation of grade distributions, it was found that the large growth suggested by larger mean differences for the RC and Science tests was reduced through the use of effect sizes to standardize the differences.

proficiency estimator

Education

Identifer	oai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-1348
Date	01 January 2007
Creators	Kim, Jungnam
Contributors	Frisbie, David A., Kolen, Michael J.
Publisher	University of Iowa
Source Sets	University of Iowa
Language	English
Detected Language	English
Type	dissertation
Format	application/pdf
Source	Theses and Dissertations
Rights	Copyright 2007 Jungnam Kim

Page generated in 0.0019 seconds

A comparison of calibration methods and proficiency estimators for creating IRT vertical scales

Description

Links & Downloads

Tags

Additional Fields