Incorporating off-grade items within an on-grade item pool is often seen in K-12 testing programs. Incorporating off-grade items may provide improvements in measurement precision, test length, and content blueprint fulfillment, especially for high- and low-performing examinees, but it may also identify some concerns when using too many off-grade items on tests that are primarily designed to measure grade-level standards. This dissertation investigates how practical constraints such as the number of on-grade items, the proportion, and range of off-grade items, and the stopping rules affect item pool characteristics and item pool performance in adaptive testing.
This study includes simulation conditions with four study factors: (1) three on-grade pool sizes (150, 300, and 500 items), (2) three proportions of off-grade items in the item pool (small, moderate, and large), (3) two ranges of off-grade items (one grade level and two grade levels), and (4) two stopping rules (variable- and fixed-length stopping rule) with two SE threshold levels. All the results are averaged across 200 replications for each simulation condition.
The item pool characteristics are summarized using descriptive statistics and histograms of item difficulty (the b-parameters), descriptive statistics and plots of test information functions (TIFs), and the standard errors of the ability estimate (SEEs). The item pool performance is evaluated based on the descriptive statistics of measurement precision, test length and exposure properties, content blueprint fulfillment, and mean proportion of off-grade items for each test.
The results show that there are some situations in which incorporating off-grade items would be beneficial. For example, a testing organization with a small item pool attempting to improve item pool performance for high- and low-performing examinees. The results also show that practical constraints of incorporating off-grade items, organized here from most impact to least impact in item pool characteristics and item pool performance, are: 1) incorporating off-grade items into small baseline pool or large baseline pool; 2) broadening the range of off-grade items from one grade level to two grade levels; 3) increasing the proportion of off-grade items in the item pool; and 4) applying variable- or fixed-length CAT. The results indicated that broadening the range of off-grade items yields improvements in measurement precision and content blueprint fulfillment when compared to increasing the proportion of off-grade items. This study could serve as guidance for test organizations when considering the benefits and limitations of incorporating off-grade items into on-grade item pools.
Identifer | oai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-8480 |
Date | 01 August 2019 |
Creators | Liu, Xiangdong |
Contributors | Welch, Catherine J., Dunbar, Stephen B. |
Publisher | University of Iowa |
Source Sets | University of Iowa |
Language | English |
Detected Language | English |
Type | dissertation |
Format | application/pdf |
Source | Theses and Dissertations |
Rights | Copyright © 2019 Xiangdong Liu |
Page generated in 0.0024 seconds