Research into the relatively rarely used examinee-selected item assessment designs has revealed certain challenges. This study aims to more comprehensively re-examine the key issues around examinee-selected items under a modern model for constructed-response scoring. Specifically, data were simulated under the hierarchical rater model with signal detection theory rater components (HRM-SDT; DeCarlo, Kim, and Johnson, 2011) and a variety of examinee-item selection mechanisms were considered. These conditions varied from the hypothetical baseline condition--where examinees choose randomly and with equal frequency from a pair of item prompts--to the perhaps more realistic and certainly more troublesome condition where examinees select items based on the very subject-area proficiency that the instrument intends to measure. While good examinee, item, and rater parameter recovery was apparent in the former condition for the HRM-SDT, serious issues with item and rater parameter estimation were apparent in the latter. Additional conditions were considered, as well as competing psychometric models for the estimation of examinee proficiency. Finally, practical implications of using examinee-selected item designs are given, as well as future directions for research.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/D8X929DC |
Date | January 2013 |
Creators | Patterson, Brian Francis |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0015 seconds