As the content and format of educational assessments evolve, the need for valid and workable standard setting methods grows as well. Although there are numerous standard setting methods available for multiple-choice items, there is a much smaller pool of methods from which to choose when constructed-response items or performance assessments are considered. In this study, four standard setting methods were evaluated. Two of the methods were used with the simulation component of a licensing examination, and two were used with the multiple-choice component. The two methods used with the simulations were the Work Classification method and the Analytic method. With the multiple-choice items, the Item Cluster method and Direct Consensus method were employed. The Item Cluster and Direct Consensus methods had each been the subject of research on two previous occasions, and the aims of the current study were to make modifications suggested by earlier findings and to seek replication of trends found earlier. The Work Classification and Analytic methods, while bearing some similarity to existing methods, are seen as new approaches specially configured to reflect the features of the simulations under consideration in the study. The results for each method were evaluated in terms of three sources of validity evidence—procedural, internal, and external—and the methods for each item type were contrasted to each other to assess their relative strengths and weaknesses. For the methods used with the simulations, the Analytic method has an advantage procedurally due to time factors, but panelists felt more positively about the Work Classification method. Internally, interrater reliability for the Analytic method was lower. Externally, the consistency of cut scores between methods was good in two of the three simulations; the larger difference on the third simulation may be explainable by other factors. For the methods used with the multiple-choice items, this study's findings support most of those found in earlier research. Procedurally, the Direct Consensus method is more efficient. Internally, there was less consistency across panels with the Direct Consensus method. Externally, the Direct Consensus method produced higher cut scores. Suggestions for future research for all four methods are given.
Identifer | oai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:dissertations-3763 |
Date | 01 January 2003 |
Creators | Pitoniak, Mary Jean |
Publisher | ScholarWorks@UMass Amherst |
Source Sets | University of Massachusetts, Amherst |
Language | English |
Detected Language | English |
Type | text |
Source | Doctoral Dissertations Available from Proquest |
Page generated in 0.0022 seconds