Return to search

Investigation of the validity of the Angoff standard setting procedure for multiple -choice items

Setting passing standards is one of the major challenges in the implementation of valid assessments for high-stakes decision making in testing situations such as licensing and certification. If high stakes pass-fail decisions are to be made from test scores, the passing standards must be valid for the assessment itself to be valid. Multiple-choice test items continue to play an important role in measurement. The Angoff (1971) procedure continues to be widely used to set standards on multiple-choice examinations. This study focuses on the internal consistency, or underlying validity, of Angoff standard setting ratings. The Angoff procedure requires judges to estimate the proportion of borderline candidates who would answer each test question correctly. If the judges are successful at estimating the difficulty of items for borderline candidates that suggests an underlying validity to the procedure. This study examines this question by evaluating the relationships among Angoff standard setting ratings and actual candidate performance from professional certification tests. For each test, a borderline group of candidates was defined as those near the cutscore. The analyses focus on three aspects of judges' ratings with respect to item difficulties for the borderline group: accuracy, correlation and variability. The results of this study demonstrate some evidence for the validity of the Angoff standard setting procedure. For two of the three examinations studied, judges were accurate and consistent in rating the difficult of items for borderline candidates. However, the study also shows that the procedure may be less successful in its application. These results indicate that the procedure can be valid, but that its validity should be checked for each application. Practitioners should not assume that the Angoff method is valid. The results of this study also show some limitations to the procedure even when the overall results are positive. Judges are less successful at rating very difficult or very easy test items. The validity of the Angoff procedure may be enhanced by further study of methods designed to ameliorate those limitations.

Identiferoai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:dissertations-1915
Date01 January 2000
CreatorsMattar, John D
PublisherScholarWorks@UMass Amherst
Source SetsUniversity of Massachusetts, Amherst
LanguageEnglish
Detected LanguageEnglish
Typetext
SourceDoctoral Dissertations Available from Proquest

Page generated in 0.0028 seconds