Return to search

Comparing scoring instruments for the performance assessment of professional competencies

Performance assessments of professionals are commonly scored with rating scales but checklists are used with the Objective Structured Clinical Examination (OSCE). Theory suggests checklists are too detailed (Norman, 2005a) and studies show that generic rating scales (e.g., Hodges, Regehr, McNaughton, Tiberius, & Hanson, 1999) are more discriminating and reliable. However, generic scales may represent clinical expertise too narrowly, resulting in less valid scores.
Study One asked how case-specific rating scale scores and decisions compared to checklist scores and decisions at representing professional competency. Study Two asked the same question about a skill-specific rating scale.
Data were from a medical licensure OSCE. Participants were 1,587 test takers and 190 physician raters. Two patient cases (Depression and Delirium) were used. In each case, two physicians scored. One scored a checklist and the other a rating scale; both scored a patient interaction rating scale (PIRS).
Results from both studies showed internal consistency was higher for rating scales (e.g., for Study One alpha=.87 and alpha=.79 for rating scales; alpha=.55 and alpha=.64 for checklists). Item-total score correlations (ITC) for each rating scale as an item within the OSCE were also higher than the ITC for the checklists in both studies. Logistic regression analyses predicting pass/fail decisions from a 3-variable expertise model explained more variance for the rating scales decisions than it did for the checklists in both studies (e.g., RL2=16.4% and RL2=16.4% for rating scales; RL2=12.7% and RL2=5.7% for checklists in Study One). The highest Pearson correlations (corrected for attenuation) were between the rating scale scores and the respective PIRS scores. In conclusion, these rating scale scores are more discriminating and reliable than checklist scores but correlations with PIRS scores indicate they may not measure the intended dimension of the clinical expertise construct. Evidence for a validity argument was strongest for the case-specific rating scale for Depression, raising the question of whether rating scale methodology is appropriate for all OSCE cases.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/29493
Date January 2007
CreatorsSmee, Sydney M
PublisherUniversity of Ottawa (Canada)
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Format195 p.

Page generated in 0.0017 seconds