Return to search

A Psychometric Evaluation of Script Concordance Tests for Measuring Clinical Reasoning

Indiana University-Purdue University Indianapolis (IUPUI) / Purpose: Script concordance tests (SCTs) are assessments purported to measure clinical data interpretation. The aims of this research were to (1) test the psychometric properties of SCT items, (2) directly examine the construct validity of SCTs, and (3) explore the concurrent validity of six SCT scoring methods while also considering validity at the item difficulty and item type levels.
Methods: SCT scores from a problem solving SCT (SCT-PS; n=522) and emergency medicine SCT (SCT-EM; n=1040) were used to investigate the aims of this research. An item analysis was conducted to optimize the SCT datasets, to categorize items into levels of difficulty and type, and to test for gender biases. A confirmatory factor analysis tested whether SCT scores conformed to a theorized unidimensional factor structure. Exploratory factor analyses examined the effects of six SCT scoring methods on construct validity. The concurrent validity of each scoring method was also tested via a one-way multivariate analysis of variance (MANOVA) and Pearson’s product moment correlations. Repeated measures analysis of variance (ANOVA) and one-way ANOVA tested the discriminatory power of the SCTs according to item difficulty and type.
Results: Item analysis identified no gender biases. A combination of moderate model-fit indices and poor factor loadings from the confirmatory factor analysis suggested that the SCTs under investigation did not conform to a unidimensional factor structure. Exploratory factor analyses of six different scoring methods repeatedly revealed weak factor loadings, and extracted factors consistently explained only a small portion of the total variance. Results of the concurrent validity study showed that all six scoring methods discriminated between medical training levels in spite of lower reliability coefficients on 3-point scoring methods. In addition, examinees as MS4s significantly (p<0.001) outperformed their MS2 SCT scores in all difficulty categories. Cross-sectional analysis of SCT-EM data reported significant differences (p<0.001) between experienced EM physicians, EM residents, and MS4s at each level of difficulty. When considering item type, diagnostic and therapeutic items differentiated between all three training levels, while investigational items could not readily distinguish between MS4s and EM residents.
Conclusions: The results of this research contest the assertion that SCTs measure a single common construct. These findings raise questions about the latent constructs measured by SCTs and challenge the overall utility of SCT scores. The outcomes of the concurrent validity study provide evidence that multiple scoring methods reasonably differentiate between medical training levels. Concurrent validity was also observed when considering item difficulty and item type.

Identiferoai:union.ndltd.org:IUPUI/oai:scholarworks.iupui.edu:1805/3877
Date29 January 2014
CreatorsWilson, Adam Benjamin
ContributorsPike, Gary R. (Gary Robert), 1952-, Humbert, Aloysius J., Brokaw, James J., Seifert, Mark F.
Source SetsIndiana University-Purdue University Indianapolis
Languageen_US
Detected LanguageEnglish
TypeThesis

Page generated in 0.0023 seconds