Return to search

A comparison of the sample invariance of item statistics from the classical test model, item response model, and structural equation model: A case study of real response data.

The sample dependency of item statistics and the practical importance of alternative models for test scores are evaluated in this case study using real data. The hypothesized superiority of the item response model (IRM) attributed to the sample invariance of item statistics is tested against a classical test theory (CTT) model and a structural equation model (SEM) for responses to the Center for Epidemiologic Studies-Depression (CES-D) scale. Sample-invariance of item statistics is tested in 10 random samples of 500 people, and across gender, age, and different health groups. Practical implications are considered in a comparison of differential score reliability, and individual rankings, using each test model. Item estimates from a 2-parameter logistic IRM were compared with classical item difficulty and discrimination estimates, and with item regression path estimates from a uni-factor SEM. An intraclass correlation coefficient (ICC) was calculated to evaluate the level of absolute agreement between item statistics in each condition. Item statistics from all test models were very similar across random samples, indicating a high level of invariance. However, IRM threshold parameters were least sensitive to sampling compared with other models. Greater variance was found among item statistics based on all test models across groups that differed in age, health, or gender. IRM discrimination estimates were most stable across contrasting groups, compared with those from other test models. Rankings assigned to individuals were most similar when CTT scores and linear transformed IRM scores were compared. The largest variation in individual rankings was obtained when SEM factor scores were compared with CTT scores in the higher score ranges. The reliability estimate for factor scores based on the SEM was highest overall. However, IRM optimal scores and the modified reliability estimate based on these, provide a more accurate estimate of average measurement error. This evidence supports the hypothesis of improved score precision when tests are constructed and scored using IRM techniques. However, rankings based on individual CES-D scores were very similar when the CTT, IRM, and SEM techniques were compared. Therefore, CTT or SEM scoring are reasonable alternatives to the IRM when norm-referenced score interpretations are based on CES-D scores.

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/9132
Date January 2001
CreatorsBreithaupt, Krista J.
ContributorsZumbo, Bruno,
PublisherUniversity of Ottawa (Canada)
Source SetsUniversité d’Ottawa
Detected LanguageEnglish
TypeThesis
Format161 p.

Page generated in 0.0028 seconds