This study investigated the effects of test length (10, 20 and 30 items), scoring schema (proportion of dichotomous ad polytomous scoring) and item analysis model (IRT and Rasch) on the ability estimates, test information levels and optimization criteria of mixed item format tests. Polytomous item responses to 30 items for 1000 examinees were simulated using the generalized partial-credit model and SAS software. Portions of the data were re-coded dichotomously over 11 structured proportions to create 33 sets of test responses including mixed item format tests. MULTILOG software was used to calculate the examinee ability estimates, standard errors, item and test information, reliability and fit indices. A comparison of IRT and Rasch item analysis procedures was made using SPSS software across ability estimates and standard errors of ability estimates using a 3 x 11 x 2 fixed factorial ANOVA. Effect sizes and power were reported for each procedure. Scheffe post hoc procedures were conducted on significant factos. Test information was analyzed and compared across the range of ability levels for all 66-design combinations. The results indicated that both test length and the proportion of items scored polytomously had a significant impact on the amount of test information produced by mixed item format tests. Generally, tests with 100% of the items scored polytomously produced the highest overall information. This seemed to be especially true for examinees with lower ability estimates. Optimality comparisons were made between IRT and Rasch procedures based on standard error rates for the ability estimates, marginal reliabilities and fit indices (-2LL). The only significant differences reported involved the standard error rates for both the IRT and Rasch procedures. This result must be viewed in light of the fact that the effect size reported was negligible. Optimality was found to be highest when longer tests and higher proportions of polytomous scoring were applied. Some indications were given that IRT procedures may produce slightly improved results in gathering available test information. Overall, significant differences were not found between the IRT and Rasch procedures when analyzing the mixed item format tests. Further research should be conducted in the areas of test difficulty, examinee test scores, and automated partial-credit scoring along with a comparison to other traditional psychometric measures and how they address challenges related to the mixed item format tests.
Identifer | oai:union.ndltd.org:unt.edu/info:ark/67531/metadc4316 |
Date | 08 1900 |
Creators | Kinsey, Tari L. |
Contributors | Schumacker, Randall E., Gates, Gordon S., Henson, Robin K. |
Publisher | University of North Texas |
Source Sets | University of North Texas |
Language | English |
Detected Language | English |
Type | Thesis or Dissertation |
Format | Text |
Rights | Public, Copyright, Kinsey, Tari L., Copyright is held by the author, unless otherwise noted. All rights reserved. |
Page generated in 0.0018 seconds