Return to search

The effect of test length, IRT model, type of aberrance, and level of aberrance on the distribution and effectiveness of three appropriateness indices.

There were two basic purposes for this study. The first purpose was to investigate the characteristics of the distributions of Lz, ECIZ4, and W3 for non-aberrant response patterns in combinations of test lengths (40 items and 80 items) and IRT model (the 2PLM and the 3PLM). The second purpose was to investigate the effectiveness of the three indices in twenty-four combinations of two test lengths, two IRT models, two types of aberrance, and three levels of aberrance. In order to investigate the distributions of appropriateness indices in non-aberrant response patterns, data were generated by computer to simulate various measurement conditions. Item parameters were generated within specified ranges to produce similar tests for the two test lengths and two IRT models. Simulated examinees were generated from the normal (0,1) distribution. Two thousand non-aberrant, response vectors were generated for each of four conditions, test length by IRT model. The three appropriateness indices, Lz, ECIZ4, and W3 were calculated for each examinee. This procedure was replicated fifty times for each of the four combinations of test length and IRT model. Of the three indices, ECIZ4 produced the most stable distributions over replications. To examine the effect of test length and IRT model on characteristics of the distributions of the indices, the mean, standard deviation, skewness, and kurtosis were computed for each index in each of the combinations of test length and IRT model over fifty replications. There were no significant effects for either test length or IRT model on the means of the three indices. Based on skewness and kurtosis, the distributions of ECIZ4 most closely approximated normality, while the distribution of W3 was least normal. To establish false positive rates, the tails of the distributions of each index were then examined at P$\sb $, P$\sb $, P$\sb $, and P$\sb{25}$ for each of the four conditions. Of the three indices ECIZ4 seemed least affected and W3 most affected by test length, IRT model, and the interaction of test length and IRT model. To investigate the effectiveness of the indices, aberrant response patterns were generated for the twenty-four combinations of the four variables (2 test lengths x 2 models x 2 types of aberrance x 3 levels of aberrance). Four thousand simulated examinees were generated for each of the twenty-four combinations and each index was computed for each examinee for each of the twenty-four combinations. The detection rates of the indices were then computed and compared for each index for each of the twenty-four conditions. Overall, the 80 item test produced somewhat better detection rates than the 40 item test and the 2PLM better rates than the 3PLM. Spuriously low scores tended to produce slightly higher detection rates than spuriously high scores under most conditions. Higher levels of aberrance tended to produce higher detection rates although for some conditions there was little difference between 15% and 30% aberrance. Lz and ECIZ4 tended to produce better detection rates than W3; however, no detection rates seemed to be as high as those reported in previous research. (Abstract shortened by UMI.)

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/5594
Date January 1990
CreatorsNoonan, Brian W.
ContributorsBoss, M.,
PublisherUniversity of Ottawa (Canada)
Source SetsUniversité d’Ottawa
Detected LanguageEnglish
TypeThesis
Format154 p.

Page generated in 0.0021 seconds