Differential Item Functioning Analyses for Mixed Response Data Using IRT Likelihood-Ratio Test, Logistic Regression, and Gllamm Procedures

With commonly usage of polytomously scored items in addition to dichotomously scored items in educational tests, it is likely to see these two item formats in a test. Several procedures are available to detect differential item functioning (DIF) for dichotomously scored items. Most of these procedures are extended to be adapted for polytomously scored items. DIF analyses are usually conducted for either dichotomously or polytomously scored items. In this study, DIF analyses were conducted for mixed test that was composed of both dichotomously and polytomously scored items in addition to dichotomous test that was composed of only dichotomously scored items and polytomous test that was composed of only polytomously scored items. Three DIF detection procedures – IRT likelihood-ratio test procedure as an item response (IR) based approach, logistic regression procedure as a non-item response (non-IR) based approach, and generalized linear latent and mixed modeling (GLLAMM) procedure as an alternative approach - were applied to simulation and 10th grade Spring 2004 Florida Comprehensive Assessment Test (FCAT) data. Simulation conditions considered for dichotomous, polytomous, and mixed tests were sample size (N = 600, N = 1200, and N = 2400), sample size ratio between reference group (R) and focal group (F) (N = 300R/300F = 600, N = 400R/200F, N = 600R/600F = 1200, N = 800R/400F = 1200, N = 1200R/1200F = 2400, and N = 1600R/800F), and DIF magnitude (0.32, 0.43, and 0.53). In addition to these conditions, DIF condition (low-shift, high-shift, and balanced) was considered for polytomous test. Simulation was replicated for 100, 200, 300, 400, and 500 times for each simulation condition for the IRT likelihood-ratio test and logistic regression procedures. It was found that 200 replications provided more stable results than 100 replications, and 300, 400, and 500 replications did not improve stability of results. Precision of item parameter estimation with IRTLRDIF program that was run to conduct the IRT likelihood-ratio test DIF analyses for simulation data was evaluated using root mean squared error (RMSE), squared bias, and standard error (SE) for all three test types. Significance of main and two-way interaction effects of sample size, sample size ratio, and DIF magnitude on the mean RMSE, mean squared bias, and mean SE was also tested. The results of item parameter stability study indicated that sample size affected the precision of item parameter estimates in all three types of tests. Item parameters were estimated better for larger sample sizes. Sample size ratio affected the precision of both item discrimination and item difficulty parameter estimation in the dichotomous test, the precision of item discrimination parameter in the polytomous test and the precision of the first between category threshold parameter in the mixed test. Samples with equal sample size ratio provided slightly better estimates than samples with unequal sample size ratio. On the other hand, DIF magnitude was not an effective factor on the precision of item parameter estimation in all three test types. In general, the item parameters were recovered well for the dichotomous, polytomous, and mixed tests by IRTLRDIF program. Type I error and power study was also conducted to evaluate performance of the IRT likelihood-ratio test and logistic regression procedures in detecting DIF for the dichotomous, polytomous, and mixed tests. The power of the likelihood-ratio test and logistic regression procedures were above 0.80 for the medium and large sample sizes and the large DIF magnitude conditions in the dichotomous test. The power of these two procedures increased as sample size or DIF magnitude increased. As a result, the type I error rates of both procedures increased. In general, the type I error rates were in good control for this test type. The polytomous test provided similar results in the power of the IRT likelihood-ratio test procedure with the dichotomous test for all DIF conditions. However, the power of the logistic regression procedure was unacceptably low for all DIF conditions, especially for the balanced DIF condition. The logistic regression procedure provided good power only for the large sample size and DIF magnitude condition. The IRT likelihood ratio-test and logistic regression procedures were very powerful for the large sample size or large DIF magnitude conditions and the type I error rates were within the expected value at these conditions in the mixed test. For the GLLAMM procedure, only one typical dataset was randomly chosen out of 500 datasets for each simulation condition in the dichotomous, polytomous, and mixed tests. Item parameter stability of the GLLAMM procedure in the STATA program was compared with the IRTLRDIF program. It was found that overall, GLLAMM provided closer estimates of items parameters to their true values than IRTLRDIF at most conditions for all three test types. DIF analyses were conducted for 2004 FCAT science data of 10th grade students that was composed of 41 dichotomously and 4 polytomously scored items using the IRT likelihood-ratio test and logistic regression procedures across different sample size and sample size ratios (N = 7761 (calibration sample), N = 300R/300F = 600, N = 400R/200F, N = 600R/600F = 1200, N = 800R/400F = 1200, N = 1200R/1200F = 2400, and N = 1600R/800F). DIF in gender was examined for the calibration sample and the sub samples of the calibration sample. Male students were considered as the reference group and female students were considered as the focal group. Several items were detected as DIF in the calibration sample and only 3 of these items were found to have moderate DIF by the IRT likelihood-ratio test procedures and all DIF items were found to have negligible DIF by the logistic regression procedure. These 3 DIF items were detected by the IRT likelihood-ratio test, logistic regression, and GLLAMM procedures for the sub samples of the calibration sample under all sample size and sample size ratios. / A Dissertation submitted to the Department of Educational Psychology and Learning
Systems in partial fulfillment of the requirements for the degree of Doctor of
Philosophy. / Degree Awarded: Spring Semester, 2007. / Date of Defense: June 12, 2006. / DIF, Mixed Tests, Item Response Based DIF Procedure, Testing, Multilevel IRT DIF Procedure, Measurement, v / Includes bibliographical references. / Akihito Kamata, Professor Directing Dissertation; Janice Flake, Outside Committee Member; Albert C. Oosterhof, Committee Member; Richard L. Tate, Committee Member.

Identiferoai:union.ndltd.org:fsu.edu/oai:fsu.digital.flvc.org:fsu_168368
ContributorsAtar, Burcu (authoraut), Kamata, Akihito (professor directing dissertation), Flake, Janice (outside committee member), Oosterhof, Albert C. (committee member), Tate, Richard L. (committee member), Department of Educational Psychology and Learning Systems (degree granting department), Florida State University (degree granting institution)
PublisherFlorida State University
Source SetsFlorida State University
LanguageEnglish, English
Detected LanguageEnglish
TypeText, text
Format1 online resource, computer, application/pdf

Page generated in 0.0019 seconds