In recent years, public attention has become focused on the issue of test and item bias in standardized tests. Since the 1980's, the Mantel-Haenszel (Holland & Thayer, 1986) and Logistic Regression procedures (Swaminathan & Rogers, 1990) have been developed to detect item bias, or differential item functioning (dif). In this study the effectiveness of the MH and LR procedures was compared under a variety of conditions, using simulated data. The ability of the MH and LR to detect dif was tested at sample sizes of 100/100, 200/200, 400/400, 600/600, and 800/800. The simulated test had 66 items, the first 33 items with item discrimination ("a") set at 0.80, the second 33 items with "a" set at 1.20. The pseudo-guessing parameter ("c") was 0.15 for all items. The item difficulty ("b") parameter ranged from $-$2.00 to 2.00 in increments of 0.125 for the first 33 items, and again for the second 33 items. Both the MH and LRU detected dif with a high degree of success whenever sample size was large (600 or more), especially when effect size, no matter how measured, was also large. The LRU outperformed the MH marginally under almost every condition of the study. However, the LRU also had a higher false-positive rate than the MH, a finding consistent with previous studies (Pang et al., 1994, Tian et al., 1994a, 1994b). Since the "a" and "b" parameters which underly the computation of the three measures of effect size used in the study are not always determinable in data derived from real world test administrations, it may be that the $\Delta\sb{\rm MH}$ is the best available measure of effect size in real world test items. (Abstract shortened by UMI.)
Identifer | oai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/10019 |
Date | January 1995 |
Creators | Hadley, Patrick. |
Contributors | Boss, Marvin, |
Publisher | University of Ottawa (Canada) |
Source Sets | Université d’Ottawa |
Detected Language | English |
Type | Thesis |
Format | 95 p. |
Page generated in 0.0021 seconds