21 |
Measurement equivalence of the center for epidemiological studies depression scale in racially/ethnically diverse older adultsKim, Giyeon 01 June 2007 (has links)
This dissertation study was designed to examine measurement equivalence of the Center for Epidemiological Studies Depression (CES-D) Scale across White, African American, and Mexican American elders. Specific aims were to identify race/ethnicity-, sociodemographic-, and acculturation and instrument language-related measurement bias in the CES-D. Three studies were conducted in this dissertation to accomplish these aims. Two existing national datasets were used: the New Haven Established Populations for Epidemiologic Studies of the Elderly (EPESE) for the White and African American samples and the Hispanic Established Populations for Epidemiologic Studies of the Elderly (H-EPESE) for the Mexican-American sample. Differential item functioning (DIF) analyses were conducted using both confirmatory factor analysis (CFA) and item response theory (IRT) methods. Study 1 focused on the role of race/ethnicity on the measurement bias in the CES-D.
Results from Study 1 showed a lack of measurement equivalence of the CES-D among Mexican Americans in the comparison with both Whites and Blacks. Race/ethnicity-specific items were also identified in Study 1: two interpersonal relation items in Blacks and four positive affect items in Mexican Americans. Study 2 focused on identifying sociodemographic-related measurement bias in responses to the CES-D among diverse racial/ethnic groups. Results from Study 2 showed that gender and educational attainment affected item bias in the CES-D. The interaction between gender and educational level and race/ethnicity was also found in Study 2: Mexican American women and lower educated Blacks had a greater predisposition to endorse the 'crying' item. Focusing on Mexican American elders, Study 3 examined how level of acculturation and language influence responses to the CES-D. In Study 3, acculturation and instrument language-biased items were identified in Mexican American elders.
Study 3 also suggested that acculturation-bias was entirely explained by whether the CES-D was administered in the English or the Spanish versions. Possible reasons for item bias on the CES-D are discussed in the context of sociocultural differences in each substudy. Findings from this dissertation provide a broader understanding of sociocultural group differences in depressive symptom measures among racially/ethnically diverse older adults and yield research and practice implications for the use of standard screening tools for depression.
|
22 |
Differential item functioning in the Peabody Picture Vocabulary Test - Third Edition: partial correlation versus expert judgmentConoley, Colleen Adele 30 September 2004 (has links)
This study had three purposes: (1) to identify differential item functioning (DIF) on the PPVT-III (Forms A & B) using a partial correlation method, (2) to find a consistent pattern in items identified as underestimating ability in each ethnic minority group, and (3) to compare findings from an expert judgment method and a partial correlation method. Hispanic, African American, and white subjects for the study were provided by American Guidance Service (AGS) from the standardization sample of the PPVT-III; English language learners (ELL) of Mexican descent were recruited from school districts in Central and South Texas. Content raters were all self-selected volunteers, each had advanced degrees, a career in education, and no special expertise of ELL or ethnic minorities. Two groups of teachers participated as judges for this study. The "expert" group was selected because of their special knowledge of ELL students of Mexican descent. The control group was all regular education teachers with limited exposure to ELL. Using the partial correlation method, DIF was detected within each group comparison. In all cases except with the ELL on form A of the PPVT-III, there were no significant differences in numbers of items found to have significant positive correlations versus significant negative correlations. On form A, the ELL group comparison indicated more items with negative correlation than positive correlation [χ2 (1) = 5.538; p=.019]. Among the items flagged as underestimating ability of the ELL group, no consistent trend could be detected. Also, it was found that none of the expert judges could adequately predict those items that would underestimate ability for the ELL group, despite expertise. Discussion includes possible consequences of item placement and recommendations regarding further research and use of the PPVT-III.
|
23 |
Using Hierarchical Generalized Linear Modeling for Detection of Differential Item Functioning in a Polytomous Item Response Theory Framework: An Evaluation and Comparison with Generalized Mantel-HaenszelRyan, Cari Helena 16 May 2008 (has links)
In the field of education, decisions are influenced by the results of various high stakes measures. Investigating the presence of differential item functioning (DIF) in a set of items ensures that results from these measures are valid. For example, if an item measuring math self-efficacy is identified as having DIF then this indicates that some other characteristic (e.g. gender) other than the latent trait of interest may be affecting an examinee’s score on that particular item. The use of hierarchical generalized linear modeling (HGLM) enables the modeling of items nested within examinees, with person-level predictors added at level-2 for DIF detection. Unlike traditional DIF detection methods that require a reference and focal group, HGLM allows the modeling of a continuous person-level predictor. This means that instead of dichotomizing a continuous variable associated with DIF into a focal and reference group, the continuous variable can be added at level-2. Further benefits of HGLM are discussed in this study. This study is an extension of work done by Williams and Beretvas (2006) where the use of HGLM with polytomous items (PHGLM) for detection of DIF was illustrated. In the Williams and Beretvas study, the PHGLM was compared with the generalized Mantel-Haenszel (GMH), for DIF detection and it was found that the two performed similarly. A Monte Carlo simulation study was conducted to evaluate HGLM’s power to detect DIF and its associated Type 1 error rates using the constrained form of Muraki’s Rating Scale Model (Muraki, 1990) as the generating model. The two methods were compared when DIF was associated with a continuous variable which was dichotomized for the GMH and used as a continuous person-level predictor with PHGLM. Of additional interest in this study was the comparison of HGLM’s performance with that of the GMH under a variety of DIF and sample size conditions. Results showed that sample size, sample size ratio and DIF magnitude substantially influenced the power performance for both GMH and HGLM. Furthermore, the power performance associated with the GMH was comparable to HGLM for conditions with large sample sizes. The mean performance for both DIF detection methods showed good Type I error control.
|
24 |
The Impact of Multidimensionality on the Detection of Differential Bundle Functioning Using SIBTEST.Raiford-Ross, Terris 12 February 2008 (has links)
In response to public concern over fairness in testing, conducting a differential item functioning (DIF) analysis is now standard practice for many large-scale testing programs (e.g., Scholastic Aptitude Test, intelligence tests, licensing exams). As highlighted by the Standards for Educational and Psychological Testing manual, the legal and ethical need to avoid bias when measuring examinee abilities is essential to fair testing practices (AERA-APA-NCME, 1999). Likewise, the development of statistical and substantive methods of investigating DIF is crucial to the goal of designing fair and valid educational and psychological tests. Douglas, Roussos and Stout (1996) introduced the concept of item bundle DIF and the implications of differential bundle functioning (DBF) for identifying the underlying causes of DIF. Since then, several studies have demonstrated DIF/DBF analyses within the framework of “unintended” multidimensionality (Oshima & Miller, 1992; Russell, 2005). Russell (2005), in particular, examined the effect of secondary traits on DBF/DTF detection. Like Russell, this study created item bundles by including multidimensional items on a simulated test designed in theory to be unidimensional. Simulating reference group members to have a higher mean ability than the focal group on the nuisance secondary dimension, resulted in DIF for each of the multidimensional items, that when examined together produced differential bundle functioning. The purpose of this Monte Carlo simulation study was to assess the Type I error and power performance of SIBTEST (Simultaneous Item Bias Test; Shealy & Stout, 1993a) for DBF analysis under various conditions with simulated data. The variables of interest included sample size and ratios of reference to focal group sample sizes, correlation between primary and secondary dimensions, magnitude of DIF/DBF, and angular item direction. Results showed SIBTEST to be quite powerful in detecting DBF and controlling Type I error for almost all of the simulated conditions. Specifically, power rates were .80 or above for 84% of all conditions and the average Type I error rate was approximately .05. Furthermore, the combined effect of the studied variables on SIBTEST power and Type I error rates provided much needed information to guide further use of SIBTEST for identifying potential sources of differential item/bundle functioning.
|
25 |
Using Three Different Categorical Data Analysis Techniques to Detect Differential Item FunctioningStephens-Bonty, Torie Amelia 16 May 2008 (has links)
Diversity in the population along with the diversity of testing usage has resulted in smaller identified groups of test takers. In addition, computer adaptive testing sometimes results in a relatively small number of items being used for a particular assessment. The need and use for statistical techniques that are able to effectively detect differential item functioning (DIF) when the population is small and or the assessment is short is necessary. Identification of empirically biased items is a crucial step in creating equitable and construct-valid assessments. Parshall and Miller (1995) compared the conventional asymptotic Mantel-Haenszel (MH) with the exact test (ET) for the detection of DIF with small sample sizes. Several studies have since compared the performance of MH to logistic regression (LR) under a variety of conditions. Both Swaminathan and Rogers (1990), and Hildalgo and López-Pina (2004) demonstrated that MH and LR were comparable in their detection of items with DIF. This study followed by comparing the performance of the MH, the ET, and LR performance when both the sample size is small and test length is short. The purpose of this Monte Carlo simulation study was to expand on the research done by Parshall and Miller (1995) by examining power and power with effect size measures for each of the three DIF detection procedures. The following variables were manipulated in this study: focal group sample size, percent of items with DIF, and magnitude of DIF. For each condition, a small reference group size of 200 was utilized as well as a short, 10-item test. The results demonstrated that in general, LR was slightly more powerful in detecting items with DIF. In most conditions, however, power was well below the acceptable rate of 80%. As the size of the focal group and the magnitude of DIF increased, the three procedures were more likely to reach acceptable power. Also, all three procedures demonstrated the highest power for the most discriminating item. Collectively, the results from this research provide information in the area of small sample size and DIF detection.
|
26 |
WAIS-III verbalinių subtestų užduočių diferencinė analizė / Analysis of differential item functioning in wais-iii verbal subtestsMalakauskaitė, Rima 23 June 2014 (has links)
WAIS-III (Wechsler intelekto matavimo skalės suaugusiems trečioji versija) – vienas plačiausiai pasaulyje naudojamų intelekto matavimo instrumentų. Vienas iš testo šališkumo šaltinių yra užduočių atlikimo skirtumai atskirose grupėse, lyginant asmenis, turinčius tokius pačius gebėjimus – skirtingas užduočių funkcionavimas. Tyrimo tikslas buvo įvertinti Lietuvoje adaptuojamos WAIS–III verbalinės testo dalies užduočių šališkumą. Užduočių funkcionavimas buvo tikrinamas lyginant tiriamųjų grupes pagal lytį (172 moterys ir 128 vyrai) ir išsilavinimą (209 tiriamiejis su viduriniu ir žemesniu išsilavinimu ir 89 tiriamieji su aukštesniuoju ir aukštuoju išsilavinimu). Užduočių atlikimas buvo analizuojamas iš viso šešiuose WAIS–III subtestuose (Paveikslėlių užbaigimo, Žodyno, Panašumų, Aritmetikos, Informacijos ir Supratingumo). Rasta, kad iš 148 analizuotų užduočių 20 skirtingai funkcionavo vyrų ir moterų grupėse bei 19 grupėse pagal išsilavinimą (daugiausia skirtumų rasta Žodyno ir Informacijos subtestuose). Daugiausia buvo užduoties sunkumo skirtumų, kurių skaičius buvo labai panašus ir lyginant vyrų ir moterų užduočių atlikimą, ir tiriamuosius pagal išsilavinimą. Rezultatai parodė, kad iš užduočių, kurios skyrėsi skiriamąja geba, yra žymiai daugiau, pagal gebėjimus geriau diferencijuojančių moteris nei vyrus ir asmenis su viduriniu ir žemesniu išsilavinimu. / WAIS–III (Wechsler Adult‘s Intelligence Scale - Third edition) – is one of the most widely used intelligence measuring scale in the world, which has also been adapted in Lithuania. A significant part of researches is related to scale‘s bias – one of it‘s sources is Differential item functioning in separate groups, when participants have the same abilities. The aim of this work was to assess the bias of verbal subtests items in Lithuanian WAIS-III version. Differential item functioning was tested comparing groups by sex (128 males and 172 females) and education (209 participants had a secondary and lower education, 89 had further and university education), in six WAIS-III subtests (Picture Completion, Vocabulary, Similarities, Arithmetic, Information and Comprehension). Data analysis showed that 20 items from 148 were functioning differently for males and females, and 19 items functioned differently in groups by education (the differences were mostly in Vocabulary and Information subtests). The biggest part of differences were of uniform DIF – the number was very similar in males and females groups, also in education groups. The bigger part of nonuniform DIF were more discriminating for females than males and participants with secondary and lower education.
|
27 |
Differences in Reading Strategies and Differential Item Functioning on PCAP 2007 Reading AssessmentScerbina, Tanya 29 November 2012 (has links)
Pan-Canadian Assessment Program (PCAP) 2007 reading ability item data and contextual data on reading strategies were analyzed to investigate the relationship between self-reported reading strategies and item difficulty. Students who reported using higher- or lower-order strategies were identified through a factor analysis. The purpose of this study was to investigate whether students with the same underlying reading ability but who reported using different reading strategies found the items differentially difficult. Differential item functioning (DIF) analyses identified the items on which students who tended to use higher-order reading strategies excelled, which were selected response items, but students who preferred using lower-order strategies found these items more difficult. The opposite pattern was found for constructed response items. The results of the study suggest that DIF analyses can be used to investigate which reading strategies are related to item difficulty when controlling for students’ level of ability.
|
28 |
Differences in Reading Strategies and Differential Item Functioning on PCAP 2007 Reading AssessmentScerbina, Tanya 29 November 2012 (has links)
Pan-Canadian Assessment Program (PCAP) 2007 reading ability item data and contextual data on reading strategies were analyzed to investigate the relationship between self-reported reading strategies and item difficulty. Students who reported using higher- or lower-order strategies were identified through a factor analysis. The purpose of this study was to investigate whether students with the same underlying reading ability but who reported using different reading strategies found the items differentially difficult. Differential item functioning (DIF) analyses identified the items on which students who tended to use higher-order reading strategies excelled, which were selected response items, but students who preferred using lower-order strategies found these items more difficult. The opposite pattern was found for constructed response items. The results of the study suggest that DIF analyses can be used to investigate which reading strategies are related to item difficulty when controlling for students’ level of ability.
|
29 |
Methods in creating alternate assessments: Calibrating a mathematics alternate assessment designed for students with disabilities using general education student dataJung, Eunju, 1974- 12 1900 (has links)
xvi, 116 p. A print copy of this thesis is available through the UO Libraries. Search the library catalog for the location and call number. / A significant challenge in developing alternate assessments is obtaining suitable sample sizes. This study investigated whether psychometric characteristics of mathematic alternate assessment items created for 2% students in grade 8 can be meaningfully estimated with data obtained from general education students in lower grades. Participants included 23 2% students in grade 8 and 235 general education students in grades 6-8. Twenty three 2% students were identified through the Student Performance Test (10 standard items and 10 2% items) and the Teacher Perception Survey. Performance on 10 2% items by the 2% students and the general education students were analyzed to address the questions: (a) are there grade levels at which the item parameters estimated from general education students in grade 6-8 are not different from those obtained using the 2% student sample? and (b) are there grade levels at which the estimated ability of general education students in grades 6-8 are not different the 2% student sample in grade 8?
Results indicated that the item response patterns of 2% students in grade 8 were comparable to those of general education students in grades 6 and 7. Additionally, 2% students in grade 8 showed comparable mathematics performance on 2% items when compared to general education students in grades 6 and 7. Considering the content exposure of students in lower grades, this study concluded that data from general education students in grade 7 would be more appropriate to be used in designing alternate assessment for 2% students in grade 8 than data from students in grade 6. The general conclusion is that using data obtained from general education students in lower grade levels may be an appropriate and efficient method of designing alternate assessment items. / Advisers: Dr. Beth Ham, Co-Chair; Dr. Paul Yovanoff, Co-Chair
|
30 |
Extending the Model with Internal Restrictions on Item Difficulty (MIRID) to Study Differential Item FunctioningLi, Yong "Isaac" 05 April 2017 (has links)
Differential item functioning (DIF) is a psychometric issue routinely considered in educational and psychological assessment. However, it has not been studied in the context of a recently developed componential statistical model, the model with internal restrictions on item difficulty (MIRID; Butter, De Boeck, & Verhelst, 1998). Because the MIRID requires test questions measuring either single or multiple cognitive processes, it creates a complex environment for which traditional DIF methods may be inappropriate. This dissertation sought to extend the MIRID framework to detect DIF at the item-group level and the individual-item level. Such a model-based approach can increase the interpretability of DIF statistics by focusing on item characteristics as potential sources of DIF. In particular, group-level DIF may reveal comparative group strengths in certain secondary constructs. A simulation study was conducted to examine under different conditions parameter recovery, Type I error rates, and power of the proposed approach. Factors manipulated included sample size, magnitude of DIF, distributional characteristics of the groups, and the MIRID DIF models corresponding to discrete sources of differential functioning. The impact of studying DIF using wrong models was investigated.
The results from the recovery study of the MIRID DIF model indicate that the four delta (i.e., non-zero value DIF) parameters were underestimated whereas item locations of the four associated items were overestimated. Bias and RMSE were significantly greater when delta was larger; larger sample size reduced RMSE substantially while the effects from the impact factor were neither strong nor consistent. Hypothesiswise and adjusted experimentwise Type I error rates were controlled in smaller delta conditions but not in larger delta conditions as estimates of zero-value DIF parameters were significantly different from zero. Detection power of the DIF model was weak. Estimates of the delta parameters of the three group-level DIF models, the MIRID differential functioning in components (DFFc), the MIRID differential functioning in item families (DFFm), and the MIRID differential functioning in component weights (DFW), were acceptable in general. They had good hypothesiswise and adjusted experimentwise Type I error control across all conditions and overall achieved excellent detection power.
When fitting the proposed models to mismatched data, the false detection rates were mostly beyond the Bradley criterion because the zero-value DIF parameters in the mismatched model were not estimated adequately, especially in larger delta conditions. Recovery of item locations and component weights was also not adequate in larger delta conditions. Estimation of these parameters was more or less affected adversely by the DIF effect simulated in the mismatched data. To study DIF in MIRID data using the model-based approach, therefore, more research is necessary to determine the appropriate procedure or model to implement, especially for item-level differential functioning.
|
Page generated in 0.1233 seconds