Global ETD Search

11	Assessing the Differential Functioning of Items and Tests of a Polytomous Employee Attitude Survey Swander, Carl Joseph 06 April 1999 (has links) Dimensions of a polytomous employee attitude survey were examined for the presence of differential item functioning (DIF) and differential test functioning (DTF) utilizing Raju, van der Linden, & Fleer's (1995) differential functioning of items and tests (DFIT) framework. Comparisons were made between managers and non-managers on the 'Management' dimension and between medical staff and nurse staff employees on both the 'Management' and 'Quality of Care and Service' dimensions. 2 out of 21 items from the manager/non-manager comparison were found to have significant DIF, supporting the generalizability of Lynch, Barnes-Farell, and Kulikowich (1998). No items from the medical staff/nurse staff comparisons were found to have DIF. The DTF results indicated that in two out of the three comparisons 1 item could be removed to create dimensions free from DTF. Based on the current findings implications and future research are discussed. / Master of Science DFIT DIF Attitude Assessment IRT Differential Item Functioning
12	The Effects Of Differential Item Functioning On Predictive Bias Bryant, Damon 01 January 2004 (has links) The purpose of this research was to investigate the relation between measurement bias at the item level (differential item functioning, dif) and predictive bias at the test score level. Dif was defined as a difference in the probability of getting a test item correct for examinees with the same ability but from different subgroups. Predictive bias was defined as a difference in subgroup regression intercepts and/or slopes in predicting a criterion. Data were simulated by computer. Two hypothetical subgroups (a reference group and a focal group) were used. The predictor was a composite score on a dimensionally complex test with 60 items. Sample size (35, 70, and 105 per group), validity coefficient (.3 or .5), and the mean difference on the predictor (0, .33, .66, and 1 standard deviation, sd) and the criterion (0 and .35 sd) were manipulated. The percentage of items showing dif (0%, 15%, and 30%) and the effect size of dif (small = .3, medium = .6, and large = .9) were also manipulated. Each of the 432 conditions in the 3 x 2 x 4 x 2 x 3 x 3 design was replicated 500 times. For each replication, a predictive bias analysis was conducted, and the detection of predictive bias against each subgroup was the dependent variable. The percentage of dif and the effect size of dif were hypothesized to influence the detection of predictive bias; hypotheses were also advanced about the influence of sample size and mean subgroup differences on the predictor and criterion. Results indicated that dif was not related to the probability of detecting predictive bias against any subgroup. Results were inconsistent with the notion that measurement bias and predictive bias are mutually supportive, i.e., the presence (or absence) of one type of bias is evidence in support of the presence (or absence) of the other type of bias. Sample size and mean differences on the predictor/criterion had direct and indirect effects on the probability of detecting predictive bias against both reference and focal groups. Implications for future research are discussed. item response theory predictive bias differential item functioning fairness Psychology
13	Gender and Ethnicity-Based Differential Item Functioning on the Myers-Briggs Type Indicator Gratias, Melissa B. 07 May 1997 (has links) Item Response Theory (IRT) methodologies were employed in order to examine the Myers-Briggs Type Indicator (MBTI) for differential item functioning (DIF) on the basis of crossed gender and ethnicity variables. White males were the reference group, and the focal groups were: black females, black males, and white females. The MBTI was predicted to show DIF in all comparisons. In particular, DIF on the Thinking-Feeling scale was hypothesized especially in the comparisons between white males and black females and between white males and white females. A sample of 10,775 managers who took the MBTI at assessment centers provided the data for the present experiment. The Mantel-Haenszel procedure and an IRT-based area technique were the methods of DIF-detection. Results showed several biased items on all scales for all comparisons. Ethnicitybased bias was seen in the white male vs. black female and white male vs. black male comparisons. Gender-based bias was seen particularly in the white male vs. white female comparisons. Consequently, the Thinking-Feeling showed the least DIF of all scales across comparisons, and only one of the items differentially scored by gender was found to be biased. Findings indicate that the gender-based differential scoring system is not defensible in managerial samples, and there is a need for further research into the study of differential item functioning with regards to ethnicity. / Master of Science Item Response Theory Differential Item Functioning Item Bias
14	Regularization Methods for Detecting Differential Item Functioning: Jiang, Jing January 2019 (has links) Thesis advisor: Zhushan Mandy Li / Differential item functioning (DIF) occurs when examinees of equal ability from different groups have different probabilities of correctly responding to certain items. DIF analysis aims to identify potentially biased items to ensure the fairness and equity of instruments, and has become a routine procedure in developing and improving assessments. This study proposed a DIF detection method using regularization techniques, which allows for simultaneous investigation of all items on a test for both uniform and nonuniform DIF. In order to evaluate the performance of the proposed DIF detection models and understand the factors that influence the performance, comprehensive simulation studies and empirical data analyses were conducted. Under various conditions including test length, sample size, sample size ratio, percentage of DIF items, DIF type, and DIF magnitude, the operating characteristics of three kinds of regularized logistic regression models: lasso, elastic net, and adaptive lasso, each characterized by their penalty functions, were examined and compared. Selection of optimal tuning parameter was investigated using two well-known information criteria AIC and BIC, and cross-validation. The results revealed that BIC outperformed other model selection criteria, which not only flagged high-impact DIF items precisely, but also prevented over-identification of DIF items with few false alarms. Among the regularization models, the adaptive lasso model achieved superior performance than the other two models in most conditions. The performance of the regularized DIF detection model using adaptive lasso was then compared to two commonly used DIF detection approaches including the logistic regression method and the likelihood ratio test. The proposed model was applied to analyzing empirical datasets to demonstrate the applicability of the method in real settings. / Thesis (PhD) — Boston College, 2019. / Submitted to: Boston College. Lynch School of Education. / Discipline: Educational Research, Measurement and Evaluation. Differential Item Functioning Item Response Theory Logistic Regression Regularization Methods
15	A Monte Carlo Study Investigating the Influence of Item Discrimination, Category Intersection Parameters, and Differential Item Functioning in Polytomous Items Thurman, Carol Jenetha 21 October 2009 (has links) The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the groups. Determining whether the difference in performance on an item between two demographic groups is due to between group differences in ability or some form of unfairness in the item is a more complex task for a polytomous item, because of its many score categories, than for a dichotomous item. Effective DIF detection methods must be able to locate DIF within each of these various score categories. The Mantel, Generalized Mantel Haenszel (GMH), and Logistic Regression (LR) are three of several DIF detection methods that are able to test for DIF in polytomous items. There have been relatively few studies on the effectiveness of polytomous procedures to detect DIF; and of those studies, only a very small percentage have examined the efficiency of the Mantel, GMH, and LR procedures when item discrimination magnitudes and category intersection parameters vary and when there are different patterns of DIF (e.g., balanced versus constant) within score categories. This Monte Carlo simulation study compared the Type I error and power of the Mantel, GMH, and OLR (LR method for ordinal data) procedures when variation occurred in 1) the item discrimination parameters, 2) category intersection parameters, 3) DIF patterns within score categories, and 4) the average latent traits between the reference and focal groups. Results of this investigation showed that high item discrimination levels were directly related to increased DIF detection rates. The location of the difficulty parameters was also found to have a direct effect on DIF detection rates. Additionally, depending on item difficulty, DIF magnitudes and patterns within score categories were found to impact DIF detection rates and finally, DIF detection power increased as DIF magnitudes became larger. The GMH outperformed the Mantel and OLR and is recommended for use with polytomous data when the item discrimination varies across items. Monte Carlo Item Discrimination Category Intersection Differential Item Functioning Patterns Differential Item Functioning Detection Polytomous Items Education Education Policy
16	Rural Opioid and Other Drug Use Disorder Diagnosis: Assessing Measurement Invariance and Latent Classification of DSM-IV Abuse and Dependence Criteria Brooks, Billy 01 August 2015 (has links) The rates of non-medical prescription drug use in the United States (U.S.) have increased dramatically in the last two decades, leading to a more than 300% increase in deaths from overdose, surpassing motor vehicle accidents as the leading cause of injury deaths. In rural areas, deaths from unintentional overdose have increased by more than 250% since 1999 while urban deaths have increased at a fraction of this rate. The objective of this research was to test the hypothesis that cultural, economic, and environmental factors prevalent in rural America affect the rate of substance use disorder (SUD) in that population, and that diagnosis of these disorders across rural and urban populations may not be generalizable due to these same effects. This study applies measurement invariance analysis and factor analysis techniques: item response theory (IRT), multiple indicators, multiple causes (MIMIC), and latent class analysis (LCA), to the DSM-IV abuse and dependency diagnosis instrument. The sample used for the study was a population of adult past-year illicit drug users living in a rural or urban area drawn from the 2011-2012 National Survey on Drug Use and Health data files (N = 3,369\| analyses 1 and 2; N = 12,140\| analysis 3). Results of the IRT and MIMIC analyses indicated no significant variance in DSM item function across rural and urban sub-groups; however, several socio-demographic variables including age, race, income, and gender were associated with bias in the instrument. Latent class structures differed across the sub-groups in quality and number, with the rural sample fitting a 3-class structure and the urban fitting 6-class model. Overall the rural class structure exhibited less diversity and lower prevalence of SUD in multiple drug categories (e.g. cocaine, hallucinogens, and stimulants). This result suggests underlying elements affecting SUD patterns in the two populations. These findings inform the development of surveillance instruments, clinical services, and public health programming tailored to specific communities. prescription drug abuse and misuse measurement invariance differential item functioning item response theory MIMIC rural health Epidemiology
17	Differential item functioning procedures for polytomous items when examinee sample sizes are small Wood, Scott William 01 May 2011 (has links) As part of test score validity, differential item functioning (DIF) is a quantitative characteristic used to evaluate potential item bias. In applications where a small number of examinees take a test, statistical power of DIF detection methods may be affected. Researchers have proposed modifications to DIF detection methods to account for small focal group examinee sizes for the case when items are dichotomously scored. These methods, however, have not been applied to polytomously scored items. Simulated polytomous item response strings were used to study the Type I error rates and statistical power of three popular DIF detection methods (Mantel test/Cox's β, Liu-Agresti statistic, HW3) and three modifications proposed for contingency tables (empirical Bayesian, randomization, log-linear smoothing). The simulation considered two small sample size conditions, the case with 40 reference group and 40 focal group examinees and the case with 400 reference group and 40 focal group examinees. In order to compare statistical power rates, it was necessary to calculate the Type I error rates for the DIF detection methods and their modifications. Under most simulation conditions, the unmodified, randomization-based, and log-linear smoothing-based Mantel and Liu-Agresti tests yielded Type I error rates around 5%. The HW3 statistic was found to yield higher Type I error rates than expected for the 40 reference group examinees case, rendering power calculations for these cases meaningless. Results from the simulation suggested that the unmodified Mantel and Liu-Agresti tests yielded the highest statistical power rates for the pervasive-constant and pervasive-convergent patterns of DIF, as compared to other DIF method alternatives. Power rates improved by several percentage points if log-linear smoothing methods were applied to the contingency tables prior to using the Mantel or Liu-Agresti tests. Power rates did not improve if Bayesian methods or randomization tests were applied to the contingency tables prior to using the Mantel or Liu-Agresti tests. ANOVA tests showed that statistical power was higher when 400 reference examinees were used versus 40 reference examinees, when impact was present among examinees versus when impact was not present, and when the studied item was excluded from the anchor test versus when the studied item was included in the anchor test. Statistical power rates were generally too low to merit practical use of these methods in isolation, at least under the conditions of this study. Bayesian Differential Item Functioning Liu-Agresti Statistic Log-Linear Smoothing Polytomous Items Sample Size Educational Psychology
18	Using MIMIC Methods to Detect and Identify Sources of DIF among Multiple Groups Chun, Seokjoon 24 September 2014 (has links) This study investigated the efficacy of multiple indicators, multiple causes (MIMIC) methods in detecting uniform and nonuniform differential item functioning (DIF) among multiple groups, where the underlying causes of DIF was different. Three different implementations of MIMIC DIF detection were studied: sequential free baseline, free baseline, and constrained baseline. In addition, the robustness of the MIMIC methods against the violation of its assumption, equal factor variance across comparison groups, was investigated. We found that the sequential-free baseline methods provided similar Type I error and power rates to the free baseline method with a designated anchor, and much better Type I error and power rates than the constrained baseline method across four groups, resulting from the co-occurrence background variables. But, when the equal factor variance assumption was violated, the MIMIC methods yielded the inflated Type I error. Also, the MIMIC procedure had problems correctly identifying the sources DIF, so further methodological developments are needed. Constrained Fee and Sequential-Free baseline approaches Differential Item Functioning Multiple Indicator Multiple Cause Method Psychology
19	Gender and Posttraumatic Stress Disorder Screening in the Military: A Measurement Study Oliver, Mark Allan 01 August 2010 (has links) The Primary Care Posttraumatic Stress Disorder (PC-PTSD) screen (Prins et al., 2003) is used by the Department of Defense to identify military members who are at increased risk of PTSD. This screen has been offered to all returning deployers since 2005. However, validation studies of PC-PTSD scores from military samples have seldom employed a significant number of female subjects and no published studies have examined it for gender bias. Ruling out bias is important because routine under-identification of PTSD risk in any group could result in hindered access to needed assessment and/or care. With the current proportion of military females historically high (Women’s Research & Education Institute, 2007), it is imperative that the PC-PTSD be analyzed to ensure measurement equivalence across gender. Using a large sample of male and female veterans returning from deployment, the validity of the PC-PTSD scores was first examined by conducting a differential item functioning (DIF) analysis across male and female subgroups. Then, using a clinical diagnosis as the criterion, both logistic regression and diagnostic likelihood ratio methods were employed to assess for differential predictive validity by gender. Finally, confirmatory factor analysis (CFA) was used to examine convergent and divergent validity in a two-factor model containing both PC-PTSD and depression screen responses. Results revealed no statistically significant gender-related DIF or differential prediction of PTSD by PC-PTSD scores. Good convergent and divergent validity were also observed in the CFA analysis. The results generally supported the continued use of the PC-PTSD with both male and female military veterans returning from deployment. Limitations of the study and recommendations for future research were discussed. posttraumatic stress disorder differential item functioning gender military screening validity Clinical and Medical Social Work Social Work
20	Differential item functioning in the Peabody Picture Vocabulary Test - Third Edition: partial correlation versus expert judgment Conoley, Colleen Adele 30 September 2004 (has links) This study had three purposes: (1) to identify differential item functioning (DIF) on the PPVT-III (Forms A & B) using a partial correlation method, (2) to find a consistent pattern in items identified as underestimating ability in each ethnic minority group, and (3) to compare findings from an expert judgment method and a partial correlation method. Hispanic, African American, and white subjects for the study were provided by American Guidance Service (AGS) from the standardization sample of the PPVT-III; English language learners (ELL) of Mexican descent were recruited from school districts in Central and South Texas. Content raters were all self-selected volunteers, each had advanced degrees, a career in education, and no special expertise of ELL or ethnic minorities. Two groups of teachers participated as judges for this study. The "expert" group was selected because of their special knowledge of ELL students of Mexican descent. The control group was all regular education teachers with limited exposure to ELL. Using the partial correlation method, DIF was detected within each group comparison. In all cases except with the ELL on form A of the PPVT-III, there were no significant differences in numbers of items found to have significant positive correlations versus significant negative correlations. On form A, the ELL group comparison indicated more items with negative correlation than positive correlation [χ2 (1) = 5.538; p=.019]. Among the items flagged as underestimating ability of the ELL group, no consistent trend could be detected. Also, it was found that none of the expert judges could adequately predict those items that would underestimate ability for the ELL group, despite expertise. Discussion includes possible consequences of item placement and recommendations regarding further research and use of the PPVT-III. item bias differential item functioning DIF Peabody Picture Vocabulary Test partial correlation expert judgment

Search results