• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 38
  • 4
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 81
  • 81
  • 78
  • 30
  • 27
  • 22
  • 20
  • 18
  • 18
  • 17
  • 16
  • 15
  • 15
  • 13
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Measurement equivalence of the center for epidemiological studies depression scale in racially/ethnically diverse older adults

Kim, Giyeon 01 June 2007 (has links)
This dissertation study was designed to examine measurement equivalence of the Center for Epidemiological Studies Depression (CES-D) Scale across White, African American, and Mexican American elders. Specific aims were to identify race/ethnicity-, sociodemographic-, and acculturation and instrument language-related measurement bias in the CES-D. Three studies were conducted in this dissertation to accomplish these aims. Two existing national datasets were used: the New Haven Established Populations for Epidemiologic Studies of the Elderly (EPESE) for the White and African American samples and the Hispanic Established Populations for Epidemiologic Studies of the Elderly (H-EPESE) for the Mexican-American sample. Differential item functioning (DIF) analyses were conducted using both confirmatory factor analysis (CFA) and item response theory (IRT) methods. Study 1 focused on the role of race/ethnicity on the measurement bias in the CES-D. Results from Study 1 showed a lack of measurement equivalence of the CES-D among Mexican Americans in the comparison with both Whites and Blacks. Race/ethnicity-specific items were also identified in Study 1: two interpersonal relation items in Blacks and four positive affect items in Mexican Americans. Study 2 focused on identifying sociodemographic-related measurement bias in responses to the CES-D among diverse racial/ethnic groups. Results from Study 2 showed that gender and educational attainment affected item bias in the CES-D. The interaction between gender and educational level and race/ethnicity was also found in Study 2: Mexican American women and lower educated Blacks had a greater predisposition to endorse the 'crying' item. Focusing on Mexican American elders, Study 3 examined how level of acculturation and language influence responses to the CES-D. In Study 3, acculturation and instrument language-biased items were identified in Mexican American elders. Study 3 also suggested that acculturation-bias was entirely explained by whether the CES-D was administered in the English or the Spanish versions. Possible reasons for item bias on the CES-D are discussed in the context of sociocultural differences in each substudy. Findings from this dissertation provide a broader understanding of sociocultural group differences in depressive symptom measures among racially/ethnically diverse older adults and yield research and practice implications for the use of standard screening tools for depression.
22

A comparability analysis of the National Nurse Aide Assessment Program

Jones, Peggy K 01 June 2006 (has links)
When an exam is administered across dual platforms, such as paper-and-pencil and computer-based testing simultaneously, individual items may become more or less difficult in the computer version (CBT) as compared to the paper-and-pencil (P&P) version, possibly resulting in a shift in the overall difficulty of the test (Mazzeo & Harvey, 1988). Using 38,955 examinees' response data across five forms of the National Nurse Aide Assessment Program (NNAAP) administered in both the CBT and P&P mode, three methods of differential item functioning (DIF) detection were used to detect item DIF across platforms. The three methods were Mantel-Haenszel (MH), Logistic Regression (LR), and the 1-Parameter Logistic Model (1-PL). These methods were compared to determine if they detect DIF equally in all items on the NNAAP forms. Data were reported by agreement of methods, that is, an item flagged by multiple DIF methods. A kappa statistic was calculated to provide an index of agreement bet ween paired methods of the LR, MH, and the 1-PL based on the inferential tests. Finally, in order to determine what, if any, impact these DIF items may have on the test as a whole, the test characteristic curves for each test form and examinee group were displayed. Results indicated that items behaved differently and the examinee's odds of answering an item correctly were influenced by the test mode administration for several items ranging from 23% of the items on Forms W and Z (MH) to 38% of the items on Form X (1-PL) with an average of 29%. The test characteristic curves for each test form were examined by examinee group and it was concluded that the impact of the DIF items on the test was not consequential. Each of the three methods detected items exhibiting DIF in each test form (ranging from 14 items to 23 items). The Kappa statistic demonstrated a strong degree of agreement between paired methods of analysis for each test form and each DIF method pairing (reporting good to excell ent agreement in all pairings). Findings indicated that while items did exhibit DIF, there was no substantial impact at the test level.
23

Differential item functioning in the Peabody Picture Vocabulary Test - Third Edition: partial correlation versus expert judgment

Conoley, Colleen Adele 30 September 2004 (has links)
This study had three purposes: (1) to identify differential item functioning (DIF) on the PPVT-III (Forms A & B) using a partial correlation method, (2) to find a consistent pattern in items identified as underestimating ability in each ethnic minority group, and (3) to compare findings from an expert judgment method and a partial correlation method. Hispanic, African American, and white subjects for the study were provided by American Guidance Service (AGS) from the standardization sample of the PPVT-III; English language learners (ELL) of Mexican descent were recruited from school districts in Central and South Texas. Content raters were all self-selected volunteers, each had advanced degrees, a career in education, and no special expertise of ELL or ethnic minorities. Two groups of teachers participated as judges for this study. The "expert" group was selected because of their special knowledge of ELL students of Mexican descent. The control group was all regular education teachers with limited exposure to ELL. Using the partial correlation method, DIF was detected within each group comparison. In all cases except with the ELL on form A of the PPVT-III, there were no significant differences in numbers of items found to have significant positive correlations versus significant negative correlations. On form A, the ELL group comparison indicated more items with negative correlation than positive correlation [χ2 (1) = 5.538; p=.019]. Among the items flagged as underestimating ability of the ELL group, no consistent trend could be detected. Also, it was found that none of the expert judges could adequately predict those items that would underestimate ability for the ELL group, despite expertise. Discussion includes possible consequences of item placement and recommendations regarding further research and use of the PPVT-III.
24

Using Hierarchical Generalized Linear Modeling for Detection of Differential Item Functioning in a Polytomous Item Response Theory Framework: An Evaluation and Comparison with Generalized Mantel-Haenszel

Ryan, Cari Helena 16 May 2008 (has links)
In the field of education, decisions are influenced by the results of various high stakes measures. Investigating the presence of differential item functioning (DIF) in a set of items ensures that results from these measures are valid. For example, if an item measuring math self-efficacy is identified as having DIF then this indicates that some other characteristic (e.g. gender) other than the latent trait of interest may be affecting an examinee’s score on that particular item. The use of hierarchical generalized linear modeling (HGLM) enables the modeling of items nested within examinees, with person-level predictors added at level-2 for DIF detection. Unlike traditional DIF detection methods that require a reference and focal group, HGLM allows the modeling of a continuous person-level predictor. This means that instead of dichotomizing a continuous variable associated with DIF into a focal and reference group, the continuous variable can be added at level-2. Further benefits of HGLM are discussed in this study. This study is an extension of work done by Williams and Beretvas (2006) where the use of HGLM with polytomous items (PHGLM) for detection of DIF was illustrated. In the Williams and Beretvas study, the PHGLM was compared with the generalized Mantel-Haenszel (GMH), for DIF detection and it was found that the two performed similarly. A Monte Carlo simulation study was conducted to evaluate HGLM’s power to detect DIF and its associated Type 1 error rates using the constrained form of Muraki’s Rating Scale Model (Muraki, 1990) as the generating model. The two methods were compared when DIF was associated with a continuous variable which was dichotomized for the GMH and used as a continuous person-level predictor with PHGLM. Of additional interest in this study was the comparison of HGLM’s performance with that of the GMH under a variety of DIF and sample size conditions. Results showed that sample size, sample size ratio and DIF magnitude substantially influenced the power performance for both GMH and HGLM. Furthermore, the power performance associated with the GMH was comparable to HGLM for conditions with large sample sizes. The mean performance for both DIF detection methods showed good Type I error control.
25

The Impact of Multidimensionality on the Detection of Differential Bundle Functioning Using SIBTEST.

Raiford-Ross, Terris 12 February 2008 (has links)
In response to public concern over fairness in testing, conducting a differential item functioning (DIF) analysis is now standard practice for many large-scale testing programs (e.g., Scholastic Aptitude Test, intelligence tests, licensing exams). As highlighted by the Standards for Educational and Psychological Testing manual, the legal and ethical need to avoid bias when measuring examinee abilities is essential to fair testing practices (AERA-APA-NCME, 1999). Likewise, the development of statistical and substantive methods of investigating DIF is crucial to the goal of designing fair and valid educational and psychological tests. Douglas, Roussos and Stout (1996) introduced the concept of item bundle DIF and the implications of differential bundle functioning (DBF) for identifying the underlying causes of DIF. Since then, several studies have demonstrated DIF/DBF analyses within the framework of “unintended” multidimensionality (Oshima & Miller, 1992; Russell, 2005). Russell (2005), in particular, examined the effect of secondary traits on DBF/DTF detection. Like Russell, this study created item bundles by including multidimensional items on a simulated test designed in theory to be unidimensional. Simulating reference group members to have a higher mean ability than the focal group on the nuisance secondary dimension, resulted in DIF for each of the multidimensional items, that when examined together produced differential bundle functioning. The purpose of this Monte Carlo simulation study was to assess the Type I error and power performance of SIBTEST (Simultaneous Item Bias Test; Shealy & Stout, 1993a) for DBF analysis under various conditions with simulated data. The variables of interest included sample size and ratios of reference to focal group sample sizes, correlation between primary and secondary dimensions, magnitude of DIF/DBF, and angular item direction. Results showed SIBTEST to be quite powerful in detecting DBF and controlling Type I error for almost all of the simulated conditions. Specifically, power rates were .80 or above for 84% of all conditions and the average Type I error rate was approximately .05. Furthermore, the combined effect of the studied variables on SIBTEST power and Type I error rates provided much needed information to guide further use of SIBTEST for identifying potential sources of differential item/bundle functioning.
26

Using Three Different Categorical Data Analysis Techniques to Detect Differential Item Functioning

Stephens-Bonty, Torie Amelia 16 May 2008 (has links)
Diversity in the population along with the diversity of testing usage has resulted in smaller identified groups of test takers. In addition, computer adaptive testing sometimes results in a relatively small number of items being used for a particular assessment. The need and use for statistical techniques that are able to effectively detect differential item functioning (DIF) when the population is small and or the assessment is short is necessary. Identification of empirically biased items is a crucial step in creating equitable and construct-valid assessments. Parshall and Miller (1995) compared the conventional asymptotic Mantel-Haenszel (MH) with the exact test (ET) for the detection of DIF with small sample sizes. Several studies have since compared the performance of MH to logistic regression (LR) under a variety of conditions. Both Swaminathan and Rogers (1990), and Hildalgo and López-Pina (2004) demonstrated that MH and LR were comparable in their detection of items with DIF. This study followed by comparing the performance of the MH, the ET, and LR performance when both the sample size is small and test length is short. The purpose of this Monte Carlo simulation study was to expand on the research done by Parshall and Miller (1995) by examining power and power with effect size measures for each of the three DIF detection procedures. The following variables were manipulated in this study: focal group sample size, percent of items with DIF, and magnitude of DIF. For each condition, a small reference group size of 200 was utilized as well as a short, 10-item test. The results demonstrated that in general, LR was slightly more powerful in detecting items with DIF. In most conditions, however, power was well below the acceptable rate of 80%. As the size of the focal group and the magnitude of DIF increased, the three procedures were more likely to reach acceptable power. Also, all three procedures demonstrated the highest power for the most discriminating item. Collectively, the results from this research provide information in the area of small sample size and DIF detection.
27

WAIS-III verbalinių subtestų užduočių diferencinė analizė / Analysis of differential item functioning in wais-iii verbal subtests

Malakauskaitė, Rima 23 June 2014 (has links)
WAIS-III (Wechsler intelekto matavimo skalės suaugusiems trečioji versija) – vienas plačiausiai pasaulyje naudojamų intelekto matavimo instrumentų. Vienas iš testo šališkumo šaltinių yra užduočių atlikimo skirtumai atskirose grupėse, lyginant asmenis, turinčius tokius pačius gebėjimus – skirtingas užduočių funkcionavimas. Tyrimo tikslas buvo įvertinti Lietuvoje adaptuojamos WAIS–III verbalinės testo dalies užduočių šališkumą. Užduočių funkcionavimas buvo tikrinamas lyginant tiriamųjų grupes pagal lytį (172 moterys ir 128 vyrai) ir išsilavinimą (209 tiriamiejis su viduriniu ir žemesniu išsilavinimu ir 89 tiriamieji su aukštesniuoju ir aukštuoju išsilavinimu). Užduočių atlikimas buvo analizuojamas iš viso šešiuose WAIS–III subtestuose (Paveikslėlių užbaigimo, Žodyno, Panašumų, Aritmetikos, Informacijos ir Supratingumo). Rasta, kad iš 148 analizuotų užduočių 20 skirtingai funkcionavo vyrų ir moterų grupėse bei 19 grupėse pagal išsilavinimą (daugiausia skirtumų rasta Žodyno ir Informacijos subtestuose). Daugiausia buvo užduoties sunkumo skirtumų, kurių skaičius buvo labai panašus ir lyginant vyrų ir moterų užduočių atlikimą, ir tiriamuosius pagal išsilavinimą. Rezultatai parodė, kad iš užduočių, kurios skyrėsi skiriamąja geba, yra žymiai daugiau, pagal gebėjimus geriau diferencijuojančių moteris nei vyrus ir asmenis su viduriniu ir žemesniu išsilavinimu. / WAIS–III (Wechsler Adult‘s Intelligence Scale - Third edition) – is one of the most widely used intelligence measuring scale in the world, which has also been adapted in Lithuania. A significant part of researches is related to scale‘s bias – one of it‘s sources is Differential item functioning in separate groups, when participants have the same abilities. The aim of this work was to assess the bias of verbal subtests items in Lithuanian WAIS-III version. Differential item functioning was tested comparing groups by sex (128 males and 172 females) and education (209 participants had a secondary and lower education, 89 had further and university education), in six WAIS-III subtests (Picture Completion, Vocabulary, Similarities, Arithmetic, Information and Comprehension). Data analysis showed that 20 items from 148 were functioning differently for males and females, and 19 items functioned differently in groups by education (the differences were mostly in Vocabulary and Information subtests). The biggest part of differences were of uniform DIF – the number was very similar in males and females groups, also in education groups. The bigger part of nonuniform DIF were more discriminating for females than males and participants with secondary and lower education.
28

Differences in Reading Strategies and Differential Item Functioning on PCAP 2007 Reading Assessment

Scerbina, Tanya 29 November 2012 (has links)
Pan-Canadian Assessment Program (PCAP) 2007 reading ability item data and contextual data on reading strategies were analyzed to investigate the relationship between self-reported reading strategies and item difficulty. Students who reported using higher- or lower-order strategies were identified through a factor analysis. The purpose of this study was to investigate whether students with the same underlying reading ability but who reported using different reading strategies found the items differentially difficult. Differential item functioning (DIF) analyses identified the items on which students who tended to use higher-order reading strategies excelled, which were selected response items, but students who preferred using lower-order strategies found these items more difficult. The opposite pattern was found for constructed response items. The results of the study suggest that DIF analyses can be used to investigate which reading strategies are related to item difficulty when controlling for students’ level of ability.
29

Differences in Reading Strategies and Differential Item Functioning on PCAP 2007 Reading Assessment

Scerbina, Tanya 29 November 2012 (has links)
Pan-Canadian Assessment Program (PCAP) 2007 reading ability item data and contextual data on reading strategies were analyzed to investigate the relationship between self-reported reading strategies and item difficulty. Students who reported using higher- or lower-order strategies were identified through a factor analysis. The purpose of this study was to investigate whether students with the same underlying reading ability but who reported using different reading strategies found the items differentially difficult. Differential item functioning (DIF) analyses identified the items on which students who tended to use higher-order reading strategies excelled, which were selected response items, but students who preferred using lower-order strategies found these items more difficult. The opposite pattern was found for constructed response items. The results of the study suggest that DIF analyses can be used to investigate which reading strategies are related to item difficulty when controlling for students’ level of ability.
30

Methods in creating alternate assessments: Calibrating a mathematics alternate assessment designed for students with disabilities using general education student data

Jung, Eunju, 1974- 12 1900 (has links)
xvi, 116 p. A print copy of this thesis is available through the UO Libraries. Search the library catalog for the location and call number. / A significant challenge in developing alternate assessments is obtaining suitable sample sizes. This study investigated whether psychometric characteristics of mathematic alternate assessment items created for 2% students in grade 8 can be meaningfully estimated with data obtained from general education students in lower grades. Participants included 23 2% students in grade 8 and 235 general education students in grades 6-8. Twenty three 2% students were identified through the Student Performance Test (10 standard items and 10 2% items) and the Teacher Perception Survey. Performance on 10 2% items by the 2% students and the general education students were analyzed to address the questions: (a) are there grade levels at which the item parameters estimated from general education students in grade 6-8 are not different from those obtained using the 2% student sample? and (b) are there grade levels at which the estimated ability of general education students in grades 6-8 are not different the 2% student sample in grade 8? Results indicated that the item response patterns of 2% students in grade 8 were comparable to those of general education students in grades 6 and 7. Additionally, 2% students in grade 8 showed comparable mathematics performance on 2% items when compared to general education students in grades 6 and 7. Considering the content exposure of students in lower grades, this study concluded that data from general education students in grade 7 would be more appropriate to be used in designing alternate assessment for 2% students in grade 8 than data from students in grade 6. The general conclusion is that using data obtained from general education students in lower grade levels may be an appropriate and efficient method of designing alternate assessment items. / Advisers: Dr. Beth Ham, Co-Chair; Dr. Paul Yovanoff, Co-Chair

Page generated in 0.1359 seconds