Return to search

Evaluation of Measurement Invariance in IRT Using Limited Information Fit Statistics/Indices: A Monte Carlo Study

Measurement invariance analysis is important when test scores are used to make a group-wise comparison. Multiple-group IRT
modeling is one of the commonly used methods for measurement invariance examination. One essential step in the multiple-group modeling
method is the evaluation of overall model-data fit. A family of limited information fit statistics has been recently developed for
assessing the overall model-data fit in IRT. Previous studies evaluated the performance of limited information fit statistics using
single-group data, and found that these fit statistics performed better than the traditional full information fit statistics when data
were sparse. However, no study has investigated the performance of the limited information fit statistics within the multiple-group
modeling framework. This study aims to examine the performance of the limited information fit statistic (M₂) and M₂-based corresponding
descriptive fit indices in conducting measurement invariance analysis within the multiple-group IRT framework. A Monte Carlo study was
conducted to examine sampling distributions of M₂ and M₂-based descriptive fit indices, and their sensitivities to lack of measurement
invariance under various conditions. The manipulated factors included sample sizes, model types, dimensionality, types and numbers of DIF
items, and latent trait distributions. Results showed that the M₂ followed an approximately chi-square distribution when the model was
correctly specified, as expected. The type I error rates of M₂ were reasonable under large sample sizes (1000/2000). When the model was
misspecified, the power of M₂ was a function of sample size and the number of DIF items. For example, the power of M₂ for rejecting the
U2PL Scalar Model increased from 29.2% to 99.9% when the number of uniform DIF items increased from one to six, given the sample sizes of
1000/2000. With six uniform DIF items (30% of the studied items), the power of increased from 42.4% to 99.9% when sample sizes changed
from 250/500 to 1000/2000. When the difference in M₂(ΔM₂) was used to compare two correctly specified nested models, the sampling
distribution of ΔM₂ appeared to be apart from the reference chi-square distribution at both tails, especially under small sample sizes.
The type I error rates of the ΔM₂ test became closer to the expectation when sample sizes increased. For example, both Metric and
Configural Models were correctly specified when the test included no DIF item. Given the alpha level of .05, the type I error rates of for
the comparsion between the Metric and Configural Model were slightly inflated with n=250/500 (8.72%), and became closer to the alpha level
with n=1000/2000 (5.3%). When at least one of the models was misspecified, the power of increased when the number of DIF items or sample
sizes became larger. For example, the Metric Model was misspecified when nonuniform DIF item existed. Given sample sizes of 1000/2000 and
alpha level of .05, the power of ΔM₂ for the comparison between the Metric and Configural Model increased from 52.55 % to 99.39% when the
number of nonuniform DIF items changes from one to six. With one nonuniform DIF item in the test, the power of ΔM₂ was only 17.05% given
the alpha level of .05 and sample sizes of 250/500, but increased to 52.55% given the sample sizes of 1000/2000. The descriptive fit
indices and their differences between nested models were also affected by the number of DIF items. When there was no DIF item, all fit
indices indicated good model-data fit. The differences of the five fit indices between nested models were all very small (<.008) across
different sample sizes. When DIF items existed, the means of descriptive fit indices, and their differences between nested models
increased when number of DIF items increased. The finding from this study provided some suggestions about the implementation of the
limited information fit statistics/indices in measurement invariance analysis within the multiple-group IRT framework. / A Dissertation submitted to the Department of Educational Psychology and Learning Systems in partial
fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2016. / October 31, 2016. / Includes bibliographical references. / Yanyun Yang, Professor Co-Directing Dissertation; Insu Paek, Professor Co-Directing Dissertation;
Fred W. Huffer, University Representative; Betsy J. Becker, Committee Member; Salih Binici, Committee Member.

Identiferoai:union.ndltd.org:fsu.edu/oai:fsu.digital.flvc.org:fsu_405570
ContributorsCui, Mengyao (authoraut), Yang, Yanyun (professor co-directing dissertation), Paek, Insu (professor co-directing dissertation), Huffer, Fred W. (Fred William) (university representative), Becker, Betsy Jane, 1956- (committee member), Binici, Salih (committee member), Florida State University (degree granting institution), College of Education (degree granting college), Department of Educational Psychology and Learning Systems (degree granting departmentdgg)
PublisherFlorida State University, Florida State University
Source SetsFlorida State University
LanguageEnglish, English
Detected LanguageEnglish
TypeText, text
Format1 online resource (170 pages), computer, application/pdf
RightsThis Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them.

Page generated in 0.0097 seconds