Return to search

Small sample IRT item parameter estimates

Item response theory (IRT) has great potential for solving many measurement problems. The success of specific IRT applications can be obtained only when the fit between the model and the test data is satisfactory. But model fit is not the only concern. Many tests are administered to relatively small numbers of examinees. If sample sizes are small, item parameter estimates will be of limited usefulness. There appear to be a number of ways that estimation might be improved. The purpose of this study was to investigate IRT parameter estimation using several promising small sample procedures. Computer simulation was used to generate the data. Two item banks were created with items described by a three parameter logistic model. Tests of length 30 and 60 items were simulated; examinee samples of 100, 200, and 500 were used in item calibration. Four promising models and associated estimation procedures were selected: (1) the one-parameter logistic model, (2) a modified one-parameter model in which a constant value for the "guessing parameter" was assumed, (3) a non-parametric three parameter model (called "Testgraf"), and (4) a one-parameter Bayesian model (with a variety of priors on the item difficulty parameter). Several criteria were used in evaluating the estimates. The main results were that (1) the modified one-parameter model seemed to consistently lead to the best estimates of item difficulty and examinee ability compared to the Rasch model and the non-parametric three-parameter model and related estimation procedures (the finding was observed across both test lengths and all three sample sizes and seemed to be true with both normal and rectangular distributions of ability), (2) the Bayesian estimation procedures with reasonable priors led to comparable results to the modified one-parameter model, and (3) the results with Testgraf, for the smallest sample of 100, typically led to the poorest results. Future studies seem justified to (1) replicate the findings with more relevant evaluation criteria, (2) determine the source of the problem with Testgraf and small samples/short tests, and (3) further investigate the utility of Bayesian estimation procedures.

Identiferoai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:dissertations-1535
Date01 January 1997
CreatorsSetiadi, Hari
PublisherScholarWorks@UMass Amherst
Source SetsUniversity of Massachusetts, Amherst
LanguageEnglish
Detected LanguageEnglish
Typetext
SourceDoctoral Dissertations Available from Proquest

Page generated in 0.0041 seconds