1 |
Stratified item selection and exposure control in unidimensional adaptive testing in the presence of two-dimensional data.Kalinowski, Kevin E. 08 1900 (has links)
It is not uncommon to use unidimensional item response theory (IRT) models to estimate ability in multidimensional data. Therefore it is important to understand the implications of summarizing multiple dimensions of ability into a single parameter estimate, especially if effects are confounded when applied to computerized adaptive testing (CAT). Previous studies have investigated the effects of different IRT models and ability estimators by manipulating the relationships between item and person parameters. However, in all cases, the maximum information criterion was used as the item selection method. Because maximum information is heavily influenced by the item discrimination parameter, investigating a-stratified item selection methods is tenable. The current Monte Carlo study compared maximum information, a-stratification, and a-stratification with b blocking item selection methods, alone, as well as in combination with the Sympson-Hetter exposure control strategy. The six testing conditions were conditioned on three levels of interdimensional item difficulty correlations and four levels of interdimensional examinee ability correlations. Measures of fidelity, estimation bias, error, and item usage were used to evaluate the effectiveness of the methods. Results showed either stratified item selection strategy is warranted if the goal is to obtain precise estimates of ability when using unidimensional CAT in the presence of two-dimensional data. If the goal also includes limiting bias of the estimate, Sympson-Hetter exposure control should be included. Results also confirmed that Sympson-Hetter is effective in optimizing item pool usage. Given these results, existing unidimensional CAT implementations might consider employing a stratified item selection routine plus Sympson-Hetter exposure control, rather than recalibrate the item pool under a multidimensional model.
|
2 |
Comparison Of Linear And Adaptive Versions Of The Turkish Pupil Monitoring System (pms) Mathematics AssessmentGokce, Semirhan 01 July 2012 (has links) (PDF)
Until the developments in computer technology, linear test administrations within classical test theory framework is mostly used in testing practices. These tests contain a set of predefined items in a large range of difficulty values for collecting information from students at various ability levels. However, placing very easy and very difficult items in the same test not only cause wasting time and effort but also introduces possible extraneous variables into the measurement process such as possibility of guessing, chance of careless errors induced by boredom or frustration. Instead of administering a linear test there is another option that adapts the difficulty of test according to the ability level of examinees which is named as computerized adaptive test. Computerized adaptive tests use item response theory as a measurement framework and have algorithms responsible for item selection, ability estimation, starting rule and test termination.
The present study aims to determine the applicability of computerized adaptive testing (CAT) to Turkish Pupil Monitoring System&rsquo / s (PMS) mathematics assessments. Therefore, live CAT study using only multiple choice items is designed to investigate whether to obtain comparable ability estimations. Afterwards, a Monte Carlo simulation study and a Post-hoc simulation study are designed to determine the optimum CAT algorithm for Turkish PMS mathematics assessments. In the simulation studies, both multiple-choice and open-ended items are used and different scenarios are tested regarding various starting rules, termination criterion, ability estimation methods and existence of exposure/content controls.
The results of the study indicate that using Weighted Maximum Likelihood (WML) ability estimation method, easy initial item difficulty as starting rule and a fixed test reliability termination criterion (0.30 standard error as termination rule) gives the optimum CAT algorithm for Turkish PMS mathematics assessment. Additionally, item exposure and content control strategies have a positive impact on providing comparable ability estimations.
|
Page generated in 0.0652 seconds