• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • Tagged with
  • 4
  • 4
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Operational characteristics of mixed-format multistage tests using the 3PL testlet response theory model

Hembry, Ian Fredrick 19 September 2014 (has links)
Multistage tests (MSTs) have received renewed interest in recent years as an effective compromise between fixed-length linear tests and computerized adaptive test. Most MSTs studies scored the assessments based on item response theory (IRT) methods. Many assessments are currently being developed as mixed-format assessments that administer both standalone items and clusters of items associated with a common stimulus called testlets. By the nature of a testlet, a natural dependency occurs between the items within the testlet that violates the local independence of items. Local independence is a fundamental assumption of the IRT models. Using dichotomous IRT methods on a mixed-format testlet-based assessment knowingly violates local independence. By combining the score points within a testlet, researchers have successfully applied polytomous IRT models. However, the use of such models loses information by not using the unique response patterns provided by each item within a testlet. The three-parameter logistic testlet response theory (3PL-TRT) model is a measurement model developed to retain the uniqueness in response patterns of each item, while accounting for the local dependency exhibited by a testlet, or testlet effect. Because few studies have examined mixed-format MSTs administration under the 3PL-TRT model, the dissertation performed a simulation to investigate the administration of a mixed-format testlet based MSTs under the 3PL-TRT model. Simulee responses were generated based on the 3PL-TRT calibrated item parameters from a real large-scale passage based standardized assessment. The manipulated testing conditions considered four panel designs, two test lengths, three routing procedures, and three conditions of local item dependence. The study found functionally no bias across testing conditions. All conditions showed adequate measurement properties, but a few differences did occur between some of the testing conditions. The measurement precision was impacted by panel design, test length and the magnitude of local item dependence. The three-stage MSTs consistently illustrated slightly lower measurement precision than the two-stage MSTs. As expected, the longer test length conditions had better measurement precision than the shorter test length conditions. Conditions with the largest magnitude of local item dependency showed the worst measurement precision. The routing procedure had little impact on the measurement effectiveness. / text
2

Random or fixed testlet effects : a comparison of two multilevel testlet models

Chen, Tzu-An, 1978- 10 December 2010 (has links)
This simulation study compared the performance of two multilevel measurement testlet (MMMT) models: Beretvas and Walker’s (2008) two-level MMMT model and Jiao, Wang, and Kamata’s (2005) three-level model. Several conditions were manipulated (including testlet length, sample size, and the pattern of the testlet effects) to assess the impact on the estimation of fixed and random effect parameters. While testlets, in which items share the same stimulus, are common in educational tests, testlet item scores violate the assumption of local item independence (LID) underlying item response theory (IRT). Modeling LID has been widely discussed in previous studies (for example, Bradlow, Wainer, and Wang, 1999; Wang, Bradlow, and Wainer, 2002; Wang, Cheng, and Wilson, 2005). More recently, Jiao et al. (2005) proposed a three-level MMMT (MMMT-3r) in which items are modeled as nested within testlets (level two) and then testlets are nested with persons (level three). Testlet effects are typically modeled as random in previous studies involving LID. However, item effects (difficulties) are commonly modeled as fixed under IRT models: that is, persons with the same ability level are assumed to have the same probability of answering an item correctly. Therefore, it is also important that a testlet effects model permit modeling of item effects as fixed. Moreover, modeling testlet effect as random implies testlets are being sampled from a larger population of testlets. However, as with item effects, researchers are typically more interested in a particular set of items or testlets that are being used in an assessment. Given the interest of the researcher or psychometrician using a testlet response model, it seems more useful to use a testlet response model that permits modeling testlets effects as fixed. An alternative MMMT that permits modeling testlet effect as fixed and/or randomly varying has been proposed (Beretvas and Walker, 2008). The MMMT-2f and MMMT-2r models treat testlet effects as item-set-specific but not person-specific. However, no simulation has been conducted to assess how this proposed model performs. The current study compared the performance of the MMMT-2f, MMMT-2r with that of the MMMT-3r. Results of the present simulation study showed that the MMMT-2r yielded the best parameter bias in estimation on fixed item effects, fixed testlet effects, and random testlet effects for conditions with nonzero equal pattern of random testlet effects’ variance even when the MMMMT-2r was not the generating model. However, random effects estimation did not perform well when unequal random testlet effects’ variances were generated. Fit indices did not perform well either as other studies have found. And it should be emphasized that model differences were of very little practical significance. From a modeling perspective, MMMT-2r does allow the greatest flexibility in terms of modeling testlet effects as fixed, random, or both. / text
3

Effects of sample size, ability distribution, and the length of Markov Chain Monte Carlo burn-in chains on the estimation of item and testlet parameters

Orr, Aline Pinto 25 July 2011 (has links)
Item Response Theory (IRT) models are the basis of modern educational measurement. In order to increase testing efficiency, modern tests make ample use of groups of questions associated with a single stimulus (testlets). This violates the IRT assumption of local independence. However, a set of measurement models, testlet response theory (TRT), has been developed to address such dependency issues. This study investigates the effects of varying sample sizes and Markov Chain Monte Carlo burn-in chain lengths on the accuracy of estimation of a TRT model’s item and testlet parameters. The following outcome measures are examined: Descriptive statistics, Pearson product-moment correlations between known and estimated parameters, and indices of measurement effectiveness for final parameter estimates. / text
4

Ability parameter recovery of a computerized adaptive test based on rasch testlet models

Pak, Seohong 15 December 2017 (has links)
The purpose of this study was to investigate the effects of various testlet characteristics in terms of an ability parameter recovery under the modality of computerized adaptive test (CAT). Given the popularity of using CATs and the high frequency of emerging testlets into exams as either mixed format or not, it was important to evaluate the various conditions in a testlet-based CAT fitted testlet response theory models. The manipulated factors of this study were testlet size, testlet effect size, testlet composition, and exam format. The performance of each condition was compared with the true thetas which were 81 equally spaced points from -3.0 to +3.0. For each condition, 1,000 times of replication process were conducted with respect to overall bias, overall standard error, overall RMSE, conditional bias, conditional standard error, conditional RMSE, as well as conditional passing rate. The conditional results were presented in the pre-specified intervals. Several significant conclusions were made. Overall, the mean theta estimates over 1,000 replications were close to the true thetas regardless of manipulated conditions. In terms of aggregated overall RMSE, predictable relationships were found in four study factors: A larger amount of error was associated with a longer testlet, a bigger effect size, a random composition, and a testlet only exam format. However, when the aggregated overall bias was considered, only two effects were observed: a large difference among three testlet length conditions, and almost no difference between two testlet composition conditions. As expected, conditional SEMs for all conditions showed a U-shape across the theta scale. The noticeable discrepancy occurred only within the testlet length condition: more error was associated with the condition of the longest testlet length compared to the short and medium length conditions. Conditional passing rate showed little discrepancy among conditions within each facto, so no particular association was found. In general, a short testlet length is better, a small testlet effect size is better, a homogeneous difficulty composition is better, and a mixed format is better in terms of the smaller amount of error found in this study. Other than these obvious findings, some interaction effects were also observed. When the medium or large (i.e., greater than .50) testlet effect was suspicious, it was better to have a short length testlet. It was also found that using a mixed-format exam increased the accuracy of the random difficulty composition. However, this study was limited by several other factors which were controlled to be the same across the conditions: a fixed length exam, no content balancing, and the uniform testlet effects. Consequently, plans for improvements in terms of generalization were also discussed.

Page generated in 0.093 seconds