• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6
  • 2
  • 2
  • Tagged with
  • 8
  • 7
  • 6
  • 4
  • 4
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Random or fixed testlet effects : a comparison of two multilevel testlet models

Chen, Tzu-An, 1978- 10 December 2010 (has links)
This simulation study compared the performance of two multilevel measurement testlet (MMMT) models: Beretvas and Walker’s (2008) two-level MMMT model and Jiao, Wang, and Kamata’s (2005) three-level model. Several conditions were manipulated (including testlet length, sample size, and the pattern of the testlet effects) to assess the impact on the estimation of fixed and random effect parameters. While testlets, in which items share the same stimulus, are common in educational tests, testlet item scores violate the assumption of local item independence (LID) underlying item response theory (IRT). Modeling LID has been widely discussed in previous studies (for example, Bradlow, Wainer, and Wang, 1999; Wang, Bradlow, and Wainer, 2002; Wang, Cheng, and Wilson, 2005). More recently, Jiao et al. (2005) proposed a three-level MMMT (MMMT-3r) in which items are modeled as nested within testlets (level two) and then testlets are nested with persons (level three). Testlet effects are typically modeled as random in previous studies involving LID. However, item effects (difficulties) are commonly modeled as fixed under IRT models: that is, persons with the same ability level are assumed to have the same probability of answering an item correctly. Therefore, it is also important that a testlet effects model permit modeling of item effects as fixed. Moreover, modeling testlet effect as random implies testlets are being sampled from a larger population of testlets. However, as with item effects, researchers are typically more interested in a particular set of items or testlets that are being used in an assessment. Given the interest of the researcher or psychometrician using a testlet response model, it seems more useful to use a testlet response model that permits modeling testlets effects as fixed. An alternative MMMT that permits modeling testlet effect as fixed and/or randomly varying has been proposed (Beretvas and Walker, 2008). The MMMT-2f and MMMT-2r models treat testlet effects as item-set-specific but not person-specific. However, no simulation has been conducted to assess how this proposed model performs. The current study compared the performance of the MMMT-2f, MMMT-2r with that of the MMMT-3r. Results of the present simulation study showed that the MMMT-2r yielded the best parameter bias in estimation on fixed item effects, fixed testlet effects, and random testlet effects for conditions with nonzero equal pattern of random testlet effects’ variance even when the MMMMT-2r was not the generating model. However, random effects estimation did not perform well when unequal random testlet effects’ variances were generated. Fit indices did not perform well either as other studies have found. And it should be emphasized that model differences were of very little practical significance. From a modeling perspective, MMMT-2r does allow the greatest flexibility in terms of modeling testlet effects as fixed, random, or both. / text
2

Effects of sample size, ability distribution, and the length of Markov Chain Monte Carlo burn-in chains on the estimation of item and testlet parameters

Orr, Aline Pinto 25 July 2011 (has links)
Item Response Theory (IRT) models are the basis of modern educational measurement. In order to increase testing efficiency, modern tests make ample use of groups of questions associated with a single stimulus (testlets). This violates the IRT assumption of local independence. However, a set of measurement models, testlet response theory (TRT), has been developed to address such dependency issues. This study investigates the effects of varying sample sizes and Markov Chain Monte Carlo burn-in chain lengths on the accuracy of estimation of a TRT model’s item and testlet parameters. The following outcome measures are examined: Descriptive statistics, Pearson product-moment correlations between known and estimated parameters, and indices of measurement effectiveness for final parameter estimates. / text
3

Ability parameter recovery of a computerized adaptive test based on rasch testlet models

Pak, Seohong 15 December 2017 (has links)
The purpose of this study was to investigate the effects of various testlet characteristics in terms of an ability parameter recovery under the modality of computerized adaptive test (CAT). Given the popularity of using CATs and the high frequency of emerging testlets into exams as either mixed format or not, it was important to evaluate the various conditions in a testlet-based CAT fitted testlet response theory models. The manipulated factors of this study were testlet size, testlet effect size, testlet composition, and exam format. The performance of each condition was compared with the true thetas which were 81 equally spaced points from -3.0 to +3.0. For each condition, 1,000 times of replication process were conducted with respect to overall bias, overall standard error, overall RMSE, conditional bias, conditional standard error, conditional RMSE, as well as conditional passing rate. The conditional results were presented in the pre-specified intervals. Several significant conclusions were made. Overall, the mean theta estimates over 1,000 replications were close to the true thetas regardless of manipulated conditions. In terms of aggregated overall RMSE, predictable relationships were found in four study factors: A larger amount of error was associated with a longer testlet, a bigger effect size, a random composition, and a testlet only exam format. However, when the aggregated overall bias was considered, only two effects were observed: a large difference among three testlet length conditions, and almost no difference between two testlet composition conditions. As expected, conditional SEMs for all conditions showed a U-shape across the theta scale. The noticeable discrepancy occurred only within the testlet length condition: more error was associated with the condition of the longest testlet length compared to the short and medium length conditions. Conditional passing rate showed little discrepancy among conditions within each facto, so no particular association was found. In general, a short testlet length is better, a small testlet effect size is better, a homogeneous difficulty composition is better, and a mixed format is better in terms of the smaller amount of error found in this study. Other than these obvious findings, some interaction effects were also observed. When the medium or large (i.e., greater than .50) testlet effect was suspicious, it was better to have a short length testlet. It was also found that using a mixed-format exam increased the accuracy of the random difficulty composition. However, this study was limited by several other factors which were controlled to be the same across the conditions: a fixed length exam, no content balancing, and the uniform testlet effects. Consequently, plans for improvements in terms of generalization were also discussed.
4

Operational characteristics of mixed-format multistage tests using the 3PL testlet response theory model

Hembry, Ian Fredrick 19 September 2014 (has links)
Multistage tests (MSTs) have received renewed interest in recent years as an effective compromise between fixed-length linear tests and computerized adaptive test. Most MSTs studies scored the assessments based on item response theory (IRT) methods. Many assessments are currently being developed as mixed-format assessments that administer both standalone items and clusters of items associated with a common stimulus called testlets. By the nature of a testlet, a natural dependency occurs between the items within the testlet that violates the local independence of items. Local independence is a fundamental assumption of the IRT models. Using dichotomous IRT methods on a mixed-format testlet-based assessment knowingly violates local independence. By combining the score points within a testlet, researchers have successfully applied polytomous IRT models. However, the use of such models loses information by not using the unique response patterns provided by each item within a testlet. The three-parameter logistic testlet response theory (3PL-TRT) model is a measurement model developed to retain the uniqueness in response patterns of each item, while accounting for the local dependency exhibited by a testlet, or testlet effect. Because few studies have examined mixed-format MSTs administration under the 3PL-TRT model, the dissertation performed a simulation to investigate the administration of a mixed-format testlet based MSTs under the 3PL-TRT model. Simulee responses were generated based on the 3PL-TRT calibrated item parameters from a real large-scale passage based standardized assessment. The manipulated testing conditions considered four panel designs, two test lengths, three routing procedures, and three conditions of local item dependence. The study found functionally no bias across testing conditions. All conditions showed adequate measurement properties, but a few differences did occur between some of the testing conditions. The measurement precision was impacted by panel design, test length and the magnitude of local item dependence. The three-stage MSTs consistently illustrated slightly lower measurement precision than the two-stage MSTs. As expected, the longer test length conditions had better measurement precision than the shorter test length conditions. Conditions with the largest magnitude of local item dependency showed the worst measurement precision. The routing procedure had little impact on the measurement effectiveness. / text
5

Decision consistency and accuracy indices for the bifactor and testlet response theory models

LaFond, Lee James 01 July 2014 (has links)
The primary goal of this study was to develop a new procedure for estimating decision consistency and accuracy indices using the bifactor and testlet response theory (TRT) models. This study is the first to investigate decision consistency and accuracy from a multidimensional perspective, and the results have shown that the bifactor model at least behaved in way that met the author's expectations and represents a potential useful procedure. The TRT model, on the other hand, did not meet the author's expectations and generally showed poor model performance. The multidimensional decision consistency and accuracy indices proposed in this study appear to provide good performance, at least for the bifactor model, in the case of a substantial testlet effect. For practitioners examining a test containing testlets for decision consistency and accuracy, a recommended first step is to check for dimensionality. If the testlets show a significant degree of multidimensionality, then the usage of the multidimensional indices proposed can be recommended as the simulation study showed an improved level of performance over unidimensional IRT models. However, if there is a not a significant degree of multidimensionality then the unidimensional IRT models and indices would perform as well, or even better, than the multidimensional models. Another goal of this study was to compare methods for numerical integration used in the calculation of decision consistency and accuracy indices. This study investigated a new method (M method) that sampled ability estimates through a Monte-Carlo approach. In summary, the M method seems to be just as accurate as the other commonly used methods for numerical integration. However, it has some practical advantages over the D and P methods. As previously mentioned, it is not as nearly as computationally intensive as the D method. Also, the P method requires large sample sizes. In addition, the P method has conceptual disadvantage in that the conditioning variable, in theory, should be the true theta, not an estimated theta. The M method avoids both of these issues and seems to provide equally accurate estimates of decision consistency and accuracy indices, which makes it a strong option particularly in multidimensional cases.
6

以相關係數探討題組型試題之鑑別度 / An exploratory study of discrimination index of testlet by using correlation coefficient

李昕儀 Unknown Date (has links)
題組題是依據所提供之新情境和資料作答的試題類型,它能測量到學生的理解、應用、分析或評鑑能力,一般來說,同一題組內各子題有某種程度的關聯性。由於題組題是近幾年國民中學基本學力測驗常見的試題類型,且目前各種鑑別度定義僅針對單一試題作鑑別度分析,若將其應用在分析題組型試題鑑別度時,除了無法計算題組本身的鑑別度之外,甚至會忽略題組內各子題之間的關聯性。此外,目前題組鑑別度的相關研究並不多,故本論文以複相關係數的觀點探討其鑑別度,提供新的研究方向。本文先分析獨立型試題鑑別度,並將其研究結果拓展至題組型試題。對於獨立型試題,本文驗證了以點二系列相關為定義的鑑別度是以相關係數為定義的鑑別度之特例。對於題組型試題,在蒐集測驗結果資料後,本文運用迴歸分析的技巧計算「題組本身」鑑別度,同時,為了探求在排除同一題組內前面各子題影響力後的子題鑑別度對於該題組鑑別度的貢獻程度,故本文提出「淨得分」與「淨鑑別度」的新概念,並發現題組鑑別度與各子題淨鑑別度之間有密切的關聯性;再者,本文亦提供了檢定各子題淨鑑別度是否顯著的統計方法。最後,以99年第一次國中基測英語科試題為例,利用本文研究結果計算其獨立型試題鑑別度以及題組試題之題組鑑別度、各子題鑑別度與各子題淨鑑別度,並與其它有關試題鑑別度的研究作比較與分析。 / For testlet, it is answered by the provided new situation and information, can measure the student’s understanding, application, analysis and judging ability. Generally speaking, a relation exists in each item within testlet. In the recent years, testlet is an usual type in the Basic Competence Test for Junior High School. Moreover, current all definitions of discrimination index are only focusing on the single item. When these definitions are applied to analyze the discrimination index of testlet directly, not only the discrimination index of testlet can not be calculated but the relation between items within testlet will be neglected. Furthermore, due to the lack of the discrimination index study on testlet, this thesis investigates the discrimination index of testlet by regression analysis with the view point of multiple correlation coefficient and provides a new direction for the following study. This thesis is investigating the discrimination index of independent items, and this result is applied to testlet. For individual items, this study proves that point-biserial correlation is a special case of correlation coefficient. For testlet, after data collection, this study calculates the discrimination index of testlet itself by regression analysis. In the meantime, for investigating the contribution of the discrimination index of testlet of item within testlet which is getting rid of the influence of the previous items in the same testlet, this study proposes a new concept of “net score” and “net discrimination”. First, this study finds the close relation between the discrimination index of testlet and item within testlet. Second, this study states how to find the “net” discrimination index of item within testlet is remarkable or not by statistics. Finally, this study takes the English test items of the First Basic Competence Test for Junior High School Students in 2010 as example to calculate their discrimination index of individual item, testlet, item with testlet, and the net discrimination index of item within testlet, separately, by the deduced formula. A comparison and analysis between this and related study also have been taken into process in this study.
7

題組測驗效果之統計分析 / A Statistical Analysis of Testlets

施焱騰 Unknown Date (has links)
本文在古典測驗的概念下,賦予題組適當機率模式,探討難度指標與鑑別度指標的計算公式;且以九十六年第二次國中基測英語科試題為驗證實例,並與傳統模式之計算結果相互比較。 / Modeling a testlet with a probability structure, we investigate the computational formulas of the difficulty index and the discrimination index. Data taken from the English test items of the second basic competence test for junior high school students in 2007 are used for empirical verification and the result is compared with that obtained by the traditional method.
8

Bayesian Model Checking Methods for Dichotomous Item Response Theory and Testlet Models

Combs, Adam 02 April 2014 (has links)
No description available.

Page generated in 0.0412 seconds