Spelling suggestions: "subject:"bildformat tests"" "subject:"bildeformat tests""
1 |
A comparison of equating/linking using the Stocking-Lord method and concurrent calibration with mixed-format tests in the non-equivalent groups common-item design under IRTTian, Feng January 2011 (has links)
Thesis advisor: Larry Ludlow / There has been a steady increase in the use of mixed-format tests, that is, tests consisting of both multiple-choice items and constructed-response items in both classroom and large-scale assessments. This calls for appropriate equating methods for such tests. As Item Response Theory (IRT) has rapidly become mainstream as the theoretical basis for measurement, different equating methods under IRT have also been developed. This study investigated the performances of two IRT equating methods using simulated data: linking following separate calibration (the Stocking-Lord method) and the concurrent calibration. The findings from this study show that the concurrent calibration method generally performs better in recovering the item parameters and more importantly, the concurrent calibration method produces more accurate estimated scores than linking following separate calibration. Limitations and directions for future research are discussed. / Thesis (PhD) — Boston College, 2011. / Submitted to: Boston College. Lynch School of Education. / Discipline: Educational Research, Measurement, and Evaluation.
|
2 |
The impact of equating method and format representation of common items on the adequacy of mixed-format test equating using nonequivalent groupsHagge, Sarah Lynn 01 July 2010 (has links)
Mixed-format tests containing both multiple-choice and constructed-response items are widely used on educational tests. Such tests combine the broad content coverage and efficient scoring of multiple-choice items with the assessment of higher-order thinking skills thought to be provided by constructed-response items. However, the combination of both item formats on a single test complicates the use of psychometric procedures. The purpose of this dissertation was to examine how characteristics of mixed-format tests and composition of the common-item set impact the accuracy of equating results in the common-item nonequivalent groups design.
Operational examinee item responses for two classes of data were considered in this dissertation: (1) operational test forms and (2) pseudo-test forms that were assembled from portions of operational test forms. Analyses were conducted on three mixed-format tests from the Advanced Placement Examination program: English Language, Spanish Language, and Chemistry.
For the operational test form analyses, two factors of investigation were considered as follows: (1) difference in proficiency between old and new form groups of examinees and (2) relative difficulty of multiple-choice and constructed-response items. For the pseudo-test form analyses, two additional factors of investigation were considered: (1) format representativeness of the common-item set and (2) statistical representativeness of the common-item set. For each study condition, two traditional equating methods, frequency estimation and chained equipercentile equating, and two item response theory (IRT) equating methods, IRT true score and IRT observed score methods, were considered.
There were five main findings from the operational and pseudo-test form analyses. (1) As the difference in proficiency between old and new form groups of examinees increased, bias also tended to increase. (2) Relative to the criterion equating relationship for a given equating method, increases in bias were typically largest for frequency estimation and smallest for the IRT equating methods. However, it is important to note that the criterion equating relationship was different for each equating method. Additionally, only one smoothing value was analyzed for the traditional equating methods. (3) Standard errors of equating tended to be smallest for IRT observed score equating and largest for chained equipercentile equating. (4) Results for the operational and pseudo-test analyses were similar when the pseudo-tests were constructed to be similar to the operational test forms. (5) Results were mixed regarding which common-item set composition resulted in the least bias.
|
3 |
Evaluating equating properties for mixed-format testsHe, Yi 01 May 2011 (has links)
Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are used in many testing programs. The use of multiple formats presents a number of measurement challenges, one of which is how to adequately equate mixed-format tests under the common-item nonequivalent groups (CINEG) design, especially when, due to practical constraints, the common-item set contains only MC items. The purpose of this dissertation was to evaluate how equating properties were preserved for mixed-format tests under the CINEG design.
Real data analyses were conducted on 22 equating linkages of 39 mixed-format tests from the Advanced Placement (AP) Examination program. Four equating methods were used: the frequency estimation (FE) method, the chained equipercentile (CE) method, item response theory (IRT) true score equating, and IRT observed score equating. In addition, cubic spline postsmoothing was used with the FE and CE methods. The factors of investigation were the correlation between MC and CR scores, the proportion of common items, the proportion of MC-item score points, and the similarity between alternate forms. Results were evaluated using three equating properties: first-order equity, second-order equity, and the same distributions property.
The main findings from this dissertation were as follows: (1) Between the two IRT equating methods, true score equating better preserved first-order equity than observed score equating, and observed score equating better preserved second-order equity and the same distributions property than true score equating. (2) Between the two traditional methods, CE better preserved first-order equity than FE, but in terms of preserving second-order equity and the same distributions property, CE and FE produced similar results. (3) Smoothing helped to improve the preservation of second-order equity and the same distributions property. (4) A higher MC-CR correlation was associated with better preservation of first-order equity for both IRT methods. (5) A higher MC-CR correlation was associated with better preservation of second-order equity for IRT true score equating. (6) A higher MC-CR correlation was associated with better preservation of the same distributions property for IRT observed score equating. (7) The proportion of common items, the proportion of MC score points, and the similarity between forms were not found to be associated with the preservation of the equating properties. These results are interpreted in the context of research literature in this area and suggestions for future research are provided.
|
Page generated in 0.2859 seconds