With the widespread use of technology in the assessment field, many testing programs use both computer-based tests (CBTs) and paper-and-pencil tests (PPTs). Both the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) and the International Guidelines on Computer-Based and Internet Delivered Testing (International Test Commission, 2005) have called for studies on the equivalence of scores from different modes to support the uses and interpretations of scores across modes. Studies of administration mode effects, however, are quite limited and have found mixed results in the early childhood literature. In addition, little research has focused on both construct comparability and score comparability.
The purpose of this study was to examine comparability in two stages. The first stage consisted of a series of analyses performed to investigate construct comparability through methods such as Confirmatory Factor Analysis (CFA), Multivariate Analysis of Variance (MANOVA), Classical Test Theory (CTT) and Item Response Theory (IRT). The second stage included summary analyses performed to investigate score comparability by evaluating the means, standard deviations, score distributions and reliabilities for the overall test scores. Correlations between the two modes and Test Characteristic Curves (TCCs) for the two modes were evaluated. Results indicated that, in general, the constructs and scores were comparable between PPTs and CBTs. The item and domain level analysis suggested that several items and domains seemed to be influenced slightly different by mode, while the scores at the total test level were not impacted by mode. This information could be useful for test developers when making decisions about what items to include on both modes.
The current study sought to address gaps in the existing literature. First, this study examined how young test takers perform in a CBT testing environment. This work adds to previous literature in that young test takers have more access to technology than they did when many research studies were previously conducted. Second, this study discussed potential sources of mode effects such as test items and the characteristics of test takers in early elementary grades, another area in which comparability research is lacking. Third, the study evaluated comparability issues in two stages in a comprehensive manner.
Identifer | oai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-8103 |
Date | 01 December 2018 |
Creators | Lin, Ye |
Contributors | Welch, Catherine J., Dunbar, Stephen B. |
Publisher | University of Iowa |
Source Sets | University of Iowa |
Language | English |
Detected Language | English |
Type | dissertation |
Format | application/pdf |
Source | Theses and Dissertations |
Rights | Copyright © 2018 Ye Lin |
Page generated in 0.0024 seconds