The shortcomings of the proportion above cut (PAC) statistic used so prominently in the educational landscape renders it a very problematic measure for making correct inferences with student test data. The limitations of PAC-based statistics are more pronounced with cross-test comparisons due to their dependency on cut-score locations. A better alternative is using mean-based statistics that can translate to parametric effect-size measures. However, these statistics as well can be problematic. When Gaussian assumptions are not met, reasonable transformations of a score scale produce non-monotonic outcomes.
The present study develops a distribution-wide approach to summarize trend, gap, and gap trend (TGGT) measures. This approach counters the limitations of PAC-based measures and mean-based statistics in addition to addressing TGGT-related statistics in a manner more closely tied to both the data and questions regarding student achievement. This distribution-wide approach encompasses visual graphics such as percentile trend displays and probability-probability plots fashioned after Receiver Operating Characteristic (ROC) curve methodology. The latter is framed as the P-P plot framework that was proposed by Ho (2008) as a way to examine trends and gaps with more consideration given to questions of scale and policy decisions. The extension in this study involves three main components: (1) incorporating Bayesian inference, (2) using a multivariate structure for longitudinal data, and (3) accounting for measurement error at the individual level. The analysis is based on mathematical assessment data comprising Grade 3 to Grade 7 from a large Midwestern school district. Findings suggest that PP-based effect sizes provide a useful framework to measure aggregate test score change and achievement gaps. The distribution-wide perspective adds insight by examining both visually and numerically how trends and gaps are affected throughout the score distribution. Two notable findings using the PP-based effect sizes were (1) achievement gaps were very similar between the Focal and Audit test, and (2) trend measures were significantly larger for the Audit test. Additionally, measurement error corrections using the multivariate Bayesian CTT approach had effect sizes disattenuated from those based on observed scores. Also, the ordinal-based effect size statistics were generally larger than their parametric-based counterparts, and this disattenuation was practically equivalent to that seen by accounting for measurement error. Finally, the rank-based estimator of P(X>Y) via estimated true scores had smaller standard errors than for its parametric-based counterpart.
Identifer | oai:union.ndltd.org:uiowa.edu/oai:ir.uiowa.edu:etd-3226 |
Date | 01 May 2012 |
Creators | Denbleyker, John Nickolas |
Contributors | Ho, Andrew D., Brennan, Robert L. |
Publisher | University of Iowa |
Source Sets | University of Iowa |
Language | English |
Detected Language | English |
Type | dissertation |
Format | application/pdf |
Source | Theses and Dissertations |
Rights | Copyright 2012 John Nickolas Denbleyker |
Page generated in 0.0018 seconds