Spelling suggestions: "subject:"subscores"" "subject:"subscale""
1 |
Methods for determining whether subscore reporting is warranted in large-scale achievement assessmentsBabenko, Oksana Illivna Unknown Date
No description available.
|
2 |
Subscore equating with the random groups designLim, Euijin 01 May 2016 (has links)
There is an increasing demand for subscore reporting in the testing industry. Many testing programs already include subscores as part of their score report or consider a plan of reporting subscores. However, relatively few studies have been conducted on subscore equating. The purpose of this dissertation is to address the necessity for subscore equating and to evaluate the performance of various equating methods for subscores.
Assuming the random groups design and number-correct scoring, this dissertation analyzed two sets of real data and simulated data with four study factors including test dimensionality, subtest length, form difference in difficulty, and sample size. Equating methods considered in this dissertation were linear equating, equipercentile equating, equipercentile with log-linear presmoothing, equipercentile equating with cubic-spline postsmoothing, IRT true score equating using a three-parameter logistic model (3PL) with separate calibration (3PsepT), IRT observed score equating using 3PL with separate calibration (3PsepO), IRT true score equating using 3PL with simultaneous calibration (3PsimT), IRT observed score equating using 3PL with simultaneous calibration (3PsimO), IRT true score equating using a bifactor model (BF) with simultaneous calibration (BFT), and IRT observed score equating using BF with simultaneous calibration (BFO). They were compared to identity equating and evaluated with respect to systematic, random, and total errors of equating.
The main findings of this dissertation were as follows: (1) reporting subscores without equating would provide misleading information in terms of score profiles; (2) reporting subscores without a pre-specified test specification would bring practical issues such as constructing alternate subtest forms with comparable difficulty, conducting equating between forms with different lengths, and deciding an appropriate score scale to be reported; (3) the best performing subscore equating method, overall, was 3PsepO followed by equipercentile equating with presmoothing, and the worst performing method was BFT; (4) simultaneous calibration involving other subtest items in the calibration process yielded larger bias but smaller random error than did separate calibration, indicating that borrowing information from other subtests increased bias but decreased random error in subscore equating; (5) BFO performed the best when a test is multidimensional, form difference is small, subtest length is short, or sample size is small; (6) equating results for BFT and BFO were affected by the magnitude of factor loading and variability for the estimated general and specific factors; and (7) smoothing improved equating results, in general.
|
3 |
An investigation into the psychometric properties of the proportional reduction of mean squared error and augmented scoresStephens, Christopher Neil 01 December 2012 (has links)
Augmentation procedures are designed to provide better estimates for a given test or subtest through the use of collateral information. The main purpose of this dissertation was to use Haberman's and Wainer's augmentation procedures on a large-scale, standardized achievement test to understand the relationship between reliability and correlation that exist to create the proportional reduction of mean squared error (PRMSE) statistic and to compare the practical effects of Haberman's augmentation procedure with the practical effects of Wainer's augmentation procedure.
In this dissertation, Haberman's and Wainer's augmentation procedures were used on a data set that consisted of a large-scale, standardized achievement test with tests in three different content areas, reading, language arts, and mathematics, in both 4th and 8th grade. Each test could be broken down into different content area subtests, between two and five depending on the content area. The data sets contained between 2,500 and 3,000 examinees for each test. The PRMSE statistic was used on the all of the data sets to evaluate two augmentation procedures, one proposed by Haberman and one by Wainer. Following the augmentation analysis, the relationship between the reliability of the subtest to be augmented and that subtest's correlation with the rest of the test was investigated using a pseudo-simulated data set, which consisted of different values for those variables. Lastly, the Haberman and Wainer augmentation procedures were used on the data sets and the augmented data was analyzed to determine the magnitude of the effects of using these augmentation procedures.
The main findings based on the data analysis and pseudo-simulated data analysis were as follows: (1) the more questions the better the estimates and the better the augmentation procedures; (2) there is virtually no difference between the Haberman and Wainer augmentation procedures, except for certain correlational relationships; (3) there is a significant effect of using the Haberman or Wainer augmentation procedures, however as the reliability increases, this effect lessens. The limitations of the study and possible future research are also discussed in the dissertation.
|
4 |
Using collateral information in the estimation of sub-scores --- a fully Bayesian approachTao, Shuqin 01 July 2009 (has links)
Educators and administrators often use sub-scores derived from state accountability assessments to diagnose learning/instruction and inform curriculum planning. However, there are several psychometric limitations of observed sub-scores, two of which were the focus of the present study: (1) limited reliabilities due to short lengths, and (2) little distinct information in sub-scores for most existing assessments.
The present study was conducted to evaluate the extent to which these limitations might be overcome by incorporating collateral information into sub-score estimation. The three sources of collateral information under investigation included (1) information from other sub-scores, (2) schools that students attended, and (3) school-level scores on the same test taken by previous cohorts of students in that school. Kelley's and Shin's methods were implemented in a fully Bayesian framework and were adapted to incorporate differing levels of collateral information. Results were evaluated in light of three comparison criteria, i.e., signal noise ratio, standard error of estimate, and sub-score separation index. The data came from state accountability assessments.
Consistent with the literature, using information from other sub-scores produced sub-scores with enhanced precision but reduced profile variability. This finding suggests that using collateral information internal to the test has the capability of enhancing sub-score reliability, but at the expense of losing the distinctness of each individual sub-score. Using information indicating the schools that students attended led to a small gain in sub-score precision without losing sub-score distinctness. Furthermore, using such information was found to have the potential to improve sub-score validity by addressing Simpson's paradox when sub-score correlations were not invariant across schools. Using previous-year school-level sub-score information was found to have the potential to enhance both precision and distinctness for school-level sub-scores, although not for student-level sub-scores. School-level sub-scores were found to exhibit satisfactory psychometric properties and thus have value in evaluating school curricular effectiveness. Issues concerning validity, interpretability, suitability of using such collateral information are discussed in the context of state accountability assessments.
|
Page generated in 0.0418 seconds