Spelling suggestions: "subject:"percentile equation"" "subject:"percentiles equation""
1 |
A comparison of Van der Linden's conditional equipercentile equating method with other equating methods under the random groups designShin, Seonho 01 July 2011 (has links)
To ensure test security and fairness, alternative forms of the same test are administered in practice. However, alternative forms of the same test generally do not have the same test difficulty level, even though alternative test forms are designed to be as parallel as possible. Equating adjusts for differences in difficulties among forms of the test. Six traditional equating methods are considered in this study: equipercentile equating without smoothing, equipercentile equating with pre-smoothing and post-smoothing, IRT true-score and observed-score equatings, and kernel equating. A common feature of all of the traditional procedures is that the end result of equating is a single transformation (or conversion table) that is used for all examinees who take the same test. Van der Linden has proposed conditional equipercentile (or local) equating (CEE) to reduce the error of equating contained in the traditional equating procedures by introducing individual level equating. Van der Linden's CEE is conceptually closest to IRT-T in that CEE is with respect to a type of true score (θ, or proficiency), but it shares similarities with to IRT-O in that CEE uses an estimated observed score distribution for each individual θ to equate scores using equipercentile equating.
No real-data study has yet compared van der Linden's CEE with each of the traditional equating procedures. Indeed, even for the traditional procedures, no study has compared all six of them simultaneously. In addition to van der Linden's CEE, two additional variations of CEE are considered: CEE using maximum likelihood (CEE-MLE) and CEE using the true characteristic curve (CEE-TCC). The focus of this study is on comparing results from CEE vis-à-vis the traditional procedures, as opposed to answering a “best-procedure”question, which would require a common conception of “true”equating. Although the results of the traditional equating methods are quite similar, the kernel equating method and equipercentile equating with log-linear presmoothing generally show better fit to the respective original form statistical moments under various data conditions. Although IRT-T and IRT-O usually are found to be least favorable under all circumstance in terms of statistical moments, the equated raw score difference distribution illustrates more stable performance than traditional equating methods. It was found here that the number of examinees having a particular score point does not influence results for CEE as much as it does for traditional equatings. CEE-EAP and CEE-MLE are very similar to one another and the equated score difference distributions are similar to those of IRT-O. CEE-TCC involves a part of the IRT-T procedure. Hence, CEE-TCC behaves somewhat similar to IRT-T. Although CEE results are less desirable in terms of maintaining statistical moments, the equated score differences are more consistent and stable than for the traditional equating methods.
|
2 |
A comparison of statistics for selecting smoothing parameters for loglinear presmoothing and cubic spline postsmoothing under a random groups designLiu, Chunyan 01 May 2011 (has links)
Smoothing techniques are designed to improve the accuracy of equating functions. The main purpose of this dissertation was to propose a new statistic (CS) and compare it to existing model selection strategies in selecting smoothing parameters for polynomial loglinear presmoothing (C) and cubic spline postsmoothing (S) for mixed-format tests under a random groups design. For polynomial loglinear presmoothing, CS was compared to seven existing model selection strategies in selecting the C parameters: likelihood ratio chi-square test (G2), Pearson chi-square test (PC), likelihood ratio chi-square difference test (G2diff), Pearson chi-square difference test (PCdiff), Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Consistent Akaike Information Criterion (CAIC). For cubic spline postsmoothing, CS was compared to the ± 1 standard error of equating (± 1 SEE) rule.
In this dissertation, both the pseudo-test data, Biology long and short, and Environmental Science long and short, and the simulated data were used to evaluate the performance of the CS statistic and the existing model selection strategies. For both types of data, sample sizes of 500, 1000, 2000, and 3000 were investigated. In addition, No Equating Needed conditions and Equating Needed conditions were investigated for the simulated data. For polynomial loglinear presmoothing, mean absolute difference (MAD), average squared bias (ASB), average squared error (ASE), and mean squared errors (MSE) were computed to evaluate the performance of all model selection strategies based on three sets of criteria: cumulative relative frequency distribution (CRFD), relative frequency distribution (RFD), and the equipercentile equating relationship. For cubic spline postsmoothing, the evaluation of different model selection procedures was only based on the MAD, ASB, ASE, and MSE of equipercentile equating.
The main findings based on the pseudo-test data and simulated data were as follows: (1) As sample sizes increased, the average C values increased and the average S values decreased for all model selection strategies. (2) For polynomial loglinear presmoothing, compared to the results without smoothing, all model selection strategies always introduced bias of RFD and significantly reduced the standard errors and mean squared errors of RFD; only AIC reduced the MSE of CRFD and MSE of equipercentile equating across all sample sizes and all test forms; the best CS procedure tended to yield an equivalent or smaller MSE of equipercentile equating than the AIC and G2diff statistics. (3) For cubic spline postsmoothing, both the ± 1 SEE rule and the CS procedure tended to perform reasonably well in reducing the ASE and MSE of equipercentile equating. (4) Among all existing model selection strategies, the ±1 SEE rule in postsmoothing tended to perform better than any of the seven existing model selection strategies in presmoothing in terms of the reduction of random error and total error; (5) pseudo-test data and the simulated data tended to yield similar results. The limitations of the study and possible future research are discussed in the dissertation.
|
3 |
Contributions to Kernel EquatingAndersson, Björn January 2014 (has links)
The statistical practice of equating is needed when scores on different versions of the same standardized test are to be compared. This thesis constitutes four contributions to the observed-score equating framework kernel equating. Paper I introduces the open source R package kequate which enables the equating of observed scores using the kernel method of test equating in all common equating designs. The package is designed for ease of use and integrates well with other packages. The equating methods non-equivalent groups with covariates and item response theory observed-score kernel equating are currently not available in any other software package. In paper II an alternative bandwidth selection method for the kernel method of test equating is proposed. The new method is designed for usage with non-smooth data such as when using the observed data directly, without pre-smoothing. In previously used bandwidth selection methods, the variability from the bandwidth selection was disregarded when calculating the asymptotic standard errors. Here, the bandwidth selection is accounted for and updated asymptotic standard error derivations are provided. Item response theory observed-score kernel equating for the non-equivalent groups with anchor test design is introduced in paper III. Multivariate observed-score kernel equating functions are defined and their asymptotic covariance matrices are derived. An empirical example in the form of a standardized achievement test is used and the item response theory methods are compared to previously used log-linear methods. In paper IV, Wald tests for equating differences in item response theory observed-score kernel equating are conducted using the results from paper III. Simulations are performed to evaluate the empirical significance level and power under different settings, showing that the Wald test is more powerful than the Hommel multiple hypothesis testing method. Data from a psychometric licensure test and a standardized achievement test are used to exemplify the hypothesis testing procedure. The results show that using the Wald test can provide different conclusions to using the Hommel procedure.
|
4 |
Observed score equating with covariatesBränberg, Kenny January 2010 (has links)
In test score equating the focus is on the problem of finding the relationship between the scales of different test forms. This can be done only if data are collected in such a way that the effect of differences in ability between groups taking different test forms can be separated from the effect of differences in test form difficulty. In standard equating procedures this problem has been solved by using common examinees or common items. With common examinees, as in the equivalent groups design, the single group design, and the counterbalanced design, the examinees taking the test forms are either exactly the same, i.e., each examinee takes both test forms, or random samples from the same population. Common items (anchor items) are usually used when the samples taking the different test forms are assumed to come from different populations. The thesis consists of four papers and the main theme in three of these papers is the use of covariates, i.e., background variables correlated with the test scores, in observed score equating. We show how covariates can be used to adjust for systematic differences between samples in a non-equivalent groups design when there are no anchor items. We also show how covariates can be used to decrease the equating error in an equivalent groups design or in a non-equivalent groups design. The first paper, Paper I, is the only paper where the focus is on something else than the incorporation of covariates in equating. The paper is an introduction to test score equating, and the author's thoughts on the foundation of test score equating. There are a number of different definitions of test score equating in the literature. Some of these definitions are presented and the similarities and differences between them are discussed. An attempt is also made to clarify the connection between the definitions and the most commonly used equating functions. In Paper II a model is proposed for observed score linear equating with background variables. The idea presented in the paper is to adjust for systematic differences in ability between groups in a non-equivalent groups design by using information from background variables correlated with the observed test scores. It is assumed that conditional on the background variables the two samples can be seen as random samples from the same population. The background variables are used to explain the systematic differences in ability between the populations. The proposed model consists of a linear regression model connecting the observed scores with the background variables and a linear equating function connecting observed scores on one test forms to observed scores on the other test form. Maximum likelihood estimators of the model parameters are derived, using an assumption of normally distributed test scores, and data from two administrations of the Swedish Scholastic Assessment Test are used to illustrate the use of the model. In Paper III we use the model presented in Paper II with two different data collection designs: the non-equivalent groups design (with and without anchor items) and the equivalent groups design. Simulated data are used to examine the effect - in terms of bias, variance and mean squared error - on the estimators, of including covariates. With the equivalent groups design the results show that using covariates can increase the accuracy of the equating. With the non-equivalent groups design the results show that using an anchor test together with covariates is the most efficient way of reducing the mean squared error of the estimators. Furthermore, with no anchor test, the background variables can be used to adjust for the systematic differences between the populations and produce unbiased estimators of the equating relationship, provided that the “right” variables are used, i.e., the variables explaining those differences. In Paper IV we explore the idea of using covariates as a substitute for an anchor test with a non-equivalent groups design in the framework of Kernel Equating. Kernel Equating can be seen as a method including five different steps: presmoothing, estimation of score probabilities, continuization, equating, and calculating the standard error of equating. For each of these steps we give the theoretical results when observations on covariates are used as a substitute for scores on an anchor test. It is shown that we can use the method developed for Post-Stratification Equating in the non-equivalent groups with anchor test design, but with observations on the covariates instead of scores on an anchor test. The method is illustrated using data from the Swedish Scholastic Assessment Test.
|
Page generated in 0.0993 seconds