Spelling suggestions: "subject:"tests anda remeasurements."" "subject:"tests anda aremeasurements.""
281 |
The conceptualization and development of a high-stakes video listening test within an AUA framework in a military contextPowers, Nancy Ellen January 2012 (has links)
No description available.
|
282 |
What's in a grade? A mixed methods investigation of teacher assessment of grammatical ability in L2 academic writingNeumann, Heike January 2011 (has links)
No description available.
|
283 |
An exploratory study of senior high school students' experiences of physical motion during examinationsLawrence, Abigail January 2015 (has links)
No description available.
|
284 |
Measuring morphological awareness across languagesQuiroga Villalba, Jorge January 2013 (has links)
No description available.
|
285 |
A study of the role of the 'teacher factor' in washbackWang, Jing January 2011 (has links)
No description available.
|
286 |
Issues and arguments in the measurement of second language pronunciationIsaacs, Talia January 2011 (has links)
No description available.
|
287 |
The impact of judges' consensus on the accuracy of anchor-based judgmental estimates of multiple-choice test item difficulty: The case of the NATABOC ExaminationDiBartolomeo, Matthew 01 January 2010 (has links)
Multiple factors have influenced testing agencies to more carefully consider the manner and frequency in which pretest item data are collected and analyzed. One potentially promising development is judges’ estimates of item difficulty. Accurate estimates of item difficulty may be used to reduce pretest samples sizes, supplement insufficient pretest sample sizes, aid in test form construction, assist in test form equating, calibrate test item writers who may be asked to produce items to meet statistical specifications, inform the process of standard setting, aid in preparing randomly equivalent blocks of pretest items, and/or aid in helping to set item response theory prior distributions. Two groups of 11 and eight judges, respectively, provided estimates of difficulty for the same set of 33 multiple-choice items from the National Athletic Trainers’ Association Board of Certification (NATABOC) Examination. Judges were faculty in Commission on Accreditation of Athletic Training Education-approved athletic training education programs and were NATABOC-approved examiners of the former hands-on practical portion of the Examination. For each item, judges provided two rounds of independent estimates of item difficulty and a third round group-level consensus estimate. Prior to providing estimates of item difficulty in rounds two and three, group discussion of the estimates provided in the preceding round was conducted. In general, the judges’ estimates of test item difficulty did not improve across rounds as predicted. Two-way repeated measures analyses of variance comparing item set mean difficulty estimates by round and the item set mean empirical item difficulty revealed no statistically significant differences across rounds, groups, or the interaction of these two factors. Moreover, item set mean difficulty estimates by round gradually drifted away from the item set mean empirical item difficulty and, therefore, mean estimation bias and effect size analyses gradually increased in correspondence with the item set mean item difficulty estimates provided across rounds. Therefore, the results revealed that no item difficulty estimation round yielded statistically significantly better recovery of the empirical item difficulty values compared to the other rounds.
|
288 |
The construction and validation of a test of technical terms in general educationUnderwood, Edward S. January 1954 (has links)
Thesis (Ed.D.)--Boston University
|
289 |
Impact of item parameter drift on test equating and proficiency estimatesHan, Kyung Tyek 01 January 2008 (has links)
When test equating is implemented, the effects of item parameter drift (IPD), especially on the linking items with the anchor test design is expected to cause flaws in the measurement. However, an important question that has not yet been examined is how much IPD is allowed until the effect is consequential. To answer this overarching question, three Monte-Carlo simulation studies were conducted. In the first study, titled 'Impact of unidirectional IPD on test equating and proficiency estimates,' the indirect effect of IPD on proficiency estimates (through its effect on test equating designs that use linking items containing IPD) was examined. The results with the regression line-like plots provided a comprehensive illustration of the relationship between IPD and its consequences, which can be used as an important guideline for practitioners when IPD is expected in testing. In the second study, titled 'Impact of multidirectional IPD on test equating and proficiency estimates,' the impact of different combinations of linking items with various multidirectional IPD on the test equating procedure was investigated for three popular scaling methods (mean-mean, mean-sigma, and TCC method). It was hypothesized that multidirectional IPD would influence the amount of random error observed in the linking while the effect of systematic error would be minimal. The study found the results confirming the hypothesis and also found different combinations of multidimensional IPD results in different levels of impact even with the same total amount of IPD. The third study, titled 'Impact of IPD on pseudo-guessing parameter estimates and test equating,' examined how serious the consequences are if c-parameters are not transformed in the test equating procedure when IPD exists. Three new item calibration strategies to put c-parameter estimates on the same scale across tests were proposed. The results indicated that the consequences of IPD with various calibration strategies and scaling methods were not substantially different when the external linking design was used, but the study found a choice of calibration method and scaling method could result in different outputs when the internal linking design and/or different cut scores were used.
|
290 |
Approaches for addressing the fit of item response theory models to educational test dataZhao, Yue 01 January 2008 (has links)
The study was carried out to accomplish three goals : (1) Propose graphical displays of IRT model fit at the item level and suggest fit procedures at the test level that are not impacted by large sample size, (2) examine the impact of IRT model misfit on proficiency classifications, and (3) investigate consequences of model misfit in assessing academic growth. The main focus of the first goal was on the use of more and better graphical procedures for investigating model fit and misfit through the use of residuals and standardized residuals at the item level. In addition, some new graphical procedures and a non-parametric test statistic for investigating fit at the test score level were introduced, and some examples were provided. Based on a realistic dataset from a high school assessment, statistical and graphical methods were applied and results were reported. More important than the results about the actual fit, were the procedures that were developed and evaluated. In addressing the second goal, practical consequences of IRT model misfit on performance classifications and test score precision were examined. It was found that with several of the data sets under investigation, test scores were noticeably less well recovered with the misfitting model; and there were practically significant differences in the accuracy of classifications with the model that fit the data less well. In addressing the third goal, the consequences of model misfit in assessing academic growth in terms of test score precision, decision accuracy and passing rate were examined. The three-parameter logistic/graded response (3PL/GR) models produced more accurate estimates than the one-parameter logistic/partial credit (1PL/PC) models, and the fixed common item parameter method produced closer results to “truth” than linear equating using the mean and sigma transformation. IRT model fit studies have not received the attention they deserve among testing agencies and practitioners. On the other hand, IRT models can almost never provide a perfect fit to the test data, but the evidence is substantial that these models can provide an excellent framework for solving practical measurement problems. The importance of this study is that it provides ideas and methods for addressing model fit, and most importantly, highlights studies for addressing the consequences of model misfit for use in making determinations about the suitability of particular IRT models.
|
Page generated in 0.1316 seconds