Return to search

Diagnostic writing assessment: the development and validation of a rating scale

Whole document restricted, see Access Instructions file below for details of how to access the print copy. / Alderson (2005) suggests that diagnostic tests should identify strengths and weaknesses in learners' use of language, focus on specific elements rather than global abilities and provide detailed feedback to stakeholders. However, rating scales used in performance assessment have been repeatedly criticized for being imprecise, for using impressionistic terminology (Fulcher, 2003; Upshur & Turner, 1999; Mickan, 2003) and for often resulting in holistic assessments (Weigle, 2002). The aim of this study was to develop a theoretically-based and empirically-developed rating scale and to evaluate whether such a scale functions more reliably and validly in a diagnostic writing context than a pre-existing scale with less specific descriptors of the kind usually used in proficiency tests. The existing scale is used in the Diagnostic English Language Needs Assessment (DELNA) administered to first-year students at the University of Auckland. The study was undertaken in two phases. During Phase 1, 601 writing scripts were subjected to a detailed analysis using discourse analytic measures. The results of this analysis were used as the basis for the development of the new rating scale. Phase 2 involved the validation of this empirically-developed scale. For this, ten trained raters applied both sets of descriptors to the rating of 100 DELNA writing scripts. A quantitative comparison of rater behavior was undertaken using FACETS (a multi-faceted Rasch measurement program). Questionnaires and interviews were also administered to elicit the raters' perceptions of the efficacy of the two scales. The results indicate that rater reliability and candidate discrimination were generally higher and that raters were able to better distinguish between different aspects of writing ability when the more detailed, empirically-developed descriptors were used. The interviews and questionnaires showed that most raters preferred using the empirically-developed descriptors because they provided more guidance in the rating process. The findings are discussed in terms of their implications for rater training and rating scale development, as well as score reporting in the context of diagnostic assessment.

Identiferoai:union.ndltd.org:ADTP/278218
Date January 2007
CreatorsKnoch, Ute
PublisherResearchSpace@Auckland
Source SetsAustraliasian Digital Theses Program
LanguageEnglish
Detected LanguageEnglish
RightsWhole document restricted. Items in ResearchSpace are protected by copyright, with all rights reserved, unless otherwise indicated., http://researchspace.auckland.ac.nz/docs/uoa-docs/rights.htm, Copyright: The author

Page generated in 0.0081 seconds