Global ETD Search

1	Relatively idiosyncratic : exploring variations in assessors' performance judgements within medical education Yeates, Peter January 2013 (has links) Background: Whilst direct-observation, workplace-based (or performance) assessments, sit at the conceptual epitome of assessment within medical education, their overall utility is limited by high-inter-assessor score variability. We conceptualised this issue as one of problematic judgements by assessors. Existing literature and evidence about judgements within performance appraisal and impression formation, as well as the small evolving literature on raters’ cognition within medical education, provided the theoretical context to study assessor’s judgement processes.Methods and Results: In this thesis we present three studies. The first study adopted an exploratory approach to studying assessors’ judgements in direct observation performance assessments, by asking assessors to describe their thoughts whilst assessing standard videoed performances by junior doctors. Comments and follow up interviews were analysed qualitatively using grounded theory principles. Results showed that assessors attributed different levels of salience to different aspects of performances, understood criteria differently (often comparing performance against other trainees) and expressed their judgements in unique narrative language. Consequently assessors’ judgements were comparatively idiosyncratic, or unique.The two subsequent follow up studies used experimental, internet based, experimental designs to further investigate the comparative judgements demonstrated in study 1. In study 2, participants were primed with either good or poor performances prior to watching intermediate (borderline) performances. In study 3 a similar design was employed but participants watched identical performances in either increasing or decreasing levels of proficiency. Collectively, the results of these two studies showed that recent experiences influenced assessors’ judgements, repeatedly showing a contrast effect (performances were scored unduly differently from earlier performances). These effects were greater than participants’ consistent tendency to be either lenient or stringent and occurred at multiple levels of performance. The effect appeared to be robust despite our attempting to reduce participants’ reliance on the immediate context. Moreover, assessors appeared to lack insight into the effect on their judgements.Discussion: Collectively, these results indicate that assessors score variations can be substantially explained by idiosyncrasy in cognitive representations of the judgement task, and susceptibility to contrast effects through comparative judgements. Moreover, assessors appear to be incapable of judging in absolute terms, instead judging normatively. These findings have important implications for theory and practice and suggest numerous further lines of research. 610.71
2	The Impact of Mental Workload on Rater Performance and Behaviour in the Assessment of Clinical Competence Tavares, Walter January 2014 (has links) The complexity and broadening of competencies have led to a number of assessment frameworks that advocate for the use of rater judgment in direct observation of clinical performance. The degree to which these assessment processes produce scores that are valid, are therefore vitally dependent on a rater’s cognitive ability. A number of theories suggest that many of the cognitive structures needed to complete rating tasks are capacity limited and may therefore become a source of difficulty when rating demands exceed resources. This thesis explores the role of rating demands on the performance and behaviour of raters in the assessment of clinical competence and asks: in what way do rating demands associated with rating clinical performance affect rater performance and behaviour? I hypothesized that as rating demands increase, rating performance declines and raters engage in cognitive avoidance strategies in order to complete the task. I tested this hypothesis by manipulating intrinsic and extraneous sources of load for raters in the assessment of clinical performance. Results consistently demonstrated that intrinsic load, specifically broadening raters’ focus by increasing the number of dimensions to be considered simultaneously, negatively affected indicators of rating quality. However, extraneous demands failed to result in the same effect in 2 of 3 experiments. When we explored the cognitive strategies raters engage under high load conditions we learned of a number of strategies to reduce cognitive work, including idiosyncratically minimizing intrinsic demands (leading to poor inter-rater reliability) and active elimination of sources of extraneous load, explaining both findings. When we induced extraneous load in manner that could not be easily minimized by raters, we also found impairments in rater performance, specifically the provision of feedback. I conclude that rating demands, whether induced intrinsically or by extraneous sources, impair rater performance affecting both the utility of scores and the opportunity for learner development. Implications for health professions education and future directions are discussed. / Dissertation / Doctor of Philosophy (PhD) Rater Cognition Clinical Competence Assessment Mental Workload Cognitive Load
3	A comparability study on differences between scores of handwritten and typed responses on a large-scale writing assessment Rankin, Angelica Desiree 01 July 2015 (has links) As the use of technology for personal, professional, and learning purposes increases, more and more assessments are transitioning from a traditional paper-based testing format to a computer-based one. During this transition, some assessments are being offered in both paper and computer formats in order to accommodate examinees and testing center capabilities. Scores on the paper-based test are often intended to be directly comparable to the computer-based scores, but such claims of comparability are often unsupported by research specific to that assessment. Not only should the scores be examined for differences, but the thought processes used by raters while scoring those assessments should also be studied to better understand why raters might score response modes differently. Previous comparability literature can be informative, but more contemporary, test-specific research is needed in order to completely support the direct comparability of scores. The goal of this thesis was to form a more complete understanding of why analytic scores on a writing assessment might differ, if at all, between handwritten and typed responses. A representative sample of responses to the writing composition portion of a large-scale high school equivalency assessment were used. Six trained raters analytically scored approximately six-hundred examinee responses each. Half of those responses were typed, and the other half were the transcribed handwritten duplicates. Multiple methods were used to examine why differences between response modes might exist. A MANOVA framework was applied to examine score differences between response modes, and the systematic analyses of think-alouds and interviews were used to explore differences in rater cognition. The results of these analyses indicated that response mode was of no practical significance, meaning that domain scores were not notably dependent on whether or not a response was presented as typed or handwritten. Raters, on the other hand, had a more substantial effect on scores. Comments from the think-alouds and interviews suggest that, while the scores were not affected by response mode, raters tended to consider certain aspects of typed responses differently than handwritten responses. For example, raters treated typographical errors differently from other conventional errors when scoring typed responses, but not while scoring the handwritten duplicates. Raters also indicated that they preferred scoring typed responses over handwritten ones, but felt they could overcome their personal preferences to score both response modes similarly. Empirical investigations on the comparability of scores, combined with the analysis of raters’ thought processes, helped to provide a more evidence-based answer to the question of why scores might differ between response modes. Such information could be useful for test developers when making decisions regarding what mode options to offer and how to best train raters to score such assessments. The design of this study itself could be useful for testing organizations and future research endeavors, as it could be used as a guide for exploring score differences and the human-based reasons behind them. publicabstract Comparability Handwritten Mixed Methods Rater Cognition Typed Writing Assessment Educational Psychology
4	The Effect of Raters and Rating Conditions on the Reliability of the Missionary Teaching Assessment Ure, Abigail Christine 17 December 2010 (has links) (PDF) This study investigated how 2 different rating conditions, the controlled rating condition (CRC) and the uncontrolled rating condition (URC), effected rater behavior and the reliability of a performance assessment (PA) known as the Missionary Teaching Assessment (MTA). The CRC gives raters the capability to manipulate (pause, rewind, fast-forward) video recordings of an examinee's performance as they rate while the URC does not give them this capability (i.e., the rater must watch the recording straight through without making any manipulations). Few studies have compared the effect of these two rating conditions on ratings. Ryan et al. (1995) analyzed the impact of the CRC and URC on the accuracy of ratings, but few, if any, have analyzed its impact on reliability. The Missionary Teaching Assessment is a performance assessment used to assess the teaching abilities of missionaries for the Church of Jesus Christ of Latter-day Saints at the Missionary Training Center. In this study, 32 missionaries taught a 10-minute lesson that was recorded and later rated by trained raters based on a rubric containing 5 criteria. Each teaching sample was rated by 4 of 6 raters. Two of the 4 ratings were rated using the CRC and 2 using the URC. Camtasia Studio (2010), a screen capture software, was used to record when raters used any type of manipulation. The recordings were used to analyze if raters manipulated the recordings and if so, when and how frequently. Raters also performed think-alouds following a random sample of the ratings that were performed using the CRC. These data revealed that when raters had access to the CRC they took advantage of it the majority of the time, but they differed in how frequently they manipulated the recordings. The CRC did not add an exorbitant amount of time to the rating process. The reliability of the ratings was analyzed using both generalizability theory (G theory) and many-facets Rasch measurement (MFRM). Results indicated that, in general, the reliability of the ratings obtained from the 2 rating conditions were not statistically significantly different from each other. The implications of these findings are addressed. generalizability theory many-facet Rasch measurement performance assessment microteaching reliability rater behavior rater cognition video recording assessment in teacher education Educational Psychology

1

Page generated in 0.0878 seconds