• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 4
  • 4
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Personality and Rater Leniency: Comparison of Broad and Narrow Measures of Conscientiousness and Agreeableness

Grahek, Myranda 05 1900 (has links)
Performance appraisal ratings provide the basis for numerous employment decisions, including retention, promotion, and salary increases. Thus, understanding the factors affecting the accuracy of these ratings is important to organizations and employees. Leniency, one rater error, is a tendency to assign higher ratings in appraisal than is warranted by actual performance. The proposed study examined how personality factors Agreeableness and Conscientiousness relate to rater leniency. The ability of narrower facets of personality to account for more variance in rater leniency than will the broad factors was also examined. The study used undergraduates' (n = 226) evaluations of instructor performance to test the study's hypotheses. In addition to personality variables, students' social desirability tendency and attitudes toward instructor were predicted to be related to rater leniency. Partial support for the study's hypotheses were found. The Agreeableness factor and three of the corresponding facets (Trust, Altruism and Tender-Mindedness) were positively related to rater leniency as predicted. The hypotheses that the Conscientiousness factor and three of the corresponding facets (Order, Dutifulness, and Deliberation) would be negatively related to rater leniency were not supported. In the current sample the single narrow facet Altruism accounted for more variance in rater leniency than the broad Agreeableness factor. While social desirability did not account for a significant amount of variance in rater leniency, attitude toward instructor was found to have a significant positive relationship accounting for the largest amount of variance in rater leniency.
2

A Monte Carlo Approach for Exploring the Generalizability of Performance Standards

Coraggio, James Thomas 16 April 2008 (has links)
While each phase of the test development process is crucial to the validity of the examination, one phase tends to stand out among the others: the standard setting process. The standard setting process is a time-consuming and expensive endeavor. While it has received the most attention in the literature among any of the technical issues related to criterion-referenced measurement, little research attention has been given to generalizing the resulting performance standards. This procedure has the potential to improve the standard setting process by limiting the number of items rated and the number of individual rater decisions. The ability to generalize performance standards has profound implications both from a psychometric as well as a practicality standpoint. This study was conducted to evaluate the extent to which minimal competency estimates derived from a subset of multiple choice items using the Angoff standard setting method would generalize to the larger item set. Individual item-level estimates of minimal competency were simulated from existing and simulated item difficulty distributions. The study was designed to examine the characteristics of item sets and the standard setting process that could impact the ability to generalize a single performance standard. The characteristics and the relationship between the two item sets included three factors: (a) the item difficulty distributions, (b) the location of the 'true' performance standard, (c) the number of items randomly drawn in the sample. The characteristics of the standard setting process included four factors: (d) number of raters, (e) percentage of unreliable raters, (f) magnitude of 'unreliability' in unreliable raters, and (g) the directional influence of group dynamics and discussion. The aggregated simulation results were evaluated in terms of the location (bias) and the variability (mean absolute deviation, root mean square error) in the estimates. The simulation results suggest that the model of using partial item sets may have some merit as the resulting performance standard estimates may 'adequately' generalize to those set with larger item sets. The simulation results also suggest that elements such as the distribution of item difficulty parameters and the potential for directional group influence may also impact the ability to generalize performance standards and should be carefully considered.
3

Investigating The Reliability And Validity Of Knowledge Structure Evaluations: The Influence Of Rater Error And Rater Limitation

Harper-Sciarini, Michelle 01 January 2010 (has links)
The likelihood of conducting safe operations increases when operators ave effectively integrated their knowledge of the operation into meaningful relationships, referred to as knowledge structures (KSs). Unlike knowing isolated facts about an operation, well integrated KSs reflect a deeper understanding. It is, however, only the isolated facts that are often evaluated in training environments. To know whether an operator has formed well integrated KSs, KS evaluation methods must be employed. Many of these methods, however, require subjective, human-rated evaluations. These ratings are often prone to the negative influence of a rater's limitations such as rater biases and cognitive limitations; therefore, the extent to which KS evaluations are beneficial is dependent on the degree to which the rater's limitations can be mitigated. The main objective of this study was to identify factors that will mitigate rater limitations and test their influence on the reliability and validity of KS evaluations. These factors were identified through the delineation of a framework that represents how a rater's limitations will influence the cognitive processes that occur during the evaluation process. From this framework, one factor (i.e., operation knowledge), and three mitigation techniques (i.e., frame-of-reference training, reducing the complexity of the KSs, and providing referent material) were identified. Ninety-two participants rated the accuracy of eight KSs over a period of two days. Results indicated that reliability was higher after training. Furthermore, several interactions indicated that the benefits of domain knowledge, referent material, and reduced complexity existed within subsets of the participants. For example, reduced complexity only increased reliability among evaluators with less knowledge of the operation. Also, referent material increased reliability only for those who scored less complex KSs. Both the practical and theoretical implications of these results are provided.
4

Assisting Novice Raters in Addressing the In-Between Scores When Rating Writing

Greer, Brittney 16 June 2013 (has links) (PDF)
In the research regarding rating ESL writing assessments, borderline writing samples are mentioned, but a solution has yet to be addressed. Borderline samples are writing samples that do not perfectly fit a set level within the rubric, but instead have characteristics from multiple levels. The aim of this thesis is to provide an improved training module in the setting of an Intensive English Program by exposing new raters to borderline samples and rating rationale from experienced raters. The purpose of this training is to increase the confidence, consistency, and accuracy of novice raters when rating borderline samples of writing. The training consists of a workbook with a rubric and instructions for use, benchmark examples of writing, borderline examples of writing with comments from experienced raters defending the established scores, then a variety of writing samples for practice. The selection of the benchmark and the borderline examples of writing was informed by the fit statistic from existing datasets that had been analyzed with many-facet Rasch measurement. Eight experienced raters participated in providing rationale based upon the rubric explaining why each borderline sample was rated with its established score, and describing why the sample could be considered at a different level. In order to assess the effectiveness of the training workbook, it was piloted by 10 novice raters who rated a series of essays and responded to a survey. Results of the survey demonstrated that rater confidence increased following the training, but that they needed more time with the training materials to use them properly. The statistical analyses showed insignificant changes, which could be due to the limitations of the data collection. Further research regarding the effectiveness of this training workbook is necessary, as well as an increased discussion in the field regarding the prevalent issue of rating borderline samples of writing.

Page generated in 0.0767 seconds