• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 6
  • 4
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

The Angoff Method and Rater Analysis: Enhancing Cutoff Score Reliability and Accuracy

Baker, Charles E., 1957- 12 1900 (has links)
At times called a philosophy and other times called a process, cutting score methodology is an issue routinely encountered by Industrial/Organizational (I/0) psychologists. Published literature on cutting score methodology appears much more frequently in academic settings than it does in personnel settings where the potential for lawsuits typically occurs more often. With the passage of the 1991 Civil Rights Act, it is no longer legal to use within-group scoring. It has now become necessary for personnel psychologists to develop more acceptable selection methods that fall within established guidelines. Designating cutoff scores with the Angoff method appears to suit many requirements of personnel departments. Several procedures have evolved that suggest enhancing the accuracy and reliability of the Angoff method is possible. The current experiment investigated several such procedures, and found that rater accuracy methods significantly enhance cutoff score reliability and accuracy.
2

A Monte Carlo Approach for Exploring the Generalizability of Performance Standards

Coraggio, James Thomas 16 April 2008 (has links)
While each phase of the test development process is crucial to the validity of the examination, one phase tends to stand out among the others: the standard setting process. The standard setting process is a time-consuming and expensive endeavor. While it has received the most attention in the literature among any of the technical issues related to criterion-referenced measurement, little research attention has been given to generalizing the resulting performance standards. This procedure has the potential to improve the standard setting process by limiting the number of items rated and the number of individual rater decisions. The ability to generalize performance standards has profound implications both from a psychometric as well as a practicality standpoint. This study was conducted to evaluate the extent to which minimal competency estimates derived from a subset of multiple choice items using the Angoff standard setting method would generalize to the larger item set. Individual item-level estimates of minimal competency were simulated from existing and simulated item difficulty distributions. The study was designed to examine the characteristics of item sets and the standard setting process that could impact the ability to generalize a single performance standard. The characteristics and the relationship between the two item sets included three factors: (a) the item difficulty distributions, (b) the location of the 'true' performance standard, (c) the number of items randomly drawn in the sample. The characteristics of the standard setting process included four factors: (d) number of raters, (e) percentage of unreliable raters, (f) magnitude of 'unreliability' in unreliable raters, and (g) the directional influence of group dynamics and discussion. The aggregated simulation results were evaluated in terms of the location (bias) and the variability (mean absolute deviation, root mean square error) in the estimates. The simulation results suggest that the model of using partial item sets may have some merit as the resulting performance standard estimates may 'adequately' generalize to those set with larger item sets. The simulation results also suggest that elements such as the distribution of item difficulty parameters and the potential for directional group influence may also impact the ability to generalize performance standards and should be carefully considered.
3

EFFECTS OF ITEM-LEVEL FEEDBACK ON THE RATINGS PROVIDED BY JUDGES IN A MODIFIED-ANGOFF STANDARD SETTING STUDY

Peabody, Michael R 01 January 2014 (has links)
Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations and although all cut score decisions are by nature arbitrary, they should not be capricious. Establishing a minimum passing standard is the technical expression of a policy decision and the information gained through standard setting studies inform these policy decisions. To this end, it is necessary to conduct robust examinations of methods and techniques commonly applied to standard setting studies in order to better understand issues that may influence policy decisions. The modified-Angoff method remains one of the most popular methods for setting performance standards in testing and assessment. With this method, is common practice to provide content experts with feedback regarding the item difficulties; however, it is unclear how this feedback affects the ratings and recommendations of content experts. Recent research seems to indicate mixed results, noting that the feedback given to raters may or may not alter their judgments depending on the type of data provided, when the data was provided, and how raters collaborated within groups and between groups. This research seeks to examine issues related to the effects of item-level feedback on the judgment of raters. The results suggest that the most important factor related to item-level feedback is whether or not a Subject Matter Expert (SME) was able to correctly answer a question. If so, then the SMEs tended to rely on their own inherent sense of item difficulty rather than the data provided, in spite of empirical evidence to the contrary. The results of this research may hold implications for how standard setting studies are conducted with regard to the difficulty and ordering of items, the ability level of content experts invited to participate in these studies, and the types of feedback provided.
4

Measurement of alignment between standards and assessment

Näsström, Gunilla January 2008 (has links)
Many educational systems of today are standards-based and aim at for alignment, i.e. consistency, among the components of the educational system: standards, teaching and assessment. To conclude whether the alignment is sufficiently high, analyses with a useful model are needed. This thesis investigates the usefulness of models for analyzing alignment between standards and assessments, with emphasis on one method: Bloom’s revised taxonomy. The thesis comprises an introduction and five articles that empirically investigate the usefulness of methods for alignment analyses. In the first article, the usefulness of different models for analyzing alignment between standards and assessment is theoretically and empirically compared based on a number of criteria. The results show that Bloom’s revised taxonomy is the most useful model. The second article investigates the usefulness of Bloom’s revised taxonomy for interpretation of standards in mathematics with two differently composed panels of judges. One panel consisted of teachers and the other panel of assessment experts. The results show that Bloom’s revised taxonomy is useful for interpretation of standards, but that many standards are multi-categorized (placed in more than one category). The results also show higher levels of intra- and inter-judge consistency for assessment experts than for teachers. The third article further investigates the usefulness of Bloom’s revised taxonomy for analyses of alignment between standards and assessment. The results show that Bloom’s revised taxonomy is useful for analyses of both standards and assessments. The fourth article studies whether vague and general standards can explain the large proportion of multi-categorized standards in mathematics. The strategy was to divide a set of standards into smaller substandards and then compare the usefulness and inter-judge consistency for categorization with Bloom’s revised taxonomy for undivided and divided standards. The results show that vague and general standards do not explain the large proportion of multi-categorized standards. Another explanation is related to the nature of mathematics that often intertwines conceptual and procedural knowledge. This was also studied in the article and the results indicate that this is a probable explanation. The fifth article focuses on another aspect of alignment between standards and assessment, namely the alignment between performance standards and cut-scores for a specific assessment. The validity of two standard-setting methods, the Angoff method and the borderline-group method, was investigated. The results show that both methods derived reasonable and trustworthy cut-scores, but also that there are potential problems with these methods. In the introductory part of the thesis, the empirical studies are summarized, contextualized and discussed. The discussion relates alignment to validity issues for assessments and relates the obtained empirical results to theoretical assumptions and applied implications. One conclusion of the thesis is that Bloom’s revised taxonomy is useful for analyses of alignment between standards and assessments. Another conclusion is that the two standard setting methods derive reasonable and trustworthy results. It is preferable if an alignment model can be used both for alignment analyses and in ongoing practice for increasing alignment. Bloom’s revised taxonomy has the potential for being such an alignment model. This thesis has found this taxonomy useful for alignment analyses, but its’ usefulness for increasing alignment in ongoing practice has to be investigated.
5

Using Stratified Item Selection to Reduce the Number of Items Rated in Standard Setting

Smith, Tiffany Nicole 01 January 2011 (has links)
The primary purpose of this study was to evaluate the effectiveness of stratified item sampling in order to reduce the number of items needed in Modified Angoff standard setting studies. Representative subsets of items were extracted from a total of 30 full-length tests based upon content weights, item difficulty, and item discrimination. Cut scores obtained from various size subsets of each test were compared to the full-length test cut score as a measure of generalizability. Applied sampling results indicated that 50% of the full-length test is sufficient to obtain cut scores within one standard error of estimate (SEE) of the full-length test standard, and 70% of the full-length test is sufficient to obtain standards within one percentage point of the full-length test standard. A theoretical sampling procedure indicated that 35% of the full-length test is required to reliably obtain a standard within one SEE of the full-length standard, and 65% of the full-length test is required to fall within one percentage point. The effects of test length, panelist group size, and interrater reliability on the feasibility of stratified item sampling were also examined. However, these standard setting characteristics did not serve as significant predictors of subset generalizability in this study.
6

Influence of Item Response Theory and Type of Judge on a Standard Set Using the Iterative Angoff Standard Setting Method

Hamberlin, Melanie Kidd 08 1900 (has links)
The purpose of this investigation was to determine the influence of item response theory and different types of judges on a standard. The iterative Angoff standard setting method was employed by all judges to determine a cut-off score for a public school district-wide criterion-reformed test. The analysis of variance of the effect of judge type and standard setting method on the central tendency of the standard revealed the existence of an ordinal interaction between judge type and method. Without any knowledge of p-values, one judge group set an unrealistic standard. A significant disordinal interaction was found concerning the effect of judge type and standard setting method on the variance of the standard. A positive covariance was detected between judges' minimum pass level estimates and empirical item information. With both p-values and b-values, judge groups had mean minimum pass levels that were positively correlated (ranging from .77 to .86), regardless of the type of information given to the judges. No differences in correlations were detected between different judge types or different methods. The generalizability coefficients and phi indices for 12 judges included in any method or judge type were acceptable (ranging from .77 to .99). The generalizability coefficient and phi index for all 24 judges were quite high (.99 and .96, respectively).

Page generated in 0.021 seconds