• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 189
  • 88
  • 9
  • 9
  • 8
  • 6
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 393
  • 393
  • 393
  • 80
  • 79
  • 79
  • 77
  • 73
  • 65
  • 63
  • 63
  • 55
  • 49
  • 44
  • 43
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Examination of the Application of Item Response Theory to the Angoff Standard Setting Procedure

Clauser, Jerome Cody 01 September 2013 (has links)
Establishing valid and reliable passing scores is a vital activity for any examination used to make classification decisions. Although there are many different approaches to setting passing scores, this thesis is focused specifically on the Angoff standard setting method. The Angoff method is a test-centric classical test theory based approach to estimating performance standards. In the Angoff method each judge estimates the proportion of minimally competent examinees who will answer each item correctly. These values are summed across items and averages across judges to arrive at a recommended passing score. Unfortunately, research has shown that the Angoff method has a number of limitations which have the potential to undermine both the validity and reliability of the resulting standard. Many of the limitations of the Angoff method can be linked to its grounding in classical test theory. The purpose of this study is to determine if the limitations of the Angoff could be mitigated by a transition to an item response theory (IRT) framework. Item response theory is a modern measurement model for relating examinees' latent ability to their observed test performance. Theoretically the transition to an IRT-based Angoff method could result in more accurate, stable, and efficient passing scores. The methodology for the study was divided into three studies designed to assess the potential advantages of using an IRT-based Angoff method. Study one examined the effect of allowing judges to skip unfamiliar items during the ratings process. The goal of this study was to detect if passing scores are artificially biased due to deficits in the content experts' specific item level content knowledge. Study two explored the potential benefit of setting passing scores on an adaptively selected subset of test items. This study attempted to leverage IRT's score invariance property to more efficiently estimate passing scores. Finally study three compared IRT-based standards to traditional Angoff standards using a simulation study. The goal of this study was to determine if passing scores set using the IRT Angoff method had greater stability and accuracy than those set using the common True Score Angoff method. Together these three studies examined the potential advantages of an IRT-based approach to setting passing scores. The results indicate that the IRT Angoff method does not produce more reliable passing score than the common Angoff method. The transition to the IRT-based approach, however, does effectively ameliorate two sources of systematic error in the common Angoff method. The first source of error is brought on by requiring that all judges rate all items and the second source is introduced during the transition from test to scaled score passing scores. By eliminating these sources of error the IRT-based method allows for accurate and unbiased estimation of the judges' true opinion of the ability of the minimally capable examinee. Although all of the theoretical benefits of the IRT Angoff method could not be demonstrated empirically, the results of this thesis are extremely encouraging. The IRT Angoff method was shown to eliminate two sources of systematic error resulting in more accurate passing scores. In addition this thesis provides a strong foundation for a variety of studies with the potential to aid in the selection, training, and evaluation of content experts. Overall findings from this thesis suggest that the application of IRT to the Angoff standard setting method has the potential to offer significantly more valid passing scores.
42

USING DIFFERENTIAL FUNCTIONING OF ITEMS AND TESTS (DFIT) TO EXAMINE TARGETED DIFFERENTIAL ITEM FUNCTIONING

O'Brien, Erin L. January 2014 (has links)
No description available.
43

Applying Longitudinal IRT Models to Small Samples for Scale Evaluation

Keum, EunHee 09 August 2016 (has links)
No description available.
44

Ability Estimation Under Different Item Parameterization and Scoring Models

Si, Ching-Fung B. 05 1900 (has links)
A Monte Carlo simulation study investigated the effect of scoring format, item parameterization, threshold configuration, and prior ability distribution on the accuracy of ability estimation given various IRT models. Item response data on 30 items from 1,000 examinees was simulated using known item parameters and ability estimates. The item response data sets were submitted to seven dichotomous or polytomous IRT models with different item parameterization to estimate examinee ability. The accuracy of the ability estimation for a given IRT model was assessed by the recovery rate and the root mean square errors. The results indicated that polytomous models produced more accurate ability estimates than the dichotomous models, under all combinations of research conditions, as indicated by higher recovery rates and lower root mean square errors. For the item parameterization models, the one-parameter model out-performed the two-parameter and three-parameter models under all research conditions. Among the polytomous models, the partial credit model had more accurate ability estimation than the other three polytomous models. The nominal categories model performed better than the general partial credit model and the multiple-choice model with the multiple-choice model the least accurate. The results further indicated that certain prior ability distributions had an effect on the accuracy of ability estimation; however, no clear order of accuracy among the four prior distribution groups was identified due to an interaction between prior ability distribution and threshold configuration. The recovery rate was lower when the test items had categories with unequal threshold distances, were close at one end of the ability/difficulty continuum, and were administered to a sample of examinees whose population ability distribution was skewed to the same end of the ability continuum.
45

How to Score Situational Judgment Tests: A Theoretical Approach and Empirical Test

Whelpley, Christopher E. 01 January 2014 (has links)
The purpose of this dissertation is to examine how the method used to a score situational judgment test (SJT) affects the validity of the SJT both in the presence of other predictors and as a single predictor of task performance. To this end, I compared the summed score approach of scoring SJTs with item response theory and multivariate items response theory. Using two samples and three sets of analyses, I found that the method used to score SJTs influences the validity of the test and that IRT and MIRT show promise for increasing SJT validity. However, no individual scoring method produced the highest amount of validity across all sets of analyses. In line with previous research, SJTs added incremental validity in the presence of GMA and personality and, again, the method used to score the SJT affected the incremental validity. A relative weights analysis was performed for each scoring method across all the sets of analyses showing that, depending on the scoring method, SJT score may account for more criterion variance than either GMA or personality. However, it is likely that the samples were influenced by range restriction present in the incumbent samples.
46

Multidimensional item response theory observed score equating methods for mixed-format tests

Peterson, Jaime Leigh 01 July 2014 (has links)
The purpose of this study was to build upon the existing MIRT equating literature by introducing a full multidimensional item response theory (MIRT) observed score equating method for mixed-format exams because no such methods currently exist. At this time, the MIRT equating literature is limited to full MIRT observed score equating methods for multiple-choice only exams and Bifactor observed score equating methods for mixed-format exams. Given the high frequency with which mixed-format exams are used and the accumulating evidence that some tests are not purely unidimensional, it was important to present a full MIRT equating method for mixed-format tests. The performance of the full MIRT observed score method was compared with the traditional equipercentile method, and unidimensional IRT (UIRT) observed score method, and Bifactor observed score method. With the Bifactor methods, group-specific factors were defined according to item format or content subdomain. With the full MIRT methods, two- and four-dimensional models were included and correlations between latent abilities were freely estimated or set to zero. All equating procedures were carried out using three end-of-course exams: Chemistry, Spanish Language, and English Language and Composition. For all subjects, two separate datasets were created using pseudo-groups in order to have two separate equating criteria. The specific equating criteria that served as baselines for comparisons with all other methods were the theoretical Identity and the traditional equipercentile procedures. Several important conclusions were made. In general, the multidimensional methods were found to perform better for datasets that evidenced more multidimensionality, whereas unidimensional methods worked better for unidimensional datasets. In addition, the scale on which scores are reported influenced the comparative conclusions made among the studied methods. For performance classifications, which are most important to examinees, there typically were not large discrepancies among the UIRT, Bifactor, and full MIRT methods. However, this study was limited by its sole reliance on real data which was not very multidimensional and for which the true equating relationship was not known. Therefore, plans for improvements, including the addition of a simulation study to introduce a variety of dimensional data structures, are also discussed.
47

Observed score and true score equating procedures for multidimensional item response theory

Brossman, Bradley Grant 01 May 2010 (has links)
The purpose of this research was to develop observed score and true score equating procedures to be used in conjunction with the Multidimensional Item Response Theory (MIRT) framework. Currently, MIRT scale linking procedures exist to place item parameter estimates and ability estimates on the same scale after separate calibrations are conducted. These procedures account for indeterminacies in (1) translation, (2) dilation, (3) rotation, and (4) correlation. However, no procedures currently exist to equate number correct scores after parameter estimates are placed on the same scale. This research sought to fill this void in the current psychometric literature. Three equating procedures--two observed score procedures and one true score procedure--were created and described in detail. One observed score procedure was presented as a direct extension of unidimensional IRT observed score equating, and is referred to as the "Full MIRT Observed Score Equating Procedure." The true score procedure and the second observed score procedure incorporated the statistical definition of the "direction of best measurement" in an attempt to equate exams using unidimensional IRT (UIRT) equating principles. These procedures are referred to as the "Unidimensional Approximation of MIRT True Score Equating Procedure" and the "Unidimensional Approximation of MIRT Observed Score Equating Procedure," respectively. Three exams within the Iowa Test of Educational Development (ITED) Form A and Form B batteries were used to conduct UIRT observed score and true score equating, MIRT observed score and true score equating, and equipercentile equating. The equipercentile equating procedure was conducted for the purpose of comparison since this procedure does not explicitly violate the IRT assumption of unidimensionality. Results indicated that the MIRT equating procedures performed more similarly to the equipercentile equating procedure than the UIRT equating procedures, presumably due to the violation of the unidimensionality assumption under the UIRT equating procedures. Future studies are expected to address how the MIRT procedures perform under varying levels of multidimensionality (weak, moderate, strong), varying frameworks of dimensionality (simple structure vs. complex structure), and number of dimensions, among other conditions.
48

Bringing Situational Judgement Tests to the 21st Century: Scoring of Situational Judgement Tests Using Item Response Theory

Ron, Tom Haim 19 November 2019 (has links)
No description available.
49

RESPONSE INSTRUCTIONS AND FAKING ON SITUATIONAL JUDGMENT TESTS

Broadfoot, Alison A. 20 October 2006 (has links)
No description available.
50

Validating hierarchical sequences in the design copying domain using latent trait models.

Burch, Melissa Price. January 1988 (has links)
The present study was a systematic investigation of hierarchical skill sequences in the design copying domain. The factors associated with possible variations in task difficulty were delineated. Five hierarchies were developed to reflect variations in rule usage, the structuring of responses, presence of angles, spatial orientations, and stimulus complexity. Three-hundred thirty four subjects aged five through ten years were administered a 25 item design copying test. The data were analyzed using probabilistic models. Latent trait models were developed to test the hypothesized skill sequences. Each latent trait model was statistically compared to alternate models to arrive at a preferred model that would adequately represent the data. Results suggested that items with predictable difficulty levels can be developed in this domain based on an analysis of stimulus dimensions and the use of rules for task completion. The inclusion of visual cues to guide design copying assists accurate task completion. Implications of the current findings for facilitating the construction of tests which accurately provide information about children's skill levels were discussed. The presence of hierarchical skill sequences in a variety of ability domains was supported.

Page generated in 0.0831 seconds