• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 152
  • 87
  • 9
  • 7
  • 5
  • 4
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 335
  • 335
  • 335
  • 78
  • 78
  • 76
  • 74
  • 66
  • 59
  • 58
  • 56
  • 48
  • 43
  • 43
  • 40
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Applying Longitudinal IRT Models to Small Samples for Scale Evaluation

Keum, EunHee 09 August 2016 (has links)
No description available.
22

How to Score Situational Judgment Tests: A Theoretical Approach and Empirical Test

Whelpley, Christopher E. 01 January 2014 (has links)
The purpose of this dissertation is to examine how the method used to a score situational judgment test (SJT) affects the validity of the SJT both in the presence of other predictors and as a single predictor of task performance. To this end, I compared the summed score approach of scoring SJTs with item response theory and multivariate items response theory. Using two samples and three sets of analyses, I found that the method used to score SJTs influences the validity of the test and that IRT and MIRT show promise for increasing SJT validity. However, no individual scoring method produced the highest amount of validity across all sets of analyses. In line with previous research, SJTs added incremental validity in the presence of GMA and personality and, again, the method used to score the SJT affected the incremental validity. A relative weights analysis was performed for each scoring method across all the sets of analyses showing that, depending on the scoring method, SJT score may account for more criterion variance than either GMA or personality. However, it is likely that the samples were influenced by range restriction present in the incumbent samples.
23

Multidimensional item response theory observed score equating methods for mixed-format tests

Peterson, Jaime Leigh 01 July 2014 (has links)
The purpose of this study was to build upon the existing MIRT equating literature by introducing a full multidimensional item response theory (MIRT) observed score equating method for mixed-format exams because no such methods currently exist. At this time, the MIRT equating literature is limited to full MIRT observed score equating methods for multiple-choice only exams and Bifactor observed score equating methods for mixed-format exams. Given the high frequency with which mixed-format exams are used and the accumulating evidence that some tests are not purely unidimensional, it was important to present a full MIRT equating method for mixed-format tests. The performance of the full MIRT observed score method was compared with the traditional equipercentile method, and unidimensional IRT (UIRT) observed score method, and Bifactor observed score method. With the Bifactor methods, group-specific factors were defined according to item format or content subdomain. With the full MIRT methods, two- and four-dimensional models were included and correlations between latent abilities were freely estimated or set to zero. All equating procedures were carried out using three end-of-course exams: Chemistry, Spanish Language, and English Language and Composition. For all subjects, two separate datasets were created using pseudo-groups in order to have two separate equating criteria. The specific equating criteria that served as baselines for comparisons with all other methods were the theoretical Identity and the traditional equipercentile procedures. Several important conclusions were made. In general, the multidimensional methods were found to perform better for datasets that evidenced more multidimensionality, whereas unidimensional methods worked better for unidimensional datasets. In addition, the scale on which scores are reported influenced the comparative conclusions made among the studied methods. For performance classifications, which are most important to examinees, there typically were not large discrepancies among the UIRT, Bifactor, and full MIRT methods. However, this study was limited by its sole reliance on real data which was not very multidimensional and for which the true equating relationship was not known. Therefore, plans for improvements, including the addition of a simulation study to introduce a variety of dimensional data structures, are also discussed.
24

Observed score and true score equating procedures for multidimensional item response theory

Brossman, Bradley Grant 01 May 2010 (has links)
The purpose of this research was to develop observed score and true score equating procedures to be used in conjunction with the Multidimensional Item Response Theory (MIRT) framework. Currently, MIRT scale linking procedures exist to place item parameter estimates and ability estimates on the same scale after separate calibrations are conducted. These procedures account for indeterminacies in (1) translation, (2) dilation, (3) rotation, and (4) correlation. However, no procedures currently exist to equate number correct scores after parameter estimates are placed on the same scale. This research sought to fill this void in the current psychometric literature. Three equating procedures--two observed score procedures and one true score procedure--were created and described in detail. One observed score procedure was presented as a direct extension of unidimensional IRT observed score equating, and is referred to as the "Full MIRT Observed Score Equating Procedure." The true score procedure and the second observed score procedure incorporated the statistical definition of the "direction of best measurement" in an attempt to equate exams using unidimensional IRT (UIRT) equating principles. These procedures are referred to as the "Unidimensional Approximation of MIRT True Score Equating Procedure" and the "Unidimensional Approximation of MIRT Observed Score Equating Procedure," respectively. Three exams within the Iowa Test of Educational Development (ITED) Form A and Form B batteries were used to conduct UIRT observed score and true score equating, MIRT observed score and true score equating, and equipercentile equating. The equipercentile equating procedure was conducted for the purpose of comparison since this procedure does not explicitly violate the IRT assumption of unidimensionality. Results indicated that the MIRT equating procedures performed more similarly to the equipercentile equating procedure than the UIRT equating procedures, presumably due to the violation of the unidimensionality assumption under the UIRT equating procedures. Future studies are expected to address how the MIRT procedures perform under varying levels of multidimensionality (weak, moderate, strong), varying frameworks of dimensionality (simple structure vs. complex structure), and number of dimensions, among other conditions.
25

Bringing Situational Judgement Tests to the 21st Century: Scoring of Situational Judgement Tests Using Item Response Theory

Ron, Tom Haim 19 November 2019 (has links)
No description available.
26

RESPONSE INSTRUCTIONS AND FAKING ON SITUATIONAL JUDGMENT TESTS

Broadfoot, Alison A. 20 October 2006 (has links)
No description available.
27

Validating hierarchical sequences in the design copying domain using latent trait models.

Burch, Melissa Price. January 1988 (has links)
The present study was a systematic investigation of hierarchical skill sequences in the design copying domain. The factors associated with possible variations in task difficulty were delineated. Five hierarchies were developed to reflect variations in rule usage, the structuring of responses, presence of angles, spatial orientations, and stimulus complexity. Three-hundred thirty four subjects aged five through ten years were administered a 25 item design copying test. The data were analyzed using probabilistic models. Latent trait models were developed to test the hypothesized skill sequences. Each latent trait model was statistically compared to alternate models to arrive at a preferred model that would adequately represent the data. Results suggested that items with predictable difficulty levels can be developed in this domain based on an analysis of stimulus dimensions and the use of rules for task completion. The inclusion of visual cues to guide design copying assists accurate task completion. Implications of the current findings for facilitating the construction of tests which accurately provide information about children's skill levels were discussed. The presence of hierarchical skill sequences in a variety of ability domains was supported.
28

En Raschanalys för att jämföra två svenska översättningar av en enkät som mäter hälsorelaterad livskvalitet

Kielén, Martina, Wallentinsson, Emma January 2016 (has links)
During the 1980’s the non-profit organisation RAND Corporation conducted the two-year Medical Outcomes Study with the goal of creating a comprehensive medical questionnaire. The resulting 116-item questionnaire measures health related quality of life (HRQoL) topics such as physical, mental and general health. The questionnaire is available as a free resource on their web page. SF-36, which contains 36 of these questions, is distributed for a fee by the US company Quality Metric Inc. The company has translated the questionnaire into several languages, including Swedish, and has also taken license for the translations. Registercentrum sydost has made a new Swedish translation of the same questions as in the SF-36. This survey is called RAND-36 and is license free. Because Quality Metric Inc has taken license for its Swedish translation, the surveys are similar but not identical. This study aims to compare the aforementioned HRQoL-instruments to determine whether it is possible to replace the licensed questionnaire SF-36 with the license free RAND-36. The distribution of items with response options according ordinal scale were compared with Mann-Whitney U-test. The test yielded a significant difference for eight items in the measure PF(physical functioning), MH(mental health), VT (vitality) and GH (general health perceptions). The distribution of items with response options according dichotomous scale were compared with X2-test. The test yielded significant difference for an item in the measure RE (emotional role functioning). The reliability of questionnaire was compared with ordinal alpha. In the selection the reliability between MH and VT is equivalent. The biggest difference between the surveys is the measure RP (physical role functioning) where the RAND-36 meets the requirement that the measure can be used for reliable conclusions on the individual level, which is a condition that SF-36 can’t met. The probability of entering an answer, given the respondent's ability, was compared with Rasch analysis. Wald's test gave DIF between most items within the measures PF, MH, VT and GH.
29

The Teacher Attitudes toward Homeless Students Scale: Development and Validation

Brown, Jessica January 2012 (has links)
Thesis advisor: Larry H. Ludlow / Recent estimates suggest there are roughly 1.6 million homeless children and this number is growing (National Center on Family Homelessness, 2011). This trend is particularly worrisome given that homeless children face a number of obstacles within society and education, not the least of which is negative teacher attitudes (Swick, 2000; U.S. Department of Education, 2002). This study's primary research question addressed whether a set of underlying dimensions could be identified and used to effectively measure teacher attitudes toward homeless students. A necessary part of answering this research question involved the development of a measurement scale. Both Classical Test Theory and Item Response Theory analyses aided in the elimination process of items in order to create the final Teacher Attitudes toward Homeless Students (TAHS) assessment, which includes an attitudes scale and subscales, and a related knowledge scale. The final outcome was a set of 43 items, across eight dimensions, which could effectively be used to measure teacher attitudes toward homeless students. Additionally, the findings upheld the principles of Rasch measurement, including unidimensionality, a hierarchical ordering of items, and a continuum of the construct definition. In other words, the findings indicate that the TAHS scale was successfully developed according to explicit a priori measurement criteria. Moreover, additional correlational and regression analyses provided empirical construct and convergent validity evidence for the TAHS scale. It was also found that attitudes differed slightly for teachers of various backgrounds and experiences, but when analyzed collectively these variables were not significantly related to teacher attitudes toward homeless students. Additionally, there was only a weak relationship between teachers' attitudes and their knowledge about homelessness. Overall the TAHS scale allows for reliable and accurate measurement of teacher attitudes toward homeless students from which valid inferences can be made. The TAHS scale scores and score descriptors can be used to help teacher interpret their attitude. This has the potential for a direct impact in creating equal educational opportunities for homeless students as teachers become aware of their attitude and make positive changes. / Thesis (PhD) — Boston College, 2012. / Submitted to: Boston College. Lynch School of Education. / Discipline: Educational Research, Measurement, and Evaluation.
30

Stability and sensitivity of a model-based person-fit index in detecting item pre-knowledge in computerized adaptive test. / 特定模型個人擬合指數在探測預見題目時的穩定性及靈敏度 / CUHK electronic theses & dissertations collection / Te ding mo xing ge ren ni he zhi shu zai tan ce yu jian ti mu shi de wen ding xing ji ling min du

January 2008 (has links)
After the stability and sensitivity of FLOR were investigated, the application of it in the CAT environment had become the main concern. The present studies found that both the test length and the number of exposed items affect the final value of FLOR. In the fixed length CAT, the FLOR has a much stronger sensitivity than lz and CUSUM in detecting item pre-knowledge. The sensitivity of FLOR in the fixed length CAT was the same as that in the fixed length fixed items test. If the test length could vary, the sensitivity of FLOR in CAT would be slightly weakened. The Adjusted FLOR index could increase the sensitivity. Concerning about the effect of ability on the sensitivity of FLOR in CAT, it was found that the abilities of the test takers in CAT did not affect the sensitivity of FLOR and Adjusted FLOR. / Item response theory is a modern test theory. It focuses on the performance of each item. Under this framework, the performance of test takers on a test item can be predicted by a set of abilities. The relationship between the test takers' item performances and the set of abilities underlying item performances can be described by a monotonically increasing function called an item characteristic curve. Due to various personal reasons, the performances of the test takers may depart from the response patterns predicted by the underlying test model. In order to calculate the extent of departure of these aberrant response patterns, a number of methods have been developed under the theme "person-fit statistics". The degree of aberration is calculated as an index called person-fit index. Inside the computerized adaptive testing (CAT), test takers with different abilities will answer different numbers of questions and the difficulties of the items administered to them are usually clustered at the abilities of the test takers. Due to this reason, the application of person-fit indices in the computerized adaptive testing environment to measure misfit is difficult. / The present study also found that FLOR has a much superior sensitivity over other indices in detecting item pre-knowledge. Concerning about the sensitivity over different abilities of test takers, it was found that the sensitivity of FLOR was the highest among low ability test takers and the weakest among strong ability test takers in the fixed length and fixed items tests. However, the sensitivities of FLOR became the same among different abilities of test takers if items with difficulties matching their abilities were used in the tests. The number of beneficiaries among the test takers did not affect the sensitivity of FLOR. Moreover, in a simulation to test the differentiating power of FLOR, it was found that FLOR could differentiate item pre-knowledge from other reasons of personal misfits (test anxiety, player, random response and challenger) effectively. / The present study assessed the stability of FLOR over other variables, which were unrelated to item pre-knowledge. It found that FLOR was stable over the discrimination and difficulty parameters of test items. It was also stable over positions of the exposed items in the test and the initial assignment of prior probability of item pre-knowledge. However, the asymptotes (guessing factor) and the probabilities of item exposure did affect the final values of FLOR seriously. / The present study used the hf plot to access the sensitivity of the person-fit indices. hf plot is a plot of hit rate against false alarm rate. For a higher hit rate, usually a higher false alarm rate is followed. hf plot provides a good tools for comparison between indices by inspection of the speed of rise of the curves. A sensitive index should give a faster rise of the curve. In this study, sensitivity of an index was defined as the speed of rise of the hf plot, which is represented by a parameter hftau estimated from the data obtained from hf plot. / When the frequent accesses to the item bank has become feasible, test takers may memorize blocks of test items and share these items with future test takers. Individuals with prior knowledge of some items may use that information to get high scores, in the sense that their test scores have been artificially inflated. FLOR is an index of posterior log-odds ratio used for detecting the use of item pre-knowledge. It can be applied both in the fixed item, fixed length test and the CAT environment. It is a model-based index in which aberrant models are defined in the situation of item pre-knowledge. FLOR describes the likelihood that a response pattern arises from the aberrant models. / Hui Hing-fai. / Adviser: Kit-tai Hau. / Source: Dissertation Abstracts International, Volume: 70-09, Section: A, page: . / Thesis (Ed.D.)--Chinese University of Hong Kong, 2008. / Includes bibliographical references (leaves 108-111). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307.

Page generated in 0.0283 seconds