1 |
An empirical comparison of item response theory and classical test theory item/person statisticsCourville, Troy Gerard 15 November 2004 (has links)
In the theory of measurement, there are two competing measurement frameworks, classical test theory and item response theory. The present study empirically examined, using large scale norm-referenced data, how the item and person statistics behaved under the two competing measurement frameworks. The study focused on two central themes: (1) How comparable are the item and person statistics derived from the item response and classical test framework? (2) How invariant are the item statistics from each measurement framework across examinee samples? The findings indicate that, in a variety of conditions, the two measurement frameworks produce similar item and person statistics. Furthermore, although proponents of item response theory have centered their arguments for its use on the property of invariance, classical test theory statistics, for this sample, are just as invariant.
|
2 |
An empirical comparison of item response theory and classical test theory item/person statisticsCourville, Troy Gerard 15 November 2004 (has links)
In the theory of measurement, there are two competing measurement frameworks, classical test theory and item response theory. The present study empirically examined, using large scale norm-referenced data, how the item and person statistics behaved under the two competing measurement frameworks. The study focused on two central themes: (1) How comparable are the item and person statistics derived from the item response and classical test framework? (2) How invariant are the item statistics from each measurement framework across examinee samples? The findings indicate that, in a variety of conditions, the two measurement frameworks produce similar item and person statistics. Furthermore, although proponents of item response theory have centered their arguments for its use on the property of invariance, classical test theory statistics, for this sample, are just as invariant.
|
3 |
REVIEW AND EVALUATION OF RELIABILITY GENERALIZATION RESEARCHHenchy, Alexandra Marie 01 January 2013 (has links)
Reliability Generalization (RG) is a meta-analytic method that examines the sources of measurement error variance for scores for multiple studies that use a certain instrument or group of instruments that measure the same construct (Vacha-Haase, Henson, & Caruso, 2002). Researchers have been conducting RG studies for over 10 years since it was first discussed by Vacha-Haase (1998). Henson and Thompson (2002) noted that, as RG is not a monolithic technique; researchers can conduct RG studies in a variety of ways and include diverse variables in their analyses. Differing recommendations exist in regards to how researchers should retrieve, code, and analyze information when conducting RG studies and these differences can affect the conclusions drawn from meta-analytic studies (Schmidt, Oh, & Hayes, 2009) like RG. The present study is the first comprehensive review of both current RG practices and RG recommendations. Based upon the prior research findings of other meta-analytic review papers (e.g., Dieckmann, Malle, & Bodner 2009), the overarching hypothesis was that there would be differences between current RG practices and best practice recommendations made for RG studies.
Data consisted of 64 applied RG studies and recommendation papers, book chapters, and unpublished papers/conference papers. The characteristics that were examined included how RG researchers: (a) collected studies, (b) organized studies, (c) coded studies, (d) analyzed their data, and (e) reported their results.
The results showed that although applied RG researchers followed some of the recommendations (e.g., RG researchers examined sample characteristics that influenced reliability estimates), there were some recommendations that RG researchers did not follow (e.g., the majority of researchers did not conduct an a priori power analysis). The results can draw RG researchers’ attentions to areas where there is a disconnect between practice and recommendations as well as provide a benchmark for assessing future improvement in RG implementation.
|
4 |
Breaking Free from the Limitations of Classical Test Theory: Developing and Measuring Information Systems Scales Using Item Response TheoryRusch, Thomas, Lowry, Paul Benjamin, Mair, Patrick, Treiblmaier, Horst 03 1900 (has links) (PDF)
Information systems (IS) research frequently uses survey data to measure the interplay between technological systems and human beings. Researchers have developed sophisticated procedures to build and validate multi-item scales that measure latent constructs. The vast majority of IS studies uses classical test theory (CTT), but this approach suffers from three major theoretical shortcomings: (1) it assumes a linear relationship between the latent variable and observed scores, which rarely represents the empirical reality of behavioral constructs; (2) the true score can either not be estimated directly or only by making assumptions that are difficult to be met; and (3) parameters such as reliability, discrimination, location, or factor loadings depend on the sample being used. To address these issues, we present item response theory (IRT) as a collection of viable alternatives for measuring continuous latent variables by means of categorical indicators (i.e., measurement variables). IRT offers several advantages: (1) it assumes nonlinear relationships; (2) it allows more appropriate estimation of the true score; (3) it can estimate item parameters independently of the sample being used; (4) it allows the researcher to select items that are in accordance with a desired model; and (5) it applies and generalizes concepts such as reliability and internal consistency, and thus allows researchers to derive more information about the measurement process. We use a CTT approach as well as Rasch models (a special class of IRT models) to demonstrate how a scale for measuring hedonic aspects of websites is developed under both approaches. The results illustrate how IRT can be successfully applied in IS research and provide better scale results than CTT. We conclude by explaining the most appropriate circumstances for applying IRT, as well as the limitations of IRT.
|
5 |
Health Knowledge & Health Behavior Outcomes in Adolescents with Elevated Blood PressureFitzpatrick, Stephanie L 24 May 2011 (has links)
The purpose of this current study was to examine the influence of cardiovascular health knowledge on dietary and physical activity changes in 15-17 year olds with elevated blood pressure. The sample consisted of 167 adolescents randomized into one of three treatment conditions (minimal, moderate, or intense). Each adolescent completed a fitness test (peak VO2), 24-hour dietary recall, 7 Day Activity Recall (kilocalories expended per day), Self-efficacy Questionnaire, and Stages of Change Questionnaire every three months. The Health Knowledge Assessment was given at baseline and at post-intervention. Classical test theory, confirmatory factor analysis, and item response theory frameworks were applied to examine psychometric properties of the Health Knowledge Assessment. Structural equation modeling was used to examine the change in health behaviors and the relationship with health knowledge, self-efficacy, and readiness for change. The 34-item Health Knowledge Assessment had good internal consistency and the items loaded onto a single factor at pretest and posttest. Furthermore, there was a good distribution of easy, moderate, and hard items at pretest, but additional hard items were needed at posttest. There were no treatment condition differences in level of health knowledge at pretest. The intense condition had significantly higher health knowledge than the minimal and moderate conditions at posttest; level of health knowledge for the moderate condition was significantly higher than the minimal condition at posttest. Level of nutrition knowledge at posttest was not associated with any of the dietary intake variables nor was level of exercise knowledge associated with the two physical activity variables at post-intervention. However, there was a marginally significant association between level of nutrition knowledge and nutrition self-efficacy at posttest. Nutrition self-efficacy and nutrition readiness for change at posttest were also associated with a decrease in sugar consumption at post-intervention. Implications of this study suggest that a cardiovascular health intervention for adolescents with elevated blood pressure, consisting of group sessions and/or individual sessions over the course of three to six months, was effective in terms of increasing cardiovascular health knowledge, self-efficacy, and readiness for change. Nonetheless, the role that health knowledge plays in health behavior change needs to be further examined.
|
6 |
Partial Credit Models for Scale Construction in Hedonic Information SystemsMair, Patrick, Treiblmaier, Horst January 2008 (has links) (PDF)
Information Systems (IS) research frequently uses survey data to measure the interplay between technological systems and human beings. Researchers have developed sophisticated procedures to build and validate multi-item scales that measure real world phenomena (latent constructs). Most studies use the so-called classical test theory (CTT), which suffers from several shortcomings. We first compare CTT to Item Response Theory (IRT) and subsequently apply a Rasch model approach to measure hedonic aspects of websites. The results not only show which attributes are best suited for scaling hedonic information systems, but also introduce IRT as a viable substitute that overcomes severall shortcomings of CTT. (author´s abstract) / Series: Research Report Series / Department of Statistics and Mathematics
|
7 |
Adapting and Validating a Parent-Completed Assessment: A Cross-Cultural Study of the Ages & Stages Questionnaires: INVENTORY in China and the United StatesXie, Huichao 21 November 2016 (has links)
The Chinese government has announced the 2013 Guidelines for developing a national system for early detection of disability among children under 6 years of age. However, given limited resources, challenges exist with developmental measures required in the 2013 Guidelines. In order to meet the needs for a more accurate and cost-efficient measure for developmental assessment, the Ages & Stages Questionnaires:INVENTORY was translated into Simplified Chinese, and validated on a regional sample of 812 Chinese children ages from 1-25 months. Psychometric properties were examined; data from previous studies on the ASQ:INVENTORY in the U.S. were compared to identify differences between the two countries. Results indicated that the Chinese ASQ:INVENTORY was an instrument with sufficient internal consistency, reliability and validity. It was well accepted by parents and professionals in China. Findings suggested that the Chinese ASQ:INVENTORY provides a promising alternative measure for screening and diagnosing developmental delays in young children in China. Implications for future research and implementation are discussed.
|
8 |
Psychometric Methods to Develop and to Analyze Clinical Measures: A Comparison and Contrast of Rasch Analysis and Classical Test Theory Analysis of the PedsQL 4.0 Generic Core Scales (Parent-report) in a Childhood Cancer SampleAmin, Leila 10 1900 (has links)
<p>Traditionally, measures have been developed using Classical Test Theory (CTT). Modern psychometric methods (e.g. Rasch analysis) are being applied to increase understanding of item-level statistics and to aid in interpreting rating scale scores. This thesis aims to compare and contrast psychometric findings for the PedsQL<sup>TM</sup> 4.0 Generic Core Scales using CTT and Rasch analysis to determine if a Rasch approach provides information that furthers our understanding of scale scores. The assumptions, advantages and limitations of each psychometric paradigm are presented.</p> <p>Issues that arise when measuring quality of life are discussed to set the stage for a psychometric analysis of the PedsQL<sup>TM</sup> in a childhood cancer sample. The PedsQL<sup>TM</sup> measures child health in terms of physical, social, emotional and school function. The parent-report version was used in a Canadian study of 385 parents of children aged 2 to 17 years on active cancer treatment and data was re-analyzed for this thesis. CTT analysis was performed using PASW Statistics and Rasch analysis was performed using Rumm2030.</p> <p>Internal consistency reliability was higher using CTT (a = 0.93) than Rasch analysis (Person Separation Index = 0.78). Rasch analysis item curves showed respondents did not discriminate between response categories and a 3 point scale (vs. 5) was preferred. Item curves also indicated most items were free of bias. There are no equivalent visual representations in CTT of how respondents use response categories or of whether items display bias. Both approaches indicate a large ceiling effect associated with the overall score.</p> <p>Results challenge internal consistency reliability of the PedsQL<sup>TM</sup> 4.0. Rasch analysis permits detailed and visually pleasing examination of item-level statistics more effectively than CTT. Research is needed to determine which testing circumstances render Rasch analysis useful and justify time and resources to use both paradigms as complementary tools to maximize understanding of rating scale scores.</p> / Master of Science Rehabilitation Science (MSc)
|
9 |
Evaluating IRT- and CTT-based Methods of Estimating Classification Consistency and Accuracy Indices from Single AdministrationsDeng, Nina 01 September 2011 (has links)
Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the " true" DC/DA indices in various conditions, and (3) to assess the impact of choice of reliability estimate on the LL method.
Four simulation studies were conducted. Study 1 looked at various test lengths. Study 2 focused on local item dependency (LID). Study 3 checked the consequences of IRT model data misfit and Study 4 checked the impact of using different scoring metrics. Finally, a real data study was conducted where no advantages were given to any models or assumptions.
The results showed that the factors of LID and model misfit had a negative impact on " true" DA index, and made all selected methods over-estimate DA index. On the contrary, the DC estimates had minimal impacts from the above factors, although the LL method had poorer estimates in short tests and the LEE and HH methods were less robust to tests with a high level of LID.
Comparing the selected methods, the LEE and HH methods had nearly identical results across all conditions, while the HH method had more flexibility in complex scoring metrics. The LL method was found sensitive to the choice of test reliability estimate. The LL method with Cronbach's alpha consistently underestimated DC estimates while LL with stratified alpha functioned noticeably better with smaller bias and more robustness in various conditions.
Lastly it is hoped to make the software be available soon to permit the wider use of the HH method. The other methods in the study are already well supported by easy to use software
|
10 |
題組測驗效果之統計分析 / A Statistical Analysis of Testlets施焱騰 Unknown Date (has links)
本文在古典測驗的概念下,賦予題組適當機率模式,探討難度指標與鑑別度指標的計算公式;且以九十六年第二次國中基測英語科試題為驗證實例,並與傳統模式之計算結果相互比較。 / Modeling a testlet with a probability structure, we investigate the computational
formulas of the difficulty index and the discrimination index.
Data taken from the English test items of the second basic competence test for junior high school students in 2007 are
used for empirical verification and the result is compared with that obtained by the traditional method.
|
Page generated in 0.1024 seconds