321 |
Parameter Recovery for the Four-Parameter Unidimensional Binary IRT Model: A Comparison of Marginal Maximum Likelihood and Markov Chain Monte Carlo ApproachesDo, Hoan 26 May 2021 (has links)
No description available.
|
322 |
Počítačové adaptivní testování v kinantropologii: Monte Carlo simulace s využitím physical self description questionnaire / Computerized Adaptive Testing In Kinanthropology: Monte Carlo Simulations Using The Physical Self Description QuestionnaireKomarc, Martin January 2017 (has links)
This thesis aims to introduce the use of computerized adaptive testing (CAT) - a novel and ever increasingly used method of a test administration - applied to the field of Kinanthropology. By adapting a test to an individual respondent's latent trait level, computerized adaptive testing offers numerous theoretical and methodological improvements that can significantly advance testing procedures. In the first part of the thesis, the theoretical and conceptual basis of CAT, as well as a brief overview of its historical origins and basic general principles are presented. The discussion necessarily includes the description of Item Response Theory (IRT) to some extent, since IRT is almost exclusively used as the mathematical model in today's CAT applications. Practical application of CAT is then evaluated using Monte-Carlo simulations involving adaptive administration of the Physical Self-Description Questionnaire (PSDQ) (Marsh, Richards, Johnson, Roche, & Tremayne, 1994) - an instrument widely used to assess physical self-concept in the field of sport and exercise psychology. The Monte Carlo simulation of the PSDQ adaptive administration utilized a real item pool (N = 70) calibrated with a Graded Response Model (GRM, see Samejima, 1969, 1997). The responses to test items were generated based on item...
|
323 |
From OLS to Multilevel Multidimensional Mixture IRT: A Model Refinement Approach to Investigating Patterns of Relationships in PISA 2012 DataGurkan, Gulsah January 2021 (has links)
Thesis advisor: Henry I. Braun / Secondary analyses of international large-scale assessments (ILSA) commonly characterize relationships between variables of interest using correlations. However, the accuracy of correlation estimates is impaired by artefacts such as measurement error and clustering. Despite advancements in methodology, conventional correlation estimates or statistical models not addressing this problem are still commonly used when analyzing ILSA data. This dissertation examines the impact of both the clustered nature of the data and heterogeneous measurement error on the correlations reported between background data and proficiency scales across countries participating in ILSA. In this regard, the operating characteristics of competing modeling techniques are explored by means of applications to data from PISA 2012. Specifically, the estimates of correlations between math self-efficacy and math achievement across countries are the principal focus of this study. Sequentially employing four different statistical techniques, a step-wise model refinement approach is used. After each step, the changes in the within-country correlation estimates are examined in relation to (i) the heterogeneity of distributions, (ii) the amount of measurement error, (iii) the degree of clustering, and (iv) country-level math performance. The results show that correlation estimates gathered from two-dimensional IRT models are more similar across countries in comparison to conventional and multilevel linear modeling estimates. The strength of the relationship between math proficiency and math self-efficacy is moderated by country mean math proficiency and this was found to be consistent across all four models even when measurement error and clustering were taken into account. Multilevel multidimensional mixture IRT modeling results support the hypothesis that low-performing groups within countries have a lower correlation between math self-efficacy and math proficiency. A weaker association between math self-efficacy and math proficiency in lower achieving groups is consistently seen across countries. A multilevel mixture IRT modeling approach sheds light on how this pattern emerges from greater randomness in the responses of lower performing groups. The findings from this study demonstrate that advanced modeling techniques not only are more appropriate given the characteristics of the data, but also provide greater insight about the patterns of relationships across countries. / Thesis (PhD) — Boston College, 2021. / Submitted to: Boston College. Lynch School of Education. / Discipline: Educational Research, Measurement and Evaluation.
|
324 |
Identifying Unbiased Items for Screening Preschoolers for Disruptive Behavior ProblemsStudts, Christina R., Polaha, Jodi, van Zyl, Michiel A. 25 October 2016 (has links)
Objective: Efficient identification and referral to behavioral services are crucial in addressing early-onset disruptive behavior problems. Existing screening instruments for preschoolers are not ideal for pediatric primary care settings serving diverse populations. Eighteen candidate items for a new brief screening instrument were examined to identify those exhibiting measurement bias (i.e., differential item functioning, DIF) by child characteristics. Method: Parents/guardians of preschool-aged children (N = 900) from four primary care settings completed two full-length behavioral rating scales. Items measuring disruptive behavior problems were tested for DIF by child race, sex, and socioeconomic status using two approaches: item response theory-based likelihood ratio tests and ordinal logistic regression. Results: Of 18 items, eight were identified with statistically significant DIF by at least one method. Conclusions: The bias observed in 8 of 18 items made them undesirable for screening diverse populations of children. These items were excluded from the new brief screening tool.
|
325 |
Investigating How Equating Guidelines for Screening and Selecting Common Items Apply When Creating Vertically Scaled Elementary Mathematics TestsHardy, Maria Assunta 09 December 2011 (has links) (PDF)
Guidelines to screen and select common items for vertical scaling have been adopted from equating. Differences between vertical scaling and equating suggest that these guidelines may not apply to vertical scaling in the same way that they apply to equating. For example, in equating the examinee groups are assumed to be randomly equivalent, but in vertical scaling the examinee groups are assumed to possess different levels of proficiency. Equating studies that examined the characteristics of the common-item set stress the importance of careful item selection, particularly when groups differ in ability level. Since in vertical scaling cross-level ability differences are expected, the common items' psychometric characteristics become even more important in order to obtain a correct interpretation of students' academic growth. This dissertation applied two screening criteria and two selection approaches to investigate how changes in the composition of the linking sets impacted the nature of students' growth when creating vertical scales for two elementary mathematics tests. The purpose was to observe how well these equating guidelines were applied in the context of vertical scaling. Two separate datasets were analyzed to observe the impact of manipulating the common items' content area and targeted curricular grade level. The same Rasch scaling method was applied for all variations of the linking set. Both the robust z procedure and a variant of the 0.3-logit difference procedure were used to screen unstable common items from the linking sets. (In vertical scaling, a directional item-difficulty difference must be computed for the 0.3-logit difference procedure.) Different combinations of stable common items were selected to make up the linking sets. The mean/mean method was used to compute the equating constant and linearly transform the students' test scores onto the base scale. A total of 36 vertical scales were created. The results indicated that, although the robust z procedure was a more conservative approach to flagging unstable items, the robust z and the 0.3-logit difference procedure produced similar interpretations of students' growth. The results also suggested that the choice of grade-level-targeted common items affected the estimates of students' grade-to-grade growth, whereas the results regarding the choice of content-area-specific common items were inconsistent. The findings from the Geometry and Measurement dataset indicated that the choice of content-area-specific common items had an impact on the interpretation of students' growth, while the findings from the Algebra and Data Analysis/Probability dataset indicated that the choice of content-area-specific common items did not appear to significantly affect students' growth. A discussion of the limitations of the study and possible future research is presented.
|
326 |
The Development and Validation of a Spanish Elicited imitation Test of Oral Language Proficiency for the Missionary Training CenterThompson, Carrie A. 05 June 2013 (has links) (PDF)
The Missionary Training Center (MTC), affiliated with the Church of Jesus Christ of Latter-day Saints, needs a reliable and cost effective way to measure the oral language proficiency of missionaries learning Spanish. The MTC needed to measure incoming missionaries' Spanish language proficiency for training and classroom assignment as well as to provide exit measures of institutional progress. Oral proficiency interviews and semi-direct assessments require highly trained raters, which is costly and time-consuming. The Elicited Imitation (EI) test is a computerized, automated test that measures oral language proficiency by having the participant hear and repeat utterances of varying syllable length in the target language. It is economical, simple to administer, and rate. This dissertation outlined the process of creating and scoring an EI test for the MTC. Item Response Theory (IRT) was used to analyze a large bank of EI items. The best performing 43 items comprise the final version MTC Spanish EI test. Questions about what linguistic features (syllable length, grammatical difficulty) contribute to item difficulty were addressed. Regression analysis showed that syllable length predicted item difficulty, whereas grammar difficulty did not.
|
327 |
Lasso Regularization for DIF Detection in Graded Response ModelsAvila Alejo, Denisse 05 1900 (has links)
Previous research has tested the lasso method for DIF detection in dichotomous items, but limited research is available on this technique for polytomous items. This simulation study compares the lasso method to hybrid ordinal logistic regression to test performance in terms of TP and FP rates when considering sample size, test length, number of response categories, group balance, DIF proportion, and DIF magnitude. Results showed better Type I error control with the lasso, with smaller sample sizes, unbalanced groups, and weak DIF. The lasso also exhibited more stable Type I error control when DIF was weak, and groups were unbalanced. Lastly, low DIF proportion contributed to better Type I error control and higher TP rates with both methods.
|
328 |
Workplace Social Courage in the United States and India: A Measurement Invariance StudySturgis, Grayson D. January 2022 (has links)
No description available.
|
329 |
Respondent and Test Delivery Characteristics that Induce Item UnfoldingLake, Christopher J. 13 October 2010 (has links)
No description available.
|
330 |
Bayesian Model Checking Methods for Dichotomous Item Response Theory and Testlet ModelsCombs, Adam 02 April 2014 (has links)
No description available.
|
Page generated in 0.0749 seconds