Global ETD Search

1	A comparison of equating/linking using the Stocking-Lord method and concurrent calibration with mixed-format tests in the non-equivalent groups common-item design under IRT Tian, Feng January 2011 (has links) Thesis advisor: Larry Ludlow / There has been a steady increase in the use of mixed-format tests, that is, tests consisting of both multiple-choice items and constructed-response items in both classroom and large-scale assessments. This calls for appropriate equating methods for such tests. As Item Response Theory (IRT) has rapidly become mainstream as the theoretical basis for measurement, different equating methods under IRT have also been developed. This study investigated the performances of two IRT equating methods using simulated data: linking following separate calibration (the Stocking-Lord method) and the concurrent calibration. The findings from this study show that the concurrent calibration method generally performs better in recovering the item parameters and more importantly, the concurrent calibration method produces more accurate estimated scores than linking following separate calibration. Limitations and directions for future research are discussed. / Thesis (PhD) — Boston College, 2011. / Submitted to: Boston College. Lynch School of Education. / Discipline: Educational Research, Measurement, and Evaluation. Equating IRT Mixed-format tests
2	Investigating the impact of a mixed-format item pool on optimal test designs for multistage testing Park, Ryoungsun 08 September 2015 (has links) The multistage testing (MST) has drawn increasing attention as a balanced format of adaptive testing that takes advantages of both fully-adaptive computerized adaptive testing (CAT) and paper-and-pencil (P\&P) tests. Most previous studies on MST have focused on purely dichotomous or polytomous item formats although the mixture of two item types (i.e., mixed-format) provides desirable psychometric properties by combining the strength of both item types. Given the dearth of studies investigating the characteristics of mixed-format MST, the current study conducted a simulation to identify important design factors impacting the measurement precision of mixed-format MST. The study considered several factors-namely, total points (40 and 60), MST structures (1-2-2 and 1-3-3), the proportion of polytomous items (10%, 30%, 50% and 70%), and the routing module design (purely dichotomous and a mixture of dichotomous and polytomous items) resulting in 32 total conditions. A total of 100 replications were performed, and 1,000 normally distributed examinees were generated in each replication. The performance of MST was evaluated in terms of the precision of ability estimation across the wide range of the scale. The study found that the longer test produced greater measurement precision while the 1-3-3 structure performed better than 1-2-2 structure. In addition, a larger proportion of polytomous items resulted in lower measurement precision through the reduced test information during the test construction. The interaction between the large proportion of polytomous items and the purely dichotomous routing module design was identified. Overall, the two factors of test length and the MST structure impacted the ability estimation, whereas the impact of the proportion of polytomous items and routing module design mirrored the item pool characteristic. / text Multistage testing Mixed-format test Adaptive testing
3	The impact of equating method and format representation of common items on the adequacy of mixed-format test equating using nonequivalent groups Hagge, Sarah Lynn 01 July 2010 (has links) Mixed-format tests containing both multiple-choice and constructed-response items are widely used on educational tests. Such tests combine the broad content coverage and efficient scoring of multiple-choice items with the assessment of higher-order thinking skills thought to be provided by constructed-response items. However, the combination of both item formats on a single test complicates the use of psychometric procedures. The purpose of this dissertation was to examine how characteristics of mixed-format tests and composition of the common-item set impact the accuracy of equating results in the common-item nonequivalent groups design. Operational examinee item responses for two classes of data were considered in this dissertation: (1) operational test forms and (2) pseudo-test forms that were assembled from portions of operational test forms. Analyses were conducted on three mixed-format tests from the Advanced Placement Examination program: English Language, Spanish Language, and Chemistry. For the operational test form analyses, two factors of investigation were considered as follows: (1) difference in proficiency between old and new form groups of examinees and (2) relative difficulty of multiple-choice and constructed-response items. For the pseudo-test form analyses, two additional factors of investigation were considered: (1) format representativeness of the common-item set and (2) statistical representativeness of the common-item set. For each study condition, two traditional equating methods, frequency estimation and chained equipercentile equating, and two item response theory (IRT) equating methods, IRT true score and IRT observed score methods, were considered. There were five main findings from the operational and pseudo-test form analyses. (1) As the difference in proficiency between old and new form groups of examinees increased, bias also tended to increase. (2) Relative to the criterion equating relationship for a given equating method, increases in bias were typically largest for frequency estimation and smallest for the IRT equating methods. However, it is important to note that the criterion equating relationship was different for each equating method. Additionally, only one smoothing value was analyzed for the traditional equating methods. (3) Standard errors of equating tended to be smallest for IRT observed score equating and largest for chained equipercentile equating. (4) Results for the operational and pseudo-test analyses were similar when the pseudo-tests were constructed to be similar to the operational test forms. (5) Results were mixed regarding which common-item set composition resulted in the least bias. Common items Equating Mixed-format tests Nonequivalent groups Educational Psychology
4	Impact of matched samples equating methods on equating accuracy and the adequacy of equating assumptions Powers, Sonya Jean 01 December 2010 (has links) This dissertation investigates the interaction of population invariance, equating assumptions, and equating accuracy with group differences. In addition, matched samples equating methods are considered as a possible way to improve equating accuracy with large group differences. Data from one administration of four mixed-format Advanced Placement (AP) Exams were used to create pseudo old and new forms sharing common items. Population invariance analyses were conducted based on levels of examinee parental education using a single group equating design. Old and new form groups with common item effect sizes (ESs) ranging from 0 to 0.75 were created by sampling examinees based on their level of parental education. Equating was conducted for four common item nonequivalent group design equating methods: frequency estimation, chained equipercentile, IRT true score, and IRT observed score. Additionally, groups with ESs greater than zero were matched using three different matching techniques including exact matching on parental education level and propensity score matching with several other background variables. The accuracy of equating results was evaluated by comparing each equating relationship with an ES greater than zero to the equating relationship where the ES equaled zero. Differences between comparison and criterion equating relationships were quantified using the root expected mean squared difference (REMSD) statistic, classification consistency, and standard errors of equating (SEs). The accuracy of equating results and the adequacy of equating assumptions was compared for unmatched and matched samples. As ES increased, equating results tended to become less accurate and less consistent across equating methods. However, there was relatively little population dependence of equating results, despite large subgroup performance differences. Large differences between criterion and comparison equating relationships appeared to be caused instead by violations of equating assumptions. As group differences increased, the degree to which frequency estimation and chained equipercentile assumptions held decreased. In addition, all four AP Exams showed some evidence of multidimensionality. Because old and new form groups were selected to differ in terms of their respective levels of parental education, the matching methods that included parental education appeared to improve equating accuracy and the degree to which equating assumptions held, at least for very large ESs. Advanced Placement Assumptions Equating Matched Samples Mixed-Format Population Invariance Educational Psychology
5	Evaluating equating properties for mixed-format tests He, Yi 01 May 2011 (has links) Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are used in many testing programs. The use of multiple formats presents a number of measurement challenges, one of which is how to adequately equate mixed-format tests under the common-item nonequivalent groups (CINEG) design, especially when, due to practical constraints, the common-item set contains only MC items. The purpose of this dissertation was to evaluate how equating properties were preserved for mixed-format tests under the CINEG design. Real data analyses were conducted on 22 equating linkages of 39 mixed-format tests from the Advanced Placement (AP) Examination program. Four equating methods were used: the frequency estimation (FE) method, the chained equipercentile (CE) method, item response theory (IRT) true score equating, and IRT observed score equating. In addition, cubic spline postsmoothing was used with the FE and CE methods. The factors of investigation were the correlation between MC and CR scores, the proportion of common items, the proportion of MC-item score points, and the similarity between alternate forms. Results were evaluated using three equating properties: first-order equity, second-order equity, and the same distributions property. The main findings from this dissertation were as follows: (1) Between the two IRT equating methods, true score equating better preserved first-order equity than observed score equating, and observed score equating better preserved second-order equity and the same distributions property than true score equating. (2) Between the two traditional methods, CE better preserved first-order equity than FE, but in terms of preserving second-order equity and the same distributions property, CE and FE produced similar results. (3) Smoothing helped to improve the preservation of second-order equity and the same distributions property. (4) A higher MC-CR correlation was associated with better preservation of first-order equity for both IRT methods. (5) A higher MC-CR correlation was associated with better preservation of second-order equity for IRT true score equating. (6) A higher MC-CR correlation was associated with better preservation of the same distributions property for IRT observed score equating. (7) The proportion of common items, the proportion of MC score points, and the similarity between forms were not found to be associated with the preservation of the equating properties. These results are interpreted in the context of research literature in this area and suggestions for future research are provided. equating first-order equity mixed-format tests same distributions property second-order equity Educational Psychology
6	Ability parameter recovery of a computerized adaptive test based on rasch testlet models Pak, Seohong 15 December 2017 (has links) The purpose of this study was to investigate the effects of various testlet characteristics in terms of an ability parameter recovery under the modality of computerized adaptive test (CAT). Given the popularity of using CATs and the high frequency of emerging testlets into exams as either mixed format or not, it was important to evaluate the various conditions in a testlet-based CAT fitted testlet response theory models. The manipulated factors of this study were testlet size, testlet effect size, testlet composition, and exam format. The performance of each condition was compared with the true thetas which were 81 equally spaced points from -3.0 to +3.0. For each condition, 1,000 times of replication process were conducted with respect to overall bias, overall standard error, overall RMSE, conditional bias, conditional standard error, conditional RMSE, as well as conditional passing rate. The conditional results were presented in the pre-specified intervals. Several significant conclusions were made. Overall, the mean theta estimates over 1,000 replications were close to the true thetas regardless of manipulated conditions. In terms of aggregated overall RMSE, predictable relationships were found in four study factors: A larger amount of error was associated with a longer testlet, a bigger effect size, a random composition, and a testlet only exam format. However, when the aggregated overall bias was considered, only two effects were observed: a large difference among three testlet length conditions, and almost no difference between two testlet composition conditions. As expected, conditional SEMs for all conditions showed a U-shape across the theta scale. The noticeable discrepancy occurred only within the testlet length condition: more error was associated with the condition of the longest testlet length compared to the short and medium length conditions. Conditional passing rate showed little discrepancy among conditions within each facto, so no particular association was found. In general, a short testlet length is better, a small testlet effect size is better, a homogeneous difficulty composition is better, and a mixed format is better in terms of the smaller amount of error found in this study. Other than these obvious findings, some interaction effects were also observed. When the medium or large (i.e., greater than .50) testlet effect was suspicious, it was better to have a short length testlet. It was also found that using a mixed-format exam increased the accuracy of the random difficulty composition. However, this study was limited by several other factors which were controlled to be the same across the conditions: a fixed length exam, no content balancing, and the uniform testlet effects. Consequently, plans for improvements in terms of generalization were also discussed. Computerized adaptive test Mixed-format exam Rasch model Testlet Testlet response theory Educational Psychology
7	[pt] DESENHANDO HISTÓRIAS EM MÍDIAS DIGITAIS: A EXPERIÊNCIA EM OBRAS INTERATIVAS ILUSTRADAS QUE MISTURAM ANIMAÇÃO, QUADRINHOS E VIDEOGAME / [en] DRAWING STORIES IN DIGITAL MEDIA: EXPERIENCE IN INTERACTIVE ILLUSTRATED WORKS THAT MIX ANIMATION, COMICS AND VIDEO GAMES VINICIUS JOSE SHINDO MITCHELL 28 May 2024 (has links) [pt] A pesquisa teve como objetivo geral compreender a experiência e a interatividade em objetos digitais ilustrados, que misturam animação, quadrinhos e videogame. A construção destes objetos digitais, para visualização em sites (homepages) ou aplicativos (apps) para smartphone, é baseada no desenho de ilustração, sendo este um tipo de trabalho no qual a narrativa se desenvolve por meio de imagens desenhadas junto ao texto (falado ou escrito), codificado em produtos para circulação cotidiana. No estudo, foram consideradas questões elaboradas a partir de observações realizadas no campo, tais como: Existe uma designação emergente para objetos que misturam animação, quadrinhos e videogame? Existe um produto codificado, com uma audiência e repositórios específicos? Quais são as dificuldades ou facilidades para o desenvolvimento desse tipo de trabalho, tanto no contexto global quanto no brasileiro? Que qualidades a ilustração pode trazer para o design em mídias digitais, considerando a diversidade estilística que inclui desenho feito à mão e digital? A pesquisa documental catalogou 118 objetos entre março de 2021 e fevereiro de 2024. Foram realizadas 30 entrevistas com 17 realizadores brasileiros e 13 estrangeiros, nativos de Alemanha, Austrália, Colômbia, Estados Unidos, Espanha, Holanda, Índia, Japão, México, Taiwan e Uganda, entre animadores, quadrinistas, designers e game artists, gerando dados qualitativos discutidos na tese. A pesquisa desenvolveu modelos de análise para compreender a experiência e a interatividade em objetos digitais ilustrados, buscando formas de discuti-los e discerni-los sem categorizá-los, a priori, como animação, quadrinhos ou videogame. Autores das três grandes áreas abordadas, como Marina Estela Graça (2006), animação; Thierry Groensteen (2015), quadrinhos; e Jesper Juul (2005), videogame, embasaram a discussão teórica. Os resultados indicam a inexistência de terminologia consolidada para designar essas obras que utilizam misturas, bem como a existência de obstáculos na produção e circulação dos objetos. No entanto, os achados sugerem que há um uso significativo da mistura de formatos e da interatividade em obras interativas ilustradas, contribuindo para a construção de sentido junto aos textos e ilustrações, estratégias que podem ser compreendidas e replicadas no ensino e na prática do design em mídias digitais. / [en] The research aimed to understand the experience and interactivity in illustrated digital objects, which mix animation, comics and video games. The construction of these digital objects, experienced on websites (homepages) or applications (apps) for smartphones, is based on illustration, a type of work that develops narratives through images drawn alongside text (spoken or written), encoded in products for everyday circulation. Research questions were formulated considering observations carried out in the field, such as: Is there an emerging designation for objects that mix animation, comics and video games? Is there a codified product, with a specific audience and repositories? What are the challenges or advantages in developing this type of work, both on a global scale and within the Brazilian context? What qualities can illustration bring to design in digital media, considering the stylistic diversity that includes hand-drawn alongside digital drawing? The research cataloged 118 objects between March 2021 and February 2024. 30 interviews were carried out with 17 Brazilian creators and 13 foreigners, natives of Germany, Australia, Colombia, the United States, Spain, the Netherlands, India, Japan, Mexico, Taiwan and Uganda, among animators, comic artists, designers and game artists, generating qualitative data discussed in the thesis. The research developed frameworks to understand the experience and interactivity in illustrated digital objects, seeking ways to discuss and discern them without categorizing them, a priori, as animation, comics or video games. Authors from the three main areas covered supported the theoretical discussion, such as Marina Estela Graça (2006), animation; Thierry Groensteen (2015), comics; and Jesper Juul (2005), video game. The results indicate the lack of consolidated terminology to designate these works that use mixtures, as well as the existence of obstacles in the production and circulation of objects. However, the findings suggest that there is significant use of the mixture of formats and interactivity in illustrated interactive works, in ways that contribute to making meaning alongside texts and illustrations, strategies that can be understood and replicated in the teaching and practice of design in digital media. [pt] ILUSTRACAO [pt] MISTURA DE FORMATOS [pt] INTERATIVIDADE [pt] EXPERIENCIA [pt] MIDIA DIGITAL [en] ILUSTRATION [en] MIXED FORMAT [en] INTERACTIVITY [en] EXPERIENCE [en] DIGITAL MEDIA
8	Mixed-format test score equating: effect of item-type multidimensionality, length and composition of common-item set, and group ability difference Wang, Wei 01 December 2013 (has links) Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under the common-item nonequivalent groups design (CINEG). The purpose of this dissertation was to investigate how various test characteristics and examinee characteristics influence CINEG mixed-format test score equating results. Simulated data were used in this dissertation. Simulees' item responses were generated using items selected from one MC item pool and one CR item pool which were constructed based on the College Board Advanced Placement examinations from various subject areas. Five main factors were investigated in this dissertation, including item-type dimensionality, group ability difference, within group ability difference, length and composition of the common-item set, and format representativeness of the common-item set. In addition, the performance of two equating methods, the presmoothed frequency estimation method (PreSm_FE) and the presmoothed chained equipercentile equating method (PreSm_CE), was compared under various conditions. To evaluate equating results, both conditional statistics and overall summary statistics were considered: absolute bias, standard error of equating, and root mean squared error. The difference that matters (DTM) also was used as a criterion for evaluating whether adequate equating results were obtained. The main findings based on the simulation studies are as follows: (1) For most situations, item-type multidimensionality did not have substantial impact on random error, regardless of the common-item set. However, its influence on bias depended on the composition of common-item sets; (2) Both the group ability difference factor and the within group ability difference factor had no substantial influence on random error. When group ability differences were simulated, the common-item set with more items or more total score points had less equating error. When a within group ability difference existed, conditions in which there was a balance of different item formats in the common-item set displayed more accurate equating results than did unbalanced common-item sets. (3) The relative performance of common-item sets with various lengths and compositions was dependent on the levels of group ability difference, within group ability difference, and test dimensionality. (4) The common-item set containing only MC items performed similarly to the common-item set with both item formats when the test forms were unidimensional and no within group ability difference existed or when groups of examinees did not differ in proficiency. (5) The PreSm_FE method was more sensitive to group ability difference than the PreSm_CE method. When the within group ability difference was non-zero, the relative performance of the two methods depended on the length and composition of the common-item set. The two methods performed almost the same in terms of random error. The studies conducted in this dissertation suggest that when equating multidimensional mixed-format test forms in practice, if groups of examinees differ substantially in overall proficiency, inclusion of both item formats should be considered for the common-item set. When within group ability differences are likely to exist, balancing different item formats in the common-item set appears to be even more important than the use of a larger number of common items for obtaining accurate equating results. Because only simulation studies were conducted in this dissertation, caution should be exercised when generalizing the conclusions to practical situations. group ability difference item-type multidimensionality mixed-format test score equating type of common-item set within group ability difference Educational Psychology

Search results