Spelling suggestions: "subject:"computerized adaptive desting"" "subject:"computerized adaptive ingesting""
1 |
A comparative study of optimal pool design methods in computerized adaptive testingHsu, Ying-Ju 01 May 2017 (has links)
An efficient pool is critical for CAT administrations. Two approaches have been developed to design an optimal CAT pool: the linear programming method (LP; Veldkamp & van der Linden, 2000, 2010) and the bin-and-union method (BU; Reckase, 2003, 2010). This study manipulated different content balancing approaches and exposure conditions to investigate their impacts on the pool performances of the LP and BU methods under practical testing situations.
The optimal pools were constructed in terms of the specification of an operational fixed-length CAT program and the IRT model employed. This study considered the one-parameter logistic (1PL) model to simulate adaptive test item responses using optimal and operational pools. Several psychometric properties were compared between the pools designed under the LP and BU methods. This research attempted to answer the following question: Under the consideration of content balancing and exposure control, what were the benefits and limitations of the LP and BU methods with respect to the optimal pool design? The results were evaluated in terms of pool characteristics, content constraint management, item exposure control, pool utilization, test reliability, and measurement precision.
Similar pool characteristics were found between the LP and BU methods. With respect to the evaluation criteria, the LP and BU pools exhibited consistent performance. However, compared to the LP pools, the BU pools demonstrated slight superiority under the condition with strict content balancing and exposure control. Given two bin widths (.35 and .70), the pools with a bin-width of .35 exhibited better performance than those with a bin-width of .70 with respect to various evaluation criteria. Especially under the condition with the strict content balancing and exposure control, a bin-width of .35 might be a better option to generate an optimal pool than a bin-width of .70 in order to maintain a higher test on-target rate.
|
2 |
Estimating the Examinee Ability on the Computerized Adaptive Testing Using Adaptive Network-Based Fuzzy Inference SystemChen, Kai-pei 09 February 2007 (has links)
Computerized adaptive testing attempts to provide the most suitable question for an examinee depending on the examinee¡¦s ability to achieve the best result. Although Maximum Likelihood Estimation (MLE) and Bayesian Likelihood Estimation (BLE) have been provided to solve ability estimation and have good results in the literature, little attention has been paid to the situation when the answer of an item does not conform with the examinee¡¦s ability as expected nor standard derivation changes of the ability estimation. We hypothesized that the Adaptive-Network-Based Fuzzy Inference System (ANFIS) can be used to infer flexible examinee¡¦s ability estimation automically by analyzing the relevant data of the examinee in a test. Consequently, the study presents a novel learning ability model based on ANFIS, which can adaptively choose questions by Item Response Theory. Taking the item discrimination, difficulty, guessing, and the examinee¡¦s ability before he/she answers a question as parameters, the proposed method can infer the adjustment of the examinee¡¦s ability to update its value after he/she answers the question. The ANFIS model of the experiments were developed using MATLAB. The examinees were simulated and the training data were collected under three different situations. Through different combination of ANFIS fuzzy rules, the adjustment of ability is inferred to improve the accuracy of the estimated ability. The error between the true ability and the estimated ability obtained by the proposed model is compared with MLE and BLE. The simulation results show that the estimated ability error of ANFIS is smaller than MLE and BLE when the value of the test information is larger. The proposed method could provide better accuracy of the examinee¡¦s ability and offer more appropriate questions for examinees.
Keywords: ANFIS, Item Response Theory, Computerized Adaptive Testing
|
3 |
The impact of collateral information on ability estimation in an adaptive test batteryXie, Qing 01 May 2019 (has links)
The advantages of administering an adaptive test battery, a collection of multiple adaptive subtests that are specifically tailored to examinees’ abilities, include shortening the subtest length and maintaining the accuracy of individual subtest scores. The test battery can incorporate a range of subjects, though this study focused primarily on Math and Reading.
This study compared different ways of incorporating collateral information (CI), supplementary information beyond examinees’ current test performance, under two frameworks (Unidimensional and Multidimensional computerized adaptive testing). It also investigated the impact of subtest intercorrelations (the relationship between an examinee’s test scores), as well as the sequences of subtest administration on ability estimation in a variable-length adaptive battery. Practical issues including content constraints and item exposure control were also considered.
Findings showed that the CI methods improved measurement efficiency with an acceptable level of measurement precision. The CI was more beneficial when associated with higher intercorrelations among the subtests. Also, the CI was found to be advantageous during the early stages of the subtests which were not taken first. Therefore, the CI may improve the examinee experience by administering items more aligned with their abilities. In addition, the CI should reduce costs for testing organizations by requiring fewer items and possibly saving seat time, while still providing reliable scores. The results should help practitioners decide whether the use of the CI is worthwhile under their particular testing situation.
|
4 |
A comparison of three statistical testing procedures for computerized classification testing with multiple cutscores and item selection methodsHaring, Samuel Heard 25 June 2014 (has links)
Computerized classification tests (CCT) have been used in high-stakes assessment settings where the express purpose of the testing is to assign a classification decision (e.g. pass/fail). One key feature of sequential probability ratio test-type procedures is that items are selected to maximize information around the cutscore region of the examinee ability distribution as opposed to common features of CATs where items are selected to maximize information at examinees' interim estimates. Previous research has examined the effectiveness of computerized adaptive tests (CAT) utilizing classification testing procedures a single cutscore as well as multiple cutscores (e.g. below basic/proficient/advanced). Several variations of the SPRT procedure have been advanced recently including a generalized likelihood ratio (GLR). While the GLR procedure has shown evidences of improved average test length while reasonably maintaining classification accuracy, it also introduces unnecessary error. The purpose of this dissertation was to propose and investigate the functionality of a modified GLR procedure which does not incorporate the unnecessary error inherent in the GLR procedure. Additionally this dissertation explored the use of the multiple cutscores and the use of ability-based item selection. This dissertation investigated the performance of three classification procedures (SPRT, GLR, and modified GLR), multiple cutscores, and two test lengths. An additional set of conditions were developed in which an ability-based item selection method was used with the modified GLR. A simulation study was performed to gather evidences of the effectiveness and efficiency of a modified GLR procedure by comparing it to the SPRT and GLR procedures. The study found that the GLR and mGLR procedures were able to yield shorter test lengths as anticipated. Additionally, the mGLR procedure using ability-based item selection produced even shorter test lengths than the cutscore-based mGLR method. Overall, the classification accuracy of the procedures were reasonably close. Examination of conditional classification accuracy in the multiple-cutscore conditions showed unexpectedly low values for each of the procedures. Implications and future research are discussed herein. / text
|
5 |
A comparison of item selection procedures using different ability estimation methods in computerized adaptive testing based on the generalized partial credit modelHo, Tsung-Han 17 September 2010 (has links)
Computerized adaptive testing (CAT) provides a highly efficient alternative to the paper-and-pencil test. By selecting items that match examinees’ ability levels, CAT not only can shorten test length and administration time but it can also increase measurement precision and reduce measurement error.
In CAT, maximum information (MI) is the most widely used item selection procedure. However, the major challenge with MI is the attenuation paradox, which results because the MI algorithm may lead to the selection of items that are not well targeted at an examinee’s true ability level, resulting in more errors in subsequent ability estimates. The solution is to find an alternative item selection procedure or an appropriate ability estimation method. CAT studies have not investigated the association between these two components of a CAT system based on polytomous IRT models.
The present study compared the performance of four item selection procedures (MI, MPWI, MEI, and MEPV) across four ability estimation methods (MLE, WLE, EAP-N, and EAP-PS) under the mixed-format CAT based on the generalized partial credit model (GPCM). The test-unit pool and generated responses were based on test-units calibrated from an operational national test that included both independent dichotomous items and testlets. Several test conditions were manipulated: the unconstrained CAT as well as the constrained CAT in which the CCAT was used as the content-balancing, and the progressive-restricted procedure with maximum exposure rate equal to 0.19 (PR19) served as the exposure control in this study. The performance of various CAT conditions was evaluated in terms of measurement precision, exposure control properties, and the extent of selected-test-unit overlap.
Results suggested that all item selection procedures, regardless of ability estimation methods, performed equally well in all evaluation indices across two CAT conditions. The MEPV procedure, however, was favorable in terms of a slightly lower maximum exposure rate, better pool utilization, and reduced test and selected-test-unit overlap than with the other three item selection procedures when both CCAT and PR19 procedures were implemented. It is not necessary to implement the sophisticated and computing-intensive Bayesian item selection procedures across ability estimation methods under the GPCM-based CAT.
In terms of the ability estimation methods, MLE, WLE, and two EAP methods, regardless of item selection procedures, did not produce practical differences in all evaluation indices across two CAT conditions. The WLE method, however, generated significantly fewer non-convergent cases than did the MLE method. It was concluded that the WLE method, instead of MLE, should be considered, because the non-convergent case is less of an issue. The EAP estimation method, on the other hand, should be used with caution unless an appropriate prior θ distribution is specified. / text
|
6 |
Počítačové adaptivní testování a možnosti jeho využití v psychodiagnostice / Computerized adaptive testing and its use in psychodiagnosticDlouhá, Jana January 2014 (has links)
5 Abstract The theoretical part of the paper focuses on computerized adaptive testing (CAT) and item response theory (IRT). Also included is a chapter comparing IRT with the commonly used classical test theory (CTT). There is also a brief mention of computerized and online testing, as these types of administration differ in many aspects from conventional paper & pencil tests. The goal of this paper was to evaluate the individual ways of eEPI test administration and to compare them with eEPQ tests and self-evaluation. In the practical part the items of the extraversion scale of the Eysenck Personality Inventory (eEPI) were calibrated using a group of 124 respondents. The acquired data were subsequently used to carry out a simulation of computerized adaptive testing, which clearly demonstrated the benefits of this type of testing in comparison to the classical test form. These results were compared with the results of real CAT test administration using the original sample and a new group of respondents (Np=69, Nn=68). The results were highly correlated with the results of the simulated test. Moreover, to verify the validity of the computerized adaptive version of eEOD, the respondents' results in this test were compared with the results in the eEPQ test and in a short self-assessment scale. Finally,...
|
7 |
Teste adaptativo informatizado da Provinha Brasil: a construção de um instrumento de apoio para professores(as) e gestores(as) de escolas / Computerized adaptive test of Provinha Brasil: the construction of a supportive instrument for teachers and school administratorsCatalani, Érica Maria Toledo 29 March 2019 (has links)
Esta Tese resulta de um projeto de construção de um Teste Adaptativo Informatizado (TAI) para a versão em papel e lápis da Provinha Brasil (PB), focado na avaliação da proficiência em leitura. O teste da PB Leitura, apesar de possuir elementos de ordem técnica e conceitual para a constituição de uma avaliação educacional e de seu amplo uso por professores dos anos iniciais do ensino fundamental, apresentava limitações que poderiam ser superadas por testes adaptados aos perfis de aprendizagem dos estudantes e com resultados mais fidedignos para apoiar as decisões pedagógicas de professores(as) e gestores(as) escolares. Assim, buscou-se responder à questão: É possível construir um TAI para a versão impressa da PB Leitura que seja ponto de apoio para professores(a) na avaliação de alunos(as) dos anos iniciais do ensino fundamental?. Para a construção dessa ferramenta TAI da PB Leitura foi necessário articular engenheiros de softwares, elaboradores de testes, pesquisadores e profissionais da educação de 15 escolas públicas do município de São Paulo. Para que pudessem participar da construção da ferramenta e da validação dos resultados obtidos, foi realizada formação de professores(as) e gestores(as) educacionais sobre medida educacional, leitura e avaliação. Após a verificação de que os aspectos psicométricos dos itens da versão impressa poderiam ser mantidos para a versão informatizada, o TAI da PB Leitura foi aplicado e os resultados indicaram que ele permitiu testes personalizados aos domínios dos(as) alunos(as), mais rápidos e de menor comprimento, sem prejuízo da precisão. Por apresentar resultados embasados em uma escala com importante interpretação pedagógica, o TAI da PB Leitura se revelou capaz de apoiar a prática avaliativa de professores(as) e gestores(as) e o trabalho pedagógico na alfabetização e no letramento inicial. Esse apoio foi potencializado com o acréscimo de uma regra ao critério de parada do TAI, utilizada em testes que visam a classificação do respondente em níveis de resultado. Verificouse também a necessidade de aprofundar as investigações sobre: a formação de professores(as) na temática da medida e avaliação; a ampliação do banco de itens, com a finalidade de controle de taxas de exposição e balanceamento de conteúdo, e a produção de relatórios pedagógicos. / This thesis results from a project of construction of a Computerized Adaptive Test (CAT) for the paper and pencil version of Provinha Brasil (PB), focused on the assessment of proficiency in reading. The PB Reading test, despite having technical and conceptual elements for the constitution of an educational assessment and its wide use by teachers of the initial years of elementary school, presented limitations that could be overcome by tests adapted to the learning styles of students and with much more reliable outcomes to support the pedagogical decisions of teachers and school administrators. Thus, it was sought to answer the question: \"Is it possible to create a CAT for the printed version of PB Reading test which would be a base of assistance for teachers in the assessment of students in the initial years of elementary education?\" For the creation of this CAT tool from PB Reading test it was necessary to articulate software engineers, test designers, researchers and education professionals from 15 public schools from São Paulo city. In order to take part in the creation of the tool and the validation of the achieved results, it was made teachers and educational managers training on educational measures, reading and assessment. After verifying that the psychometric aspects of the printed version items could be kept for the computerized version, the PB Reading CAT was applied and the results indicated that it allowed customized testing to the students domains, faster and of smaller length, without prejudice of the precision. Based on a scale with an important pedagogical interpretation, the PB Reading CAT was able to support the assessment practice of teachers and managers and the pedagogical work in literacy and initial literacy. This support was strengthened by adding a rule to the CAT stopping criterion, used in tests that aim to classify the respondent into outcome levels. There was also a need to deepen the research on: teacher training in the subject of measurement and assessment; the expansion of the item base, for the purpose of controlling exposure rates and content balancing, and the production of pedagogical reports.
|
8 |
Teoria e a prática de um teste adaptativo informatizado / Theory and practice of computerized adaptive testingSassi, Gilberto Pereira 10 April 2012 (has links)
O objetivo deste trabalho é apresentar os conceitos relacionados a Teste Adaptativo Informatizado, ou abreviadamente TAI, para o modelo logístico unidimensional da Teoria de Resposta ao Item. Utilizamos a abordagem bayesiana para a estimação do parâmetro de interesse, chamado de traço latente ou habilidade. Apresentamos os principais algoritmos de seleção de itens em TAI e realizamos estudos de simulação para comparar o desempenho deles. Para comparação, usamos aproximações numéricas para o Erro Quadrático Médio e para o Vício e também calculamos o tempo médio para o TAI selecionar um item. Além disso, apresentamos como instalar e usar a implementação de TAI desenvolvida neste projeto chamada de TAI2U, que foi desenvolvido no VBA-Excel usando uma interface com o R / The main of this work is to introduce the subjects related to Computerized Adaptive Testing, or breafly CAT, for the unidimensional three-parameter logistic model of Item Response Theory. We use bayesian approach to estimate the parameter of interest. We present several item selection algorithms and we perform simulations comparing them. The comparisons are made in terms of the mean square error, bias of the trait estimates, the average time for item selection and the average length of test. Furthermore, we show how to install e use the CAT implementation of this work called built in MIcrosoft Excel - VBA using interface with the statistical package R
|
9 |
Comparison Of Linear And Adaptive Versions Of The Turkish Pupil Monitoring System (pms) Mathematics AssessmentGokce, Semirhan 01 July 2012 (has links) (PDF)
Until the developments in computer technology, linear test administrations within classical test theory framework is mostly used in testing practices. These tests contain a set of predefined items in a large range of difficulty values for collecting information from students at various ability levels. However, placing very easy and very difficult items in the same test not only cause wasting time and effort but also introduces possible extraneous variables into the measurement process such as possibility of guessing, chance of careless errors induced by boredom or frustration. Instead of administering a linear test there is another option that adapts the difficulty of test according to the ability level of examinees which is named as computerized adaptive test. Computerized adaptive tests use item response theory as a measurement framework and have algorithms responsible for item selection, ability estimation, starting rule and test termination.
The present study aims to determine the applicability of computerized adaptive testing (CAT) to Turkish Pupil Monitoring System&rsquo / s (PMS) mathematics assessments. Therefore, live CAT study using only multiple choice items is designed to investigate whether to obtain comparable ability estimations. Afterwards, a Monte Carlo simulation study and a Post-hoc simulation study are designed to determine the optimum CAT algorithm for Turkish PMS mathematics assessments. In the simulation studies, both multiple-choice and open-ended items are used and different scenarios are tested regarding various starting rules, termination criterion, ability estimation methods and existence of exposure/content controls.
The results of the study indicate that using Weighted Maximum Likelihood (WML) ability estimation method, easy initial item difficulty as starting rule and a fixed test reliability termination criterion (0.30 standard error as termination rule) gives the optimum CAT algorithm for Turkish PMS mathematics assessment. Additionally, item exposure and content control strategies have a positive impact on providing comparable ability estimations.
|
10 |
Počítačové adaptivní testování v kinantropologii: Monte Carlo simulace s využitím physical self description questionnaire / Computerized Adaptive Testing In Kinanthropology: Monte Carlo Simulations Using The Physical Self Description QuestionnaireKomarc, Martin January 2017 (has links)
This thesis aims to introduce the use of computerized adaptive testing (CAT) - a novel and ever increasingly used method of a test administration - applied to the field of Kinanthropology. By adapting a test to an individual respondent's latent trait level, computerized adaptive testing offers numerous theoretical and methodological improvements that can significantly advance testing procedures. In the first part of the thesis, the theoretical and conceptual basis of CAT, as well as a brief overview of its historical origins and basic general principles are presented. The discussion necessarily includes the description of Item Response Theory (IRT) to some extent, since IRT is almost exclusively used as the mathematical model in today's CAT applications. Practical application of CAT is then evaluated using Monte-Carlo simulations involving adaptive administration of the Physical Self-Description Questionnaire (PSDQ) (Marsh, Richards, Johnson, Roche, & Tremayne, 1994) - an instrument widely used to assess physical self-concept in the field of sport and exercise psychology. The Monte Carlo simulation of the PSDQ adaptive administration utilized a real item pool (N = 70) calibrated with a Graded Response Model (GRM, see Samejima, 1969, 1997). The responses to test items were generated based on item...
|
Page generated in 0.0983 seconds