Global ETD Search

31	An Item Response Theory Analysis of the Scales from the International Personality Item Pool and the NEO Personality Inventory-Revised McBride, Nadine LeBarron 10 August 2001 (has links) Personality tests are widely used in the field of Industrial/Organizational Psychology; however, few studies have focused on their psychometric properties using Item Response Theory. This paper uses IRT to examine the test information functions (TIFs) of two personality measures: the NEO-PI-R and scales from the International Personality Item Pool. Results showed that most scales for both measures provided relatively consistent levels of information and measurement precision across levels of theta (q). Although the NEO-PI-R provided overall higher levels of information and measurement precision, the IPIP scales provided greater efficiency in that they provided more precision per item. Both scales showed substantial decrease in precision and information when response scales were dichotomized away from the original 5 point likert scale format. Implications and further avenues for research are discussed. / Master of Science Personality Item Response Theory IRT
32	A comparison of fixed item parameter calibration methods and reporting score scales in the development of an item pool Chen, Keyu 01 August 2019 (has links) The purposes of the study were to compare the relative performances of three fixed item parameter calibration methods (FIPC) in item and ability parameter estimation and to examine how the ability estimates obtained from these different methods affect interpretations using reported scales of different lengths. Through a simulation design, the study was divided into two stages. The first stage was the calibration stage, where the parameters of pretest items were estimated. This stage investigated the accuracy of item parameter estimates and the recovery of the underlying ability distributions for different sample sizes, different numbers of pretest items, and different types of ability distributions under the three-parameter logistic model (3PL). The second stage was the operational stage, where the estimated parameters of the pretest items were put on operational forms and were used to score examinees. The second stage investigated the effect of item parameter estimation had on the ability estimation and reported scores for the new test forms. It was found that the item parameters estimated from the three FIPC methods showed subtle differences, but the results of the DeMars method were closer to those of the separate calibration with linking method than to the FIPC with simple-prior update and FIPC with iterative prior update methods, while the FIPC with simple-prior update and FIPC with iterative prior update methods performed similarly. Regarding the experimental factors that were manipulated in the simulation, the study found that the sample size influenced the estimation of item parameters. The effect of the number of pretest items on estimation of item parameters was strong but ambiguous, likely because the effect was confounded by changes of both the number of the pretest items and the characteristics of the pretest items among the item sets. The effect of ability distributions on estimation of item parameters was not as evident as the effect of the other two factors. After the pretest items were calibrated, the parameter estimates of these items were put into operational use. The abilities of the examinees were then estimated based on the examinees’ response to the existing operational items and the new items (previously called pretest items), of which the item parameters were estimated under different conditions. This study found that there were high correlations between the ability estimates and the true abilities of the examinees when forms containing pretest items calibrated using any of the three FIPC methods. The results suggested that all three FIPC methods were similarly competent in estimating parameters of the items, leading to satisfying determination of the examinees’ abilities. When considering the scale scores, because the estimated abilities were very similar, there were small differences among the scaled scores on the same scale; the relative frequency of examinees classified into performance categories and the classification consistency index also showed the interpretation of reported scores across scales were similar. The study provided a comprehensive comparison on the use of FIPC methods in parameter estimation. It was hoped that this study would help the practitioners choose among the methods according to the needs of the testing programs. When ability estimates were linearly transformed into scale scores, the lengths of scales did not affect the statistical properties of scores, however, they may impact how the scores are subjectively perceived by stakeholders and therefore should be carefully selected. Fixed item parameter calibration Item calibration Item response theory Scale score Simulation Educational Psychology
33	<原著>項目困難度の分布の偏りが IRT 項目パラメタの発見的推定値に与える影響野口, 裕之, NOGUCHI, Hiroyuki 25 December 1992 (has links) 国立情報学研究所で電子化したコンテンツを使用している。 item response theory item perameter estimation heuristic method item difficulty distribution simulation study
34	Stratified item selection and exposure control in unidimensional adaptive testing in the presence of two-dimensional data. Kalinowski, Kevin E. 08 1900 (has links) It is not uncommon to use unidimensional item response theory (IRT) models to estimate ability in multidimensional data. Therefore it is important to understand the implications of summarizing multiple dimensions of ability into a single parameter estimate, especially if effects are confounded when applied to computerized adaptive testing (CAT). Previous studies have investigated the effects of different IRT models and ability estimators by manipulating the relationships between item and person parameters. However, in all cases, the maximum information criterion was used as the item selection method. Because maximum information is heavily influenced by the item discrimination parameter, investigating a-stratified item selection methods is tenable. The current Monte Carlo study compared maximum information, a-stratification, and a-stratification with b blocking item selection methods, alone, as well as in combination with the Sympson-Hetter exposure control strategy. The six testing conditions were conditioned on three levels of interdimensional item difficulty correlations and four levels of interdimensional examinee ability correlations. Measures of fidelity, estimation bias, error, and item usage were used to evaluate the effectiveness of the methods. Results showed either stratified item selection strategy is warranted if the goal is to obtain precise estimates of ability when using unidimensional CAT in the presence of two-dimensional data. If the goal also includes limiting bias of the estimate, Sympson-Hetter exposure control should be included. Results also confirmed that Sympson-Hetter is effective in optimizing item pool usage. Given these results, existing unidimensional CAT implementations might consider employing a stratified item selection routine plus Sympson-Hetter exposure control, rather than recalibrate the item pool under a multidimensional model. a-stratified design Adaptive testing multidimensionality Item response theory. item selection item exposure control Computer adaptive testing.
35	Uma abordagem personalizada no processo de seleção de itens em Testes Adaptativos Computadorizados / A personalized approach to the item selection process in Computerized Adaptive Testing Victor Miranda Gonçalves Jatobá 08 October 2018 (has links) Testes Adaptativos Computadorizados (CAT) baseados na Teoria de Resposta ao Item permitem fazer testes mais precisos com um menor número de questões em relação à prova clássica feita a papel. Porém a construção de CAT envolve alguns questionamentos-chave, que quando feitos de forma adequada, podem melhorar ainda mais a precisão e a eficiência na estimativa das habilidades dos respondentes. Um dos principais questionamentos está na escolha da Regra de Seleção de Itens (ISR). O CAT clássico, faz uso, exclusivamente, de uma ISR. Entretanto, essas regras possuem vantagens, entre elas, a depender do nível de habilidade e do estágio em que o teste se encontra. Assim, o objetivo deste trabalho é reduzir o comprimento de provas dicotômicas - que consideram apenas se a resposta foi correta ou incorreta - que estão inseridas no ambiente de um CAT que faz uso, exclusivo, de apenas uma ISR sem perda significativa de precisão da estimativa das habilidades. Para tal, cria-se a abordagem denominada ALICAT que personaliza o processo de seleção de itens em CAT, considerando o uso de mais de uma ISR. Para aplicar essa abordagem é necessário primeiro analisar o desempenho de diferentes ISRs. Um estudo de caso na prova de Matemática e suas tecnologias do ENEM de 2012, indica que a regra de seleção de Kullback-Leibler com distribuição a posteriori (KLP) possui melhor desempenho na estimativa das habilidades dos respondentes em relação as regras: Informação de Fisher (F); Kullback-Leibler (KL); Informação Ponderada pela Máxima Verossimilhança (MLWI); e Informação ponderada a posteriori (MPWI). Resultados prévios da literatura mostram que CAT utilizando a regra KLP conseguiu reduzir a prova do estudo de caso em 46,6% em relação ao tamanho completo de 45 itens sem perda significativa na estimativa das habilidades. Neste trabalho, foi observado que as regras F e a MLWI tiveram melhor desempenho nos estágios inicias do CAT, para estimar respondentes com níveis de habilidades extremos negativos e positivos, respectivamente. Com a utilização dessas regras de seleção em conjunto, a abordagem ALICAT reduziu a mesma prova em 53,3% / Computerized Adaptive Testing (CAT) based on Item Response Theory allows more accurate assessments with fewer questions than the classic paper test. Nonetheless, the CAT building involves some key questions that, when done properly, can further improve the accuracy and efficiency in estimating examinees\' abilities. One of the main questions is in regard to choosing the Item Selection Rule (ISR). The classic CAT makes exclusive use of one ISR. However, these rules have differences depending on the examinees\' ability level and on the CAT stage. Thus, the objective of this work is to reduce the dichotomous - which considers only correct and incorrect answers - test size which is inserted on a classic CAT without significant loss of accuracy in the estimation of the examinee\'s ability level. For this purpose, we create the ALICAT approach that personalizes the item selection process in a CAT considering the use of more than one ISR. To apply this approach, we first analyze the performance of different ISRs. The case study in textit test of the ENEM 2012 shows that the Kullback-Leibler Information with a Posterior Distribution (KLP) has better performance in the examinees\' ability estimation when compared with: Fisher Information (F); Kullback-Leibler Information (KL); Maximum Likelihood Weighted Information(MLWI); and Maximum Posterior Weighted Information (MPWI) rules. Previous results in the literature show that CAT using KLP was able to reduce this test size by 46.6% from the full size of 45 items with no significant loss of accuracy in estimating the examinees\' ability level. In this work, we observe that the F and the MLWI rules performed better on early CAT stages to estimate examinees proficiency level with extreme negative and positive values, respectively. With this information, we were able to reduce the same test by 53.3% using an approach that uses the best rules together Critério de Seleção de Item ENEM Exame Nacional do Ensino Médio Regra de Seleção de Item Teoria de Resposta ao Item Testes Adaptativos Computadorizados CAT Computerized Adaptive Testing Fisher Information IRT Item Response Theory Item Selection Method Item Selection Rule
36	Uma abordagem personalizada no processo de seleção de itens em Testes Adaptativos Computadorizados / A personalized approach to the item selection process in Computerized Adaptive Testing Jatobá, Victor Miranda Gonçalves 08 October 2018 (has links) Testes Adaptativos Computadorizados (CAT) baseados na Teoria de Resposta ao Item permitem fazer testes mais precisos com um menor número de questões em relação à prova clássica feita a papel. Porém a construção de CAT envolve alguns questionamentos-chave, que quando feitos de forma adequada, podem melhorar ainda mais a precisão e a eficiência na estimativa das habilidades dos respondentes. Um dos principais questionamentos está na escolha da Regra de Seleção de Itens (ISR). O CAT clássico, faz uso, exclusivamente, de uma ISR. Entretanto, essas regras possuem vantagens, entre elas, a depender do nível de habilidade e do estágio em que o teste se encontra. Assim, o objetivo deste trabalho é reduzir o comprimento de provas dicotômicas - que consideram apenas se a resposta foi correta ou incorreta - que estão inseridas no ambiente de um CAT que faz uso, exclusivo, de apenas uma ISR sem perda significativa de precisão da estimativa das habilidades. Para tal, cria-se a abordagem denominada ALICAT que personaliza o processo de seleção de itens em CAT, considerando o uso de mais de uma ISR. Para aplicar essa abordagem é necessário primeiro analisar o desempenho de diferentes ISRs. Um estudo de caso na prova de Matemática e suas tecnologias do ENEM de 2012, indica que a regra de seleção de Kullback-Leibler com distribuição a posteriori (KLP) possui melhor desempenho na estimativa das habilidades dos respondentes em relação as regras: Informação de Fisher (F); Kullback-Leibler (KL); Informação Ponderada pela Máxima Verossimilhança (MLWI); e Informação ponderada a posteriori (MPWI). Resultados prévios da literatura mostram que CAT utilizando a regra KLP conseguiu reduzir a prova do estudo de caso em 46,6% em relação ao tamanho completo de 45 itens sem perda significativa na estimativa das habilidades. Neste trabalho, foi observado que as regras F e a MLWI tiveram melhor desempenho nos estágios inicias do CAT, para estimar respondentes com níveis de habilidades extremos negativos e positivos, respectivamente. Com a utilização dessas regras de seleção em conjunto, a abordagem ALICAT reduziu a mesma prova em 53,3% / Computerized Adaptive Testing (CAT) based on Item Response Theory allows more accurate assessments with fewer questions than the classic paper test. Nonetheless, the CAT building involves some key questions that, when done properly, can further improve the accuracy and efficiency in estimating examinees\' abilities. One of the main questions is in regard to choosing the Item Selection Rule (ISR). The classic CAT makes exclusive use of one ISR. However, these rules have differences depending on the examinees\' ability level and on the CAT stage. Thus, the objective of this work is to reduce the dichotomous - which considers only correct and incorrect answers - test size which is inserted on a classic CAT without significant loss of accuracy in the estimation of the examinee\'s ability level. For this purpose, we create the ALICAT approach that personalizes the item selection process in a CAT considering the use of more than one ISR. To apply this approach, we first analyze the performance of different ISRs. The case study in textit test of the ENEM 2012 shows that the Kullback-Leibler Information with a Posterior Distribution (KLP) has better performance in the examinees\' ability estimation when compared with: Fisher Information (F); Kullback-Leibler Information (KL); Maximum Likelihood Weighted Information(MLWI); and Maximum Posterior Weighted Information (MPWI) rules. Previous results in the literature show that CAT using KLP was able to reduce this test size by 46.6% from the full size of 45 items with no significant loss of accuracy in estimating the examinees\' ability level. In this work, we observe that the F and the MLWI rules performed better on early CAT stages to estimate examinees proficiency level with extreme negative and positive values, respectively. With this information, we were able to reduce the same test by 53.3% using an approach that uses the best rules together CAT Computerized Adaptive Testing Critério de Seleção de Item ENEM Exame Nacional do Ensino Médio Fisher Information IRT Item Response Theory Item Selection Method Item Selection Rule Regra de Seleção de Item Teoria de Resposta ao Item Testes Adaptativos Computadorizados
37	A Monte Carlo Study Investigating the Influence of Item Discrimination, Category Intersection Parameters, and Differential Item Functioning in Polytomous Items Thurman, Carol Jenetha 21 October 2009 (has links) The increased use of polytomous item formats has led assessment developers to pay greater attention to the detection of differential item functioning (DIF) in these items. DIF occurs when an item performs differently for two contrasting groups of respondents (e.g., males versus females) after controlling for differences in the abilities of the groups. Determining whether the difference in performance on an item between two demographic groups is due to between group differences in ability or some form of unfairness in the item is a more complex task for a polytomous item, because of its many score categories, than for a dichotomous item. Effective DIF detection methods must be able to locate DIF within each of these various score categories. The Mantel, Generalized Mantel Haenszel (GMH), and Logistic Regression (LR) are three of several DIF detection methods that are able to test for DIF in polytomous items. There have been relatively few studies on the effectiveness of polytomous procedures to detect DIF; and of those studies, only a very small percentage have examined the efficiency of the Mantel, GMH, and LR procedures when item discrimination magnitudes and category intersection parameters vary and when there are different patterns of DIF (e.g., balanced versus constant) within score categories. This Monte Carlo simulation study compared the Type I error and power of the Mantel, GMH, and OLR (LR method for ordinal data) procedures when variation occurred in 1) the item discrimination parameters, 2) category intersection parameters, 3) DIF patterns within score categories, and 4) the average latent traits between the reference and focal groups. Results of this investigation showed that high item discrimination levels were directly related to increased DIF detection rates. The location of the difficulty parameters was also found to have a direct effect on DIF detection rates. Additionally, depending on item difficulty, DIF magnitudes and patterns within score categories were found to impact DIF detection rates and finally, DIF detection power increased as DIF magnitudes became larger. The GMH outperformed the Mantel and OLR and is recommended for use with polytomous data when the item discrimination varies across items. Monte Carlo Item Discrimination Category Intersection Differential Item Functioning Patterns Differential Item Functioning Detection Polytomous Items Education Education Policy
38	Assessment of Item Parameter Drift of Known Items in a University Placement Exam January 2012 (has links) abstract: ABSTRACT This study investigated the possibility of item parameter drift (IPD) in a calculus placement examination administered to approximately 3,000 students at a large university in the United States. A single form of the exam was administered continuously for a period of two years, possibly allowing later examinees to have prior knowledge of specific items on the exam. An analysis of IPD was conducted to explore evidence of possible item exposure. Two assumptions concerning items exposure were made: 1) item recall and item exposure are positively correlated, and 2) item exposure results in the items becoming easier over time. Special consideration was given to two contextual item characteristics: 1) item location within the test, specifically items at the beginning and end of the exam, and 2) the use of an associated diagram. The hypotheses stated that these item characteristics would make the items easier to recall and, therefore, more likely to be exposed, resulting in item drift. BILOG-MG 3 was used to calibrate the items and assess for IPD. No evidence was found to support the hypotheses that the items located at the beginning of the test or with an associated diagram drifted as a result of item exposure. Three items among the last ten on the exam drifted significantly and became easier, consistent with item exposure. However, in this study, the possible effects of item exposure could not be separated from the effects of other potential factors such as speededness, curriculum changes, better test preparation on the part of subsequent examinees, or guessing. / Dissertation/Thesis / M.A. Educational Psychology 2012 Educational psychology Educational tests & measurements BILOG Item parameter drift Item recall Item response theory Known items
39	A Comparison of Three Methods of Detecting Test Item Bias Monaco, Linda Gokey 05 1900 (has links) This study compared three methods of detecting test item bias, the chi-square approach, the transformed item difficulties approach, and the Linn-Harnish three-parameter item response approach which is the only Item Response Theory (IRT) method that can be utilized with minority samples relatively small in size. The items on two tests which measured writing and reading skills were examined for evidence of sex and ethnic bias. Eight sets of samples, four from each test, were randomly selected from the population (N=7287) of sixth, seventh, and eighth grade students enrolled in a large, urban school district in the southwestern United States. Each set of samples, male/female, White/Hispanic, White/Black, and White/White, contained 800 examinees in the majority group and 200 in the minority group. In an attempt to control differences in ability that may have existed between the various population groups, examinees with scores greater or less than two standard deviations from their group's mean were eliminated. Ethnic samples contained equal numbers of each sex. The White/White sets of samples were utilized to provide baseline bias estimates because the tests could not logically be biased against these groups. Bias indices were then calculated for each set of samples with each of the three methods. Findings of this study indicate that the percent agreement between the Linn-Harnish IRT method and the chisquare and transformed difficulties methods is similar to that found in previous studies comparing the latter approaches with other IRT methods requiring large minority samples. Therefore, it appears that the Linn-Harnish IRT approach can be used in lieu of other more restrictive IRT methods. Ethnic bias appears to exist in the two tests as measured by the large mean bias indices for the White/Hispanic and White/Black samples. Little sex bias was found as evidenced by the low mean bias indices of the male/ female samples and the fact that the male/female mean bias indices were lower than those of the White/White in 33% of the samples. test item bias chi-square approach transformed item difficulties approach Test bias.
40	Item Selection for a Structural Priming Task to be used with Spanish-English Bilingual Children with and without Language Impairment Eagleson, Rebecca 29 October 2010 (has links) Results from traditional assessment measures used with Spanish-English bilingual children may not be representative of this population’s morphosyntactic abilities due to their dynamic proficiencies. Short-term learning tasks such as structural priming may provide more comprehensive information on bilingual children’s morphosyntactic abilities. The purpose of this thesis was to analyze items from the experimental version of the Bilingual English Spanish Assessment-Middle Extension (BESA-ME) in order to select appropriate item types to be used on a structural priming task. The Experimental BESA-ME was administered to 137 children with typical development and 37 children with language impairment between the ages of 7;0 to 9;11. Results revealed that appropriate items for a structural priming task were third-person singular, past tense, and possessives in English, and conditionals, subjunctives, and direct object clitics in Spanish. Depending on the purpose of the structural priming tasks, additional items also showed potential for use. / text bilingual structural priming morphosyntax item difficulty

Search results