Global ETD Search

241	Um modelo de resposta ao item para grupos múltiplos com distribuições normais assimétricas centralizadas / A multiple group IRT model with skew-normal latent trait distribution under the centred parametrization Santos, José Roberto Silva dos, 1984- 20 August 2018 (has links) Orientador: Caio Lucidius Naberezny Azevedo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica / Made available in DSpace on 2018-08-20T09:23:25Z (GMT). No. of bitstreams: 1 Santos_JoseRobertoSilvados_M.pdf: 2068782 bytes, checksum: f8dc91d2f7f6091813ba229dc12991f4 (MD5) Previous issue date: 2012 / Resumo: Uma das suposições dominantes nos modelos de resposta ao item (MRI) é a suposição de normalidade simétrica para modelar o comportamento dos traços latentes. No entanto, tal suposição tem sido questionada em vários trabalhos como, por exemplo, nos trabalhos de Micceri (1989) e Bazán et.al (2006). Recentemente Azevedo et.al (2011) propuseram um MRI com distribuição normal assimétrica centralizada para os traços latentes, considerando a estrutura de um único grupo de indivíduos. No presente trabalho fazemos uma extensão desse modelo para o caso de grupos múltiplos. Desenvolvemos dois algoritmos MCMC para estimação dos parâmetros utilizando a estrutura de dados aumentados para representar a função de resposta ao item (FRI), veja Albert (1992). O primeiro é um amostrador de Gibbs com passos de Metropolis-Hastings. No segundo utilizamos representações estocásticas (gerando uma estrutura hierárquica) das densidades a priori dos traços latentes e parâmetros populacionais conseguindo, assim, formas conhecidas para todas as distribuições condicionais completas, o que nos possibilitou desenvolver o amostrador de Gibbs completo. Comparamos esses algoritmos utilizando como critério o tamanho efetivo de amostra, veja Sahu (2002). O amostrador de Gibbs completo obteve o melhor desempenho. Também avaliamos o impacto do número de respondentes por grupo, número de itens por grupo, número de itens comuns, assimetria da distribuição do grupo de referência e priori, na recuperação dos parâmetros. Os resultados indicaram que nosso modelo recuperou bem todos os parâmetros, principalmente, quando utilizamos a priori de Jeffreys. Além disso, o número de itens por grupo e o número de examinados por grupo, mostraram ter um alto impacto na recuperação dos traços latentes e parâmetros dos itens, respectivamente. Analisamos um conjunto de dados reais que apresenta indícios de assimetria na distribuição dos traços latentes de alguns grupos. Os resultados obtidos com o nosso modelo confirmam a presença de assimetria na maioria dos grupos. Estudamos algumas medidas de diagnóstico baseadas na distribuição preditiva de medidas de discrepância adequadas. Por último, comparamos os modelos simétrico e assimétrico utilizando os critérios sugeridos por Spiegelhalter et al. (2002). O modelo assimétrico se ajustou melhor aos dados segundo todos os critérios / Abstract: An usual assumption for parameter estimation in the Item Response Models (IRM) is to assume that the latent traits are random variables which follow a normal distribution. However, many works suggest that this assumption does not apply in many cases. For example, the works of Micceri (1989) and Bazán (2006). Recently Azevedo et.al (2011) proposed an IRM with skew-normal distribution under the centred parametrization for the latent traits, considering one single group of examinees. In the present work, we developed an extension of this model to account for multiple groups. We developed two MCMC algorithms to parameter estimation using the augmented data structure to represent the Item response function (IRF), see Albert (1992). The First is a Metropolis-Hastings within Gibbs sampling. In the second, we use stochastic representations (creating a hierarchical structure) in the prior distribution of the latent traits and population parameters. Therefore, we obtained known full conditional distributions, which enabled us to develop the full Gibbs sampler. We compared these algorithms using the effective sample size criteria, see Sahu (2002). The full Gibbs sampling presented the best performance. We also evaluated the impact of the number of examinees per group, number of items per group, number of common items, priors and asymmetry of the reference group, on the parameter recovery. The results indicated that our approach recovers properly all parameters, mainly, when we consider the Jeffreys prior. Furthermore, the number of items per group and the number of examinees per group, showed to have a high impact on the recovery of the true of latent traits and item parameters, respectively. We analyze a real data set in which we found an evidence of asymmetry in the distribution of latent traits of some groups. The results obtained with our model confirmed the presence of asymmetry in most groups. We studied some diagnostic measures based on predictive distribution of appropriate discrepancy measures. Finally, we compared the symmetric and asymmetric models using the criteria suggested by Spiegelhalter et al. (2002). The asymmetrical model fits better according to all criteria / Mestrado / Estatistica / Mestre em Estatística Teoria da resposta ao item Distribuição normal assimétrica Métodos MCMC (Estatística) Item response theory Skew-normal distribution MCMC methods
242	[en] ALTERNATIVES MODELS FOR PRODUCTION SOCIAL-ECONOMICAL INDEX: ITEM RESPONDE THEORY / [pt] MÉTODOS ALTERNATIVOS NO CRITÉRIO BRASIL PARA CONSTRUÇÃO DE INDICADORES SÓCIO-ECONÔMICO: TEORIA DA RESPOSTA AO ITEM VINICIUS RIBEIRO PEREIRA 06 August 2004 (has links) [pt] No Brasil a teoria da Resposta ao Item (TRI) tem sido empregada principalmente na produção de índices de proficiência para alunos que participam de testes de avaliação educacional em larga escala. No entanto, seus diferentes modelos permitem construir indicadores com as mais variadas finalidades, e este é o caso dos indicadores de condição sócio econômica. Existem poucos estudos no Brasil que abordam técnicas empregadas para a produção de indicadores da condição sócio-econômica tendo como base a teoria da resposta ao item. Neste trabalho, propõe-se construir outros tipos de indicadores da classificação sócioeconômica, além do Critério Brasil, utilizando-se modelos específicos da Teoria da Resposta ao Item. Esses indicadores serão comparados, interpretados, e comparados com o indicador do Critério Brasil. / [en] The IRT (Item Response Theory) has been used in Brazil mainly in the production of proficiency indices related to large scale educational assessment. However, the distinct models include in the formulation allow broader applications in the construction of indices, as; for instance, social-economical index (SEI). These are only a few published studies on techniques to formulates SEI specially those using the IRT. In this paper it is proposed a new formulation for the SEI in Brazil based on the IRT the obtained index is compared with the official one, knows as Critério Brasil. [pt] TEORIA DA RESPOSTA AO ITEM [en] ITEM RESPONSE THEORY [pt] MODELOS MULTIDIMENSIONAIS [en] MULTIDIMENSIONAL MODEL [pt] SOCIO-ECONOMICO [en] SOCIO-ECONOMIC
243	Investigating Parameter Recovery and Item Information for Triplet Multidimensional Forced Choice Measure: An Application of the GGUM-RANK Model Lee, Philseok 07 June 2016 (has links) To control various response biases and rater errors in noncognitive assessment, multidimensional forced choice (MFC) measures have been proposed as an alternative to single-statement Likert-type scales. Historically, MFC measures have been criticized because conventional scoring methods can lead to ipsativity problems that render scores unsuitable for inter-individual comparisons. However, with the recent advent of classical test theory and item response theory scoring methods that yield normative information, MFC measures are surging in popularity and becoming important components of personnel and educational assessment systems. This dissertation presents developments concerning a GGUM-based MFC model henceforth referred to as the GGUM-RANK. Markov Chain Monte Carlo (MCMC) algorithms were developed to estimate GGUM-RANK statement and person parameters directly from MFC rank responses, and the efficacy of the new estimation algorithm was examined through computer simulations and an empirical construct validity investigation. Recently derived GGUM-RANK item information functions and information indices were also used to evaluate overall item and test quality for the empirical study and to give insights into differences in scoring accuracy between two-alternative (pairwise preference) and three-alternative (triplet) MFC measures for future work. This presentation concludes with a discussion of the research findings and potential applications in workforce and educational setting. Multidimensional Forced Choice Format Item Response Theory (IRT) Monte Carlo Simulation Parameter Recovery Item Information Psychology Quantitative Psychology
244	Teorie odpovědí na položku a její aplikace v oblasti Národních srovnávacích zkoušek / Item Response Theory and its Application in the National Comparative Exams Fiřtová, Lenka January 2012 (has links) Item Response Theory, a psychometric paradigm for test development and evaluation, comprises a collection of models which enable the estimation of the probability of a correct answer to a particular item in the test as a function of the item parameters and the level of a respondent's underlying ability. This paper, written in cooperation with the company Scio, is focused on the application of Item Response Theory in the context of the National Comparative Exams. Its aim is to propose a test-equating procedure which would ensure a fair comparison of respondents' scores in the Test of General Academic Prerequisites regardless of the particular test administration.
245	Accuracy and variability of item parameter estimates from marginal maximum a posteriori estimation and Bayesian inference via Gibbs samplers Wu, Yi-Fang 01 August 2015 (has links) Item response theory (IRT) uses a family of statistical models for estimating stable characteristics of items and examinees and defining how these characteristics interact in describing item and test performance. With a focus on the three-parameter logistic IRT (Birnbaum, 1968; Lord, 1980) model, the current study examines the accuracy and variability of the item parameter estimates from the marginal maximum a posteriori estimation via an expectation-maximization algorithm (MMAP/EM) and the Markov chain Monte Carlo Gibbs sampling (MCMC/GS) approach. In the study, the various factors which have an impact on the accuracy and variability of the item parameter estimates are discussed, and then further evaluated through a large scale simulation. The factors of interest include the composition and length of tests, the distribution of underlying latent traits, the size of samples, and the prior distributions of discrimination, difficulty, and pseudo-guessing parameters. The results of the two estimation methods are compared to determine the lower limit--in terms of test length, sample size, test characteristics, and prior distributions of item parameters--at which the methods can satisfactorily recover item parameters and efficiently function in reality. For practitioners, the results help to define limits on the appropriate use of the BILOG-MG (which implements MMAP/EM) and also, to assist in deciding the utility of OpenBUGS (which carries out MCMC/GS) for item parameter estimation in practice. Gibbs sampling item parameter estimation item response theory marginal maximum A posteriori estimation Marko chain Monte Carlo Educational Psychology
246	Towards optimal measurement and theoretical grounding of L2 English elicited imitation: Examining scales, (mis)fits, and prompt features from item response theory and random forest approaches Ji-young Shin (11560495) 14 October 2021 (has links) <p>The present dissertation investigated the impact of scales / scoring methods and prompt linguistic features on the meausrement quality of L2 English elicited imitation (EI). Scales / scoring methods are an important feature for the validity and reliabilty of L2 EI test, but less is known (Yan et al., 2016). Prompt linguistic features are also known to influence EI test quaity, particularly item difficulty, but item discrimination or corpus-based, fine-grained meausres have rarely been incorporated into examining the contribution of prompt linguistic features. The current study addressed the research needs, using item response theory (IRT) and random forest modeling.</p><p>Data consisted of 9,348 oral responses to forty-eight items, including EI prompts, item scores, and rater comments, which were collected from 779 examinees of an L2 English EI test at Purdue Universtiy. First, the study explored the current and alternative EI scales / scoring methods that measure grammatical / semantic accuracy, focusing on optimal IRT-based measurement qualities (RQ1 through RQ4 in Phase Ⅰ). Next, the project identified important prompt linguistic features that predict EI item difficulty and discrimination across different scales / scoring methods and proficiency, using multi-level modeling and random forest regression (RQ5 and RQ6 in Phase Ⅱ).</p><p>The main findings were (although not limited to): 1) collapsing exact repetition and paraphrase categories led to more optimal measurement (i.e., adequacy of item parameter values, category functioning, and model / item / person fit) (RQ1); there were fewer misfitting persons with lower proficiency and higher frequency of unexpected responses in the extreme categories (RQ2); the inconsistency of qualitatively distinguishing semantic errors and the wide range of grammatical accuracy in the minor error category contributed to misfit (RQ3); a quantity-based, 4-category ordinal scale outperformed quality-based or binary scales (RQ4); sentence length significantly explained item difficulty only, with small variance explained (RQ5); Corpus-based lexical measures and phrase-level syntactic complexity were important to predicting item difficulty, particularly for the higher ability level. The findings made implications for EI scale / item development in human and automatic scoring settings and L2 English proficiency development.</p> Elicited imitation scales scoring methods prompt linguistic features item response theory random forest regression misfit analysis
247	A Comparison of Traditional Norming and Rasch Quick Norming Methods Bush, Joan Spooner 08 1900 (has links) The simplicity and ease of use of the Rasch procedure is a decided advantage. The test user needs only two numbers: the frequency of persons who answered each item correctly and the Rasch-calibrated item difficulty, usually a part of an existing item bank. Norms can be computed quickly for any specific group of interest. In addition, once the selected items from the calibrated bank are normed, any test, built from the item bank, is automatically norm-referenced. Thus, it was concluded that the Rasch quick norm procedure is a meaningful alternative to traditional classical true score norming for test users who desire normative data. norming method rasch quick norming method Norm-referenced tests. Item response theory. Educational tests and measurements. Rasch, G. (Georg), 1901-1980.
248	Comparing Fountas and Pinnell's Reading Levels to Reading Scores on the Criterion Referenced Competency Test Walker, Shunda F. 01 January 2016 (has links) Reading competency is related to individuals' success at school and in their careers. Students who experience significant problems with reading may be at risk of long-term academic and social problems. High-quality measures that determine student progress toward curricular goals are needed for early identification and interventions to improve reading abilities and ultimately prevent subsequent failure in reading. The purpose of this quantitative nonexperimental ex post facto research study was to determine whether a correlation existed amongst student achievement scores on the Fountas and Pinnell Reading Benchmark Assessment and reading comprehension scores on the Criterion Reference Competency Test (CRCT). The item response theory served as the conceptual framework for examining whether a relationship exists between Fountas and Pinnell Benchmark Instructional Reading Levels and the reading comprehension scores on the CRCT of students in Grades 3, 4, and 5 in the year 2013-2014. Archival data for 329 students in Grades 3-5 were collected and analyzed through Spearman's rank-order correlation. The results showed positive relationships between the scores. The findings promote positive social change by supporting the use of benchmark assessment data to identify at-risk reading students early. assessment Benchmark Test data driven decision making formative assessment Fountas and Pinnell item response theory Education Liberal Studies
249	Exploring the Item Difficulty and Other Psychometric Properties of the Core Perceptual, Verbal, and Working Memory Subtests of the WAIS-IV Using Item Response Theory Schleicher-Dilks, Sara Ann 01 January 2015 (has links) The ceiling and basal rules of the Wechsler Adult Intelligence Scale – Fourth Edition (WAIS-IV; Wechsler, 2008) only function as intended if subtest items proceed in order of difficulty. While many aspects of the WAIS-IV have been researched, there is no literature about subtest item difficulty and precise item difficulty values are not available. The WAIS-IV was developed within the framework of Classical Test Theory (CTT) and item difficulty was most often determined using p-values. One limitation of this method is that item difficulty values are sample dependent. Both standard error of measurement, an important indicator of reliability, and p-values change when the sample changes. A different framework within which psychological tests can be created, analyzed and refined is called Item Response Theory (IRT). IRT places items and person ability onto the same scale using linear transformations and links item difficulty level to person ability. As a result, IRT is said to be produce sample-independent statistics. Rasch modeling, a form of IRT, is one parameter logistic model that is appropriate for items with only two response options and assumes that the only factors affecting test performance are characteristics of items, such as their difficulty level or their relationship to the construct being measured by the test, and characteristics of participants, such as their ability levels. The partial credit model is similar to the standard dichotomous Rasch model, except that it is appropriate for items with more than two response options. Proponents of standard dichotomous Rasch model argue that it has distinct advantages above both CTT-based methods as well as other IRT models (Bond & Fox, 2007; Embretson & Reise, 2000; Furr & Bacharach, 2013; Hambleton & Jones, 1993) because of the principle of monotonicity, also referred to as specific objectivity, the principle of additivity or double cancellation, which “establishes that two parameters are additively related to a third variable” (Embretson & Reise, 2000, p. 148). In other words, because of the principle of monotonicity, in Rasch modeling, probability of correctly answering an item is the additive function of individuals’ ability, or trait level, and the item’s degree of difficulty. As ability increases, so does an individual’s probability of answering that item. Because only item difficulty and person ability affect an individual’s chance of correctly answering an item, inter-individual comparisons can be made even if individuals did not receive identical items or items of the same difficulty level. This is why Rasch modeling is referred to as a test-free measurement. The purpose of this study was to apply a standard dichotomous Rasch model or partial credit model to the individual items of seven core perceptual, verbal and working memory subtests of the WAIS-IV: Block Design, Matrix Reasoning, Visual Puzzles, Similarities, Vocabulary, Information, Arithmetic Digits Forward, Digits Backward and Digit Sequencing. Results revealed that WAIS-IV subtests fall into one of three categories: optimally ordered, near optimally ordered and sub-optimally ordered. Optimally ordered subtests, Digits Forward and Digits Backward, had no disordered items. Near optimally ordered subtests were those with one to three disordered items and included Digit Sequencing, Arithmetic, Similarities and Block Design. Sub-optimally ordered subtests consisted of Matrix Reasoning, Visual Puzzles, Information and Vocabulary, with the number of disordered items ranging from six to 16. Two major implications of the result of this study were considered: the impact on individuals’ scores and the impact on overall test administration time. While the number of disordered items ranged from 0 to 16, the overall impact on raw scores was deemed minimal. Because of where the disordered items occur in the subtest, most individuals are administered all the items that they would be expected to answer correctly. A one-point reduction in any one subtest is unlikely to significantly affect overall index scores, which are the scores most commonly interpreted in the WAIS-IV. However, if an individual received a one-point reduction across all subtests, this may have a more noticeable impact on index scores. In cases where individuals discontinue before having a chance to answer items that were easier, clinicians may consider testing the limits. While this would have no impact on raw scores, it may provide clinicians with a better understanding of individuals’ true abilities. Based on the findings of this study, clinicians may consider administering only certain items in order to test the limits, based on the items’ difficulty value. This study found that the start point for most subtests is too easy for most individuals. For some subtests, most individuals may be administered more than 10 items that are too easy for them. Other than increasing overall administration time, it is not clear what impact, of any, this has. However, it does suggest the need to reevaluate current start items so that they are the true basal for most people. Future studies should break standard test administration by ignoring basal and ceiling rules to collect data on more items. In order to help clarify why some items are more or less difficult than would be expected given their ordinal rank, future studies should include a qualitative aspect, where, after each subtest, individuals are asked describe what they found easy and difficult about each item. Finally, future research should examine the effects of item ordering on participant performance. While this study revealed that only minimal reductions in index scores likely result from the prematurely stopping test administration, it is not known if disordering has other impacts on performance, perhaps by increasing or decreasing an individual’s confidence. item difficulty item response theory psychometric properties Rasch model WAIS-IV Psychology
250	Assessing the Absolute and Relative Performance of IRTrees Using Cross-Validation and the RORME Index DiTrapani, John B. 03 September 2019 (has links) No description available. Quantitative Psychology Item response theory cross-validation model selection model fit item response tree models quantitative psychology

Search results