Spelling suggestions: "subject:"itemresponse theory (IRT)"" "subject:"chemoresponse theory (IRT)""
1 |
Ability Estimation Under Different Item Parameterization and Scoring ModelsSi, Ching-Fung B. 05 1900 (has links)
A Monte Carlo simulation study investigated the effect of scoring format, item parameterization, threshold configuration, and prior ability distribution on the accuracy of ability estimation given various IRT models. Item response data on 30 items from 1,000 examinees was simulated using known item parameters and ability estimates. The item response data sets were submitted to seven dichotomous or polytomous IRT models with different item parameterization to estimate examinee ability. The accuracy of the ability estimation for a given IRT model was assessed by the recovery rate and the root mean square errors. The results indicated that polytomous models produced more accurate ability estimates than the dichotomous models, under all combinations of research conditions, as indicated by higher recovery rates and lower root mean square errors. For the item parameterization models, the one-parameter model out-performed the two-parameter and three-parameter models under all research conditions. Among the polytomous models, the partial credit model had more accurate ability estimation than the other three polytomous models. The nominal categories model performed better than the general partial credit model and the multiple-choice model with the multiple-choice model the least accurate. The results further indicated that certain prior ability distributions had an effect on the accuracy of ability estimation; however, no clear order of accuracy among the four prior distribution groups was identified due to an interaction between prior ability distribution and threshold configuration. The recovery rate was lower when the test items had categories with unequal threshold distances, were close at one end of the ability/difficulty continuum, and were administered to a sample of examinees whose population ability distribution was skewed to the same end of the ability continuum.
|
2 |
Making Diagnostic Thresholds Less ArbitraryUnger, Alexis Ariana 2011 May 1900 (has links)
The application of diagnostic thresholds plays an important role in the classification of mental disorders. Despite their importance, many diagnostic thresholds are set arbitrarily, without much empirical support. This paper seeks to introduce and analyze a new empirically based way of setting diagnostic thresholds for a category of mental disorders that has historically had arbitrary thresholds, the personality disorders (PDs). I analyzed data from over 2,000 participants that were part of the Methods to Improve Diagnostic Assessment and Services (MIDAS) database. Results revealed that functional outcome scores, as measured by Global Assessment of Functioning (GAF) scores, could be used to identify diagnostic thresholds and that the optimal thresholds varied somewhat by personality disorder (PD) along the spectrum of latent severity. Using the Item response theory (IRT)-based approach, the optimal threshold along the spectrum of latent severity for the different PDs ranged from θ = 1.50 to 2.25. Effect sizes using the IRT-based approach ranged from .34 to 1.55. These findings suggest that linking diagnostic thresholds to functional outcomes and thereby making them less arbitrary is an achievable goal. This study has introduced a new and uncomplicated way to empirically set diagnostic thresholds while also taking into consideration that items within diagnostic sets may function differently. Although purely an initial demonstration meant only to serve as an example, by using this approach, there exists the potential that diagnostic thresholds for all disorders could one day be set on an empirical basis.
|
3 |
Detecting Aberrant Responding on Unidimensional Pairwise Preference Tests: An Application of based on the Zinnes Griggs Ideal Point IRT ModelLee, Philseok 01 January 2013 (has links)
This study investigated the efficacy of the lz person fit statistic for detecting aberrant responding with unidimensional pairwise preference (UPP) measures, constructed and scored based on the Zinnes-Griggs (ZG, 1974) IRT model, which has been used for a variety of recent noncognitive testing applications. Because UPP measures are used to collect both "self-" and "other-" reports, I explored the capability of lz to detect two of the most common and potentially detrimental response sets, namely fake good and random responding. The effectiveness of lz was studied using empirical and theoretical critical values for classification, along with test length, test information, the type of statement parameters, and the percentage of items answered aberrantly (20%, 50%, 100%). We found that lz was ineffective in detecting fake good responding, with power approaching zero in the 100% aberrance conditions. However, lz was highly effective in detecting random responding, with power approaching 1.0 in long-test, high information conditions, and there was no diminution in efficacy when using marginal maximum likelihood estimates of statement parameters in place of the true values. Although using empirical critical values for classification provided slightly higher power and more accurate Type I error rates, theoretical critical values, corresponding to a standard normal distribution, provided nearly as good results.
|
4 |
Algorithms for assessing the quality and difficulty of multiple choice exam questionsLuger, Sarah Kaitlin Kelly January 2016 (has links)
Multiple Choice Questions (MCQs) have long been the backbone of standardized testing in academia and industry. Correspondingly, there is a constant need for the authors of MCQs to write and refine new questions for new versions of standardized tests as well as to support measuring performance in the emerging massive open online courses, (MOOCs). Research that explores what makes a question difficult, or what questions distinguish higher-performing students from lower-performing students can aid in the creation of the next generation of teaching and evaluation tools. In the automated MCQ answering component of this thesis, algorithms query for definitions of scientific terms, process the returned web results, and compare the returned definitions to the original definition in the MCQ. This automated method for answering questions is then augmented with a model, based on human performance data from crowdsourced question sets, for analysis of question difficulty as well as the discrimination power of the non-answer alternatives. The crowdsourced question sets come from PeerWise, an open source online college-level question authoring and answering environment. The goal of this research is to create an automated method to both answer and assesses the difficulty of multiple choice inverse definition questions in the domain of introductory biology. The results of this work suggest that human-authored question banks provide useful data for building gold standard human performance models. The methodology for building these performance models has value in other domains that test the difficulty of questions and the quality of the exam takers.
|
5 |
Assessment of Competencies Among Doctoral Trainees in PsychologyPrice, Samantha 08 1900 (has links)
The recent shift to a culture of competence has permeated several areas of professional psychology, including competency identification, competency-based education training, and competency assessment. A competency framework has also been applied to various programs and specialty areas within psychology, such as clinical, counseling, clinical health, school, cultural diversity, neuro-, gero-, child, and pediatric psychology. Despite the spread of competency focus throughout psychology, few standardized measures of competency assessment have been developed. To the authors' knowledge, only four published studies on measures of competency assessment in psychology currently exist. While these measures demonstrate significant steps in progressing the assessment of confidence, three of these measures were designed for use with individual programs, two of these international (i.e., UK and Taiwan). The current study applied the seminal Competency Benchmarks, via a recently adapted benchmarks form (i.e., Practicum Evaluation form; PEF), to practicum students at the University of North Texas. In addition to traditional supervisor ratings, the present study also involved self-, peer supervisor, and peer supervisee ratings to provide 360-degree evaluations. Item-response theory (IRT) was used to evaluate the psychometric properties of the PEF and inform potential revisions of this form. Supervisor ratings of competency were found to fit the Rasch model specified, lending support to use of the benchmarks framework as assessed by this form. Self- and peer-ratings were significantly correlated with supervisor ratings, indicating that there may be some utility to 360-degree evaluations. Finally, as predicted, foundational competencies were rated as significantly higher than functional competencies, and competencies improved significantly with training. Results of the current study provide clarity about the utility of the PEF and inform our understanding of practicum-level competencies.
|
6 |
An Examination of the Psychometric Properties of the Trauma Inventory for Partners of Sex Addicts (TIPSA)Stokes, Steven Scott 01 July 2017 (has links)
This study examined the psychometric properties of the Trauma Inventory for Partners of Sex Addicts (TIPSA). Using the Nominal Response Model (NRM), I examined several aspects of item and option functioning including discrimination, empirical category ordering, and information. Category Boundary Discrimination (CBD) parameters were calculated to determine the extent to which respondents distinguished between adjacent categories. Indistinguishable categories were collapsed through recoding. Empirically disordered response categories were also collapsed through recoding. Findings revealed that recoding solved some technical functioning issues in some items, and also revealed items (and perhaps option anchors) that were probably poorly conceived initially. In addition, nuisance or error variance was reduced only marginally by recoding, and the relative standing of respondents on the trait continuum remained largely unchanged. Items in need of modification or removal were identified, and issues of content validity were discussed.
|
7 |
A generalized partial credit FACETS model for investigating order effects in self-report personality dataHayes, Heather 05 July 2012 (has links)
Despite its convenience, the process of self-report in personality testing can be impacted by a variety of cognitive and perceptual biases. One bias that violates local independence, a core criterion of modern test theory, is the order effect. In this bias, characteristics of an item response are impacted not only by the content of the current item but also the accumulated exposure to previous, similar-content items. This bias is manifested as increasingly stable item responses for items that appear later in a test. Previous investigations of this effect have been rooted in classical test theory (CTT) and have consistently found that item reliabilities, or corrected item-total score correlations, increase with the item's serial position in the test. The purpose of the current study was to more rigorously examine order effects via item response theory (IRT). To this end, the FACETS modeling approach (Linacre, 1989) was combined with the Generalized Partial Credit model (GPCM; Muraki, 1992) to produce a new model, the Generalized Partial Credit FACETS model (GPCFM). Serial position of an item serves as a facet that contributes to the item response, not only via its impact on an item's location on the latent trait continuum, but also its discrimination. Thus, the GPCFM differs from previous generalizations of the FACETS model (Wang&Liu, 2007) in that the item discrimination parameter is modified to include a serial position effect. This parameter is important because it reflects the extent to which the purported underlying trait is represented in an item score. Two sets of analyses were conducted. First, a simulation study demonstrated effective parameter recovery, though measurements of error were impacted by sample size for all parameters, test length for trait level estimates, and the size of the order effect for trait level estimates, and an interaction between sample size and test length for item discrimination. Secondly, with respect to real self-report personality data, the GPCFM demonstrated good fit as well as superior fit relative to competing, nested models while also identifying order effects in some traits, particularly Neuroticism, Openness, and Agreeableness.
|
8 |
Teoria de resposta ao item aplicada no ENEM / Theory of response to the item applied in the ENEMCosta, Sidney Tadeu Santiago 03 March 2017 (has links)
Submitted by JÚLIO HEBER SILVA (julioheber@yahoo.com.br) on 2017-03-15T17:36:59Z
No. of bitstreams: 2
Dissertação - Sidney Tadeu Santiago Costa - 2017.pdf: 1406618 bytes, checksum: 291719e6f7eaaff496ec405e241ce518 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-03-20T12:39:15Z (GMT) No. of bitstreams: 2
Dissertação - Sidney Tadeu Santiago Costa - 2017.pdf: 1406618 bytes, checksum: 291719e6f7eaaff496ec405e241ce518 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-03-20T12:39:15Z (GMT). No. of bitstreams: 2
Dissertação - Sidney Tadeu Santiago Costa - 2017.pdf: 1406618 bytes, checksum: 291719e6f7eaaff496ec405e241ce518 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2017-03-03 / With the note gotten in the Exame Nacional do Ensino Médio - ENEM the students
can applay the vacant in diverse public institutions of superior education and programs
of the government, for example, the program Universidade para Todos(Prouni) and
the Fundo de Financiamento Estudantil (Fies). The ENEM uses a methodology of
correction of the objective questions called Theory of Reply to the Item - TRI, that
has some aspects that are different of the Classic Theory of the Tests - TCT. The
main factor that determines the result of a citizen in a avaliativo process where if uses
the TCT, is the number of correct answers, while in the TRI, beyond the amount of
rightnesss is basic if to analyze which answers they are correct. The objective of this
work is to explain what it is the TRI and as if it applies this methodology in evaluations
of wide scale.
A historical boarding of the logistic models used by the TRI and the justification
of the existence of each parameter will be made that composes the main equation of
the modeling. To determine each parameter that composes the model of the TRI and
to calculate the final note of each candidate, a procedure of called optimization will be
used Method of Maximum Probability - MMV.
The computational tools in the work had been software R, with packages developed
for application of the TRI and the Visual programming language beginner’s all-purpose
symbolic instruction code to program functions, called as macros, in electronic spread
sheets. / Com a nota obtida no Exame Nacional do Ensino Médio - ENEM os estudantes
podem se candidatar a vagas em diversas instituições públicas de ensino superior e
programas do governo, por exemplo, o programa Universidade para Todos (Prouni)
e o Fundo de Financiamento Estudantil (Fies). O ENEM utiliza uma metodologia
de correção das questões objetivas denominada Teoria de Resposta ao Item - TRI,
que possui vários aspectos que são diferentes da Teoria Clássica dos Testes - TCT.
O principal fator que determina o resultado de um sujeito em um processo avaliativo
onde se utiliza a TCT, é o número de respostas corretas, enquanto na TRI, além da
quantidade de acertos é fundamental se analisar quais respostas estão corretas. O
objetivo deste trabalho é explicar o que é a TRI e como se aplica essa metodologia em
avaliações de larga escala.
Será feita uma abordagem histórica dos modelos logísticos utilizados pela TRI e
a justificativa da existência de cada parâmetro que compõe a equação principal da
modelagem. Para determinar cada parâmetro que compõe o modelo da TRI e calcular
a nota final de cada candidato, será utilizado um procedimento de otimização
denominado Método da Máxima Verossimilhança - MMV.
As ferramentas computacionais no trabalho foram o software R, com pacotes desenvolvidos
para aplicação da TRI e a linguagem de programação Visual Basic para
programar funções, denominadas como macros, em planilhas eletrônicas.
|
9 |
Investigating Parameter Recovery and Item Information for Triplet Multidimensional Forced Choice Measure: An Application of the GGUM-RANK ModelLee, Philseok 07 June 2016 (has links)
To control various response biases and rater errors in noncognitive assessment, multidimensional forced choice (MFC) measures have been proposed as an alternative to single-statement Likert-type scales. Historically, MFC measures have been criticized because conventional scoring methods can lead to ipsativity problems that render scores unsuitable for inter-individual comparisons. However, with the recent advent of classical test theory and item response theory scoring methods that yield normative information, MFC measures are surging in popularity and becoming important components of personnel and educational assessment systems. This dissertation presents developments concerning a GGUM-based MFC model henceforth referred to as the GGUM-RANK. Markov Chain Monte Carlo (MCMC) algorithms were developed to estimate GGUM-RANK statement and person parameters directly from MFC rank responses, and the efficacy of the new estimation algorithm was examined through computer simulations and an empirical construct validity investigation. Recently derived GGUM-RANK item information functions and information indices were also used to evaluate overall item and test quality for the empirical study and to give insights into differences in scoring accuracy between two-alternative (pairwise preference) and three-alternative (triplet) MFC measures for future work. This presentation concludes with a discussion of the research findings and potential applications in workforce and educational setting.
|
10 |
Modelos multidimensionais da teoria de resposta ao item / Multidimensional models of item response theoryTiago de Miranda Fragoso 04 August 2010 (has links)
Avaliaçõs educacionais, de distúrbios psicológicos e da aceitação de um produto no mercado são exemplos de estudos que buscam quantificar um construto de interesse através de questionários compostos por itens de múltipla escolha. A Teoria de Resposta ao Item (TRI) é muito utilizada na análise de dados provenientes da aplicação desses questionários. Há vários modelos da TRI já muito utilizados na prática com tal finalidade, tanto para respostas dicotômicas aos itens (certo/errado, presente/ausente, sim/não), quanto para itens com mais de duas categorias de resposta (nominais ou ordinais). No entanto, a grande maioria supôe que apenas um traço latente é necessário para explicar a probabilidade de resposta ao item (modelos unidimensionais). Como as situações práticas são usualmente caracterizadas por várias aptidões (traços latentes) influenciando a probabilidade de um indivíduo apresentar certa resposta ao item, os modelos multidimensionais são de grande importância. Neste trabalho, após um levantamento bibliográfico dos principais modelos multidimensionais da TRI existentes na literatura, realizou-se um estudo detalhado de um deles: o modelo logístico multidimensional de dois parâmetros. O método de estimação dos parâmetros dos itens por máxima verossimilhança marginal e dos traços latentes por máxima verossimilhança são explicitados assim como a estimação por métodos bayesianos. Todos os métodos foram implementados em R, comparados e aplicados a um conjunto de dados reais para avaliação do Inventário de Depressão de Beck (BDI) e do Exame Nacional do Ensino Médio (ENEM) / Educational evaluations, psychological testing and market surveys are examples of studies aiming to quantify an underlying construct of interest through multiple choice item tests. Item Response Theory (IRT) is a class of models used to analyse such data. There are several IRT models already being used in applied studies to such end, either for dichotomical answers (right/wrong, present/ absent, Yes/No) or for itens with nominal or ordinal answers. However, the large majority of those models make the assumption that only one latent trait is sufficient to explain the probability of a correct answer to an item (unidimensional models). Since many situations in practice are characterized by multiple aptitudes (latent traits) in uencing such probabilities, multidimensional models that take such traits into consideration gain great importance. In the present work, after a thorough review of the litterature regarding multidimensional IRT models, we studied in depth one model: the two parameter multidimensional logistic model for dichotomical items. The marginal maximum likelihood method used to estimate the item parameters and the maximum likelihood method used for the latent traits as well as bayesian methods for parameter estimation were studied, compared, implemented in the R software and then applied to a real dataset to infere depression using the Beck Depression Inventory(BDI)and the Exame Nacional do Ensino Médio (ENEM)
|
Page generated in 0.0625 seconds