Global ETD Search

171	Contributions to Kernel Equating Andersson, Björn January 2014 (has links) The statistical practice of equating is needed when scores on different versions of the same standardized test are to be compared. This thesis constitutes four contributions to the observed-score equating framework kernel equating. Paper I introduces the open source R package kequate which enables the equating of observed scores using the kernel method of test equating in all common equating designs. The package is designed for ease of use and integrates well with other packages. The equating methods non-equivalent groups with covariates and item response theory observed-score kernel equating are currently not available in any other software package. In paper II an alternative bandwidth selection method for the kernel method of test equating is proposed. The new method is designed for usage with non-smooth data such as when using the observed data directly, without pre-smoothing. In previously used bandwidth selection methods, the variability from the bandwidth selection was disregarded when calculating the asymptotic standard errors. Here, the bandwidth selection is accounted for and updated asymptotic standard error derivations are provided. Item response theory observed-score kernel equating for the non-equivalent groups with anchor test design is introduced in paper III. Multivariate observed-score kernel equating functions are defined and their asymptotic covariance matrices are derived. An empirical example in the form of a standardized achievement test is used and the item response theory methods are compared to previously used log-linear methods. In paper IV, Wald tests for equating differences in item response theory observed-score kernel equating are conducted using the results from paper III. Simulations are performed to evaluate the empirical significance level and power under different settings, showing that the Wald test is more powerful than the Hommel multiple hypothesis testing method. Data from a psychometric licensure test and a standardized achievement test are used to exemplify the hypothesis testing procedure. The results show that using the Wald test can provide different conclusions to using the Hommel procedure. observed-score test equating item response theory R equipercentile equating asymptotic standard errors
172	Relationships between Missing Response and Skill Mastery Profiles of Cognitive Diagnostic Assessment Zhang, Jingshun 13 August 2013 (has links) This study explores the relationship between students’ missing responses on a large-scale assessment and their cognitive skill profiles and characteristics. Data from the 48 multiple-choice items on the 2006 Ontario Secondary School Literacy Test (OSSLT), a high school graduation requirement, were analyzed using the item response theory (IRT) three-parameter logistic model and the Reduced Reparameterized Unified Model, a Cognitive Diagnostic Model. Missing responses were analyzed by item and by student. Item-level analyses examined the relationships among item difficulty, item order, literacy skills targeted by the item, the cognitive skills required by the item, the percent of students not answering the item, and other features of the item. Student-level analyses examined the relationships among students’ missing responses, overall performance, cognitive skill mastery profiles, and characteristics such as gender and home language. Most students answered most items: no item was answered by fewer than 98.8% of the students and 95.5% of students had 0 missing responses, 3.2% had 1 missing response, and only 1.3% had more than 1 missing responses). However, whether students responded to items was related to the student’s characteristics, including gender, whether the student had an individual education plan and language spoken at home, and to the item’s characteristics such as item difficulty and the cognitive skills required to answer the item. Unlike in previous studies of large-scale assessments, the missing response rates were not higher for multiple-choice items appearing later in the timed sections. Instead, the first two items in some sections had higher missing response rates. Examination of the student-level missing response rates, however, showed that when students had high numbers of missing responses, these often represented failures to complete a section of the test. Also, if nonresponse was concentrated in items that required particular skills, the accuracy of the estimates for those skills was lower than for other skills. The results of this study have implications for test designers who seek to improve provincial large-scale assessments, and for teachers who seek to help students improve their cognitive skills and develop test taking strategies. 0288
173	Relationships between Missing Response and Skill Mastery Profiles of Cognitive Diagnostic Assessment Zhang, Jingshun 13 August 2013 (has links) This study explores the relationship between students’ missing responses on a large-scale assessment and their cognitive skill profiles and characteristics. Data from the 48 multiple-choice items on the 2006 Ontario Secondary School Literacy Test (OSSLT), a high school graduation requirement, were analyzed using the item response theory (IRT) three-parameter logistic model and the Reduced Reparameterized Unified Model, a Cognitive Diagnostic Model. Missing responses were analyzed by item and by student. Item-level analyses examined the relationships among item difficulty, item order, literacy skills targeted by the item, the cognitive skills required by the item, the percent of students not answering the item, and other features of the item. Student-level analyses examined the relationships among students’ missing responses, overall performance, cognitive skill mastery profiles, and characteristics such as gender and home language. Most students answered most items: no item was answered by fewer than 98.8% of the students and 95.5% of students had 0 missing responses, 3.2% had 1 missing response, and only 1.3% had more than 1 missing responses). However, whether students responded to items was related to the student’s characteristics, including gender, whether the student had an individual education plan and language spoken at home, and to the item’s characteristics such as item difficulty and the cognitive skills required to answer the item. Unlike in previous studies of large-scale assessments, the missing response rates were not higher for multiple-choice items appearing later in the timed sections. Instead, the first two items in some sections had higher missing response rates. Examination of the student-level missing response rates, however, showed that when students had high numbers of missing responses, these often represented failures to complete a section of the test. Also, if nonresponse was concentrated in items that required particular skills, the accuracy of the estimates for those skills was lower than for other skills. The results of this study have implications for test designers who seek to improve provincial large-scale assessments, and for teachers who seek to help students improve their cognitive skills and develop test taking strategies. 0288
174	The Predictive Validity Of Baskent University Proficiency Exam (buepe) Through The Use Of The Three-parameter Irt Model&amp / #8217 / s Ability Estimates Yegin, Oya Perim 01 January 2003 (has links) (PDF) The purpose of the present study is to investigate the predictive validity of the BUEPE through the use of the three-parameter IRT model&amp / #8217 / s ability estimates. The study made use of the BUEPE September 2000 data which included the responses of 699 students. The predictive validity was established by using the departmental English courses (DEC) passing grades of a total number of 371 students. As for the prerequisite analysis the best fitted model of IRT was determined by first, checking the assumptions of IRT / second, by analyzing the invariance of ability parameters and item parameters and thirdly, by interpreting the chi-square statistics. After the prerequisite analyses, the best fitted model&amp / #8217 / s estimates were correlated with DEC passing grades to investigate the predictive power of BUEPE on DEC passing grades. The findings indicated that the minimal guessing assumption of the one- and two-parameter models was not met. In addition, the chi-square statistics indicated a better fit to the three-parameter model. Therefore, it was concluded that the best fitted model was the three-parameter model. The findings of the predictive validity analyses revealed that the best predictors for DEC passing grades were the three-parameter model ability estimates. The second best predictor was the ability estimates obtained from sixty high information items. In the third place BUEPE total scores and the total scores obtained from sixty high information items followed with nearly the same correlation coefficients. Among the three sub-tests, the reading sub-test was found to be the best predictor of DEC passing grades.
175	An irt model to estimate differential latent change trajectories in a multi-stage, longitudinal assessment Shim, Hi Shin 08 April 2009 (has links) Repeated measures designs are widely used in educational and psychological research to compare the changes exhibited in response to a treatment. Traditionally, measures of change are found by calculating difference scores (subtracting the observed initial score from the final score) for each person. However, problems such as the reliability paradox and the meaning of change scores arise from using simple difference scores to study change. A new item response theory model will be presented that estimates latent change scores instead of difference scores, addresses some of the limitations of using difference scores, and provides a direct comparison of the mean latent changes exhibited by different groups (e.g. females versus males). A simulation-based test was conducted to ascertain the viability of the model and results indicate that parameters of the newly developed model can be estimated accurately. Two sets of analyses were performed on the Early Childhood Longitudinal Study-Kindergarten cohort (ECLS-K) to examine differential growth in math ability between 1) male and female students and 2) Caucasian and African American students from kindergarten through fifth grade. Longitudinal assessment Item response theory Psychological tests Psychometrics Educational tests and measurements Psychological tests for children Education Research Psychology Research
176	Avaliação de desempenho de fornecedores em cadeias de suprimentos utilizando a teoria da resposta ao item Santos, Kathyana Vanessa Diniz 28 August 2017 (has links) Submitted by Leonardo Cavalcante (leo.ocavalcante@gmail.com) on 2018-06-11T11:51:35Z No. of bitstreams: 1 Arquivototal.pdf: 3445476 bytes, checksum: 555b1ed01b554f9292c07b7471e3de2f (MD5) / Made available in DSpace on 2018-06-11T11:51:35Z (GMT). No. of bitstreams: 1 Arquivototal.pdf: 3445476 bytes, checksum: 555b1ed01b554f9292c07b7471e3de2f (MD5) Previous issue date: 2017-08-28 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / An adequate selection of suppliers can make a difference in the future of organizations, lowering operating costs, improving product quality and enabling quick responses to customer demands. In the current business context of supply chains, the appropriate choice of suppliers is essential for good management and maintenance and improvement of competitive advantages. Therefore, the objective of this dissertation is to develop a method to evaluate suppliers performance in the context of supply chains, using item response theory (IRT). To achieve this goal, 60 supplier performance aspects, covering seven dimensions (cost, time, quality, flexibility, innovation, reputation/ industry experience and sustainability), were considered in the formulation of a questionnaire with 67 items to evaluate supplier performance. The questionnaire made possible the evaluation of 243 supply links of companies across different sectors and 14 Brazilian States. The evaluation results of the 243 supply links were analyzed using IRT’s Graded Response Model (GRM). GRM establishes a difficulty parameter for each category of the presented items, defining levels of performance in an interpretable scale that indicates the aspects served by each evaluated relationship and informs what aspects the company evaluated still needs to evolve in. Depending on what the client company expects and prioritizes in the performance of its suppliers, it can associate scale levels (and the presence and/or absence of certain aspects) with decisions about what to do in the relationship with the supplier company (deepen relationship, request changes in behavior or end relationship, for example). / Uma seleção adequada de fornecedores pode fazer diferença no futuro das organizações, diminuindo custos operacionais, melhorando a qualidade dos produtos e possibilitando respostas rápidas às demandas dos clientes. No atual contexto empresarial de atuação em cadeias, a escolha apropriada de fornecedores é imprescindível para uma boa gestão e para manutenção e melhoria de vantagens competitivas. Assim, o objetivo desta dissertação é desenvolver um método para avaliar desempenho de fornecedores no contexto de cadeias de suprimentos, utilizando a Teoria de Resposta ao Item (TRI). Para atingir tal objetivo, foram levantados 60 aspectos de desempenho de fornecedores, que contemplam sete dimensões: custo, tempo, qualidade, flexibilidade, inovação, reputação/experiência no setor e sustentabilidade. Com base nestes aspectos, foi elaborado um questionário com 67 itens para avaliação do desempenho de fornecedores, que permitiu avaliar 243 elos de fornecimento de empresas respondentes de diversos setores e 14 Estados brasileiros. Estes resultados foram avaliados utilizando o Modelo de Resposta Gradual da TRI que permite estabelecer um parâmetro de dificuldade para cada categoria dos itens apresentados, definindo níveis de desempenho em uma escala interpretável que indica os aspectos atendidos por cada relacionamento avaliado e em quais aspectos ainda é preciso evoluir. A depender do que a empresa cliente espera e prioriza no desempenho de seus fornecedores, esta pode associar os níveis da escala (e a presença e/ou ausência de certos aspectos) a decisões sobre o que fazer no relacionamento com a empresa fornecedora (estreitar laços, solicitar mudanças no comportamento ou cortar relações, por exemplo). Avaliação de Fornecedores Gestão da Cadeia de Suprimentos Teoria da Resposta ao Item Suppliers Evaluation Supply Chain Management Item Response Theory ENGENHARIAS::ENGENHARIA DE PRODUCAO
177	Assessing Dimensionality in Complex Data Structures: A Performance Comparison of DETECT and NOHARM Procedures January 2011 (has links) abstract: The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in compensatory and noncompensatory multidimensional item response models (MIRT) of assessment data using dimensionality assessment procedures based on conditional covariances (i.e., DETECT) and a factor analytical approach (i.e., NOHARM). The DETECT-based methods typically outperformed the NOHARM-based methods in both two- (2D) and three-dimensional (3D) compensatory MIRT conditions. The DETECT-based methods yielded high proportion correct, especially when correlations were .60 or smaller, data exhibited 30% or less complexity, and larger sample size. As the complexity increased and the sample size decreased, the performance typically diminished. As the complexity increased, it also became more difficult to label the resulting sets of items from DETECT in terms of the dimensions. DETECT was consistent in classification of simple items, but less consistent in classification of complex items. Out of the three NOHARM-based methods, χ2G/D and ALR generally outperformed RMSR. χ2G/D was more accurate when N = 500 and complexity levels were 30% or lower. As the number of items increased, ALR performance improved at correlation of .60 and 30% or less complexity. When the data followed a noncompensatory MIRT model, the NOHARM-based methods, specifically χ2G/D and ALR, were the most accurate of all five methods. The marginal proportions for labeling sets of items as dimension-like were typically low, suggesting that the methods generally failed to label two (three) sets of items as dimension-like in 2D (3D) noncompensatory situations. The DETECT-based methods were more consistent in classifying simple items across complexity levels, sample sizes, and correlations. However, as complexity and correlation levels increased the classification rates for all methods decreased. In most conditions, the DETECT-based methods classified complex items equally or more consistent than the NOHARM-based methods. In particular, as complexity, the number of items, and the true dimensionality increased, the DETECT-based methods were notably more consistent than any NOHARM-based method. Despite DETECT's consistency, when data follow a noncompensatory MIRT model, the NOHARM-based method should be preferred over the DETECT-based methods to assess dimensionality due to poor performance of DETECT in identifying the true dimensionality. / Dissertation/Thesis / Ph.D. Educational Psychology 2011 Educational Tests and Measurements Educational Psychology compensatory complex data structures dimensionality assessment multidimensional item response theory noncompensatory
178	Assessment of Item Parameter Drift of Known Items in a University Placement Exam January 2012 (has links) abstract: ABSTRACT This study investigated the possibility of item parameter drift (IPD) in a calculus placement examination administered to approximately 3,000 students at a large university in the United States. A single form of the exam was administered continuously for a period of two years, possibly allowing later examinees to have prior knowledge of specific items on the exam. An analysis of IPD was conducted to explore evidence of possible item exposure. Two assumptions concerning items exposure were made: 1) item recall and item exposure are positively correlated, and 2) item exposure results in the items becoming easier over time. Special consideration was given to two contextual item characteristics: 1) item location within the test, specifically items at the beginning and end of the exam, and 2) the use of an associated diagram. The hypotheses stated that these item characteristics would make the items easier to recall and, therefore, more likely to be exposed, resulting in item drift. BILOG-MG 3 was used to calibrate the items and assess for IPD. No evidence was found to support the hypotheses that the items located at the beginning of the test or with an associated diagram drifted as a result of item exposure. Three items among the last ten on the exam drifted significantly and became easier, consistent with item exposure. However, in this study, the possible effects of item exposure could not be separated from the effects of other potential factors such as speededness, curriculum changes, better test preparation on the part of subsequent examinees, or guessing. / Dissertation/Thesis / M.A. Educational Psychology 2012 Educational psychology Educational tests & measurements BILOG Item parameter drift Item recall Item response theory Known items
179	Teoria e avaliação da personalidade psicopática : construção e evidências de validade de um instrumento de autorrelato para uso na população geral Hauck Filho, Nelson January 2013 (has links) O objetivo central da presente tese de doutorado foi a construção e a análise psicométrica de um instrumento de autorrelato para avaliar traços de psicopatia na população geral. Para fundamentar a proposta do instrumento de avalição, foram defendidos três argumentos. O primeiro deles é que embora o problema da medida dos atributos psicológicos ainda se encontre aberto à investigação teórica e empírica, instrumentos psicométricos são recursos adequados para obter informação psicológica. O segundo é que o autorrelato é um método legítimo de avaliação da psicopatia em situações em que não há incentivos explícitos para distorções nas respostas. O terceiro, por sua vez, é que um modelo dimensional é mais coerente com a literatura teórica e empírica da psicopatia do que uma perspectiva taxônica ou categórica. A construção e a análise psicométrica do instrumento contaram com uma coleta piloto com estudantes de graduação (N = 224), uma coleta online com universitários e indivíduos da população geral (N = 1238) e uma segunda coleta com estudantes de graduação (N = 12). Uma combinação de análises fatoriais exploratórias e confirmatórias ordinais e dois modelos de Teoria de Resposta ao Item evidenciaram excelentes propriedades psicométricas para um conjunto de 60 itens. Esses itens se mostraram distribuídos em três escalas (Insociabilidade, Descontrole e Audácia) e 10 subescalas (Tendências Antissociais, Dependência de Recompensas, Baixo Autocontrole, Dominância Social, Déficits Emocionais, Narcisismo Patológico, Exploração Interpessoal, Cinismo, Despreocupação e Intrepidez). As escalas e subescalas construídas se correlacionaram com as variáveis ruminação, Behavioral Inhibition System, Behavioral Approach System, afetos positivos, afetos negativos e com dois instrumentos de avaliação da psicopatia, a Levenson Self-Report Psychopathy scale e a Psychopathy Checklist-Revised. Além disso, construiu-se um sistema quantitativo-qualitativo de interpretação dos escores produzidos pelo instrumento. O trabalho oferece um recurso metodológico gratuito, com excelentes propriedades psicométricas, para suprir às necessidades de pesquisadores e profissionais de avaliação de traços de psicopatia que trabalham com indivíduos da população geral. / The aim of the present doctoral thesis was to develop and analyze the psychometric properties of a self-report instrument of psychopathic traits devised for use with nonforensic and noncriminal populations. In order to provide a theoretical framework for proposing the instrument, I developed three main arguments. First, although the psychological measurement problem remains as an open issue to theoretical an empirical scrutiny, psychometric instruments comprise adequate means for obtaining psychological information. Second, empirical studies support self-report method as a useful, reliable psychological resource for assessing psychopathic traits when individuals have no incentives for faking responses. Third, evidences favor a dimensional model of psychopathy as a more plausible theoretical perspective than a taxonic, categorical model. Three distinct samples were employed for constructing and evaluating the self-report instrument: a pilot sample of undergraduate students (N = 224), a large sample of university students and other individuals from the general population (1,238) and a sample of undergraduate students (N = 12). I analyzed data combining categorical exploratory and confirmatory factor analyses and two Item Response Theory models, which suggested retaining 60 items with excellent psychometric properties. These items comprised three scales (Meanness, Boldness and Disinhibition) and 10 subscales (Antisocial Tendencies, Reward Dependence, Low Self-Control, Social Dominance, Emotional Deficits, Pathological Narcissism, Interpersonal Exploitation, Cynism, Unconcern and Fearlessness). Scales and subscales correlated with rumination, behavioral inhibition, behavioral approach, positive and negative affects and scores on other instruments for assessing psychopathy, namely, Levenson Self-Report Psychopathy scale and Psychopathy Checklist-Revised. Furthermore, I devised a quantitative-qualitative system to help interpreting raw scores on the instrument. The present work offers a free self-report method with excellent psychometric properties to assist Brazilian researchers and professionals in dealing with the assessment of psychopathy among individuals from the general population. Personalidade anti-social Análise fatorial Teoria de resposta ao item Psicometria Psychopathy Psychometrics Item response theory Factor analysis
180	Avaliação da proficiência em inglês acadêmico através de um teste adaptativo informatizado / Assessment of proficiency in academic English through an adaptive computerized test Vanessa Rufino da Silva 09 April 2015 (has links) Este trabalho descreve as etapas de transformação de um exame de proficiência em inglês acadêmico, aplicado via lápis-e-papel, com itens de múltipla escolha administrados segundo o método de Medida de Probabilidade Admissível (Shuford Jr et al., 1966), utilizado no programa de pós-graduação do Instituto de Ciências Matemáticas e de Computação da Universidade de São Paulo (ICMC-USP), em um teste adaptativo informatizado (TAI-PI) baseado em um modelo da Teoria de Resposta ao Item (TRI). Apesar do programa aceitar diversos exames que atestam a proficiência em inglês para indivíduos não-nativos de abrangência e reconhecimento internacionais, como o TOEFL (Test of English as a Foreign Language), IELTS (International English Language Testing System) e CPE (Certicate of Proficiency in English), por exemplo, a sua obrigatoriedade é incoerente em universidades públicas do Brasil devido ao custo que varia de 200 a 300 dólares por exame. O software TAI-PI (Teste Adaptativo Informatizado para Proficiência em Inglês), que foi desenvolvido em Java e SQLite, será utilizado para a avaliação da proficiência em inglês dos alunos do programa a partir do segundo semestre de 2013, de forma gratuita. A metodologia estatística implementada foi definida considerando a história e objetivos do exame e adotou o modelo de resposta gradual unidimensional de Samejima (Samejima, 1969), o critério de Kullback-Leibler para seleção de itens, o método de estimação da esperança a posteriori para os traços latentes (Baker, 2001) e a abordagem Shadow test (Van der Linden e Pashley, 2010) para imposição de restrições (de conteúdo e tamanho da prova) na composição do teste de cada indivíduo. Uma descrição da estrutura do exame, dos métodos empregados, dos resultados das aplicações do TAI-PI a alunos de pós-graduação do ICMC e estudos de classificação dos alunos em aprovados e reprovados, são apresentados neste trabalho, evidenciando a boa qualidade da nova proposta adotada e aprimoramento do exame com a utilização dos métodos de TRI e TAI. / This work describes the steps for converting a linear paper-and-pencil English proficiency test for academic purposes, composed with multiple choice items that are administered following the admissible probability measurement procedure (Shuford Jr et al., 1966), adopted by the graduate program of Institute of Mathematical Sciences and Computing of University of São Paulo (ICMCUSP), Brazil, to a computerized adaptive test (TAI-PI) based on an item response theory model (IRT). Despite the Institute recognizes reliable international English-language exams for academic purposes and non-native speakers, as TOEFL (Test of English as a Foreign Language), IELTS (International English Language Testing System) and CPE (Cambridge English: Proficiency), for instance, it is inconsistent that public universities in Brazil require them as certification because of the cost of approximately US$ 200.00 to US$ 300.00 per exam. The software TAI-PI (computerized adaptive test for English proficiency) was implemented in Java language, used SQLite as database engine, and it shall be offered free of charge for English proficiency assessment of the graduate students from October 2013. The statistical methodology employed for TAI-PI construction was defined considering the history and the aims of the evaluation and adopted the Samejima\'s graded response model (Samejima, 1969), the Kullback-Leibler information criterion for item selection, the expected a posteriori Bayesian estimation for latent trait (Baker, 2001) and shadow test approach (Van der Linden e Pashley, 2010) for test constraints (content and size of the test, for example). A description of the test design, the employed statistical methods, study results of a real application of TAI-PI to graduate students are presented in this work and the validation studies of the new methodology for pass/fail classification, highlighting the good quality of the new evaluation system and examination of improvement with the use of the methods of IRT and CAT. Shadow test Teoria de resposta ao item Teste adaptativo computadorizado Computerized adaptive test Item response theory Shadow test

Search results