Global ETD Search

1	A comparison of computer-based classification testing approaches using mixed-format tests with the generalized partial credit model Kim, Jiseon 03 December 2010 (has links) Classification testing has been widely used to make categorical decisions by determining whether an examinee has a certain degree of ability required by established standards. As computer technologies have developed, classification testing has become more computerized. Several approaches have been proposed and investigated in the context of computer-based classification testing, including: 1) Computerized adaptive test (CAT); 2) Multistage test (MST); 3) Sequential probability ratio test (SPRT), among others. The purpose of this study was to systematically compare the differences in classification decision precision among several testing approaches (i.e., CAT, MST, and SPRT) given three test lengths and three cutoff scores using mixed-format tests based on the generalized partial credit model. The progressive-restricted exposure control procedure and constrained CAT content balancing procedure with test unit types were also incorporated as part of this study. All conditions were evaluated in terms of the classification decision precision and the exposure control property. Overall, this study’s results indicated that all three approaches performed well in terms of classifying people into two categories. The CAT and SPRT approaches produced, on average, comparable results with both performing relatively better than the MST approach in the precision of their classification decision. As the test length increased, the classification decision accuracy generally increased for all approaches; however, the CAT and SPRT approaches yielded more accuracy with the shorter test length. In terms of cutoff scores, predicting classification decision differed according to the location of cutoff scores based on the normal distribution of examinees. In terms of exposure control properties, the progressive-restricted exposure control procedure with the pre-set maximum test unit exposure rate was implemented effectively into the CAT and SPRT approaches. The CAT approach had, on average, a higher proportion of test units with low test unit exposure rates and produced better results in pool utilization rates than the SPRT approach. Finally, the MST approach administered all test units constructed for the panels for each condition. It had, on average, however, a higher proportion of test units with high test unit exposure rates because computations were based only on the proportion of whole test unit pool used for constructing the MST panels. / text Computer-based testing Psychometrics
2	Resource Limited Testing Center Scheduling For a Web-Based Testing Application Graham, Adam J. 01 January 2012 (has links) Testing centers are a useful tool to help instructors deliver computer-based tests, but computers resources are expensive and therefore limited. This paper describes a method by which testing center(s) may use iNetTest, a web-based computer aided testing system, to house and administer exams. The algorithm discussed in this paper makes it possible for instructors to schedule tests for a given time frame while ensuring that enough computer resources will be available to all of the students. The algorithm prevents the testing center from getting overwhelmed with students while attempting to maximize the usage of the valuable computer resources. Computer Based Testing Web-Based Exam Administration Proctoring
3	An examination of computer anxiety related to achievement on paper-and-pencil and computer-based aircraft maintenance knowledge testing of United States Air Force technical training students. McVay, Richard B. 05 1900 (has links) The purpose of this study was to determine whether varying levels of computer anxiety have an effect on computer-based testing of United States Air Force technical training students. The first chapter presents an overview of computer-based testing, defines key terms, and identifies questions addressed in the research. The rationale for conducting this study was that little research had been done in this area. The second chapter contains a review of the pertinent literature related to computer-based testing, computer anxiety, test reliability, validity, and gender differences in computer use. Due to the lack understanding concerning any effects of computer anxiety on computer-based testing, this has been a worthwhile topic to explore, and it makes a significant contribution to the training field. The third chapter describes the qualitative research methodology used to conduct the study. The primary methodology was an analysis of variance comparison for groups of individuals who displayed high or low computer anxiety to their respective mean computer-based or paper-based aircraft maintenance knowledge testing scores. The research population consisted of United States Air Force aircraft maintenance craftsmen students attending training at Sheppard Air Force Base, Texas. The fourth chapter details the findings of the study. The findings indicate that there was no significant difference between the groups of students rated with high computer anxiety and low computer anxiety while testing with computers. Additionally, no significant differences were detected while testing alternative hypotheses covering differences between groups of students rated with high computer anxiety and low computer anxiety testing by traditional paper-and pencil methods. Finally, a reference section identifying the literature used in the preparation of this dissertation is also included. Anxiety. Educational tests and measurements. Computer-based testing computer anxiety military training
4	Examinees' Perceptions of the Physical Aspects of the Testing Environment During the National Physical Therapy Examination Donald, Ellen Kroog 04 July 2016 (has links) Despite the increasing number of individuals taking computer-based tests, little is known about how examinees perceive computer-based testing environments and the extent to which these testing environments are perceived to affect test performance. The purpose of the present study was to assess the testing environment as perceived by individuals taking the National Physical Therapy Examination (NPTE), a high-stakes licensure examination. Perceptions of the testing environments were assessed using an examinee self-report questionnaire. The questionnaire included items that measured individuals’ preference and perception of specific characteristics of the environment, along with demographic information and one open-ended item. Questionnaires were distributed by email to the 210 accredited physical therapy programs at the time, encouraging programs to forward the instrument by email to the most recent class of physical therapy graduates. Two hundred and sixteen respondents completed the study, representing 101 testing centers in 31 states. Data from these 216 examinees were used to answer four research questions. The first research question focused on the examinees’ environmental preferences for the NPTE testing environment and the relation between these preferences and examinees’ background characteristics (e.g., sex, program GPA, age, online experience, online testing experience, comfort level with online testing, and preferred testing time). A clear preference toward one end of the scale was observed for preferring a quiet room and a desktop area that had a great deal of adjustability. Examinees’ preferences and their demographic characteristics were not strongly related with the seven demographic variables accounting for < 7% of the variability in examinees’ environmental preferences. The second research question used the data from multiple examinees nested within the same testing center to examine the within- and between-center variability in examinees’ perceptions of the testing environment and their satisfaction with the environment. Results indicated that the majority of the variance in these variables was within testing centers with average between-center variability equal to .032 for the perception ratings and .078 for the satisfaction ratings. Research questions (RQ) three and four explored whether examinees’ background characteristics (RQ 3) and center characteristics (RQ 4) were significantly related to the 12 environmental perception ratings, 12 satisfaction ratings, and two items representing examinees’ perceptions of the effect of the testing environment on their performance and the likelihood they would choose the same center again. In terms of examinee characteristics, age, online testing experience, and comfort with online testing were the most consistent predictors of the various examinee ratings. The most consistent predictors for the satisfaction ratings were examinees’ online test comfort, online test experience, and age. For center characteristics, the newness of the center and the room density of the center were the most consistent predictors of examinee ratings. For satisfaction ratings, the most consistent predictor was the newness of the center. Center newness was significantly related to the outcome variables related to the size, lighting and sound of the center which may reflect changes in building standards and materials. The results of the study suggest the need for further exploration of the environmental and human factors that may impact individuals taking high stakes examinations in testing centers. Although there may not be an effect on all examinees, there may be subsets of individuals who are more sensitive to the effects of the testing environment on performance. Further exploration of the uniformity of testing environments is also needed to minimize error and maximize potential threats to test security. testing environment human factors computer-based testing
5	Examining the Comparative Measurement Value of Technology-Enhanced Items: Moncaleano, Sebastian January 2021 (has links) Thesis advisor: Michael Russell / The growth of computer-based testing over the last two decades has motivated the creation of innovative item formats. It is often argued that technology-enhanced items (TEIs) provide better measurement of test-takers’ knowledge, skills, and abilities by increasing the authenticity of tasks presented to test-takers (Sireci & Zenisky, 2006). Despite the popularity of TEIs in operational assessments, there remains little psychometric research on these innovative item formats. Claims regarding their potential to provide better measurement are seldomly explored. This dissertation adds to this limited body of research by developing theory and proposing a methodology to compare TEIs to traditional item formats. This study investigated how to judge the comparative measurement value (CMV) of two drag-and-drop technology-enhanced formats (classification and rank-ordering) relative to stem-equivalent multiple-choice items. Items were administered to a sample of adults and results were calibrated using a 2-parameter logistic IRT model. Moreover, the utility of the TEIs was evaluated according to the TEI Utility Framework (Russell, 2016). Four indicators were identified as the most valuable characteristics to judge CMV and then combined into a hierarchical decision protocol. When applied, this protocol provides a CMV judgment and a recommendation of the preferred item format. Applying the protocol to the items revealed that most TEIs examined in this study showed decreased CMV, indicating that in a real-life scenario the multiple-choice format would be favored for most of these item pairs. Recommendations for the use of the CMV protocol and directions of future related research are discussed. / Thesis (PhD) — Boston College, 2021. / Submitted to: Boston College. Lynch School of Education. / Discipline: Educational Research, Measurement and Evaluation. assessment comparative measurement value computer-based testing drag-and-drop innovative items technology-enhanced items
6	Using a Computer-adaptive Test Simulation to Investigate Test Coordinators' Perceptions of a High-stakes Computer-based Testing Program Hogan, Tiffany 10 January 2014 (has links) This case study examined the efficiency and precision of computer classification and adaptive testing to elicit responses from test coordinators on implementing a high-stakes computer-based testing. Test coordinators from five elementary schools located in a Georgia school district participated in the study. The school district administered state-made, high-stakes tests using paper and pencil; locally developed tests via the computer or paper and pencil. A post-hoc simulation program, Comprehensive Simulation of Computerized Adatpive Testing, used 586 student item responses to produce results with a variable termination point and classification termination point. Results from the simulation were analyzed and used in the case study to elicit interview responses from test coordinators. The photographs of computer-labs and test schedule documents were collected and analyzed to validate school test coordinators' responses. Test coordinators responded positively to the efficiency and precision of simulation results. Some test coordinators preferred the use of computer-adaptive tests for diagnostic purposes only. Test coordinators experiences focused on the security, the emotions, and the management of testing. The findings of this study will benefit those interested in implementing a high-stakes, computer-based testing program by recommending a simulation study be conducted and feedback by solicited from test coordinators prior to an operational test administration. Computer-adaptive test case study computerized classification test computer-based testing
7	Desafios e perspectivas da implementação computacional de testes adaptativos multidimensionais para avaliações educacionais / Challenges and perspectives of implementation of multidimensional adaptive test for educational assessment Piton Gonçalves, Jean 17 December 2012 (has links) Testes educacionais possibilitam a obtenção de medidas e resultados, a realização de análises e o estabelecimento de objetivos para os processos de ensino e a aprendizagem, além de subsidiarem processos seletivos e políticas públicas. A avaliação de desempenho dos examinados pode considerar uma única ou múltiplas habilidades e/ou competências. Como alternativa para testes via lápis e papel, o Teste Baseado em Computador (CBT) pode compor, aplicar e corrigir testes e produzir estatísticas individuais ou do grupo de examinados automaticamente. Considerando que o examinado possua múltiplas habilidades, o Teste Adaptativo baseado na Teoria de Resposta ao Item Multidimensional (MCAT) mantém a mesma acurácia de um teste tradicional, baseando-se no conhecimento do examinado a partir do histórico de itens anteriormente respondidos. A seleção de itens por Kullback Leibler entre Posteriores Subsequentes (\'K POT. p\') evita selecionar um item difícil para um examinado com baixa habilidade, sugerindo que \'K POT. p\' é um critério aplicável em testes educacionais. A revisão da literatura apontou para: (i) a carência de estudos para o critério \'K POT. P\', (ii) a carência de estudos com MCATs operacionais em contextos educacionais para usuários reais, (iii) a carência de estudos e propostas de critérios iniciais e de parada para MCATs, quando o número de itens administrados pelo teste é variável, e (iv) a ausência de trabalhos brasileiros na área de MCATs. Diante das lacunas apresentadas, esta tese de doutoramento trata da seguinte questão de pesquisa: Qual a abordagem para viabilizar o uso do critério KP em MCATs operacionais para contextos educacionais, que permita que o sistema implementado seja aprovado nos critérios de funcionalidade, confiabilidade, eficiência, manutenibilidade e portabilidade da ISO-9126, que é a base para avaliar testes computadorizados? Os objetivos específicos desta pesquisa foram os seguintes: (i) implementar e validar o critério de seleção \'K POT. P\', comparando-o com o critério bayesiano usual, (ii) propor melhorias e calcular o tempo computacional de processamento da seleção de itens por \'K POT. P\', (iii) propor critérios iniciais consistentes com a realidade e a necessidade das avaliações educacionais, (iv) validar o critério de parada inédito KPIC, quando a intenção é se ter MCATs que administrem um número variável de itens para os examinados, (v) desenvolver uma arquitetura que viabilize a aplicação via Web de MCATs com usuários reais, (vi) discutir aspectos teóricos e metodológicos da nova abordagem CBMAT via prova de conceito, por meio da implementação do sistema MADEPT, que avalia examinados na perspectiva da avaliação diagnóstica, (vii) avaliar o MADEPT de acordo com as normas internacionais de produto de software ISO-9126 e apontar a factibilidade, a viabilidade, as dificuldades, as vantagens e as limitações do desenvolvimento CBMATs para o ambiente Web. A metodologia utilizada para responder a questão de pesquisa foi: (i) organizar e selecionar as teorias, os métodos, os modelos e os resultados inerentes a MCATs, (ii) expandir a equação de \'K POT. P\', (iii) implementar o MCAT contemplando o critério de seleção \'K POT. P\' e a metodologia bayesiana para estimação e seleção de itens, (iv) validar estatisticamente \'K POT. P\' e KPIC, (v) implementar o CBMAT, contemplando o MCAT como um subsistema e (vi) avaliar o CBMAT via ISO-9126. Os resultados deste trabalho são vários: (i) uma ampla revisão da literatura nas teorias/métodos/critérios necessários para a implementação computacional de MCATs, (ii) a reformulação da equação que expressa a seleção por \'K POT. P\' para implementação via linguagem de programação científica, (iii) os estudos de simulações do MCAT quando a seleção de itens é por \'K POT. P\' e o critério de parada por KPIC mostram que \'K POT. P\' é um critério adequado e indicado quando o objetivo é ter um teste com um número baixo e variável de itens administrados, mantendo um vício adequado e com alta acurácia na estimação da habilidade, (iv) o desenvolvimento de algoritmos inéditos para os critérios iniciais, (v) a validação de uma nova arquitetura que viabiliza a aplicação via Web de MCATs com usuários reais e (vi) a implementação e avaliação via ISO-9126 do sistema computacionalWeb MADEPT. Conclui-se que é possível desenvolver uma arquitetura que viabilize a aplicação viaWeb de MCATs com usuários reais, utilizando o critério de seleção \'K POT. P\' e critérios iniciais condizentes com as avaliações educacionais. Quando a intenção é aplicar MCATs em cenários reais, a seleção de itens por \'K POT. P\' combinado com o critério de parada KPIC proporcionam um teste mais curto e com mais acurácia do que aqueles que utilizam a metodologia bayesiana usual, e com um tempo computacional de processamento condizente com as características da abordagem multidimensional / Educational tests provide measures and indicators that enable evaluations and guide the definition of educational goals, besides supporting selection processes and public policies formulation. The evaluation of the examinees performance may consider one or multiple skills and abilities. As an alternative to hand-written tests, the Computer Based Test (CBT) provides the setup, application and correction of tests as well as provide individual and/or collective statistics about the examinees performance. Considering that the examinee has several abilities, the Computer Adaptive Test based on the Multidimensional Item Response Theory (MCAT) keeps the same accuracy of a traditional test, building on the personal knowledge inferred from the track record of responses to previous items. The item selection through Kullback Leibler between Subsequent Posteriors (\'K POT. P\') avoids to select a difficult item for a low ability examinee, suggesting that \'K POT. P\' is a criterion applicable to educational tests. The literature review evidenced: (i) the insufficiency of studies about the \'K POT. P\' criterion; (ii) the insufficiency of studies on operational MCATs in educational contexts for real users; (iii) the shortage of studies and proposals for initial and stop criteria for MCATs, given a variable number of administered items, and (iv) the lack of Brazilian studies in the area of MCATs. To bridge these gaps, this doctoral thesis addresses the following research question: What is the approach that enables to employ the \'K POT. P\' criterion in operational MCATs for educational contexts, ensuring that the implemented system be in accordance with the functionality, reliability, efficiency, maintainability and portability criteria of ISO-9126 (which is the base for computer based tests evaluation)? The specific objectives of this research are to: (i) implement and validate the \'K POT. P\' selection criterion, comparing it to the usual Bayesian criterion; (ii) propose improvements and calculate the computational time for item selection processing through \'K POT. P\'; (iii) propose initial criteria consistent with the reality and the need of educational evaluation; (iv) validate the novel stop criterion KPIC, aiming at MCATs that administer a variable number of items for the examinees; (v) develop an architecture that enables the application of MCATs via web to real users; (vi) discuss theoretic and methodological issues related to the new CBMAT via proof-of-concept, implementing the MADEPT, which evaluates the examinees under the perspective of the diagnostic evaluation; (vii) evaluateMADEPT according to the international standards software ISO-9126 and point out feasibility, viability, difficulties, advantages and limitations of CBMATs development for web environment. The methodology used to answer the research question was to: (i) organize and select the theories, the methods, the models and results inherent to MCATs; (ii) rewrite the equation of \'K POT. P\'; (iii) implement the MCAT considering the \'K POT. P\' selection criterion and the Bayesian methodology for item estimation and selection (iv) validate \'K POT. P\' and KPIC statistically; (v) implement CBMAT, considering MCAT as a subsystem and (vi) evaluate CBMAT according to ISO-9126. This research has many results: (i) it presents a broad literature review regarding theories/methods/criteria for MCATs computational implementation; (ii) it rewrites in a scientific programming language the equation that expresses the selection through \'K POT. P\'; (iii) it shows, through MCAT simulations, that \'K POT. P\' is a criterion adequate and indicated for tests with a small and variable number of administered items, using \'K POT. P\' for item selection and KPIC as stop criterion; (iv) it develops novel algorithms for initial criteria; (v) it validates a new architecture to enable the application of MCATs via Web to real users; (vi) it implements and evaluates the web computational system MADEPT according to ISO-9126. We conclude that it is possible to develop an architecture that enables the application of MCATs via web to real users, using \'K POT. P\' selection criterion and initial criteria consistent with the educational evaluation. If the aim is to apply MCATs in real scenarios, the item selection through \'K POIT. \'P associated with the stop criterion KPIC provide a shorter and more accurate test in comparison to those using bayesian methodology. Moreover, its processing computational time is in line with the features of the multidimensional approach Adaptive test Avaliação educacional Computer-based testing Educational assessment Multidimensional adaptive test Teste adaptativo Teste adaptativo multidimensional Teste computadorizado
8	Desafios e perspectivas da implementação computacional de testes adaptativos multidimensionais para avaliações educacionais / Challenges and perspectives of implementation of multidimensional adaptive test for educational assessment Jean Piton Gonçalves 17 December 2012 (has links) Testes educacionais possibilitam a obtenção de medidas e resultados, a realização de análises e o estabelecimento de objetivos para os processos de ensino e a aprendizagem, além de subsidiarem processos seletivos e políticas públicas. A avaliação de desempenho dos examinados pode considerar uma única ou múltiplas habilidades e/ou competências. Como alternativa para testes via lápis e papel, o Teste Baseado em Computador (CBT) pode compor, aplicar e corrigir testes e produzir estatísticas individuais ou do grupo de examinados automaticamente. Considerando que o examinado possua múltiplas habilidades, o Teste Adaptativo baseado na Teoria de Resposta ao Item Multidimensional (MCAT) mantém a mesma acurácia de um teste tradicional, baseando-se no conhecimento do examinado a partir do histórico de itens anteriormente respondidos. A seleção de itens por Kullback Leibler entre Posteriores Subsequentes (\'K POT. p\') evita selecionar um item difícil para um examinado com baixa habilidade, sugerindo que \'K POT. p\' é um critério aplicável em testes educacionais. A revisão da literatura apontou para: (i) a carência de estudos para o critério \'K POT. P\', (ii) a carência de estudos com MCATs operacionais em contextos educacionais para usuários reais, (iii) a carência de estudos e propostas de critérios iniciais e de parada para MCATs, quando o número de itens administrados pelo teste é variável, e (iv) a ausência de trabalhos brasileiros na área de MCATs. Diante das lacunas apresentadas, esta tese de doutoramento trata da seguinte questão de pesquisa: Qual a abordagem para viabilizar o uso do critério KP em MCATs operacionais para contextos educacionais, que permita que o sistema implementado seja aprovado nos critérios de funcionalidade, confiabilidade, eficiência, manutenibilidade e portabilidade da ISO-9126, que é a base para avaliar testes computadorizados? Os objetivos específicos desta pesquisa foram os seguintes: (i) implementar e validar o critério de seleção \'K POT. P\', comparando-o com o critério bayesiano usual, (ii) propor melhorias e calcular o tempo computacional de processamento da seleção de itens por \'K POT. P\', (iii) propor critérios iniciais consistentes com a realidade e a necessidade das avaliações educacionais, (iv) validar o critério de parada inédito KPIC, quando a intenção é se ter MCATs que administrem um número variável de itens para os examinados, (v) desenvolver uma arquitetura que viabilize a aplicação via Web de MCATs com usuários reais, (vi) discutir aspectos teóricos e metodológicos da nova abordagem CBMAT via prova de conceito, por meio da implementação do sistema MADEPT, que avalia examinados na perspectiva da avaliação diagnóstica, (vii) avaliar o MADEPT de acordo com as normas internacionais de produto de software ISO-9126 e apontar a factibilidade, a viabilidade, as dificuldades, as vantagens e as limitações do desenvolvimento CBMATs para o ambiente Web. A metodologia utilizada para responder a questão de pesquisa foi: (i) organizar e selecionar as teorias, os métodos, os modelos e os resultados inerentes a MCATs, (ii) expandir a equação de \'K POT. P\', (iii) implementar o MCAT contemplando o critério de seleção \'K POT. P\' e a metodologia bayesiana para estimação e seleção de itens, (iv) validar estatisticamente \'K POT. P\' e KPIC, (v) implementar o CBMAT, contemplando o MCAT como um subsistema e (vi) avaliar o CBMAT via ISO-9126. Os resultados deste trabalho são vários: (i) uma ampla revisão da literatura nas teorias/métodos/critérios necessários para a implementação computacional de MCATs, (ii) a reformulação da equação que expressa a seleção por \'K POT. P\' para implementação via linguagem de programação científica, (iii) os estudos de simulações do MCAT quando a seleção de itens é por \'K POT. P\' e o critério de parada por KPIC mostram que \'K POT. P\' é um critério adequado e indicado quando o objetivo é ter um teste com um número baixo e variável de itens administrados, mantendo um vício adequado e com alta acurácia na estimação da habilidade, (iv) o desenvolvimento de algoritmos inéditos para os critérios iniciais, (v) a validação de uma nova arquitetura que viabiliza a aplicação via Web de MCATs com usuários reais e (vi) a implementação e avaliação via ISO-9126 do sistema computacionalWeb MADEPT. Conclui-se que é possível desenvolver uma arquitetura que viabilize a aplicação viaWeb de MCATs com usuários reais, utilizando o critério de seleção \'K POT. P\' e critérios iniciais condizentes com as avaliações educacionais. Quando a intenção é aplicar MCATs em cenários reais, a seleção de itens por \'K POT. P\' combinado com o critério de parada KPIC proporcionam um teste mais curto e com mais acurácia do que aqueles que utilizam a metodologia bayesiana usual, e com um tempo computacional de processamento condizente com as características da abordagem multidimensional / Educational tests provide measures and indicators that enable evaluations and guide the definition of educational goals, besides supporting selection processes and public policies formulation. The evaluation of the examinees performance may consider one or multiple skills and abilities. As an alternative to hand-written tests, the Computer Based Test (CBT) provides the setup, application and correction of tests as well as provide individual and/or collective statistics about the examinees performance. Considering that the examinee has several abilities, the Computer Adaptive Test based on the Multidimensional Item Response Theory (MCAT) keeps the same accuracy of a traditional test, building on the personal knowledge inferred from the track record of responses to previous items. The item selection through Kullback Leibler between Subsequent Posteriors (\'K POT. P\') avoids to select a difficult item for a low ability examinee, suggesting that \'K POT. P\' is a criterion applicable to educational tests. The literature review evidenced: (i) the insufficiency of studies about the \'K POT. P\' criterion; (ii) the insufficiency of studies on operational MCATs in educational contexts for real users; (iii) the shortage of studies and proposals for initial and stop criteria for MCATs, given a variable number of administered items, and (iv) the lack of Brazilian studies in the area of MCATs. To bridge these gaps, this doctoral thesis addresses the following research question: What is the approach that enables to employ the \'K POT. P\' criterion in operational MCATs for educational contexts, ensuring that the implemented system be in accordance with the functionality, reliability, efficiency, maintainability and portability criteria of ISO-9126 (which is the base for computer based tests evaluation)? The specific objectives of this research are to: (i) implement and validate the \'K POT. P\' selection criterion, comparing it to the usual Bayesian criterion; (ii) propose improvements and calculate the computational time for item selection processing through \'K POT. P\'; (iii) propose initial criteria consistent with the reality and the need of educational evaluation; (iv) validate the novel stop criterion KPIC, aiming at MCATs that administer a variable number of items for the examinees; (v) develop an architecture that enables the application of MCATs via web to real users; (vi) discuss theoretic and methodological issues related to the new CBMAT via proof-of-concept, implementing the MADEPT, which evaluates the examinees under the perspective of the diagnostic evaluation; (vii) evaluateMADEPT according to the international standards software ISO-9126 and point out feasibility, viability, difficulties, advantages and limitations of CBMATs development for web environment. The methodology used to answer the research question was to: (i) organize and select the theories, the methods, the models and results inherent to MCATs; (ii) rewrite the equation of \'K POT. P\'; (iii) implement the MCAT considering the \'K POT. P\' selection criterion and the Bayesian methodology for item estimation and selection (iv) validate \'K POT. P\' and KPIC statistically; (v) implement CBMAT, considering MCAT as a subsystem and (vi) evaluate CBMAT according to ISO-9126. This research has many results: (i) it presents a broad literature review regarding theories/methods/criteria for MCATs computational implementation; (ii) it rewrites in a scientific programming language the equation that expresses the selection through \'K POT. P\'; (iii) it shows, through MCAT simulations, that \'K POT. P\' is a criterion adequate and indicated for tests with a small and variable number of administered items, using \'K POT. P\' for item selection and KPIC as stop criterion; (iv) it develops novel algorithms for initial criteria; (v) it validates a new architecture to enable the application of MCATs via Web to real users; (vi) it implements and evaluates the web computational system MADEPT according to ISO-9126. We conclude that it is possible to develop an architecture that enables the application of MCATs via web to real users, using \'K POT. P\' selection criterion and initial criteria consistent with the educational evaluation. If the aim is to apply MCATs in real scenarios, the item selection through \'K POIT. \'P associated with the stop criterion KPIC provide a shorter and more accurate test in comparison to those using bayesian methodology. Moreover, its processing computational time is in line with the features of the multidimensional approach Avaliação educacional Teste adaptativo Teste adaptativo multidimensional Teste computadorizado Adaptive test Computer-based testing Educational assessment Multidimensional adaptive test
9	Maximizing Information: Applications of Ideal Point Modeling and Innovative Item Design to Personality Measurement Leeson, Heidi Vanessa January 2008 (has links) Recent research has challenged the way in which personality and attitude constructs are measured. Alternatives have been offered as to how non-cognitive responses are modeled, the mode of delivery used when administrating such scales, and the impact of technology in measuring personality. Thus, the major purpose of the studies in this thesis concerns two interrelated issues of personality research, namely the way personality responses are best modeled, and the most optimal mode by which personality items are presented and associated modal issues. Three studies are presented. First, recent developments using an ideal point approach to scale construction are outlined, and an empirical study compares modeling personality items based on an ideal point approach (generalized graded unfolding model; GGUM) and a dominance approach (graded response model: GRM). Second, an extensive review of literature pertaining to the mode effect when transferring paper-and-pencil measures to screen was conducted, in addition to a review of the various types of computerized and innovative items and their associated psychometric information. Finally, nine innovative items were developed using various multimedia features (e.g., video, graphics, and audio) to ascertain the advantages of these methods to present items constructed to elicit response behavior underlying ideal point approaches, namely, typical response behavior. It was found that the dominance IRT model continued to produce superior model-data fit for most items, more attention needs to be placed on developing principles for constructing ideal point type items, the web-based version supplied 20% more construct information than the paper version, and innovative items seem to provide more data-model fit for students with lower personality attributes. While the innovative items may require more initial outlay in terms of time and development costs, they have the capacity to provide more information regarding test-takers’ personality levels, potentially using fewer items. Ideal Point Theory Innovative items Web-based testing Computer-based testing Personality measurement Psychometrics
10	Maximizing Information: Applications of Ideal Point Modeling and Innovative Item Design to Personality Measurement Leeson, Heidi Vanessa January 2008 (has links) Recent research has challenged the way in which personality and attitude constructs are measured. Alternatives have been offered as to how non-cognitive responses are modeled, the mode of delivery used when administrating such scales, and the impact of technology in measuring personality. Thus, the major purpose of the studies in this thesis concerns two interrelated issues of personality research, namely the way personality responses are best modeled, and the most optimal mode by which personality items are presented and associated modal issues. Three studies are presented. First, recent developments using an ideal point approach to scale construction are outlined, and an empirical study compares modeling personality items based on an ideal point approach (generalized graded unfolding model; GGUM) and a dominance approach (graded response model: GRM). Second, an extensive review of literature pertaining to the mode effect when transferring paper-and-pencil measures to screen was conducted, in addition to a review of the various types of computerized and innovative items and their associated psychometric information. Finally, nine innovative items were developed using various multimedia features (e.g., video, graphics, and audio) to ascertain the advantages of these methods to present items constructed to elicit response behavior underlying ideal point approaches, namely, typical response behavior. It was found that the dominance IRT model continued to produce superior model-data fit for most items, more attention needs to be placed on developing principles for constructing ideal point type items, the web-based version supplied 20% more construct information than the paper version, and innovative items seem to provide more data-model fit for students with lower personality attributes. While the innovative items may require more initial outlay in terms of time and development costs, they have the capacity to provide more information regarding test-takers’ personality levels, potentially using fewer items. Ideal Point Theory Innovative items Web-based testing Computer-based testing Personality measurement Psychometrics

Search results