Global ETD Search

1	Partial Credit Models for Scale Construction in Hedonic Information Systems Mair, Patrick, Treiblmaier, Horst January 2008 (has links) (PDF) Information Systems (IS) research frequently uses survey data to measure the interplay between technological systems and human beings. Researchers have developed sophisticated procedures to build and validate multi-item scales that measure real world phenomena (latent constructs). Most studies use the so-called classical test theory (CTT), which suffers from several shortcomings. We first compare CTT to Item Response Theory (IRT) and subsequently apply a Rasch model approach to measure hedonic aspects of websites. The results not only show which attributes are best suited for scaling hedonic information systems, but also introduce IRT as a viable substitute that overcomes severall shortcomings of CTT. (author´s abstract) / Series: Research Report Series / Department of Statistics and Mathematics
2	An investigation of the optimal test design for multi-stage test using the generalized partial credit model Chen, Ling-Yin 27 January 2011 (has links) Although the design of Multistage testing (MST) has received increasing attention, previous studies mostly focused on comparison of the psychometric properties of MST with CAT and paper-and-pencil (P&P) test. Few studies have systematically examined the number of items in the routing test, the number of subtests in a stage, or the number of stages in a test design to achieve accurate measurement in MST. Given that none of the studies have identified an ideal MST test design using polytomously-scored items, the current study conducted a simulation to investigate the optimal design for MST using generalized partial credit model (GPCM). Eight different test designs were examined on ability estimation across two routing test lengths (short and long) and two total test lengths (short and long). The item pool and generated item responses were based on items calibrated from a national test consisting of 273 partial credit items. Across all test designs, the maximum information routing method was employed and the maximum likelihood estimation was used for ability estimation. Ten samples of 1,000 simulees were used to assess each test design. The performance of each test design was evaluated in terms of the precision of ability estimates, item exposure rate, item pool utilization, and item overlap. The study found that all test designs produced very similar results. Although there were some variations among the eight test structures in the ability estimates, results indicate that the performance overall of these eight test structures in achieving measurement precision did not substantially deviate from one another with regard to total test length and routing test length. However, results from the present study suggest that routing test length does have a significant effect on the number of non-convergent cases in MST tests. Short routing tests tended to result in more non-convergent cases, and the presence of fewer stage tests yielded more of such cases than structures with more stages. Overall, unlike previous findings, the results of the present study indicate that the MST test structure is less likely to be a factor impacting ability estimation when polytomously-scored items are used, based on GPCM. / text Multistage testing Generalized partial credit model Polytomous IRT Test structures Routing test length Educational tests and measurements
3	A comparison of item selection procedures using different ability estimation methods in computerized adaptive testing based on the generalized partial credit model Ho, Tsung-Han 17 September 2010 (has links) Computerized adaptive testing (CAT) provides a highly efficient alternative to the paper-and-pencil test. By selecting items that match examinees’ ability levels, CAT not only can shorten test length and administration time but it can also increase measurement precision and reduce measurement error. In CAT, maximum information (MI) is the most widely used item selection procedure. However, the major challenge with MI is the attenuation paradox, which results because the MI algorithm may lead to the selection of items that are not well targeted at an examinee’s true ability level, resulting in more errors in subsequent ability estimates. The solution is to find an alternative item selection procedure or an appropriate ability estimation method. CAT studies have not investigated the association between these two components of a CAT system based on polytomous IRT models. The present study compared the performance of four item selection procedures (MI, MPWI, MEI, and MEPV) across four ability estimation methods (MLE, WLE, EAP-N, and EAP-PS) under the mixed-format CAT based on the generalized partial credit model (GPCM). The test-unit pool and generated responses were based on test-units calibrated from an operational national test that included both independent dichotomous items and testlets. Several test conditions were manipulated: the unconstrained CAT as well as the constrained CAT in which the CCAT was used as the content-balancing, and the progressive-restricted procedure with maximum exposure rate equal to 0.19 (PR19) served as the exposure control in this study. The performance of various CAT conditions was evaluated in terms of measurement precision, exposure control properties, and the extent of selected-test-unit overlap. Results suggested that all item selection procedures, regardless of ability estimation methods, performed equally well in all evaluation indices across two CAT conditions. The MEPV procedure, however, was favorable in terms of a slightly lower maximum exposure rate, better pool utilization, and reduced test and selected-test-unit overlap than with the other three item selection procedures when both CCAT and PR19 procedures were implemented. It is not necessary to implement the sophisticated and computing-intensive Bayesian item selection procedures across ability estimation methods under the GPCM-based CAT. In terms of the ability estimation methods, MLE, WLE, and two EAP methods, regardless of item selection procedures, did not produce practical differences in all evaluation indices across two CAT conditions. The WLE method, however, generated significantly fewer non-convergent cases than did the MLE method. It was concluded that the WLE method, instead of MLE, should be considered, because the non-convergent case is less of an issue. The EAP estimation method, on the other hand, should be used with caution unless an appropriate prior θ distribution is specified. / text Keyword 1 Computerized adaptive testing Keyword 2 Item selection procedure Keyword 3 Ability estimation method Keyword 4 Generalized partial credit model
4	Bewertungskompetenz im Physikunterricht: Entwicklung eines Messinstruments zum Themenfeld Energiegewinnung, -speicherung und -nutzung / Decision-making competencies and Physics education: Development of a questionnaire in the context of generation, storage and use of electric energy Sakschewski, Mark 30 October 2013 (has links) Die vorliegende Studie diskutiert die Entwicklung eines Testinstruments zur Messung von Bewertungskompetenz im Sinne der Teilkompetenz Bewerten, Entscheiden und Reﬂektieren (BER) innerhalb des Göttinger Modells der Bewertungskompetenz im Kontext nachhaltiger Entwicklung (Bögeholz 2011) für das Unterrichtsfach Physik in der Sekundarstufe. Die ausgewählten Aufgabenkontexte beschreiben die Erzeugung, die Speicherung und die Nutzung elektrischer Energie. Sie schließen damit auch an die aktuelle gesellschaftliche Diskussion um Erneuerbare Energien an und untersuchen diesbezüglich das Entscheidungsvermögen und die Bewertungskompetenz heutiger Schülerinnen und Schüler. Die Einsatzfähigkeit des in dieser Studie entwickelten Testinstruments wurde zunächst im Rahmen zweier Vorstudien überprüft, bevor die Haupterhebung als Querschnittstudie in den Jahrgängen 6, 8, 10 und 12 erfolgte (N = 850 Schülerinnen und Schüler an Gymnasien). Nach dem Ansatz von Eggert (2008), Eggert und Bögeholz (2006, 2010) ist es dabei als paper-and-pencil -Test konzipiert und beinhaltet zwei Entscheidungsaufgaben und eine Reﬂexionsaufgabe. Die empirisch gewonnenen Daten wurden zunächst anhand eines entwickelten Scoring Guides codiert und anschließend sowohl unter Gesichtspunkten der Klassischen als auch der Probabilistischen Testtheorie ausgewertet. Das entwickelte Testinstrument hat sich unter Reliabilitäts- und Validitätsaspekten bewährt. Item-Fit-Parameter zeigen, dass sich die empirischen Daten gut in einem eindimensionalen Rasch-Partial-Credit-Modell abbilden lassen. Unter anderem konnten Zusammenhänge von BER mit dem Schulalter der Schülerinnen und Schüler nachgewiesen werden. Geringe Korrelationen von BER bestehen zu verschiedenen Schulnoten (u. a. zu Deutsch, Mathematik, Politik und Physik in der Klasse 10), zudem wird das Testergebnis für BER kaum von Lesekompetenzen beeinﬂusst. <p><p> Externer Link zum Testinstrument: http://dx.doi.org/10.7477/39:41:17 530 Physik (PPN621336750) Bewertungskompetenz Physikunterricht Energie Rasch Partial Credit Modell Socioscientific Issues Decision-making competence Socioscientific issues Physics education Energy Rasch partial credit model
5	A GLM framework for item response theory models. Reissue of 1994 Habilitation thesis. Hatzinger, Reinhold January 2008 (has links) (PDF) The aim of the monograph is to contribute towards bridging the gap between methodological developments that have evolved in the social sciences, in particular in psychometric research, and methods of statistical modelling in a more general framework. The first part surveys certain special psychometric models (often referred to as Rasch family of models) that share common properties: separation of parameters describing qualities of the subject under investigation and parameters related to properties of the situation under which the response of a subject is observed. Using conditional maximum likelihood estimation, both types of parameters may be estimated independently from each other. In particular, the Rasch model, the rating scale model, the partial credit model, hybrid types, and linear extensions thereof are treated. The second part reviews basic ideas of generalized linear models (GLMs) as an an excellent framework for unifying different approaches and providing a natural, technical background for model formulation, estimation and testing. This is followed by a short introduction to the software package GLIM chosen to illustrate the formulation of psychometric models in the GLM framework. The third part is the main part of this monograph and shows the application of generalized linear models to psychometric approaches. It gives a unified treatment of Rasch family models in the context of log-linear models and contains some new material on log-linear longitudinal modelling. The last part of the monograph is devoted to show the usefulness of the latent variable approach in a variety of applications, such as panel, cross-over, and therapy evaluation studies, where standard statistical analysis does not necessarily lead to satisfactory results. (author´s abstract) / Series: Research Report Series / Department of Statistics and Mathematics
6	Development of a working memory test for the German Bundeswehr’s online assessment Nagler-Nitzschner, Ursa 09 March 2021 (has links) Wie die meisten westlichen Streitkräfte, bewegt sich die Bundeswehr im Spannungsfeld zwischen hohem Personalbedarf und Fachkräftemangel. Durch ein Onlineassessment kann der Bewerbungsprozess dahingehend optimiert werden, dass fähiges Personal schneller gebunden wird. Onlineassessment hat diverse Vorteile, gleichzeitig sind damit jedoch Herausforderungen verbunden. Die wahrscheinlich größte ist es, Betrug zu minimieren, da Onlineassessment in einer weitestgehend unkontrollierten Umgebung stattfindet. Zur Entgegnung dieser Problematik dienen verschiedene Ansätze, wie beispielsweise große Itempools, wodurch einer Verbreitung der Lösung im Internet entgegengewirkt werden kann. Dieser Ansatz ist jedoch mit hohen Kosten verbunden. Automatische Itemgenerierung hingegen ermöglicht es, kostengünstig und zeiteffizient psychometrisch hochwertige Items zu erstellen. Aus diesem Grund wurden in der vorliegenden Arbeit zwei Arbeitsgedächtnistests mit automatischer Itemgenerierung für das Onlineassessment der Bundeswehr entwickelt und evaluiert, mit dem Ziel einer hohen prädiktiven Validität auf die Diagnostik vor Ort. In der ersten Studie (N = 330) wurde gezeigt, dass automatische Itemgenerierung für die entwickelten Arbeitsgedächtnistests eingesetzt werden kann. Hierbei wurden zudem zwei verschiedene zeitliche Varianten untersucht, wobei sich diejenige mit der längeren Stimulusrepräsentationszeit als vorteilhafter erwies. In der zweiten Studie (N = 621) wurden Nachweise für Reliabilität und Validität erbracht. Die Tests zeigten eine gute konvergente und diskriminante Validität. Zudem konnte einer der beiden Tests eine sehr gute prädiktive Validität aufweisen. Unter Gesamtberücksichtigung der Testgütekriterien wurde dieser Test schließlich für das Onlineassessment der Bundeswehr vorgeschlagen. Somit steht der Bundeswehr nun ein wissenschaftlich fundierter Arbeitsgedächtnistest für das Onlineassessment zur Verfügung. / Like most Western armed forces, the Bundeswehr faces both high personnel requirements and a shortage of skilled personnel. Online assessment can optimize the application process to ensure that capable personnel are retained more quickly. Online assessment has various advantages, but also challenges associated with it. Probably the biggest of these challenges is minimizing cheating, as online assessment takes place in a largely unsupervised environment. Various approaches are used to counter this problem, such as large item pools, which can be used to counter the dissemination of solutions on the Internet. However, this approach is associated with high costs. Automatic item generation, on the other hand, makes it possible to create psychometrically high-quality items in a cost-effective and time-efficient manner. For this reason, two working memory tests with automatic item generation for the German Armed Forces’ online assessment were developed and evaluated in the present study, with the aim of matching the high predictive validity of on-site diagnostics. The first study (N = 330) demonstrated that automatic item generation can be used for the developed working memory tests. Two different temporal variants were also investigated, with the longer stimulus presentation time proving to be more beneficial. The second study (N = 621) provided reliability and validity evidence. The tests showed good convergent and discriminant validity. In addition, one of the two tests demonstrated very good predictive validity. Taking into account the overall test quality criteria, this test was ultimately proposed for use in the German Armed Forces’ online assessment. Thus, the Bundeswehr now has a scientifically-grounded working memory test available for its online assessment. Personalauswahl Arbeitsgedächtnis Linear Logistisches Test Modell (LLTM) Linear Partial Credit Model (LPCM) Simulation Onlineassessment personnel selection working memory military linear logistic test model (LLTM) linear partial credit model (LPCM) simulation online assessment 150 Psychologie CT 1500 CT 6700 CW 4700 CW 9000 ddc:150
7	The application and empirical comparison of item parameters of Classical Test Theory and Partial Credit Model of Rasch in performance assessments Mokilane, Paul Moloantoa 05 1900 (has links) This study empirically compares the Classical Test Theory (CTT) and the Partial Credit Model (PCM) of Rasch focusing on the invariance of item parameters. The invariance concept which is the consequence of the principle of specific objectivity was tested in both CTT and PCM using the results of learners who wrote the National Senior Certificate (NSC) Mathematics examinations in 2010. The difficulty levels of the test items were estimated from the independent samples of learn- ers. The same sample of learners used in the calibration of the difficulty levels of the test items in the PCM model were also used in the calibration of the difficulty levels of the test items in CTT model. The estimates of the difficulty levels of the test items were done using RUMM2030 in the case of PCM while SAS was used in the case of CTT. RUMM2030 and SAS are both the statistical softwares. The analysis of variance (ANOVA) was used to compare the four different design groups of test takers. In cases where the ANOVA showed a significant difference between the means of the design groups, the Tukeys groupings was used to establish where the difference came from. The research findings were that the test items' difficulty parameter estimates based on the CTT theoretical framework were not invariant across the different independent sample groups. The over- all findings from this study were that the CTT theoretical framework was unable to produce item difficulty invariant parameter estimates. The PCM estimates were very stable in the sense that for most of the items, there was no significant difference between the means of at least three design groups and the one that deviated from the rest did not deviate that much. The item parameters of the group that was representative of the population (proportional allocation) and the one where the same number of learners (50 learners) was taken from different performance categories did not differ significantly for all the items except for item 6.6 in examination question paper 2. It is apparent that for the test item parameters to be invariant of the group of test takers in PCM, the group of test takers must be heterogeneous and each performance category needed to be big enough for the proper calibration of item parameters. The higher values of the estimated item parameters in CTT were consistently found in the sample that was dominated by the high proficient learners in Mathematics ("bad") and the lowest values were consistently calculated in the design group that was dominated by the less proficient learners. This phenomenon was not apparent in the Rasch model. / Mathematical Sciences / M.Sc. (Statistics) CTT IRT NSC Item Rasch model Partial Credit Model Invariance Specific objectivity 519.50968 Rasch models -- South Africa Item response theory Psychometrics
8	Statistical reasoning at the secondary tertiary interface Wilson, Therese Maree January 2006 (has links) Each year thousands of students enrol in introductory statistics courses at universities throughout Australia, bringing with them formal and informal statistical knowledge and reasoning, as well as a wide range of basic numeracy skills, mathematical inclinations and attitudes towards statistics, which have the potential to impact on their ability to develop statistically. This research develops and investigates measures of each of these components for students at the interface of secondary and tertiary education, and investigates the relationships that exist between them, and a range of background variables. The focus of the research is on measuring and analysing levels and abilities in statistical reasoning for a range of students at the tertiary interface, with particular interest also in investigating their basic numeracy skills and how these may or may not link with statistical reasoning allowing for other variables and factors. Information from three cohorts in an introductory data analysis course, whose focus is real data investigations, provides basis for the research. This course is compulsory for all students in degree programs associated with all sciences or mathematics. The research discusses and reports on the development of questionnaires to measure numeracy and statistical reasoning and the students' attitudes and reflections on their prior school experiences with statistics. Students' attitudes are found to be generally positive, particularly with regard to their self-efficacy. They are also in no doubt as to the links that exist between mathematics and statistics. The Numeracy Questionnaire, developed to measure pre-calculus skills relevant to an introductory data analysis course which emphasises real data investigations, demonstrates that many students who have completed a basic algebra and calculus senior school subject struggle with skills which are in the pre-senior curricula. Direct examination of the responses helps to understand where and why difficulties tend to occur. Rasch analysis is used to validate the questionnaire and assist in the description of levels of skill. General linear models demonstrate that a student's numeracy score depends on the result obtained in senior mathematics, whether or not the student is a mathematics student, gender, whether or not higher level mathematics has been studied, self-efficacy and year. The research indicates that either the pre-senior curricula need strengthening or that exposure to mathematics beyond the core senior course is required to establish confidence with basic skills particularly when applied to new contexts and multi- step situations. The Statistical Reasoning Questionnaire (SRQ) is developed for use in the Australian context at the secondary/tertiary interface. As with the Numeracy Questionnaire, detailed examination of the responses provides much insight into the range and features of statistical reasoning at this level. Rasch analyses, both dichotomous and polychotomous, are used to establish the appropriateness of this instrument as a measuring tool at this level. The polychotomous, Rasch partial credit model is also used to define a new approach to scoring a statistical reasoning instrument and enables development and application of a hierarchical model and measures levels of statistical reasoning appropriate at the school/tertiary interface. General linear models indicate that numeracy is a highly significant predictor of statistical reasoning allowing for all other variables including tertiary entrance score and students' backgrounds and self-efficacy. Further investigation demonstrates that this relationship is not limited to more difficult or overtly mathematical items on the SRQ. Performance on the end of semester component of assessment in the course is shown to depend on statistical reasoning at the beginning of semester as measured by the partial credit model, allowing for all other variables. Because of the dominance of the relationship between statistical reasoning (as measured by the SRQ) and numeracy on entry, some further analysis of the end of semester assessment is carried out. This includes noting the higher attrition rates for students with less mathematical backgrounds and lower numeracy. statistical reasoning statistical thinking statistical literacy numeracy secondary/tertiary interface Rasch analysis partial credit model attitudes towards statistics self-efficacy assessment statistical education introductory data analysis course mathematical thinking
9	The application and empirical comparison of item parameters of Classical Test Theory and Partial Credit Model of Rasch in performance assessments Mokilane, Paul Moloantoa 05 1900 (has links) This study empirically compares the Classical Test Theory (CTT) and the Partial Credit Model (PCM) of Rasch focusing on the invariance of item parameters. The invariance concept which is the consequence of the principle of specific objectivity was tested in both CTT and PCM using the results of learners who wrote the National Senior Certificate (NSC) Mathematics examinations in 2010. The difficulty levels of the test items were estimated from the independent samples of learn- ers. The same sample of learners used in the calibration of the difficulty levels of the test items in the PCM model were also used in the calibration of the difficulty levels of the test items in CTT model. The estimates of the difficulty levels of the test items were done using RUMM2030 in the case of PCM while SAS was used in the case of CTT. RUMM2030 and SAS are both the statistical softwares. The analysis of variance (ANOVA) was used to compare the four different design groups of test takers. In cases where the ANOVA showed a significant difference between the means of the design groups, the Tukeys groupings was used to establish where the difference came from. The research findings were that the test items' difficulty parameter estimates based on the CTT theoretical framework were not invariant across the different independent sample groups. The over- all findings from this study were that the CTT theoretical framework was unable to produce item difficulty invariant parameter estimates. The PCM estimates were very stable in the sense that for most of the items, there was no significant difference between the means of at least three design groups and the one that deviated from the rest did not deviate that much. The item parameters of the group that was representative of the population (proportional allocation) and the one where the same number of learners (50 learners) was taken from different performance categories did not differ significantly for all the items except for item 6.6 in examination question paper 2. It is apparent that for the test item parameters to be invariant of the group of test takers in PCM, the group of test takers must be heterogeneous and each performance category needed to be big enough for the proper calibration of item parameters. The higher values of the estimated item parameters in CTT were consistently found in the sample that was dominated by the high proficient learners in Mathematics ("bad") and the lowest values were consistently calculated in the design group that was dominated by the less proficient learners. This phenomenon was not apparent in the Rasch model. / Mathematical Sciences / M.Sc. (Statistics) CTT IRT NSC Item Rasch model Partial Credit Model Invariance Specific objectivity 519.50968 Rasch models -- South Africa Item response theory Psychometrics
10	Relationships of Home, Student, School, and Classroom Variables with Mathematics Achievement Miller, Roslyn B 09 December 2016 (has links) This study used the TIMSS 2011 International Database to investigate predictors of 8th-grade mathematics achievement across three countries that represent a wide range of cultures and levels of mathematics achievement: Chinese Taipei, Ghana, and the United States. A review of literature on predictors of mathematics achievement yielded variables in four major contexts of learning—a student’s home, beliefs, school, and classroom. The variables of home that were investigated are home possessions for learning, parent education, and parents’ expectations and involvement in their children’s education. The variables of student beliefs were self-confidence in mathematics and the value of mathematics. The variables of school were school climate, school resources, administrator leadership, and school socioeconomic status. Finally, the variables of the classroom are access and equity, curriculum, tools and technology, assessment, and teacher professionalism. A 2-level hierarchical linear model was used to investigate relationships between the predictors for learning mathematics and 8th-grade mathematics achievement. Level 1 represented the relationships among the student-level variables, and Level 2 represented the school-level variables. In Chinese Taipei, statistically significant predictors of mathematics achievement in the final model included variables from the domains of home resources, student beliefs, school climate, and school socioeconomic status. In Ghana, both student-beliefs variables had statistically significant relationships with mathematics achievement, and one school climate and one school socioeconomic status variable each was found statistically significant. The U.S. had statistically significant predictors in the domains of home resources, student beliefs, school socioeconomic status, classroom-level access and equity, classroom assessment, and teacher professionalism. This study extends previous research in several ways. It includes a review of classic and recent literature regarding predictors of mathematics achievement; 17 scales using the Rasch partial credit model were developed to measure predictors of mathematics achievement; and the results of this study may be used to examine the relationships between the independent variables of this study and middle-grades mathematics achievement in countries similar to the 3 in this study to reinforce and support variables that contribute to student achievement. Grade 8 Ghana United States Chinese Taipei scales Rasch partial credit model multilevel modeling hierarchical linear modeling HLM TIMSS achievement education mathematics education assessment

Search results