Spelling suggestions: "subject:"examinations -- design anda construction"" "subject:"examinations -- design ando construction""
1 |
A comparison of multi-stage and computerized adaptive tests based on the generalized partial credit modelMacken-Ruiz, Candance L. 11 September 2012 (has links)
A multi-stage test (MST) design is an alternative design for the delivery of automated tests. While computerized adaptive tests (CAT) have dominated testing for the past three decades, increasing interest has been focused on the MST because it offers two advantages that CAT does not: Test sponsors and test developers can see an entire test before administration because it is pre-constructed from sets of modules of test items, and within a module examinees may skip forward and back through test items and make changes to previously answered items. Due to the dominance of CAT, little research has been devoted to differing MST designs with regard to the number of items per stage and routing rules that direct the selection of the next module after a previous module has been completed. This research used simulated response data for a large national test and the generalized partial credit model to compare a CAT to one of three MST designs that had either decreasing numbers of items per stage, increasing number of items per stage, or the same number of items per stage, and one of three routing rules, maximum information, fixed [theta], or number-right routing. As anticipated, CAT had the best performance with respect to estimating proficiency and item pool use. Among the MSTs, the MST with increasing numbers of items per stage performed the best with respect to estimating proficiency, followed by the MST with decreasing number of items per stage, and equal numbers of items per stage. By routing rule, maximum information performed the best and number-right routing performed the worst. Only one panel was constructed per MST design, so only limited comparisons of item pool use could be made. Although the MST designs did not perform as well as CAT, the differences in estimating proficiency were not large, implying that the MST design is a viable alternative to CAT. / text
|
2 |
Producing equivalent examination forms : an assessment of the British Columbia Ministry of Education examination construction procedureMacMillan, Peter D. January 1991 (has links)
Questions have been raised concerning the equivalency of the January, June, and August forms of the British Columbia provincial Grade 12 examinations for a given subject. The procedure for constructing these examinations has been changed as of the 1990/91 school year. The purpose of this study was to duplicate this new procedure and assess the equivalency of the forms that resulted.
An examination construction team, all of whom had previous experience with the British Columbia Ministry of Education's Student Assessment Branch, simultaneously constructed two forms of a Biology 12 examination from a common table of specifications using a pool of multiple choice items from previous examinations. A sample of students was obtained in the Okanagan, Thompson, and North Thompson areas of British Columbia. Both forms were administered to each student, as required by the test equating design (Design II (Angoff, 1971)) chosen. The data sample consisted of responses from 286 students.
The data were analyzed using a classical item analysis (LERTAP, Nelson, 1974) followed by a 2x2 order-by-form fixed effects ANOVA with repeated measures on the second factor. Item analysis revealed all items on both forms performed satisfactorily, ruling out an alternate hypothesis of flawed items being the cause of the lack of equivalence found. Results showed a significant (p<.05) difference in the means of the two forms, no
significant (p>.25) order effect, and a significant (p<.25) order-by-form interaction.
Linear and equipercentile equatings were carried out. The linear and equipercentile equatings yielded very similar results. Equating errors, judged using the conditional root mean square error of equating, was 4.86 points (9.35%) for both the equatings. Equivalency was also judged employing a graphical procedure in which the deviation of the equating function from the identity function was plotted with error bands produced using the standard error of equating. The procedure showed the two forms to be nonequivalent, particularly for the lower scoring students.
The source of the nonequivalency was investigated by separating the forms into three subtests based on the pairs of items possessing or lacking item statistics at the time of test construction. The linear equating followed by the graphical analysis was repeated for the pairs of subtests. The pairs of subtests comprised of item pairs for which difficulty (p) values were present at time of construction for one or both of the items in an item pair were found to be equivalent. In contrast, the pair of subtests comprised of items for which p values were unavailable for either item in an item pair at time of construction were found to be not equivalent.
It was concluded that the examination construction procedure in its present state cannot be relied on to produce equivalent forms. An experienced examination construction team was unable to accurately match items based on the level of
difficulty for items which did not have prior item statistics. As such, a necessary requirement for construction of equivalent forms is that item statistics be present at the time of construction. / Education, Faculty of / Graduate
|
3 |
A critical analysis of certain standardized tests as determined by scientific studyUnknown Date (has links)
Test makers and test users are giving increasing attention to the critical analysis and evaluation of standardized tests, scales, and inventories. Such analyses are too many times so brief and general as to be of little value to those desiring to use the best test measure. Hence it was deemed worth while to attempt in this paper a representative critical analysis of each of several types of measures. Certain assumptions and principles basic to this study follow. / "May, 1948." / Typescript. / "Submitted to the Graduate Council of Florida State University in partial fulfillment of the requirements for the degree of Master of Arts under Plan II." / Includes bibliographical references (leaves 43-44).
|
4 |
An evaluation of teacher-constructed testsCanepa, Eano Joseph 01 January 1949 (has links)
It is my purpose, therefore, to present a discussion, a few examples, and an evaluation of the principal types of tests which are in the concern and domain of the classroom teacher. It is important that the discussion be concerned here with the field of modern languages, and more specifically Spanish, for that is the field for which this investigator is prepared, Therefore, while the application of the findings is generally applicable to other modern languages, it is here restricted to Spanish.
The need for an investigation into the types of tests which a teacher might use or construct arises out of the fact that very few such investigations in the field of Spanish have ever been made in the past in any extensive manner. Many investigators have skirted the field lightly, but none has conducted a special study. This investigation is intended to supplement thoss already made and bring many of them up-to-date.
|
5 |
Computer-assisted item and test pre-analysis: a new direction in qualitative methodsSales, Clay Alan 08 September 2012 (has links)
To date, the major emphasis in test and item evaluation has been directed toward statistical post-measures which rely heavily on data gathered from the administration of the instrument. These primarily summative techniques are limited, however, in that they are incapable of providing information about and item/test before it has been sent for field trials. This research presents a new direction in test and item analysis which, using test/item writing heuristics, provides a previously unavailable technology for instrument pre-analysis. The new field of "qualitative item and test pre-analysis" is proposed and described. The implications to the field are discussed in addition to specific suggestions for the use of this new technology.
The design and creation of a base-case item and test pre-analysis expert system (ITAX) is also detailed, including the heuristics incorporated, implementation methodologies and limitations. The heuristics incorporated into the system include the detection of: two varieties of grammatical cues, negation/multiple negation, repetition of phrases within an options list, presence of too few options, inconsistent length of distractors, use of all- and none-of-the-above, repetition of significant words from the stem to the options, randomness of multiple choice answer placement, balance of true/false items and length of true/false items. A comprehensive reference to the system is also provided. / Master of Arts
|
6 |
Diagnosing Learner Deficiencies in Algorithmic ReasoningHubbard, George U. 05 1900 (has links)
It is hypothesized that useful diagnostic information can reside in the wrong answers of multiple-choice tests, and that properly designed distractors can yield indications of misinformation and missing information in algorithmic reasoning on the part of the test taker. In addition to summarizing the literature regarding diagnostic research as opposed to scoring research, this study proposes a methodology for analyzing test results and compares the findings with those from the research of Birenbaum and Tatsuoka and others. The proposed method identifies the conditions of misinformation and missing information, and it contains a statistical compensation for careless errors. Strengths and weaknesses of the method are explored, and suggestions for further research are offered.
|
7 |
Development of a test for scientific literacy and its application in assessing the scientific literacy of matriculants entering universities and technikons in the Western Cape, South AfricaLaugksch, Rüdiger Christian January 1996 (has links)
Bibliography: p. [331]-349. / This exploratory study was conducted against the background of immediate post-apartheid South Africa in which the social upliftment and improvement of living conditions of all South Africans is regarded as of the highest priority. In a science- and technology-orientated world, science and technology is inextricably linked to this process. The purposes of this study were (a) to determine the level of scientific literacy of matriculants entering tertiary education in the Western Cape for the first time; (b) to describe patterns of scientific literacy levels with respect to selected demographic and other student background variables; and (c) to ascertain which student background variables appear to have the most influence in determining whether matriculants are scientifically literate or not. A survey was deemed to be appropriate for answering the above research questions. Underpinning the study was the development of a pool of scientific literacy test-items, from which a criterion-referenced, reliable, valid, and composite scientific literacy test instrument - the Test of Basic Scientific Literacy - could be constructed.
|
8 |
A Psychometric Evaluation of Script Concordance Tests for Measuring Clinical ReasoningWilson, Adam Benjamin 29 January 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Purpose: Script concordance tests (SCTs) are assessments purported to measure clinical data interpretation. The aims of this research were to (1) test the psychometric properties of SCT items, (2) directly examine the construct validity of SCTs, and (3) explore the concurrent validity of six SCT scoring methods while also considering validity at the item difficulty and item type levels.
Methods: SCT scores from a problem solving SCT (SCT-PS; n=522) and emergency medicine SCT (SCT-EM; n=1040) were used to investigate the aims of this research. An item analysis was conducted to optimize the SCT datasets, to categorize items into levels of difficulty and type, and to test for gender biases. A confirmatory factor analysis tested whether SCT scores conformed to a theorized unidimensional factor structure. Exploratory factor analyses examined the effects of six SCT scoring methods on construct validity. The concurrent validity of each scoring method was also tested via a one-way multivariate analysis of variance (MANOVA) and Pearson’s product moment correlations. Repeated measures analysis of variance (ANOVA) and one-way ANOVA tested the discriminatory power of the SCTs according to item difficulty and type.
Results: Item analysis identified no gender biases. A combination of moderate model-fit indices and poor factor loadings from the confirmatory factor analysis suggested that the SCTs under investigation did not conform to a unidimensional factor structure. Exploratory factor analyses of six different scoring methods repeatedly revealed weak factor loadings, and extracted factors consistently explained only a small portion of the total variance. Results of the concurrent validity study showed that all six scoring methods discriminated between medical training levels in spite of lower reliability coefficients on 3-point scoring methods. In addition, examinees as MS4s significantly (p<0.001) outperformed their MS2 SCT scores in all difficulty categories. Cross-sectional analysis of SCT-EM data reported significant differences (p<0.001) between experienced EM physicians, EM residents, and MS4s at each level of difficulty. When considering item type, diagnostic and therapeutic items differentiated between all three training levels, while investigational items could not readily distinguish between MS4s and EM residents.
Conclusions: The results of this research contest the assertion that SCTs measure a single common construct. These findings raise questions about the latent constructs measured by SCTs and challenge the overall utility of SCT scores. The outcomes of the concurrent validity study provide evidence that multiple scoring methods reasonably differentiate between medical training levels. Concurrent validity was also observed when considering item difficulty and item type.
|
Page generated in 0.1649 seconds