Spelling suggestions: "subject:"criterion'referenced tests"" "subject:"criterionreferenced tests""
11 |
A Comparison of Three Item Selection Methods in Criterion-Referenced TestsLin, Hui-Fen 08 1900 (has links)
This study compared three methods of selecting the best discriminating test items and the resultant test reliability of mastery/nonmastery classifications. These three methods were (a) the agreement approach, (b) the phi coefficient approach, and (c) the random selection approach.
Test responses from 1,836 students on a 50-item physical science test were used, from which 90 distinct data sets were generated for analysis. These 90 data sets contained 10 replications of the combination of three different sample sizes (75, 150, and 300) and three different numbers of test items (15, 25, and 35).
The results of this study indicated that the agreement approach was an appropriate method to be used for selecting criterion-referenced test items at the classroom level, while the phi coefficient approach was an appropriate method to be used at the district and/or state levels. The random selection method did not have similar characteristics in selecting test items and produced the lowest reliabilities, when compared with the agreement and the phi coefficient approaches.
|
12 |
The Characteristics and Properties of the Threshold and Squared-Error Criterion-Referenced Agreement IndicesDutschke, Cynthia F. (Cynthia Fleming) 05 1900 (has links)
Educators who use criterion-referenced measurement to ascertain the current level of performance of an examinee in order that the examinee may be classified as either a master or a nonmaster need to know the accuracy and consistency of their decisions regarding assignment of mastery states. This study examined the sampling distribution characteristics of two reliability indices that use the squared-error agreement function: Livingston's k^2(X,Tx) and Brennan and Kane's M(C). The sampling distribution characteristics of five indices that use the threshold agreement function were also examined: Subkoviak's Pc. Huynh's p and k. and Swaminathan's p and k. These seven methods of calculating reliability were also compared under varying conditions of sample size, test length, and criterion or cutoff score. Computer-generated data provided randomly parallel test forms for N = 2000 cases. From this, 1000 samples were drawn, with replacement, and each of the seven reliability indices was calculated. Descriptive statistics were collected for each sample set and examined for distribution characteristics. In addition, the mean value for each index was compared to the population parameter value of consistent mastery/nonmastery classifications. The results indicated that the sampling distribution characteristics of all seven reliability indices approach normal characteristics with increased sample size. The results also indicated that Huynh's p was the most accurate estimate of the population parameter, with the smallest degree of negative bias. Swaminathan's p was the next best estimate of the population parameter, but it has the disadvantage of requiring two test administrations, while Huynh's p index only requires one administration.
|
13 |
A study of the effect of criterion-referencing on teaching, learning and assessment in secondary schoolsKerrison, Terence Michael. January 1996 (has links)
published_or_final_version / Education / Master / Master of Education
|
14 |
Criterion validity of the Indiana Basic Competency Skills Test for third gradersMorgan, M. Sue January 1988 (has links)
The purpose of this study was to assess the criterion validity of the Indiana Basic Competency Skills Test (IBCST) by exploring the relationships between scores obtained on the IBCST and (a) student gender, (b) teacher-assigned letter grades, (c) scores obtained on the Otis-Lennon School Ability Test (OLSAT), and (a) scores obtained on the Stanford Achievement Test (SAT). The subjects were 300 third grade students enrolled in a small mid-Indiana school system. Data collected included gender, age, IBCST scores, OLSAT scores, SAT scores, and teacher-assigned letter grades in reading and mathematics. An alpha level of .01 was used in each statistical analysis. Gender differences were investigated by comparisons of the relative IBCST pass/fail (p/f) frequencies of males and females and boys' and girls' correct answers on the IBCST Reading and Math tests. Neither the chi square analysis of p/f frequencies nor the multivariate analysis of variance of the IBCST scores disclosed significant gender differences. Therefore, subsequent correlational analyses were done with pooled data.The relationship of teacher-assigned letter grades to IBCST p/f levels was studied with nonparametric and parametric statistical techniques. The 2x3 chi squares computed between IBCST performance and letter grades in reading and math were significant. The analyses of variance of the data yielded similar results. Teacher grades were related to IBCST performance.Multiple regression analyses were used to study the relationships between the IBCST and OLSAT performances. Significant multiple R-squares of approximately .30 were obtained in each analysis. Scholastic aptitude was related to IBCST performance.Canonical correlation analyses were used to explore the relationships between the reading and mathematics sections of the IBCST and SAT. Both analyses yielded a single significant, meaningful canonical correlation coefficient. The canonical variable loadings suggested that the IBCST Reading and Math composites, as well as the SAT composites, were expressions of general achievement. Thus, levels of achievement on the criterion referenced IBCST and the norm referenced SAT were related. The results of the study support the criterion validity of the IBCST with traditional methods of assessment as criteria. / Department of Educational Psychology
|
15 |
A classroom-based investigation into the potential of a computer-mediated criterion-referenced test as an evaluation instrument for the assessment of primary end user spreadsheet skillsBenn, Kenneth Robert Andrew January 1994 (has links)
The demand for innovative end users of information technology is increasing along with the proliferation of computer equipment within the workplace. This has resulted in increasing demands being made upon educational institutions responsible for the education of computer end users. The demands placed upon the teachers are particularly high. Large class groups and limited physical resources make the task especially difficult. One of the most time consuming, yet important, tasks is that of student evaluation. To effectively assess the practical work of information technology students requires intensive study of the storage media upon which the students'efforts have been saved. The purpose of this study was to assess the suitability of criterion-referenced testing techniques applied to the evaluation of end user computing students. Objective questions were administered to the students using Question Mark, a computer-managed test delivery system which enabled quick and efficient management of scoring and data manipulation for empirical analysis. The study was limited to the classroom situation and the assessment of primary spreadsheet skills. In order to operate within these boundaries, empirical techniques were used which enabled the timeous analysis of the students' test results. The findings of this study proved to be encouraging. Computer-mediated criterion-referenced testing techniques were found to be sufficiently reliable for classroom practice when used to assess primary spreadsheet skills. The validation of the assessment technique proved to be problematic because of the constraints imposed by normal classroom practice as well as the lack of an established methodology for evaluating spreadsheet skills. However, sufficient evidence was obtained to warrant further research aimed at assessing the use of computer-mediated criterion-referenced tests to evaluate information technology end user learning in situations beyond the boundaries of the classroom, such as a national certification examination.
|
16 |
The development and validation of a preassessment instrument for the criterion referenced curriculum /Coston, Caroline Ann January 1983 (has links)
No description available.
|
17 |
Criterion-referenced assessment for modern dance educationMacIntyre, Christine Campbell January 1985 (has links)
This study monitored the conceptualisation, implementation and evaluation of criterion-referenced assessment for Modern Dance by two teachers specifically chosen because they represented the two most usual stances in current teaching i.e. one valuing dance as part of a wider, more general education, the other as a performance art. The Review of Literature investigated the derivation of these differences and identified the kinds of assessment criteria which would be relevant in each context. It then questioned both the timing of the application of the criteria and the benefits and limitations inherent in using a pre-active or re-active model. Lastly it examined the philosophy of criterion-referenced assessment and thereafter formulated the main hypothesis, i. e. "That criterion-referenced assessment is an appropriate and realistic method for Modern Dance in schools". Both the main and sub-hypotheses were tested by the use of Case Study/Collaborative Action research. In this chosen method of investigation the teachers' actions were the primary focus of study while the researcher played a supportive but ancillary role. The study has three sections. The first describes the process experienced by the teachers as they identified their criteria for assessment and put their new strategy into action. It shows the problems which arose and the steps which were taken to resolve them. It gives exemplars of the assessment instruments which were designed and evaluates their use. It highlights the differences in the two approaches to dance and the different competencies required by the teachers if their criterion-referenced strategy was adequately and validly to reflect the important features of their course. In the second section the focus moves from the teachers to the pupils. Given that the pupils have participated in different programmes of dance, the study investigates what criteria the pupils spontaneously use and what criteria they can be taught to use. It does this through the introduction of self-assessment in each course. In this way the pupils' observations and movement analyses were made explicit and through discussion, completing specially prepared leaflets and using video, they were recorded and compared. And finally, the research findings were circulated to a larger number of teachers to find to what extent their concerns and problems had been anticipated by the first two and to discover if they, without extensive support, could also mount a criterion-referenced assessment strategy with an acceptable amount of effort and within a realistic period of time. And given that they could, the final question concerned the evaluations of all those participants i.e. teachers, parents and pupils. Would this extended group similarly endorse the strategy and strengthen the claim that criterion-referenced assessment was a valid and beneficial way of assessing Modern Dance in Schools?
|
18 |
Principals' Opinions on the Impact of High-Stakes Testing on Teaching and Learning in the Public Elementary Schools in the State of UtahHadley, Raylene Jo 03 December 2010 (has links) (PDF)
The No Child Left Behind Act of 2001 (NCLB) brought high-stakes testing to the forefront of American public education. With its call for teachers and schools to be accountable for academic performance, NCLB has focused the spotlight on yearly progress, as measured by students' test scores. Issues associated with this charge include the questionable reliability of tests, the variation evident in state standards, and the consequences an emphasis on high-stakes testing may have on teaching and learning in the classroom. The purpose of this study was to investigate the consequences of high-stakes testing on teaching and learning in public elementary schools in Utah from the vantage point of school principals. Although policymakers assume a direct correlation between increased test scores and academic achievement, this study went beyond test scores. Analysis of semi-structured interviews with 12 principals, selected through purposive sampling from both Title 1 and non-Title 1 schools, revealed both positive and negative themes. Principals appreciated the focus and collaboration that NCLB testing encourages among teachers, but they disliked the impact of poor test scores on faculty morale. Unlike respondents in previous studies, principals did not feel that NCLB diminished creativity in the classroom; they did worry, however, about the validity of scores as a measure of student learning, particularly in the case of a one-time, year-end test.
|
19 |
A Criterion-Referenced Analysis of Form F of the Standardized Bible Content Tests of the American Association of Bible CollegesGaede, Charles S. (Charles Samuel) 12 1900 (has links)
The purposes of this study were to: (a) analyze subjects' responses from Form F of the Standardized Bible Content Tests of the American Association of Bible Colleges by factor analysis and the Rasch measurement model and (b) determine dimensionality of Form F, determine the correlation to the Literal, Anti-literal, Mythological Scales, and determine the best criterion-referenced test design of Form F using Rasch measurement procedures. Volunteers from a purposefully selected sample of nine colleges from the American Association of Bible Colleges participated in the study. One research instrument with five demographic questions, the Standardized Bible Content Test Form F, and the Literal, Anti-literal, and Mythological Scales was administered to 179 volunteer graduating seniors. Frequencies and percentages of responses were computed for the demographic questions. Mean scores on the Literal, Anti-literal, and Mythological Scales were computed for gender and religious affiliation. Principal components analysis of Form F with varimax rotation and list-wise deletion of missing data was used to assess the dimensionality of Form F. Correlations between scores on the Literal, Anti-literal and Mythological Scales and scores from the principal components analysis of Form F were computed. Dunn's multiple comparison procedures were used to test for statistical significance. Rasch-Model measurement analysis of the scales extracted by principal components analysis was accomplished to obtain suggested target description, test design, variable definition, and item calibration.
|
20 |
Influence of Item Response Theory and Type of Judge on a Standard Set Using the Iterative Angoff Standard Setting MethodHamberlin, Melanie Kidd 08 1900 (has links)
The purpose of this investigation was to determine the influence of item response theory and different types of judges on a standard. The iterative Angoff standard setting method was employed by all judges to determine a cut-off score for a public school district-wide criterion-reformed test. The analysis of variance of the effect of judge type and standard setting method on the central tendency of the standard revealed the existence of an ordinal interaction between judge type and method. Without any knowledge of p-values, one judge group set an unrealistic standard. A significant disordinal interaction was found concerning the effect of judge type and standard setting method on the variance of the standard. A positive covariance was detected between judges' minimum pass level estimates and empirical item information. With both p-values and b-values, judge groups had mean minimum pass levels that were positively correlated (ranging from .77 to .86), regardless of the type of information given to the judges. No differences in correlations were detected between different judge types or different methods. The generalizability coefficients and phi indices for 12 judges included in any method or judge type were acceptable (ranging from .77 to .99). The generalizability coefficient and phi index for all 24 judges were quite high (.99 and .96, respectively).
|
Page generated in 0.1162 seconds