Spelling suggestions: "subject:"educational evaluatuation"" "subject:"educational evalualuation""
421 |
The role of parents in the continuous assessment of learnersMadondo, Sipho Eric Sihle January 2002 (has links)
submitted to the Faculty of
Education
in fulfillment ofthe requirement for the degree of
MASTER OF EDUCATION
in the Department of
CURRICULUM AND INSTRUCTIONAL STUDIES
at the
UNIVERSITY OF ZULULAND, 2002. / The present study examines the role of parents in the continuous assessment of learners. The first aim was to ascertain the extent to which parents play an active role in the continuous assessment of their children. The second aim was to ascertain the extent to which parents understand the significance of continuous assessment. The third aim was to determine whether parents' biographical factors such as gender, age, type of parent, academic qualification as well as learner's grade have any influence on parents' active role which they play in the continuous assessment of their children. The last aim was to determine whether parents' biographical factors such as gender, aEQ, type of parent, academic qualification as well as learner's grade have any influence on parents1 understanding of the significance of continuous assessment. To this end, a questionnaire was administered to a randomly selected sample of one hundred and eighty four parents.
The findings reveal that parents differ in the extent to which they play an active role in the continuous assessment of their children. A very high percentage (72,3%) of parents report an above average level of active role. The findings show that parents differ in the extent to which they understand the significance of continuous assessment. A very high percentage (65.2%) of parents report an above average level of understanding of the significance of continuous assessment. The findings also indicate that parents' personal variables such as age, type of parent, academic qualification as well as learner's grade have no influence on parents' active role which they play in the continuous assessment of their children. The last findings show that, with the exception of learner's grade, gender, age, type of parent and academic qualification have no influence on parents' understanding of the significance of continuous assessment. A very high percentage (72.2%) of parents.with learners in grade 8 report above average level of understanding of the significance of continuous assessment as compared to (57.5%) of parents with learners in grade 7.
On the basis of the findings of this study, recommendations to the Department of Education and Culture, as well as for directing future research were made. / National Research Fund
|
422 |
Transformative Teacher Evaluation: Self Evaluation for High Performing TeachersSosanya-Tellez, Carla Ann 01 January 2010 (has links)
Public schools are in crisis, as educators and legislators seek to provide high quality education to diverse students in a measurement-driven environment. The public educator's moral imperative is to assure that all children are literate when they leave school so they can thrive in our democracy (Dewey, 1944; Freire, 1998a; Giroux & Giroux, 2004). Yet, the achievement gap persists, as poor African-American and Latino students under-perform as compared to white middle-class students (Ladson-Billings & Tate, 1995). Additionally, public school teachers are predominately middle-class and White, while they teach increasingly diverse children of poverty. In legislation, student assessment, teacher licensure, and research-based curricula have taken center stage. Teacher evaluation is noticeably absent (Danielson, 2002; Iwanicki, 1990; No Child Left Behind Act, 2002). Teacher evaluation is static and mired in politics; it has not historically helped improve school (Peterson, 2000). Investigating teacher evaluation's potential as an overlooked tool to improve teaching for all teachers and students in public school is urgent in this climate. As Stronge and Tucker (2003) asserted, "Without capable, highly qualified teachers in America's classrooms, no educational reform process can possibly succeed" (p. 3). This problem-based learning dissertation addresses a real problem in practice: how to make teacher evaluation meaningful for high-performing teachers. This study explores Wood's (1998) call for a move from traditional to transformative evaluation. Ten high performing teachers field-tested a self-evaluation handbook. They explored study options designed to help them critically reflect on their own teaching, connect with students, reflect, and set new goals. This work shows promise to help teachers and students engage in a more democratic, caring and loving public place we call school. This work is timely. After all, "When all is said and done, what matters most for students' learning are the commitments and capacities of their teachers" (Darling-Hammond, 1997, p. 293).
|
423 |
Establishing local norms for adaptive behavior of Hmong children using the Texas Environmental Adaptation Measure (TEAM)Miles, Winona Cound 01 January 1990 (has links)
Statement of problem. Assessment of adaptive behavior is a mandated component of the information necessary to make special education decisions and includes consideration of cultural and environmental expectations placed upon the child. Adaptive behavior scales currently in use do not include many ethnic minorities in their standardization, bringing into question their validity when used on non-majority students. Establishing local norms for ethnic minority groups allows children to be compared to their peer group when determining their level of adaptive functioning. Sources of data. This research study, which used the Texas Environmental Adaptation Measure-Adaptive Behavior Scale (TEAM-ABS), was based on a random sample of 100 Hmong students (49 girls, 51 boys) who completed first grade during the 1989-90 school year in the Stockton (CA) Unified School District. A mean and standard deviation, and standard error for the total sample was calculated. Alpha reliability coefficients were computed for the total and subscale scores of the TEAM-ABS, and means and standard deviations for each test item were calculated. In addition, 51 families completed the structured interview portion of the TEAM. This descriptive information is presented in frequency and percentage format. Conclusions reached. Local norms with a mean of 98, S.D. of 15 and SEM of 4.74 provide the information needed to make the TEAM useful as a measure of adaptive behavior with Hmong first graders between the ages of 6 and 8. Because of the homogeneity of the Hmong, these norms should be valid for use with Hmong children in other locations. Recommendations for further research include expanding norms to include kindergarten and second grade Hmong students, and other ethnic subgroups. Test-retest reliability and further concurrent and predictive validity research is needed to make the TEAM a more generally useful measure.
|
424 |
Toward optimizing learner feedback during instructional materials development : exploring a methodology for the analysis of verbal dataCarroll, M. Jane January 1988 (has links)
No description available.
|
425 |
The effect of *test characteristics on aberrant response patterns in computer adaptive testingRizavi, Saba M 01 January 2001 (has links)
The advantages that computer adaptive testing offers over linear tests have been well documented. The Computer Adaptive Test (CAT) design is more efficient than the Linear test design as fewer items are needed to estimate an examinee's proficiency to a desired level of precision. In the ideal situation, a CAT will result in examinees answering different number of items according to the stopping rule employed. Unfortunately, the realities of testing conditions have necessitated the imposition of time and minimum test length limits on CATs. Such constraints might place a burden on the CAT test taker resulting in aberrant response behaviors by some examinees. Occurrence of such response patterns results in inaccurate estimation of examinee proficiency levels. This study examined the effects of test lengths, time limits and the interaction of these factors with the examinee proficiency levels on the occurrence of aberrant response patterns. The focus of the study was on the aberrant behaviors caused by rushed guessing due to restrictive time limits. Four different testing scenarios were examined; fixed length performance tests with and without content constraints, fixed length mastery tests and variable length mastery tests without content constraints. For each of these testing scenarios, the effect of two test lengths, five different timing conditions and the interaction between these factors with three ability levels on ability estimation were examined. For fixed and variable length mastery tests, decision accuracy was also looked at in addition to the estimation accuracy. Several indices were used to evaluate the estimation and decision accuracy for different testing conditions. The results showed that changing time limits had a significant impact on the occurrence of aberrant response patterns conditional on ability. Increasing test length had negligible if not negative effect on ability estimation when rushed guessing occured. In case of performance testing high ability examinees while in classification testing middle ability examinees suffered the most. The decision accuracy was considerably affected in case of variable length classification tests.
|
426 |
The application of data envelopment analysis to publicly funded K–12 education in Massachusetts in order to evaluate the effectiveness of the Massachusetts Education Reform Act of 1993 in improving educational outcomesHall, Andrew D. J 01 January 2005 (has links)
The Charnes Cooper Rhodes ratio DEA model ("CCR") is used, with panel data from a large sample of Massachusetts' school districts, to test three propositions concerning the Massachusetts Education Reform Act of 1993 ("MERA"). First, did the degree of positive correlation between Socio-Economic Status ("SES") and educational outcomes decrease, secondly did educational opportunity become more equal among towns in Massachusetts, and finally were education standards raised overall? The CCR model is a Linear Programming method that estimates a convex production function using Koopmans' (1951) definition of technical efficiency and the radial measurements of efficiency proposed by Farrell (1957). It has been widely used in Education Production Function research. The pursuit, through state and federal courts, of equitable funding, allied to the belief that smaller class sizes improve outcomes, has made K-12 education expensive. The belief that outcomes are in constant decline has led to calls for "Accountability" and to "Standards" reform. Standards reform was combined, in MERA, with reform of state aid formulas and additional state funding, to ensure a minimum basic level of education pursuant to the decision of the Massachusetts Supreme Court in McDuffy v. Robertson. The one certain relationship revealed by decades of research is a strong positive correlation between SES and outcomes. If MERA ensured a higher basic level of education, then the correlation between SES and outcomes should have weakened as the education of less well SES-endowed children improved. The CCR model was used first to measure "correlation" between multiple input and multiple output variables. Strong positive correlation was shown to exist and it appeared to strengthen rather than weaken. Next the CCR model was used to determine if there were changes in the distribution of per pupil expenditures and, lastly to determine whether outcomes improved between after MERA. The analysis suggested that the distribution of expenditures improved but that outcomes deteriorated. This deterioration seems to be closely related to the changes in the proportion of all students, in a grade, actually taking the tests. There is little evidence that MERA achieved anything and no basis upon which to argue that it achieved nothing.
|
427 |
Evaluation of IRT anchor test designs in test translation studiesBollwark, John Anthony 01 January 1991 (has links)
Translating measurement instruments from one language to another is a common way of adapting them for use in a population other than those for which the instruments were designed. This technique is particularly useful in helping to (1) understand the similarities and differences that exist between populations and (2) provide unbiased testing opportunities across different segments of a single population. To help insure that a translated instrument is valid for these purposes, it is essential that the equivalence of the original and translated instrument be established. One focus of this thesis was to provide a review of the history, problems and techniques associated with establishing the translation equivalence of measurement instruments. In addition, this review provided support for the use of item response theory (IRT) in translation equivalence studies. The second and main focus of this thesis was to investigate anchor test designs when using IRT in translation equivalence studies. Simulated data were used to determine the anchor test length required to provide adequate scaling results under conditions similar to those that are likely to be found in a translation equivalence study. These conditions included (1) relatively small samples and (2) examinee ability distribution overlaps that are more representative of vertical rather than horizontal scaling situations. The effects of these two variables on the anchor test design required to provide adequate scaling results were also investigated. The main conclusions from this research concerning the scaling of IRT ability and item parameters are: (1) larger examinee samples with larger ability overlaps should be used whenever possible, (2) under ideal scaling conditions of larger examinee samples with larger ability overlaps, relatively good scaling results can be obtained with anchor tests consisting of as few as 5 items (although the use of such short anchor tests is not recommended), and (3) anchor test lengths of at least 10 items should provide adequate scaling results, but longer anchor tests, consisting of well-translated items, should be used if possible. Finally, suggestions for further research on establishing translation equivalence were provided.
|
428 |
Obtaining norm-referenced scores from criterion-referenced tests: An analysis of estimation errorsTucker, Charlene Gower 01 January 1991 (has links)
One customized testing model equates a criterion-referenced test (CRT) to a norm-referenced test (NRT) so that performance on the CRT can produce an estimate of performance on the NRT. The error associated with these estimated norms is not well understood. The purpose of this study was to examine the extent and nature of error present in these normative scores. In two subject areas and at three grade levels, actual NRT scores were compared to NRT scores which were estimated from a CRT. The estimation error was analyzed for individual scores and for group means at different parts of the score distribution. For individuals, the mean absolute difference between the actual NRT scores and the estimated NRT scores was approximately five raw score points on a 60-item reading subtest and approximately two points on a 30-item mathematics subtest. A comparison of the standard errors of substitution showed that individual differences were similar whether a parallel form or a CRT estimate was substituted for the NRT score. The bias present in the estimation of NRT scores from a CRT for groups of examinees is shown by the mean difference between the estimated and actual NRT scores. For all subtests, mean differences were less than one score point, indicating that group data can be accurately obtained through the use of this model. To examine the accuracy of estimation at different parts of the score distribution, the data was divided into three score groups (low, middle, and high) and, subsequently, into deciles. After correcting for a regression effect, mean group differences between actual NRT scores and those estimated from a CRT were fairly consistent for groups at different parts of the distribution. Individual scores, however, were most accurate at the upper end of the score distribution with a decline in accuracy as the score level decreased. In conclusion, this study offers evidence that NRT scores can be estimated from performance on a CRT with reasonable accuracy. However, generalizability of these results to other sets of tests or other populations is unknown. It is recommended that similar research be pursued under varying conditions.
|
429 |
Factors influencing the performance of the Mantel-Haenszel procedure in identifying differential item functioningClauser, Brian Errol 01 January 1993 (has links)
The Mantel-Haenszel (MH) procedure has emerged as one of the methods of choice for identification of differentially functioning test items (DIF). Although there has been considerable research examining its performance in this context, important gaps remain in the knowledge base for effectively applying this procedure. This investigation is an attempt to fill these gaps with the results of five simulation studies. The first study is an examination of the utility of the two-step procedure recommended by Holland and Thayer in which the matching criterion used in the second step is refined by removing items identified in the first step. The results showed that using the two-step procedure is associated with a reduction in the Type II error rate. In the second study, the capability of the MH procedure to identify uniform DIF was examined. The statistic was used to identify simulated DIF in items with varying levels of difficulty and discrimination and with differing levels of difference in difficulty between groups. The results indicated that when difference in difficulty was held constant, poorly discriminating items and items that were very difficult were less likely to be identified by the procedure. In the third study, the effects of sample size were considered. In spite of the fact that the MH procedure has been repeatedly recommended for use with small samples, the results of this study suggest that samples below 200 per group may be inadequate. Performance with larger samples was satisfactory and improved as samples increased. The fourth study is an examination of the effects of score group width on the statistic. Holland and Thayer recommended that n + 1 score groups should be used for matching (where n is the number of items). Since then, various authors have suggested that there may be utility in using fewer (wider) score groups. It was shown that use of this variation on the MH procedure could result in dramatically increased type I error rates. In the final study, a simple variation on the MH statistic which may allow it to identify non-uniform DIF was examined. The MH statistic's inability to identify certain types of non-uniform DIF items has been noted as a major shortcoming. Use of the variation resulted in identification of many of the simulated non-uniform DIF items with little or no increase in the type I error rate.
|
430 |
Optimal test designs with content balancing and variable target information functions as constraintsLam, Tit Loong 01 January 1993 (has links)
Optimal test design involves the application of an item selection heuristic to construct a test to fit the target information function in order that the standard error of the test can be controlled at different regions of the ability continuum. The real data simulation study assessed the efficiency of binary programming in optimal item selection by comparing the degree in which the obtained test information was approximated to different target information functions with a manual heuristic. The effects of imposing a content balancing constraint was studied in conventional, two-stage and adaptive tests designed using the automated procedure. Results showed that the automated procedure improved upon the manual procedure significantly when a uniform target information function was used. However, when a peaked target information function was used, the improvement over the manual procedure was marginal. Both procedures were affected by the distribution of the item parameters in the item pool. The degree in which the examinee empirical scores were recovered was lower when a content balancing constraint was imposed in the conventional test designs. The effect of uneven item parameter distribution in the item pool was shown by the poorer recovery of the empirical scores at the higher regions of the ability continuum. Two-stage tests were shown to limit the effects of content balancing. Content balanced adaptive tests using optimal item selection was shown to be efficient in empirical score recovery, especially in maintaining equiprecision in measurement over a wide ability range despite the imposition of content balancing constraint in the test design. The study had implications for implementing automated test designs in the school systems supported by hardware and expertise in measurement theory and addresses the issue of content balancing using optimal test designs within an adaptive testing framework.
|
Page generated in 0.1158 seconds