11 |
Evaluator contextual responsivenessAzzam, Tarek, January 2007 (has links)
Thesis (Ph. D.)--UCLA, 2007. / Vita. Includes bibliographical references (leaves 179-185).
|
12 |
Parent satisfaction in a summer enrichment program evaluation year two /Wartenburg, Kim Michelle. January 2005 (has links)
Theses (Ed.S.)--Marshall University, 2005. / Title from document title page. Includes abstract. Document formatted into pages: contains v, 31 p. Bibliography: p. 18-20.
|
13 |
Student Choice and Student EngagementTravis, Joellyn Marie 12 December 2017 (has links)
<p> The focus of this study was school transformation to accommodate “new literacies, skills, and dispositions that students need to flourish in a networked world” (Richardson, 2016, p. ix). Many schools operate within a traditional model developed during the Industrial Revolution to fit the need for efficiency and compliance (Robinson & Aronica, 2015). However, according to Robinson and Aronica (2015), “These systems are inherently unsuited to the wholly different circumstances of the twenty-first century” (p. xxiii). The purpose of this study was to determine if student choice of where to sit or type of seating positively impact student engagement. Observations were conducted in classrooms to identify whether students had a choice in where they sat; the types of seating available; and whether each student was engaged, compliant, or off-task as defined by a scoring guide. It was determined there is a positive significant difference in the engagement level of students who have a choice in where they sit as compared to students who are assigned to seats. It was also determined there is a positive significant difference in the engagement level of students who were offered flexible seating options compared to students who were seated in traditional desks or at tables with chairs. There are many opportunities to learn from this study and to change educational practices based on the theoretical framework about student engagement and the decline in student engagement according to Gallup polls (Gallup, 2016). The findings of this study bring additional awareness to student engagement and what factors impact learning in the classroom.</p><p>
|
14 |
A Program Evaluation of Check and Connect for Successful School CompletionRiggans-Curtis, Nicole 30 June 2017 (has links)
<p> School leaders at an urban public high school implemented the Check and Connect (C&C) program to improve student engagement outcomes for at-risk students in 2010–2011. No formal program evaluation of C&C had been conducted in the 2012–2013, 2013–2014, and 2014–2015 school years to show whether the program was effective. The purpose of this study was to investigate the relationship between successful school completion and participation in the C&C program. A quantitative, quasi-experimental program evaluation was conducted to determine whether C&C’s student-related variables including cohort, gender, ethnicity, socioeconomic status, and truancy predicted students’ successful school completion. Archival data of students eligible for graduation (<i>N</i> = 668) were analyzed using chi square tests and logistic regression. Results showed that the model, including C&C participation and all student-related variables, was significant in explaining the variance for successful school completion. Follow-up analyses revealed that C&C participation for the 2013 graduation cohort only, females, and low truancy students were significantly more likely to complete school, suggesting a need for further investigation of the program’s implementation strategy. An evaluation report was developed with recommendations to evaluate C&C for implementation fidelity and to consider the use of observable indicators to recruit students for C&C participation who may require targeted or intensive interventions for successful school completion. This endeavor may contribute to positive social change by informing stakeholders of C&C’s effectiveness, helping leaders make future decisions about how to approach program implementation and evaluation, and increasing successful school completion. </p>
|
15 |
Equating tests under the Generalized Partial Credit ModelSwediati, Nonny 01 January 1997 (has links)
The efficacy of several procedures for equating tests when the scoring is based on the Partial Credit item response theory was investigated. A simulation study was conducted to investigate the effect of several factors on the accuracy of equating for tests calibrated using the Partial Credit Model. The factors manipulated were the number of anchor items, the difference in the ability distributions of the examinee groups that take alternate forms of a test, the sample size of the groups taking the tests, and the equating method. The data for this study were generated according to the Generalized Partial Credit model. Test lengths of 5 and 20 items were studied. The number of items in the anchor test ranged from two to four for the five item test while the number of anchor items ranged from two to eight items in the twenty item test. Two levels of sample size (500 and 1000) and two levels of ability distribution (equal and unequal) were studied. The equating methods studied were four variations of the Mean and Sigma method and the characteristic curve method. The results showed that the characteristic curve method was the most accurate equating method under all conditions studied. The second most effective method of equating was the Mean and Sigma method which used the all the step difficulty parameter estimates in the computation of the equating constants. In general, all equating methods produced reasonably accurate equating with long tests and with a large number of anchor items when there was no mean difference in ability of the two groups. When there was a large ability difference between the two groups of examinees taking the test, item parameters were estimated poorly, particularly in short tests, and this in turn affected the equating methods adversely. The conclusion is that poor parameter estimation makes it difficult to equate tests which are administered to examinee groups that differ greatly in ability, especially when the tests are relatively short and when the number of anchor items is small.
|
16 |
An investigation of alternative approaches to scoring multiple response items on a certification examMa, Xiaoying 01 January 2004 (has links)
Multiple-response (MR) items are items that have more than one correct answer. This item type is often used in licensure and achievement tests to accommodate situations where identification of a single correct answer no longer suffices or where multiple steps are required in solving a problem. MR items can be scored either dichotomously or polytomously. Polytomous scoring of MR items often employs some type of option weighting to assign differential point values to each of the response options. Weights for each option are defined a priori by expert judgments or derived empirically from item analysis. Studies examining the reliability and validity of differential option weighting methods have been based on classical test theory. Little or no research has been done to examine the usefulness of item response theory (IRT) models for deriving empirical weights, or to compare the effectiveness of different option weighting methods. The purposes of this study, therefore, were to investigate polytomous scoring methods for MR items and to evaluate the impacts different scoring methods may have on the reliability of the test scores, item and test information functions, as well as on measurement efficiency and classification accuracy. Results from this study indicate that polytomous scoring of the MR items did not significantly increase the reliability of the test, nor did it increase the test information functions drastically, probably due to 2/3 of the items being multiple-choice items, scored the same way across comparisons. However, substantial increase in test information function at the lower end of the score scale was observed under polytomous scoring schema. With respect to classification accuracy, the results were inconsistent across different samples; therefore, further study is needed. In summary, findings from this study suggest that polytomous scoring of MR items has the potential to increase the efficiency (as shown in increase in test information functions) of measurement and the accuracy of classification. Realizing these advantages, however, will be contingent on the quality and quantity of the MR items on the test. Further research is needed to evaluate the quality of the MR items and its effect on the effectiveness of polytomous scoring.
|
17 |
An evaluation of automated scoring programs designed to score essaysKhaliq, Shameem Nyla 01 January 2004 (has links)
The number of performance assessment tasks has increased over the years because some constructs are best assessed in this manner. Though there are benefits to using performance tasks, there are also drawbacks. The problems with performance assessments include scoring time, scoring costs, and problems with human raters. One solution for overcoming the drawbacks of performance assessments is the use of automated scoring programs. There are several automated scoring programs designed to score essays and other constructed responses. Much research has been conducted on these programs by program developers; however, relatively little research has used external criteria to evaluate automated programs. The purpose of this study was to evaluate two popular automated scoring programs. The automated scoring programs were evaluated with respect to several criteria: the percent of exact and adjacent agreements, kappas, correlations, differences in score distributions, discrepant scoring, analysis of variance, and generalizability theory. The scoring results from the two automated scoring programs were compared to the scores from operational scoring and an expert panel of judges. The results indicated close similarity between the two scoring program regarding how they scored the essays. However, the results also revealed some subtle, but important, differences between the programs. One program exhibited higher correlations and agreement indices with both the operational and expert committee scores, although the magnitude of the different was small. Differences were also noted in the scores assigned to fake essays designed to trick the programs into providing a higher score. These results were consistent for both the full set of 500 scored essays and the subset of essays reviewed by the expert committee. Overall, both automated scoring programs did well judged on the criteria; however, one program did slightly better. The G-studies indicated that there were small differences among the raters and that the amount of error in the models was reduced as the number of human raters and automated scoring programs were increased. In summary, results suggest automated scoring programs can approximate scores given by human raters, but they differ with respect to proximity to operational and expert scores, and their ability to identify dubious essays.
|
18 |
Using performance level descriptors to ensure consistency and comparability in standard -settingKhembo, Dafter January 01 January 2004 (has links)
The need for fair and comparable performance standards in high-stakes examinations cannot be overstated. For examination results to be comparable over time, uniform performance standards need to be applied to different cohorts of students taking different forms of the examination. The motivation to conduct a study on maintenance of the Malawi School Certificate of Education (MSCE) performance standards arose following the observation by the Presidential Commission of Enquiry into the MSCE Results that the examination was producing fluctuating results whose cause could not be identified and explained, except for blaming the standard setting procedure that was in use. This study was conducted with the following objectives: (1) to see if use of performance level descriptors could ensure consistency in examination standards; (2) to assess the role of training of judges in standard setting; and (3) to examine the impact of judges' participation in scoring students' written answers prior to being involved in setting examination standards. To maintain examination standards over years means assessing different cohorts of students taking different forms of the examination using common criteria. In this study, common criteria, in the form of performance level descriptors, were developed and applied to the 2002 and 2003 MSCE Mathematics examination, using the item score string estimation (ISSE) standard setting method. Twenty MSCE mathematics experts were purposely identified and trained to use the method. Results from the study demonstrated that performance level descriptors, especially when used in concert with test equating, can help greatly determine grading standards that can be maintained from year to year by reducing variability in performance standards due to ambiguity about what it means to achieve each grade category. It has also been shown in this study that preparing judges to set performance standards is an important factor for producing quality standard setting results. At the same time, the results did not support a recommendation for judges to gain experience as scorers prior to participating in standard setting activities.
|
19 |
Validity issues in standard settingMeara, Kevin Charles 01 January 2001 (has links)
Standard setting is an important yet controversial aspect of testing. In credentialing, pass-fail decisions must be made to determine who is competent to practice in a particular profession. In education, decisions based on standards can have tremendous consequences for students, parents and teachers. Standard setting is controversial due to the judgmental nature of the process. In addition, the nature of testing is changing. With the increased use of computer based testing and new item formats, test-centered methods may no longer be applicable. How are testing organizations currently setting standards? How can organizations gather validity evidence to support their standards? This study consisted of two parts. The purpose of the first part was to learn about the procedures credentialing organizations use to set standards on their primary exam. A survey was developed and mailed to 98 credentialing organizations. Fifty-four percent of the surveys were completed and returned. The results indicated that most organizations used a modified Angoff method, however, no two organizations used exactly the same procedure. In addition, the use of computer based testing (CBT) and new item formats has increased during the past ten years. The results were discussed in terms of ways organizations can alter their procedures to gather additional validity evidence. The purpose of the second part was to conduct an evaluation of the standard-setting process used by a state department of education. Two activities were conducted. First, the documentation was evaluated, and second, secondary data analyses (i.e., contrasting groups analysis and cluster analysis) were conducted on data made available by the state. The documentation and the contrasting groups indicated that the standards were set with care and diligence. The results of the contrasting groups, however, also indicated that the standards in some categories might be a bit high. In addition, some of the score categories were somewhat narrow in range. The information covered in this paper might be useful for practitioners who must validate the standards they create.
|
20 |
Development and evaluation of test assembly procedures for computerized adaptive testingRobin, Frederic 01 January 2001 (has links)
Computerized adaptive testing provides a flexible and efficient framework for the assembly and administration of on-demand tests. However, the development of practical test assembly procedures that can ensure desired measurement, content, and security objectives for all individual tests, has proved difficult. To address this challenge, desirable test specifications, such as minimum test information targets, minimum and maximum test content attributes, and item exposure limits, were identified. Five alternative test assembly procedures where then implemented, and extensive computerized adaptive testing simulations were conducted under various test security and item pool size conditions. All five procedures implemented were modeled based on the weighted deviation model and optimized to produce the most acceptable compromise between testing objectives. As expected, the random (RD) and maximum information (MI) test assembly procedures resulted in the least acceptable tests—producing either the most informative but least secure and efficient tests or the most efficient and secure but least informative tests—illustrating the need for compromise between competing objectives. The combined maximum information item selection and Sympson-Hetter unconditional exposure control procedure (MI-SH) allowed for more acceptable compromise between testing objectives but demonstrated only moderate levels of test security and efficiency. The more sophisticated combined maximum information and Stocking and Lewis conditional exposure control procedure (MI-SLC) demonstrated both high levels of testing security and efficiency while providing acceptable measurement. Results obtained with the combined maximum information and stochastic conditional exposure control procedure (MI-SC) were similar to those obtained with MI-SLC. However, MI-SC offers the advantage of not requiring extensive preliminary simulations and allows for more flexibility in the removal or replacement of faulty items from operational pools. The importance of including minimum test information targets in the testing objectives was supported by the relatively large variability of test information observed for all the test assembly procedures used. Failure to take this problem into account when test assembly procedures are operationalized is likely to results in the administration of sub-standard tests to many examinees. Concerning pool management, it was observed that increasing pool size beyond what is needed to satisfy all testing objectives actually reduced testing efficiency.
|
Page generated in 0.177 seconds