Global ETD Search

621	Equating tests under the Generalized Partial Credit Model Swediati, Nonny 01 January 1997 (has links) The efficacy of several procedures for equating tests when the scoring is based on the Partial Credit item response theory was investigated. A simulation study was conducted to investigate the effect of several factors on the accuracy of equating for tests calibrated using the Partial Credit Model. The factors manipulated were the number of anchor items, the difference in the ability distributions of the examinee groups that take alternate forms of a test, the sample size of the groups taking the tests, and the equating method. The data for this study were generated according to the Generalized Partial Credit model. Test lengths of 5 and 20 items were studied. The number of items in the anchor test ranged from two to four for the five item test while the number of anchor items ranged from two to eight items in the twenty item test. Two levels of sample size (500 and 1000) and two levels of ability distribution (equal and unequal) were studied. The equating methods studied were four variations of the Mean and Sigma method and the characteristic curve method. The results showed that the characteristic curve method was the most accurate equating method under all conditions studied. The second most effective method of equating was the Mean and Sigma method which used the all the step difficulty parameter estimates in the computation of the equating constants. In general, all equating methods produced reasonably accurate equating with long tests and with a large number of anchor items when there was no mean difference in ability of the two groups. When there was a large ability difference between the two groups of examinees taking the test, item parameters were estimated poorly, particularly in short tests, and this in turn affected the equating methods adversely. The conclusion is that poor parameter estimation makes it difficult to equate tests which are administered to examinee groups that differ greatly in ability, especially when the tests are relatively short and when the number of anchor items is small. Educational evaluation
622	An investigation of alternative approaches to scoring multiple response items on a certification exam Ma, Xiaoying 01 January 2004 (has links) Multiple-response (MR) items are items that have more than one correct answer. This item type is often used in licensure and achievement tests to accommodate situations where identification of a single correct answer no longer suffices or where multiple steps are required in solving a problem. MR items can be scored either dichotomously or polytomously. Polytomous scoring of MR items often employs some type of option weighting to assign differential point values to each of the response options. Weights for each option are defined a priori by expert judgments or derived empirically from item analysis. Studies examining the reliability and validity of differential option weighting methods have been based on classical test theory. Little or no research has been done to examine the usefulness of item response theory (IRT) models for deriving empirical weights, or to compare the effectiveness of different option weighting methods. The purposes of this study, therefore, were to investigate polytomous scoring methods for MR items and to evaluate the impacts different scoring methods may have on the reliability of the test scores, item and test information functions, as well as on measurement efficiency and classification accuracy. Results from this study indicate that polytomous scoring of the MR items did not significantly increase the reliability of the test, nor did it increase the test information functions drastically, probably due to 2/3 of the items being multiple-choice items, scored the same way across comparisons. However, substantial increase in test information function at the lower end of the score scale was observed under polytomous scoring schema. With respect to classification accuracy, the results were inconsistent across different samples; therefore, further study is needed. In summary, findings from this study suggest that polytomous scoring of MR items has the potential to increase the efficiency (as shown in increase in test information functions) of measurement and the accuracy of classification. Realizing these advantages, however, will be contingent on the quality and quantity of the MR items on the test. Further research is needed to evaluate the quality of the MR items and its effect on the effectiveness of polytomous scoring. Educational evaluation
623	An evaluation of automated scoring programs designed to score essays Khaliq, Shameem Nyla 01 January 2004 (has links) The number of performance assessment tasks has increased over the years because some constructs are best assessed in this manner. Though there are benefits to using performance tasks, there are also drawbacks. The problems with performance assessments include scoring time, scoring costs, and problems with human raters. One solution for overcoming the drawbacks of performance assessments is the use of automated scoring programs. There are several automated scoring programs designed to score essays and other constructed responses. Much research has been conducted on these programs by program developers; however, relatively little research has used external criteria to evaluate automated programs. The purpose of this study was to evaluate two popular automated scoring programs. The automated scoring programs were evaluated with respect to several criteria: the percent of exact and adjacent agreements, kappas, correlations, differences in score distributions, discrepant scoring, analysis of variance, and generalizability theory. The scoring results from the two automated scoring programs were compared to the scores from operational scoring and an expert panel of judges. The results indicated close similarity between the two scoring program regarding how they scored the essays. However, the results also revealed some subtle, but important, differences between the programs. One program exhibited higher correlations and agreement indices with both the operational and expert committee scores, although the magnitude of the different was small. Differences were also noted in the scores assigned to fake essays designed to trick the programs into providing a higher score. These results were consistent for both the full set of 500 scored essays and the subset of essays reviewed by the expert committee. Overall, both automated scoring programs did well judged on the criteria; however, one program did slightly better. The G-studies indicated that there were small differences among the raters and that the amount of error in the models was reduced as the number of human raters and automated scoring programs were increased. In summary, results suggest automated scoring programs can approximate scores given by human raters, but they differ with respect to proximity to operational and expert scores, and their ability to identify dubious essays. Educational evaluation
624	Using performance level descriptors to ensure consistency and comparability in standard -setting Khembo, Dafter January 01 January 2004 (has links) The need for fair and comparable performance standards in high-stakes examinations cannot be overstated. For examination results to be comparable over time, uniform performance standards need to be applied to different cohorts of students taking different forms of the examination. The motivation to conduct a study on maintenance of the Malawi School Certificate of Education (MSCE) performance standards arose following the observation by the Presidential Commission of Enquiry into the MSCE Results that the examination was producing fluctuating results whose cause could not be identified and explained, except for blaming the standard setting procedure that was in use. This study was conducted with the following objectives: (1) to see if use of performance level descriptors could ensure consistency in examination standards; (2) to assess the role of training of judges in standard setting; and (3) to examine the impact of judges' participation in scoring students' written answers prior to being involved in setting examination standards. To maintain examination standards over years means assessing different cohorts of students taking different forms of the examination using common criteria. In this study, common criteria, in the form of performance level descriptors, were developed and applied to the 2002 and 2003 MSCE Mathematics examination, using the item score string estimation (ISSE) standard setting method. Twenty MSCE mathematics experts were purposely identified and trained to use the method. Results from the study demonstrated that performance level descriptors, especially when used in concert with test equating, can help greatly determine grading standards that can be maintained from year to year by reducing variability in performance standards due to ambiguity about what it means to achieve each grade category. It has also been shown in this study that preparing judges to set performance standards is an important factor for producing quality standard setting results. At the same time, the results did not support a recommendation for judges to gain experience as scorers prior to participating in standard setting activities. Educational evaluation
625	Perceptions of contemporary effects of colonialism among educational professionals in Ghana Fletcher, Kingsley Atterh 01 January 2013 (has links) This research study examined perceptions of contemporary effects of colonialism among education professionals in Ghana, and the extent to which education professionals express awareness of colonialism in Ghanaian school systems and contemporary Ghanaian society. An overview of literature in Critical Race Theory, Social Justice Education Theory, Oppression Theory and Post-Colonial Theory provided the theoretical foundation that was used to guide this study. Five factors emerged from this literature review as a framework for analysis of study data. These five factors included discourse, cultural imperialism, linguistic hegemony, racism and internalized racism, and oppression. The study participants included education policy makers, administrators, counselors, teachers, and teacher educators in the educational system of Ghana. A set of thirty-two individual interviews and six focus groups comprised of twenty-seven participants were conducted in which educators described their perspectives of Ghanaian society and Ghanaian educational systems in their own words in response to a predetermined set of twelve questions. A document analysis established a baseline of data regarding the curriculum of Ghanaian schools as presented in curriculum guides, textbooks, and policy statements, handbooks and reports that describe the educational systems in Ghana today. Ghanaian educators expressed the most awareness of colonial legacies related to cultural imperialism, linguistic hegemony, internalized oppression and discourse. The findings suggest that educational professionals in Ghana demonstrate limited awareness of colonial legacies of racism and internalized racism, sexism, classism, ethnoreligious oppression and neocolonialism. Educational sociology
626	Validity issues in standard setting Meara, Kevin Charles 01 January 2001 (has links) Standard setting is an important yet controversial aspect of testing. In credentialing, pass-fail decisions must be made to determine who is competent to practice in a particular profession. In education, decisions based on standards can have tremendous consequences for students, parents and teachers. Standard setting is controversial due to the judgmental nature of the process. In addition, the nature of testing is changing. With the increased use of computer based testing and new item formats, test-centered methods may no longer be applicable. How are testing organizations currently setting standards? How can organizations gather validity evidence to support their standards? This study consisted of two parts. The purpose of the first part was to learn about the procedures credentialing organizations use to set standards on their primary exam. A survey was developed and mailed to 98 credentialing organizations. Fifty-four percent of the surveys were completed and returned. The results indicated that most organizations used a modified Angoff method, however, no two organizations used exactly the same procedure. In addition, the use of computer based testing (CBT) and new item formats has increased during the past ten years. The results were discussed in terms of ways organizations can alter their procedures to gather additional validity evidence. The purpose of the second part was to conduct an evaluation of the standard-setting process used by a state department of education. Two activities were conducted. First, the documentation was evaluated, and second, secondary data analyses (i.e., contrasting groups analysis and cluster analysis) were conducted on data made available by the state. The documentation and the contrasting groups indicated that the standards were set with care and diligence. The results of the contrasting groups, however, also indicated that the standards in some categories might be a bit high. In addition, some of the score categories were somewhat narrow in range. The information covered in this paper might be useful for practitioners who must validate the standards they create. Educational evaluation
627	Development and evaluation of test assembly procedures for computerized adaptive testing Robin, Frederic 01 January 2001 (has links) Computerized adaptive testing provides a flexible and efficient framework for the assembly and administration of on-demand tests. However, the development of practical test assembly procedures that can ensure desired measurement, content, and security objectives for all individual tests, has proved difficult. To address this challenge, desirable test specifications, such as minimum test information targets, minimum and maximum test content attributes, and item exposure limits, were identified. Five alternative test assembly procedures where then implemented, and extensive computerized adaptive testing simulations were conducted under various test security and item pool size conditions. All five procedures implemented were modeled based on the weighted deviation model and optimized to produce the most acceptable compromise between testing objectives. As expected, the random (RD) and maximum information (MI) test assembly procedures resulted in the least acceptable tests—producing either the most informative but least secure and efficient tests or the most efficient and secure but least informative tests—illustrating the need for compromise between competing objectives. The combined maximum information item selection and Sympson-Hetter unconditional exposure control procedure (MI-SH) allowed for more acceptable compromise between testing objectives but demonstrated only moderate levels of test security and efficiency. The more sophisticated combined maximum information and Stocking and Lewis conditional exposure control procedure (MI-SLC) demonstrated both high levels of testing security and efficiency while providing acceptable measurement. Results obtained with the combined maximum information and stochastic conditional exposure control procedure (MI-SC) were similar to those obtained with MI-SLC. However, MI-SC offers the advantage of not requiring extensive preliminary simulations and allows for more flexibility in the removal or replacement of faulty items from operational pools. The importance of including minimum test information targets in the testing objectives was supported by the relatively large variability of test information observed for all the test assembly procedures used. Failure to take this problem into account when test assembly procedures are operationalized is likely to results in the administration of sub-standard tests to many examinees. Concerning pool management, it was observed that increasing pool size beyond what is needed to satisfy all testing objectives actually reduced testing efficiency. Educational evaluation
628	The new /given index: A measure to explore, evaluate, and monitor eDiscourse in educational conferencing applications Welts, Dana Raymond 01 January 2002 (has links) This dissertation addresses the limited measures available to conduct comparative linguistic analysis across spoken, written and eDiscourse environments and proposes a new measure—the new/given index. The new/given construct of Halliday and Clark is reviewed as well as the relevant literature of eDiscourse and other persistent electronic communication. A data set of writing samples, face to face meeting transcripts, and electronic conferences is assembled and used to test and validate the new/given index. The data are reviewed and scored by raters for new and given material and the rater scores are compared with the score generated by the new/given index software parser. The data suggest that the new/given index reliably reports the presence of new and given information in processed text and provides a measure of the efficiency with which this text is resolved or grounded in discourse. The data are further processed by the software parser and aggregate new/given indices for the data types are generated. This analysis reveals that statistically significant differences between the new/give index of written text, transcriptions of face to face discussion, and eDiscourse conferencing transcripts exist. Finally, a qualitative analysis based on interviews with the creators of the data set explore their experience in the eDiscourse conferencing environment and the relation between individual behavior in a group problem-solving situation and an individuals new/given index in an eDiscourse environment. The study concludes with suggestions for the application of the new/given index in eDiscourse and other persistent electronic communication environments. Educational software
629	Investigation of the validity of the Angoff standard setting procedure for multiple -choice items Mattar, John D 01 January 2000 (has links) Setting passing standards is one of the major challenges in the implementation of valid assessments for high-stakes decision making in testing situations such as licensing and certification. If high stakes pass-fail decisions are to be made from test scores, the passing standards must be valid for the assessment itself to be valid. Multiple-choice test items continue to play an important role in measurement. The Angoff (1971) procedure continues to be widely used to set standards on multiple-choice examinations. This study focuses on the internal consistency, or underlying validity, of Angoff standard setting ratings. The Angoff procedure requires judges to estimate the proportion of borderline candidates who would answer each test question correctly. If the judges are successful at estimating the difficulty of items for borderline candidates that suggests an underlying validity to the procedure. This study examines this question by evaluating the relationships among Angoff standard setting ratings and actual candidate performance from professional certification tests. For each test, a borderline group of candidates was defined as those near the cutscore. The analyses focus on three aspects of judges' ratings with respect to item difficulties for the borderline group: accuracy, correlation and variability. The results of this study demonstrate some evidence for the validity of the Angoff standard setting procedure. For two of the three examinations studied, judges were accurate and consistent in rating the difficult of items for borderline candidates. However, the study also shows that the procedure may be less successful in its application. These results indicate that the procedure can be valid, but that its validity should be checked for each application. Practitioners should not assume that the Angoff method is valid. The results of this study also show some limitations to the procedure even when the overall results are positive. Judges are less successful at rating very difficult or very easy test items. The validity of the Angoff procedure may be enhanced by further study of methods designed to ameliorate those limitations. Educational evaluation
630	The Role of Vocabulary Knowledge, Syntactic Awareness and Metacognitive Awareness in Reading Comprehension of Adult English Language Learners Unknown Date (has links) The importance of vocabulary knowledge, syntactic awareness and metacognitive awareness in reading comprehension has been established in the first language research. By contrast, fewer studies have documented the role of these components in the reading comprehension of English language learners (ELLs) in the field of second language (L2) research. The proposed study specifically focused on an L2-only model to examine the role of L2 vocabulary knowledge, syntactic awareness and metacognitive awareness of reading strategies in L2 reading comprehension with 278 Chinese college students majoring in English. More specifically, First, confirmatory factor analysis and structural equation modeling were used to (1) evaluate whether vocabulary knowledge, syntactic awareness and metacognitive awareness were distinguishable psychological constructs, and (2) examine the strength of the relations between each construct with reading comprehension. Second, the following questions were addressed: (1) whether poor L2 readers are inferior to good L2 readers in syntactic awareness, vocabulary knowledge and metacognitive awareness of reading strategies (MANCOVA was used to address this question); (2) whether the correlations among vocabulary knowledge, syntactic awareness and metacognitive awareness in reading comprehension were different for poor L2 readers and good L2 readers; and (3) whether the relation between each of three constructs vocabulary knowledge, syntactic awareness and metacognitive awareness to reading comprehension differ across the poor-reader and good-reader groups. The multigroup analyses were conducted using structural equation modeling. 278 undergraduates whose native language is Chinese, enrolled as English majors, from 3 Chinese universities participated. Those with TOEFL reading scores in the sample's top and bottom 25% were identified as good and poor readers. Eight assessments were administered concurrently, with two measures each of vocabulary knowledge, syntactic awareness, metacognitive awareness, and reading comprehension. Vocabulary knowledge was assessed using the Vocabulary Level Test (Nation, 1990) and the Depth of Vocabulary Knowledge Measure (Dian & Mary, 2004). The Sentence Combination Subtest of the Test of Adolescent and Adult Language (Hammill, Brown, Larsen & Wiederholt, 2007) and the Syntactic Awareness Questionnaire (Layton, Robinson & Lawson, 1998) were used as indicators of syntactic awareness. The Metacognitive Reading Strategies Questionnaire (Taraban, Kerr & Ryneason, 2004) and the Metacognitive Reading Awareness Inventory (Miholic, 1994) assessed the construct of metacognitive awareness of reading strategies. Reading ability was assessed by using the Test of English as a Foreign Language (TOEFL) Reading Comprehension Subtest (Schedl, Thomas & Way, 1995) and the Gray Silent Reading Test (Third-Edition; Blalock & Weiderholt, 2000). These were all paper and pencil, group administered assessments, which participants completed in a counterbalanced order. Confirmatory factor analysis suggested the two-factor model of Vocabulary Knowledge/Syntactic Awareness and Metacogntive Awareness offered the best fit to the data. Structural equation modeling indicated that 87% variance in reading comprehension is explained by the Vocabulary Knowledge/Syntactic Awareness and Metacognitive Awareness factors taken together. However, Vocabulary Knowledge/Syntactic Awareness has a stronger relationship to reading comprehension than metacognitive awareness does. MANCOVA indicated significant differences between poor and good readers in both constructs. Multigroup analyses using structural equation modeling suggested the correlation between the Vocabulary Knowledge/Syntactic Awareness and Metacognitive Awareness in poor readers was the same across poor-reader and good-reader groups. Similarly, the pattern of relations of Vocabulary Knowledge/Syntactic Awareness and Metacognitive Awareness to reading comprehension remained constant across the poor-reader and good-reader groups. / A Dissertation submitted to the Department of Educational Psychology and Learning System in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester, 2008. / April 14, 2008. / Metacognitive, Syntax, Reading, Vocabulary, English Language Learners / Includes bibliographical references. / Alysia D. Roehrig, Professor Directing Dissertation; Richard K. Wagner, Outside Committee Member; Akihito Kamata, Committee Member; Beth M. Phillips, Committee Member. Educational psychology

Search results