Global ETD Search

171	Die ontwerp van 'n meervoudige evalueringstelsel vir onderwysers aan die sekondêre skool Grobler, Bernardus Rudolf 04 November 2014 (has links) D.Ed. (Educational Management) / Please refer to full text to view abstract Teachers - Rating of Educational evaluation
172	Small sample IRT item parameter estimates Setiadi, Hari 01 January 1997 (has links) Item response theory (IRT) has great potential for solving many measurement problems. The success of specific IRT applications can be obtained only when the fit between the model and the test data is satisfactory. But model fit is not the only concern. Many tests are administered to relatively small numbers of examinees. If sample sizes are small, item parameter estimates will be of limited usefulness. There appear to be a number of ways that estimation might be improved. The purpose of this study was to investigate IRT parameter estimation using several promising small sample procedures. Computer simulation was used to generate the data. Two item banks were created with items described by a three parameter logistic model. Tests of length 30 and 60 items were simulated; examinee samples of 100, 200, and 500 were used in item calibration. Four promising models and associated estimation procedures were selected: (1) the one-parameter logistic model, (2) a modified one-parameter model in which a constant value for the "guessing parameter" was assumed, (3) a non-parametric three parameter model (called "Testgraf"), and (4) a one-parameter Bayesian model (with a variety of priors on the item difficulty parameter). Several criteria were used in evaluating the estimates. The main results were that (1) the modified one-parameter model seemed to consistently lead to the best estimates of item difficulty and examinee ability compared to the Rasch model and the non-parametric three-parameter model and related estimation procedures (the finding was observed across both test lengths and all three sample sizes and seemed to be true with both normal and rectangular distributions of ability), (2) the Bayesian estimation procedures with reasonable priors led to comparable results to the modified one-parameter model, and (3) the results with Testgraf, for the smallest sample of 100, typically led to the poorest results. Future studies seem justified to (1) replicate the findings with more relevant evaluation criteria, (2) determine the source of the problem with Testgraf and small samples/short tests, and (3) further investigate the utility of Bayesian estimation procedures.
173	Accuracy of parameter estimation on polytomous IRT models Park, Chung 01 January 1997 (has links) Procedures based on item response theory (IRT) are widely accepted for solving various measurement problems which cannot be solved using classical test theory (CTT) procedures. The desirable features of dichotomous IRT models over CTT are well known and have been documented by Hambleton, Swaminathan, and Rogers (1991). However, dichotomous IRT models are inappropriate for situations where items need to be scored in more than two categories. For example, in performance assessments, most of the scoring rubrics for performance assessment require scoring of examinee's responses in ordered categories. In addition, polytomous IRT models are useful for assessing an examinee's partial knowledge or levels of mastery. However, the successful application of polytomous IRT models to practical situations depends on the availability of reasonable and well-behaved estimates of the parameters of the models. Therefore, in this study, the behavior of estimators of parameters in polytomous IRT models were examined. In the first study, factors that affected the accuracy, variance, and bias of the marginal maximum likelihood (MML) estimators in the generalized partial credit model (GPCM) were investigated. Overall, the results of the study showed that the MML estimators of the parameters of the GPCM, as obtained through the computer program, PARSCALE, performed well under various conditions. However, there was considerable bias in the estimates of the category parameters under all conditions investigated. The average bias did not decrease when sample size and test length increased. The bias contributed to large RMSE in the estimation of category parameters. Further studies need to be conducted to study the effect of bias in the estimates of parameters on the estimation of ability, the development of item banks, and on adaptive testing based on polytomous IRT models. In the second study, the effectiveness of Bayesian procedures for estimating parameters in the GPCM was examined. The results showed that Bayes procedures provided more accurate estimates of parameters with small data sets. Priors on the slope parameters, while having only a modest effect on the accuracy of estimation of slope parameters, had a very positive effect on the accuracy of estimation of the step difficulty parameters.
174	Linking multiple -choice and constructed -response items to a common proficiency scale Bastari, B 01 January 2000 (has links) Tests consisting of both multiple-choice and constructed-response items have gained in popularity in recent years. The evidence shows that many assessment programs have administered these two item formats in the same test. However, linking these two item formats on a common scale has not been thoroughly studied. Even though several methods for linking scales under item response theory (IRT) have been developed, many studies have addressed multiple-choice items only and only a few studies have addressed constructed-response items. No linking studies have addressed both item formats in the same assessment. The purpose of this study was to investigate the effects of several factors on the accuracy of linking item parameter estimates onto a common scale using the combination of the three-parameter logistic (3-PL) model for multiple-choice items with the graded response model (GRM) for constructed-response items. Working with an anchor-test design, the factors considered were: (1) test length, (2) proportion of items of each format in the test, (3) anchor test length, (4) sample size, (5) ability distributions, and (6) method of equating. The data for dichotomous and polytomous responses for unique and anchor items were simulated to vary as a function of these factors. The main findings were as follows: the constructed-response items had a large influence in parameter estimation for both types of item formats. Generally, the slope parameters were estimated with small bias but large variance. Threshold parameters were also estimated with small bias but large variance for constructed-response items. However, the opposite results were obtained for multiple-choice items. For the guessing parameter estimates, the recovery was relatively good. The coefficients of transformation were also relatively well estimated. Overall, it was found that the following conditions led to more effective results: (1) a long test, (2) a large proportion of multiple-choice items in the test, (3) a long anchor test, (4) a large sample size, (5) no ability differences between the groups used in linking the two tests, and (6) the method of concurrent calibration. At the same time, more research will be necessary to expand the conditions, such as the introduction of multidimensional data, under which linking of item formats to a common scale is evaluated.
175	Measurements of student understanding on complex scientific reasoning problems Izumi, Alisa Sau-Lin 01 January 2004 (has links) While there has been much discussion of cognitive processes underlying effective scientific teaching, less is known about the response nature of assessments targeting processes of scientific reasoning specific to biology content. This study used multiple-choice (m-c) and short-answer essay student responses to evaluate progress in high-order reasoning skills. In a pilot investigation of student responses on a non-content-based test of scientific thinking, it was found that some students showed a pre-post gain on the m-c test version while showing no gain on a short-answer essay version of the same questions. This result led to a subsequent research project focused on differences between alternate versions of tests of scientific reasoning. Using m-c and written responses from biology tests targeted toward the skills of (1) reasoning with a model and (2) designing controlled experiments, test score frequencies, factor analysis, and regression models were analyzed to explore test format differences. Understanding the format differences in tests is important for the development of practical ways to identify student gains in scientific reasoning. The overall results suggested test format differences. Factor analysis revealed three interpretable factors—m-c format, genetics content, and model-based reasoning. Frequency distributions on the m-c and open explanation portions of the hybrid items revealed that many students answered the m-c portion of an item correctly but gave inadequate explanations. In other instances students answered the m-c portion incorrectly yet demonstrated sufficient explanation or answered the m-c correctly and also provided poor explanations. When trying to fit test score predictors for non-associated student measures—VSAT, MSAT, high school grade point average, or final course grade—the test scores accounted for close to zero percent of the variance. Overall, these results point to the importance of using multiple methods of testing and of further research and development in the area of assessment of scientific reasoning. Educational evaluation\|Science education
176	Content validity of independently constructed curriculum -based examinations Chakwera, Elias Watson Jani 01 January 2004 (has links) This study investigated the content validity of two independently constructed tests based on the Malawi School Certificate History syllabus. The key question was: To what extent do independently constructed examinations equivalently sample items from the same content and cognitive domains? This question was meant to examine the assumption that tests based on the same syllabus produce results that can be interpreted in similar manner in certification or promotion decisions on examinees without regard to the examination they took. In Malawi, such a study was important to provide evidence for the justification for using national examination results in placement and selection decisions. Based on Cronbach's (1971) proposal, two teams of three teachers were drawn from six schools that were purposefully selected to participate in this study. Each team constructed a test using the Malawi School Certificate of Education (MSCE) History syllabus. The two tests were put together in a common mock examination, which was first piloted before the final form. Two hundred examinees from the participating schools took the common mock examination. Paired scores from the two tests and the same examinees' scores on MSCE History 1A were used in the analysis of testing the mean difference of dependent samples and variance comparison. Subject matter experts' ratings were used to evaluate content and cognitive relevance of the items in the test. The findings indicate that MSCE syllabus was a well-defined operational universe of admissible observations because independently constructed tests equivalently tapped the same content. Their mean difference was not statistically different from zero and the mean of the squared difference scores was less than the sum of the split-half error variances. It was therefore, concluded that the two independently constructed were statistically equivalent. The two tests were also found to be statistically equivalent to the 2003 MSCE History 1A. However, the presence of stray items indicated syllabus looseness that needed redress to improve content coverage. Inadequacy in the rating of cognitive levels was noted as a problem for further research. The need to improve examinations was advocated in view of the their great influence in instruction and assessment decisions or practices.
177	Evaluating the effects of several multi -stage testing design variables on selected psychometric outcomes for certification and licensure assessment Zenisky, April L 01 January 2004 (has links) Computer-based testing is becoming popular with credentialing agencies because new test designs are possible and the evidence is clear that these new designs can increase the reliability and validity of candidate scores and pass/fail decisions. Research on MST to date suggests that the measurement quality of MST results is comparable to full-fledged computer-adaptive tests and improved over computerized fixed-form tests. MST's promise dwells in this potential for improved measurement with greater control than other adaptive approaches for constructing test forms. Recommending use of the MST design and advising how best to set up the design, however, are two different things. The purpose of the current simulation study was to advance an established line of research on MST methodology by enhancing understanding of how several important design variables affect outcomes for high-stakes credentialing. Modeling of the item bank, the candidate population, and the statistical characteristics of test items reflect an operational credentialing exam's conditions. Studied variables were module arrangement (4 designs), amount of overall test information (4 levels), distribution of information over stages (2 variations), strategies for between-stage routing (4 levels), and pass rates (3 levels), for 384 conditions total. Results showed that high levels of decision accuracy (DA) and decision consistency (DC) were consistently observed, even when test information was reduced by as much as 25%. No differences due to the choice of module arrangement were found. With high overall test information, results were optimal when test information was divided equally among stages; with reduced test information gathering more test information at Stage 1 provided the best results. Generalizing simulation study findings is always problematic. In practice, psychometric models never completely explain candidate performance, and with MST, there is always the potential psychological impact on candidates if test difficulty shifts are noticed. At the same time, two findings seem to stand out in this research: (1) with limited amounts of overall test information, it may be best to capitalize on available information with accurate branching decisions early, and (2) there may be little statistical advantage in exceeding test information much above 10 as gains in reliability and validity appear minimal.
178	An item modeling approach to descriptive score reports Huff, Kristen Leigh 01 January 2003 (has links) One approach to bridging the gap between cognitively principled assessment, instruction, and learning is to provide the score user with meaningful details about the examinee's test performance. Several researchers have demonstrated the utility of modeling item characteristics, such as difficulty, in light of item features and the cognitive skills required to solve the item, as a way to link assessment and instructional feedback. The next generation of the Test of English as a Foreign Language (TOEFL) will be launched in 2005, with new task types that integrate listening, reading, writing and speaking—the four modalities of language. Evidence centered design (ECD) principles are being used to develop tasks for the new TOEFL assessment. ECD provides a framework within which to design tasks, to link information gathered from those tasks back to the target of inference through the statistical model, and to evaluate each facet of the assessment program in terms of its connection to the test purpose. One of the primary goals of the new exam is to provide users with a score report that describes the English language proficiencies of the examinee. The purpose of this study was to develop an item difficulty model as the first step in generating descriptive score reports for the new TOEFL assessment. Task model variables resulting from the ECD process were used as the independent variables, and item difficulty estimates were used as the dependent variable in the item difficulty model. Tree-based regression was used to estimate the nonlinear relationships among the item and stimulus features and item difficulty. The proposed descriptive score reports capitalized on the item features that accounted for the most variance in item difficulty. The validity of the resulting proficiency statements were theoretically supported by the links among the task model variables and student model variables evidenced in the ECD task design shells, and empirically supported by the item difficulty model. Directions for future research should focus on improving the predictors in the item difficulty model, determining the most appropriate proficiency estimate categories, and comparing item difficulty models across major native language groups. Educational evaluation\|Language arts
179	Educating for Democratic Citizenship: An Analysis of the Role of Teachers in Implementing Civic Education Policy in Madagascar Unknown Date (has links) In democratizing states around the world, civic education programs have long formed a critical component of government and donor strategy to support the development of civil society and strengthen citizens' democratic competencies, encompassing the knowledge, attitudes and skills required for them to become informed and actively engaged participants in the economic and social development of their country. Such programs, however, have had limited success. Despite research that has identified critical components of successful democratic civic education programs, including the use of learner-centered methods and experiential civic learning opportunities rooted in real-world contexts, these programs continue to produce weak results. This study targets an under-examined link in the policy-to-practice chain: the teachers themselves. By applying a qualitative, grounded theory approach to analyze interview and observation data collected from public primary schools, teacher training institutes and other key sites in Madagascar where best practices in civic education have recently been adopted, this research presents original insight into the ways in which teachers conceptualize and execute their role as civic educator in a democratizing state. The impact of training and the diverse obstacles emerging from political and economic underdevelopment are examined and analyzed. Emerging from this analysis, a new approach to conceptualizing civic education programs is proposed in which a direct ('front-door') and an indirect ('back-door') approach to the development of democracy through civic education are assigned equal credence as legitimate, situationally-appropriate alternatives to utilize in the effort to strengthen political institutions, civil society and citizen participation in developing democracies around the world. / A Dissertation submitted to the Department of Educational Leadership and Policy Studies in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester, 2010. / October 27, 2010. / Democracy, Civic Education, Citizenship, Teacher Training, Madagascar, Learner-Centered Pedagogy, Active Methods, Democratization, Sub-Saharan Africa / Includes bibliographical references. / Peter Easton, Professor Directing Dissertation; Jim Cobbe, University Representative; Sande Milton, Committee Member; Jeff Milligan, Committee Member. Educational evaluation Education--Research
180	Weighting procedures for robust ability estimation in item response theory Skorupski, William P 01 January 2004 (has links) Methods of ability parameter estimation in educational testing are subject to the biases inherent in various estimation procedures. This is especially true in the case of tests whose properties do not meet the asymptotic assumptions of estimation procedures like Maximum Likelihood Estimation. The item weighting procedures in this study were developed as a means to improve the robustness of such ability estimates. A series of procedures to weight the contribution of items to examinees' scores are described and empirically tested using a simulation study under a variety of reasonable conditions. Item weights are determined to minimize the contribution of some items while simultaneously maximizing the contribution of others. These procedures differentially weight the contribution of items to examinees' scores, by accounting for either (1) the amount of information with respect to trait estimation, or (2) the relative precision of item parameter estimates. Results indicate that weighting by item information produced ability estimates that were moderately less biased at the tails of the ability distribution and had substantially lower standard errors than scores derived from a traditional item response theory framework. Areas for future research using this scoring method are suggested.

Search results