Global ETD Search

91	Equating high-stakes educational measurements: A study of design and consequences Chulu, Bob Wajizigha 01 January 2006 (has links) The practice of equating educational and psychological tests to create comparable and interchangeable scores is increasingly becoming appealing to most testing and credentialing agencies. However, the Malawi National Examinations Board (MANEB) and many other testing organizations in Africa and Europe do not conduct equating and the consequences of not equating tests have not been clearly documented. Furthermore, there are no proper equating designs for some agencies to employ because they administer tests annually to different examinee' populations and they disclose all items after each administration. Therefore, the purposes of this study were to: (1) determine whether it was necessary to equate MANEB tests; (2) investigate consequences of not equating educational tests; and (3) explore the possibility of using an external anchor test that is administered separately from the target tests to equate scores. The study used 2003, 2004, and 2005 Primary School Leaving Certificate (PSLCE) Mathematics scores for two randomly equivalent groups of eighth grade examinees drawn from 12 primary schools in the Zomba district in Malawi. In the first administration, group A took the 2004 test while group B took the 2003 form. In the second administration both groups took an external anchor test and five weeks later, they both took the 2005 test. Data were analyzed using identity and log-linear methods, t-tests, decision consistency analyses, classification consistency analyses, and by computing reduction in uncertainty, and the root mean square difference indices. Both linear and post-smoothed equipercentile methods were used to equate test scores. The study revealed that: (1) score distributions and test difficulties were dissimilar across test forms signifying that equating is necessary; (2) classification of students into grade categories across forms were different before equating, but similar after equating; and (3) the external anchor test design performed in the same way as the random groups design. The results suggest that MANEB should equate tests scores to improve consistency of decisions and to match their distributions and difficulty levels across forms. Given the current policy of exam discloser, the use of an external anchor test that is administered separately from the operational form to equate score is recommended. Educational evaluation
92	Small -sample item parameter estimation in the three parameter logistic model: Using collateral information Keller Stowe, Lisa Ann 01 January 2002 (has links) The appeal of computer adaptive testing (CAT) is growing in the licensure, credentialing, and educational fields. A major promise of CAT is the more efficient measurement of an examinee's ability. However, for CAT to be successful, a large calibrated item bank is essential. As item selection depends on the proper calibration of items, and accurate estimation of the item information functions, obtaining accurate and stable estimates of item parameters is paramount. However, concerns of item exposure and test security require item parameter estimation with much smaller samples than is recommended. Therefore, the development of methods for small sample estimation is essential. The purpose of this study was to investigate a technique to improve small sample estimation of item parameters, as well as recovery of item information functions by using auxiliary information about item in the estimation process. A simulation study was conducted to examine the improvements in both item parameter and item information recovery. Several different conditions were simulated, including sample size, test length, and quality of collateral information. The collateral information was used to set prior distributions on the item parameters. Several prior distributions were placed on both the a - and b-parameters and were compared to each other as well as to the default options in BILOG. The results indicate that with some relatively good collateral information, nontrivial gains in both item parameter and item information recovery can be made. The current literature in automatic item generation indicates that such information is available for the prediction of item difficulty. The largest improvements were made in the bias of both the a-parameters and the information functions. The implications are that more accurate item selection can occur, leading to more accurate estimates of examinee ability. Educational evaluation
93	Evaluating the consistency and accuracy of proficiency classifications using item response theory Li, Shuhong 01 January 2006 (has links) As demanded by the No Child Left Behind (NCLB) legislation, state-mandated testing has increased dramatically, and almost all of these tests report examinee's performance in terms of several ordered proficiency categories. Like licensure exams, these assessments often have high-stakes consequences, such as graduation requirements and school accountability. It goes without saying that we want these tests to be of high quality, and the quality of these test instruments can be assessed, in part, through the decision accuracy (DA) and decision consistency (DC) indices. With the popularization of IRT, an increasing number of tests are adopting IRT for test development, test score equating and all other data analyses, which naturally calls for approaches to evaluating DA and DC in the framework of IRT. However, it is still common to observe the practice of carrying out all data analyses in IRT while reporting DA and DC indices derived in the framework of CTT. This situation testifies to the necessity to the exploration of possibilities to quantify DA and DC under IRT. The current project addressed several possible methods for estimating DA and DC in the framework of IRT with the specific focus on tests involving both dichotomous and polytomous items. It consisted of several simulation studies in which the all IRT methods introduced were valuated with simulated data, and all methods introduced were also be applied in a real data context to demonstrate their application in practice. Overall, the results from this study provided evidence that would support the use of the 3 IRT methods introduced in this project in estimating DA and DC indices in most of the simulated situations, and in most of the cases the 3 IRT methods produced results that were close to the "true" DA and DC values, and consistent results to (sometimes even better results than) those from the commonly used L&L method. It seems the IRT methods showed more robustness on the distribution shapes than on the test length. Their implications to educational measurement and some directions for future studies in this area were also discussed. Educational evaluation
94	USING RESIDUAL ANALYSES TO ASSESS ITEM RESPONSE MODEL-TEST DATA FIT (MEASUREMENT TESTING) MURRAY, LINDA NORINE 01 January 1985 (has links) Statistical tests are commonly used for studying item response model-test data fit. But, many of these tests have well-known problems associated with them. The biggest concern is the confounding of sample size in the interpretation of fit results. In the study, the fit of three item response models was investigated using a different approach: exploratory residual procedures. These residual techniques rely on the use of judgment for interpreting the size and direction of discrepancies between observed and expected examinee performances. The objectives of the study were to investigate if exploratory procedures involving residuals are valuable for judging instances of model-data fit, and to examine the fit of the one-parameter, two-parameter, and three-parameter logistic models to National Assessment of Educational Progress (NAEP) and Maryland Functional Reading Test (MFRT) data. The objectives were investigated by determining if judgments about model-data fit are altered if different variations of residuals are used in the analysis, and by examining fit at the item, ability, and overall test level using plots and simple summary statistics. Reasons for model misfit were sought by analyzing associations between the residuals and important item variables. The results showed that the statistics based on average raw and standardized residuals provided useful fit information, but that when compared, the statistics based on standardized residuals presented a more accurate picture of model-data fit and therefore, provided the best overall fit information. Other results revealed that with the NAEP and MFRT type of items, failure to consider variations in item discriminating power resulted in the one-parameter model providing substantially poorer fits to the data sets. Also, guessing on difficult NAEP multiple-choice items affected the degree of model-data fit. The main recommendation from the study is that because the residual analyses provide substantial amounts of empirical evidence about fit, practitioners should consider these procedures as one of the several types of strategies to employ when dealing with the goodness of fit question. Educational evaluation
95	INVESTIGATION OF JUDGES' ERRORS IN ANGOFF AND CONTRASTING-GROUPS CUT-OFF SCORE METHODS (STANDARD SETTING, MASTERY TESTING, CRITERION-REFERENCED TESTING) ARRASMITH, DEAN GORDON 01 January 1986 (has links) Methods for specifying cut-off scores for a criterion-referenced test usually rely on judgments about item content and/or examinees. Comparisons of cut-off score methods have found that different methods result in different cut-off scores. This dissertation focuses on understanding why and how cut-off score methods are different. The importance of this understanding is reflected in practitioners' needs to choose appropriate cut-off score methods, and to understand and control inappropriate factors that may influence the cut-off scores. First, a taxonomy of cut-off score methods was developed. The taxonomy identified the generic categories of setting cut-off scores. Second, the research focused on three methods for estimating the errors associated with setting cut-off scores: generalizability theory, item response theory and bootstrap estimation. These approaches were applied to Angoff and Contrasting-groups cut-off score methods. For the Angoff cut-off score method, the IRT index of consistency and analyses of the differences between judges' ratings and expected test item difficulty, provided useful information for reviewing specific test items that judges were inconsistent in rating. In addition, the generalizability theory and bootstrap estimates were useful for overall estimates of the errors in judges' ratings. For the Contrasting-groups cut-off score method, the decision accuracy of the classroom cut-off scores was useful for identifying classrooms in which the classification of students may need to be reviewed by teachers. The bootstrap estimate of the pooled sample of students provided a useful overall estimate of the errors in the resulting cut-off score. There are several extensions of this investigation that can be made. For example, there is a need to understand the magnitude of errors in relationship to the precision with which judges are able to rate test items or classify examinees; better ways of reporting and dealing with judges' inconsistencies need to be developed; and the analysis of errors needs to be extended to other cut-off score methods. Finally, these procedures can provide the operational criterion against which improvements and comparisons of cut-off score procedures can be evaluated. Educational evaluation
96	A theory-driven evaluation of a legal advice and training programme at a women and children's centre in Cape Town Behardien, Nasreen January 2011 (has links) Includes bibliographical references. / This study was undertaken to articulate and evaluate the programme theory and implementation of the Legal Advice and Training (LAT) Programme, a publicly funded programme established in 2004 at the Saartjie Baartman Centre for Women and Children in Cape Town. This programme is a behavioural change programme aiming to increase the accessibility of legal services and justice to female victims of domestic violence. Programme Evaluation
97	A formative evaluation of the University of Cape Town's emerging student leaders programme Mukoza, Stella Maris Kyobula January 2010 (has links) Includes summary. / Includes bibliographical references (leaves 80-83). / This report presents the findings of the formative evaluation for The University of Cape Town's Emerging Student Leaders Programme (ESLP). The ESLP is a student leadership development programme aimed at equipping aspiring student leaders with leadership competencies. The goal of this programme is to prepare participants for leadership roles and positions such that they can practice effective leadership in positions taken up after university. Programme Evaluation
98	A Formative Evaluation of the Dream Toolkit component of the Be the Dream Programme Bhebe, Brilliant 21 February 2019 (has links) The need for positive youth development programmes is necessary in the South African context where youth struggle with many socio-economic challenges including poverty, youth unemployment, alcohol and drug abuse, teenage pregnancies, violent behaviour and high school dropout. These programmes aim to promote personal and interpersonal development outcomes for at-risk youth so that they can lead better purpose-driven lives. The following dissertation presents the findings of a formative evaluation conducted for the Dream Toolkit Component of the Be the Dream Programme, a PYD programme implemented by Dream Factory Foundation in Cape Town. Three evaluations were performed, namely programme theory evaluation, implementation evaluation, and short-term outcome evaluation. A combination of qualitative and quantitative research methods were utilised to answer the evaluation questions posed. Overall, the findings indicate that: a) the programme theory of the Dream Toolkit Programme is consistent with best practices programmes and the causal logic of the programme was deemed to be plausible; b) programme participants were highly satisfied with the programme services; c) the programme was implemented with limited fidelity; and d) majority of the learners demonstrated relatively high self-esteem and career decidedness outcome levels. While the evaluation yielded positive results, the evaluator was able to make a number of recommendations and highlight important considerations for DFF to improve the Dream Toolkit Programme. This evaluation contributes to limited research on implementation and programme theory driven evaluations in the PYD programme context. Programme Evaluation
99	A formative evaluation of the James House programme for orphans and vulnerable children Mutenheri, Hellen January 2014 (has links) Includes bibliographical references. / The increasing burden of care and support of orphaned children or those made vulnerable by HIV/AIDS remains a critical and challenging issue particularly in the South African context. A number of community based interventions have been put in place to provide both material and psychosocial support. This dissertation is a theory-driven process evaluation of a programme offering care and support to orphans and vulnerable children (OVCs). The programme is run by James House, a non-governmental organization whose main objective is to meet the basic needs of children in their service area; to protect them from abuse and exploitation, and to ensure there is no family breakdown that would lead to institutionalisation of the children. James House implements a nationally accredited model of care for OVCs called Isibindi. The James House approach involves direct support to OVCs and indirect support through referrals to complementary services. This dissertation presents the results of a formative evaluation of the James House Isibindi programme which provides some insight into the implementation and improvement of the programme. Programme Evaluation
100	Breaking cycles of violence, one wave at a time : a formative evaluation of the Waves for Change Surf Therapy programme Snelling, Matthew January 2016 (has links) This dissertation was a formative evaluation of the Waves for Change Surf Therapy Programme, and included both a process evaluation and an outcome evaluation. Waves for Change used surfing as a means of engaging children and adolescents thought to be at risk of long-term social exclusion. This engagement was necessary in order to deliver a psychosocial curriculum. Waves for Change aimed to use this curriculum to enhance psychosocial wellbeing and reduce antisocial behaviour, and association with antisocial peers. Five evaluation questions were generated using programme documents and a rapid evidence assessment. These were concerned with whether the programme was capable of enhancing psychosocial wellbeing, and reducing antisocial behaviour and association with antisocial peers. Further, the evaluation wished to determine whether the programme was correctly targeted, and delivered with fidelity. An intention to treat analysis was conducted within a randomised control trial, using 115 primary school students from Masiphumelele, Khayelitsha, and Lavender Hill. Further 88 interviews were conducted with programme beneficiaries, and 15 coaches were submitted to performance review. The programme was found to be suitably targeted, but delivery of the programme was not achieved with fidelity to the programme design. There were a number of reasons for this, including inadequate completion of programme tasks by coaches, and inadequate attendance by children and adolescents. The result was that children and adolescents received less than half of the psychosocial curriculum, and did not show a change on the outcomes of interest. However, this evaluation suggested that the programme is feasible, pending improvements. Programme Evaluation

Search results