Spelling suggestions: "subject:"educational evaluation."" "subject:"cducational evaluation.""
31 |
The Development of a Model to Analyze the Relationship of Selected School Inputs to Specific Output in a Sample of Ohio School Districts: A Systems ApproachWinfield, Eugene W. January 1970 (has links)
No description available.
|
32 |
INTERCULTURAL SENSITIVITY: THEORY DEVELOPMENT, INSTRUMENT CONSTRUCTION AND PRELIMINARY VALIDATIONMYENI, ANNIE DUMISILE 01 January 1983 (has links)
First a theoretical framework for the understanding of intercultural sensitivity was developed. George Kelly's personal construct theory was applied in the definition and in the elaboration of this construct. This theory was selected after a review of various approaches in the understanding of this construct. Based on the developed framework, an instrument was then constructed to measure intercultural sensitivity, or a person's potential to adapt successfully in cross cultural situations. This instrument, the Survey of Intercultural Constructs (SIC), is intended as a research tool to be used with people undergoing cross cultural training. It is general rather than culture specific, and is applicable in a wide variety of cultural situations, and with different types of people. The SIC is based on the notion that intercultural behavior can be explained in part by differences in personalities or construction systems. Personal construct theory states that people look at others through constructs they create or choose, and then test against reality. A construct is a way in which at least two things are similar and contrast with a third. To analyze people's cognitive processes, information is needed about the content and structure of their construction systems. The SIC elicits the constructs a person applies to people of the same and of other cultures. A preliminary version of the SIC was developed and tried out on 50 people. The data obtained was used primarily to improve the draft instrument. A few preliminary validity studies were also conducted with it. The preliminary version of the SIC was reviewed by an expert in the field of tests and measurements. His comments, together with comments obtained from the tryout sample, were used in the development of the second version. A review of the second version by 13 experts in the area of cross cultural training led to the development of the final version of the instrument. No validity or reliability studies were conducted with the final version. Therefore validity and reliability studies on it are needed, and recommendations to that effect are made.
|
33 |
Pretest item calibration within the computerized adaptive testing environmentSlater, Sharon Cadman 01 January 2001 (has links)
An issue of primary concern for computerized adaptive testing (CAT) is that of maintaining viable item pools. There is an increased need for items within the CAT framework, which places greater demand on item calibration procedures. This dissertation addressed the important problem of calibrating pretest items within the framework of CAT. The study examined possible ways to incorporate additional information available in the CAT environment into item parameter estimation, with the intent of improving accuracy of item parameter estimates. Item parameter estimates were obtained in a number of ways, including: using five different Bayesian priors, four sample sizes, two different fixed abilities for calibration, and two sampling strategies. All variables were compared in a simulation study. Results for sample size were not surprising. As sample size decreased, the error in the estimates increased. Also as expected, fixing true abilities resulted in more accurate item parameter estimates than fixing an estimate of ability for calibration. Bayesian priors effected item parameter estimates differently, depending on the sampling strategy used. In the random pretesting strategy, more general priors produced the best results. For the focused pretesting strategy, the item-specific priors produced the best results if the priors were good, and the worst results if the priors were poor. When comparing results for the random and focused sampling strategies in terms of item difficulty, the random conditions produced slightly more accurate estimates than the focused conditions for the majority of items. However, the focused conditions produced much better estimates of item difficulty for very easy and very difficult items. The random conditions resulted in far more accurate estimates of item discrimination than the focused conditions. In conclusion, the focused samples used in the study appear to have been too focused. Future research could investigate different ways of sampling examinees to ensure that sufficient variability is obtained for better estimation of item discrimination. Ways of translating judgmental information about items into numerical priors for estimation is another area in need of more study. Finally, an interesting and useful extension of this work would be to examine the effect of poor item parameter estimates on ability estimation.
|
34 |
Assessing adaptation equivalence in cross -lingual and cross -cultural assessment using linear structural equations modelsPurwono, Urip 01 January 2004 (has links)
Making a test available in more than one language versions has become a common practice in the fields of psychology and education. When comparisons of the populations taking the parent and the adapted versions of the test are to be made, the equivalence of the constructs of the tests must be established. Structural equations model (SEM) offers a unified approach for examining equivalences between the parent and adapted language version of a test by examining the equivalence of the constructs measured by the two versions of the test. While the procedures have the potential for yielding more direct information regarding whether the original and adapted version of an assessment instrument are equivalent, study investigating the power and type-I error rate of the procedures in the context of adaptation equivalence is not yet available. The present study is an attempt to fill this void. Three separate simulation studies were conducted to evaluate the effectiveness of the SEM approach for investigating test adaptation equivalence. In the first study the accuracy of the estimation procedure was investigated. In the second study, the Type-1 error rate of the procedure in identifying invariance in the parameters across two subgroups was investigated. In the third study, the power of the procedure in identifying differences in mean (Kappa) and structural (Lambda) parameters across two subgroups was investigated. The results of the first study indicated that the Kappa and Lambda parameters could be recovered with sufficient degree of accuracy with sample size in the order of 500. The Type I error rate for the Kappa and the lambda parameters were similar. With a sample size larger than 500, the Type I error rate approached the nominal levels. The power of the procedure in detecting differences increased with sample size and the magnitude of the difference in the parameters between the subgroups. With the kappa parameters, a sample of size 600 was required to detect a difference of .35 standardized units with a probability of .75. With the Lambda parameters, a difference of .2 in factor loading was detectable with a sample size of 300 with probability of .9.
|
35 |
THE FIT OF EMPIRICAL DATA TO TWO LATENT TRAIT MODELSHUTTEN, LEAH R 01 January 1981 (has links)
The study explored fit of empirical data to the Rasch and three-parameter logistic latent trait models focusing upon the relationship between deviations from latent trait model assumptions and fit. The study also investigated estimation precision for small sample and short test conditions and evaluated parameter estimation costs for the two latent trait models. Rasch and three-parameter abilities and item parameters were estimated for twenty-five 40-item tests having 1000 examinees. These estimated parameters were substituted for true parameters to make predictions about number-correct score distributions using a theorem by Lord (1980) equating ability with the conditional distribution of number correct scores. Predicted score distributions were compared with observed score distributions by Kolmogorov-Smirnov and Chi square measures, and with graphical techniques. The importance of three latent trait model assumptions: unidimensionality, equality of item discrimination indices, and no guessing were assessed with correlation analyses. Estimation precision for short 20-item tests and for small samples of 250 examinees were evaluated with correlation methods and by assessing average absolute differences between estimates. Simple summary statistics were gathered to evaluate computer cost and time for parameter estimation with each model. The Rasch and the three-parameter models both demonstrated close fit to the majority of data studied. Eighty percent of tests fit both models quite well. Only one predicted test distribution deviated significantly from the observed score distribution. Results obtained with the Chi square measure were less favorable toward the models than Kolmogorov-Smirnov assessments had been. This outcome was attributed to the apparent sensitivity of the Chi square statistic to lack of normality in score distributions. Graphic results clearly supported statistical measures of fit leading to the conclusion that latent trait models adequately describe empirical test sets. The Rasch model fit data, overall, as well as the three-parameter model. The average K-S statistic for 25 tests was 1.304 for the Rasch model and 1.289 for the three-parameter model. The latter model fit data better than the Rasch model for 65 percent of the tests, yet the differences in fit between the models were insignificant. The Chi square measure and graphical tests supported these results. Lack of unidimensionality was the primary cause for misfit of data to the models. Correlations between fit statistics and indices of unidimensionality were significant at the .05 probability level for the Rasch and three-parameter models. When item discrimination parameters were unequal, fit of data to both models was impaired, and when guessing was present, while not well estimated on samples of 1000, fit of data to both latent trait models tended to be distorted. Ability estimates from short 20-item tests were quite precise, especially for the Rasch model. Correlations between ability estimates from the 20-item and longer tests were .923 for the Rasch estimates, and .866 for the three-parameter estimates. Difficulty estimates made from small 250 examinee samples were also quite precise, but estimates of other item parameters from small samples tended not to be very accurate. Although small sample item discrimination estimates were reasonable, estimates of the guessing parameter were very poor. The results suggest that at least 1000 examinees are required to obtain precise estimates with the three-parameter model. The average cost for estimating Rasch item parameters and abilities was only $12.50 for 1000 examinees in contrast to $35.12 for the three-parameter model, but when item parameters were known in advance, and only abilities estimated, these cost differences disappeared.
|
36 |
An empirical comparison of the Bookmark and modified Angoff standard setting methods and the impact on student classificationHauger, Jeffrey B 01 January 2007 (has links)
No Child Left Behind has increased the importance of properly classifying students into performance level categories due to the ramifications associated with not making Annual Yearly Progress (AYP). States have the opportunity to create their own standards and conduct their own standard setting sessions. Two of the more popular methods used are the Angoff method and the Bookmark method. Reckase (2005) simulated both methods and found that the Bookmark method had negative bias associated with the method while the Angoff method did not produce any bias. This study simulated the Angoff and Bookmark methods similarly to Reckase's (2005) article and also added a different simulated bookmark method, which was used to simulate the Bookmark method more accurately. The study included six independent variables: standard setting method, cutscores, central tendency, number of panelists, item density, and bookmark placement. The second part of the study applied the results of the simulations to real data to determine the impact on student classification, based on the different conditions. Overall, the results of the simulation study indicated the Angoff simulated method was able to recover the parameters extremely well, while the second Bookmark simulated method recovered the item parameters better than the original Bookmark simulated method. However, in certain conditions, the second Bookmark simulated method was able to recover the item parameters as well as the Angoff method. The simulated cutscores were then used to place students into performance level categories based on students' ethnicity, gender, socioeconomic status, and interactions. The results indicated that the simulated Angoff method and the second Bookmark simulated method were most similar when the median was used as the central tendency for the Bookmark method and the panelists' error was large. The simulated Angoff method was the most robust method compared to the two simulated Bookmark methods. The implications and suggested future research are discussed.
|
37 |
Exploring the impact of teachers' participation in an assessment -standards alignment studyMartone, Andrea 01 January 2007 (has links)
This study explored the impact of teachers' participation in an assessment standards alignment study as a way to gain a deeper understanding of an assessment, the underlying standards, and how these components relate to the participants' approach to instruction. Alignment research is one means to demonstrate the connection between assessment, standards, and instruction. If these components work together to deliver a consistent message about the topics about which students taught and assessed, students will have the opportunity to learn and demonstrate their acquired knowledge and skills. Six participants applied Norman Webb's alignment methodology to understand the degree of alignment between an assessment, the Massachusetts Adult Proficiency Test for Math (MAPT for Math), and state standards, the Massachusetts Adult Basic Education Curriculum Framework for Mathematics and Numeracy (Math ABE standards). Through item-objective matches, alignment was examined in terms of categorical concurrence, depth-of-knowledge consistency, range of knowledge correspondence, and balance of representation. The study also used observations, discussions, open-response survey questions, and a focus group discussion to understand how the alignment process influenced the participants' view of the assessment, the standards, and their approach to instruction. Results indicated that the MAPT for Math is well aligned to the Math ABE standards across three out of the four dimensions. Specific recommendations for improvements to the MAPT for Math and Math ABE standards are presented. The study also found that the alignment process influenced the participants' view of the standards, the assessment, and their approach to instruction. Additionally, the study highlighted ways to improve the alignment process to make the results more meaningful for teachers and test developers. This study indicated the value in ensuring an assessment is well aligned to the standards on which it is based. Findings also showed the value added when teachers are involved in an in-depth examination of an assessment and the standards on which that assessment is based. Teachers are the conduit through which the next generation is guided. Thus it is critical that teachers understand what they are being asked to teach their students and how that can be assessed on a well designed assessment.
|
38 |
A comparison of item response theory true score equating and item response theory-based local equatingKeller, Robert R. 01 January 2007 (has links)
The need to compare students across different test administrations, or perhaps across different test forms within the same administration, plays a key role in most large-scale testing programs. In order to do this, these tests must be placed on the same scale. Placing test forms onto the same scale not only allows results from different test forms to be compared to each other, but also facilitates placing the results from different test scores onto a common reporting scale. The statistical method used to place these test scores onto a common metric is called equating. Estimated true equating, one of the conditional equating methods described by van der Linden (2000), has been shown to be a dramatic improvement over classical based equipercentile equating under some conditions (van der Linden, 2006). The purpose of the study is to investigate the relative performance of estimated true equating with IRT true score equating under a variety of conditions that are known to impact equating accuracy, namely: anchor test length, data misfit, scaling method, and examinee ability distribution, through simulation study. The results are evaluated based on root mean squared error (RMSE) and bias of the equating functions, as well as decision accuracy when placing examinees in to performance categories. A secondary research question of relative performance of the scaling methods is also investigated. The results indicate that estimated true equating shows tremendous promise with the dramatically lower bias and RMSE values when compared to IRT true score equating. However, this promise does not bear out when looking at examinee classification. Despite the lack of significant gains in the area of decision accuracy, this new equating method shows promise in its reduction of error attributable to the equating functions themselves, and therefore deserves further scrutiny. The results fail to indicate a clear choice for a scaling method for use with either equating method. Practitioners still must do their best to rely on the growing body of evidence, and consider the nature of their own testing programs, and the abilities of their examinee population when choosing a scaling method.
|
39 |
A Study of the Impact of Funding on Growth and Development of Selected School and Colleges of Allied HealthDwyer, Kathleen Marie January 1983 (has links)
No description available.
|
40 |
An evaluation of the new twelve-year school plan for South CarolinaHopson, Raymond W. January 1947 (has links)
No description available.
|
Page generated in 0.1241 seconds