Global ETD Search

111	EQUATING AND CALIBRATION TECHNIQUE FOR TEST PROGRAM IN THAILAND Unknown Date (has links) Source: Dissertation Abstracts International, Volume: 37-01, Section: A, page: 0245. / Thesis (Ph.D.)--The Florida State University, 1975. Education, Tests and Measurements
112	THE INVESTIGATION OF SELF-EVALUATION PROCEDURES FOR IDENTIFYING INSTRUCTIONAL NEEDS OF TEACHERS Unknown Date (has links) Source: Dissertation Abstracts International, Volume: 37-10, Section: A, page: 6428. / Thesis (Ph.D.)--The Florida State University, 1976. Education, Tests and Measurements
113	EXPECTED PRODUCTIVITY CURVES FOR INSTRUCTION FOR A SYSTEM OF HUMAN RESOURCES ACCOUNTING IN HIGHER EDUCATION Unknown Date (has links) Source: Dissertation Abstracts International, Volume: 38-04, Section: A, page: 2069. / Thesis (Ph.D.)--The Florida State University, 1977. Education, Tests and Measurements
114	THE EFFECT OF STRUCTURAL CHARACTERISTICS UPON ITEM DIFFICULTIES IN STANDARDIZED READING COMPREHENSION TESTS Unknown Date (has links) Source: Dissertation Abstracts International, Volume: 38-04, Section: A, page: 2070. / Thesis (Ph.D.)--The Florida State University, 1977. Education, Tests and Measurements
115	The stability of item-parameter estimates across time: A comparison of item response models and selected estimation procedures Unknown Date (has links) This study examined the stability of item parameter estimates across time using (a) different item response models and (b) different estimation procedures. These stabilities were examined in the context of item banking. According to Item Response Theory, the item parameter estimates should not differ from administration to administration. Any differences in the item parameter estimates, among other factors, may be attributed to changes in emphasis of school curricula over time, using an inappropriate model, or the error associated with estimation procedures used to estimate the item parameters. / The factors and their levels in this investigation were (a) model: one-, two-, and three-parameter models, with a common value for guessing; (b) estimation procedures for item parameters: Joint Maximum Likelihood (JML) using the LOGIST computer program, and Marginal Maximum Likelihood (MML) using the BILOG computer program. / The test used in this study was the SSAT-II Mathematics test. Data for this study were obtained from test results of the March 1985 and March 1986 test administration. This test was administered to approximately 100,000 10th grade students in Florida who had to pass this test to graduate. The Mathematics test consisted of 75 items. Of these, 49 items were common to both the 1985 and 1986 tests. The analyses were performed on four representative samples, each 1000 first-time takers. / Results showed that regardless of the model used and estimation procedures employed, the parameter estimate changes on the average between the 1985 and 1986 test administrations were significantly higher than parameter estimate changes between the two 1985 samples. However, the changes were not across most of the items. Only two items demonstrated significant change beyond what was expected from sampling fluctuation. / The one-parameter model produced significantly lower mean differences between ICCs than the two- and the modified three-parameter models. This pattern of differences was observed to be similar for both the JML and the MML estimation procedures. The MML estimation procedure produced significantly smaller mean differences than the JML estimation procedure. / Source: Dissertation Abstracts International, Volume: 49-06, Section: A, page: 1436. / Major Professor: F. J. King. / Thesis (Ph.D.)--The Florida State University, 1988. Education, Tests and Measurements
116	The differences among reliability estimates of randomly parallel tests and their effects on Tucker non-random group linear equating Unknown Date (has links) The purpose of the study was to investigate the differences among reliability estimates of randomly parallel tests and their effects on Tucker non-random group linear equating. / The study was conducted using a sample of 988 tenth graders. One hundred 20-item tests were randomly generated from the students' response files. These tests were called "current" forms. Each of these randomly parallel forms was equated to each of five reference forms. / It was found that differences between reliabilities of current and reference tests had an effect on the accuracy of Tucker equating. The equating error was systematic and predictable. Larger differences in the reliabilities of test forms tended to produce larger errors. / Given an arbitrary unweighted error of.50, or a weighted error of.15, on the standard T-scale, a range of acceptable differences in reliability estimates was proposed for Tucker equating. Differences in the reliability of the current and reference tests of less than.025 produced negligible equating errors. / Source: Dissertation Abstracts International, Volume: 52-03, Section: A, page: 0890. / Major Professor: Jacob G. Beard. / Thesis (Ph.D.)--The Florida State University, 1991. Education, Tests and Measurements
117	Ridge regression: Application to educational data Unknown Date (has links) Ridge regression is a type of regression technique which was developed to remedy the problem of multicollinearity in regression analysis. The major problem with multicollinearity is that it causes high variances in the estimation of regression coefficients. The ridge model introduces some bias into the regression equation in order to reduce the variance of the estimators. The purposes of this study were to demonstrate the application of the ridge regression model to educational data and to compare the characteristics and performance of the ridge method and the least squares method. In this study, four types of ridge were compared to the least squares method. They were ridge trace, generalized, ordinary and directed ridge. / The sample of this study consisted of 141 public schools in Dade County, Florida. The dependent variable was the students' average scores in mathematical computation and reading comprehension. Six variables representing teacher and student characteristics were employed as the predictors. The performance of ridge and the least squares were compared in terms of the confidence interval of an individual estimator and predictive accuracy for the whole model. Since the statistical inference for the ridge method has not been completely developed, the bootstrap technique with a sample size of twenty, was used to calculate the confidence interval of each estimator. / The study resulted in a successful application of ridge regression to school level data in which it was found that (1) ridge regression yielded a smaller confidence interval for every estimated regression coefficient and (2) ridge regression produced higher predictive accuracy than ordinary least squares. / Since the results were just based on one particular set of data, it cannot be guaranteed that ridge always outperforms the least squares method in all cases. / Source: Dissertation Abstracts International, Volume: 49-03, Section: A, page: 0487. / Major Professor: F. J. King. / Thesis (Ph.D.)--The Florida State University, 1988. Education, Tests and Measurements Statistics
118	Effects of local dependence in achievement tests on IRT ability estimation Unknown Date (has links) The effects of commonly occurring violations of the assumption of test items' local independence were investigated in this study. Item responses to college-level communications and mathematics tests were simulated using a multidimensional item response theory model. These data sets were then assigned different degrees of dependency as defined by Ackerman's model and the effects on the one and three parameter models' ability estimates were found. The results suggest caution in interpretation of the unidimensional ability estimate when extreme dependency is present with heterogeneous subtests. / Source: Dissertation Abstracts International, Volume: 49-06, Section: A, page: 1439. / Major Professor: Jacob Beard. / Thesis (Ph.D.)--The Florida State University, 1988. Education, Tests and Measurements
119	The Impact of Unbalanced Designs on the Performance of Parametric and Nonparametric DIF Procedures: A Comparison of Mantel Haenszel, Logistic Regression, SIBTEST, and IRTLR Procedures Unknown Date (has links) The current study examined the impact of unbalanced sample sizes between focal and reference groups on the Type I error rates and DIF detection rates (power) of five DIF procedures (MH, LR, general IRTLR, IRTLR-b, and SIBTEST). Five simulation factors were used in this study. Four factors were for generating simulation data and they were sample size, DIF magnitude, group mean ability difference (impact), and the studied item difficulty. The fifth factor was the DIF method factor that included MH, LR, general IRTLR, IRTLR-b, and SIBTEST. A repeated-measures ANOVA, where the DIF method factor was the within-subjects variable, was performed to compare the performance of the five DIF procedures and to discover their interactions with other factors. For each data generation condition, 200 replications were made. Type I error rates for MH and IRTLR DIF procedures were close to or lower than 5%, the nominal level for different sample size levels. On average, the Type I error rates for IRTLR-b and SIBTEST were 5.7%, and 6.4%, respectively. In contrast, the LR DIF procedure seems to have a higher Type I error rate, which ranged from 5.3% to 8.1% with 6.9% on average. When it comes to the rejection rate under DIF conditions, or the DIF detection rate, the IRTLR-b showed the highest DIF detection rate followed by SIBTEST with averages of 71.8% and 68.4%, respectively. Overall, the impact of unbalanced sample sizes between reference and focal groups on the performance of DIF detection showed a similar tendency for all methods, generally increasing DIF detection rates as the total sample size increased. In practice, IRTLR-b, which showed the best performance for DIF detection rates and controlled for the Type I error rates, should be the choice when the model-data fit is reasonable. If other non-IRT DIF methods are considered, MH or SIBTEST could be used, depending on which type of error (Type I or II) is more seriously considered. / A Dissertation submitted to the Department of Educational Psychology and Learning Systems in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2017. / November 6, 2017. / Includes bibliographical references. / Insu Paek, Professor Directing Dissertation; Fred Huffer, University Representative; Betsy Jane Becker, Committee Member; Yanyun Yang, Committee Member. Educational tests and measurements
120	The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring Unknown Date (has links) Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used to assess the relatedness of human and automated essay scoring, and to examine impacts of rater variability on inter-rater agreement. To implement the investigations, my study consists of two parts: empirical and simulation studies. Based on the results from the empirical study, the overall effects for inter-rater agreement were .63 and .99 for exact and adjacent proportions of agreement, .48 for kappas, and between .75 and .78 for correlations. Additionally, significant differences between 6-point scales and the other scales (i.e., 3-, 4-, and 5-point scales) for correlations, kappas and proportions of agreement existed. Moreover, based on the results of the simulated data, the highest agreements and lowest discrepancies achieved in the matched rater distribution pairs. Specifically, the means of exact and adjacent proportions of agreement, kappa and weighted kappa values, and correlations were .58, .95, .42, .78, and .78, respectively. Meanwhile the average standardized mean difference was .0005 in the matched rater distribution pairs. Acceptable values for inter-rater agreement as evaluation criteria for automated essay scoring, impacts of rater variability on inter-rater agreement, and relationships among inter-rater agreement indices were discussed. / A Dissertation submitted to the Department of Educational Psychology and Learning Systems in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Fall Semester 2017. / November 10, 2017. / Automated Essay Scoring, Inter-Rater Agreement, Meta-Analysis, Rater Variability / Includes bibliographical references. / Betsy Jane Becker, Professor Directing Dissertation; Fred Huffer, University Representative; Insu Paek, Committee Member; Qian Zhang, Committee Member. Statistics Educational tests and measurements

Search results