Global ETD Search

11	A comparison of the precision of scores from fixed-form mastery tests constructed using item response theory optimal item selection and conventional item selection strategies Unknown Date (has links) The study investigated the relative precision of ability estimates and mastery classifications resulting from fixed-form mastery achievement tests constructed using item response theory (IRT) optimal item selection and conventional item selection strategies. Two optimal item selection strategies, optimal and content optimal, and two conventional strategies, content representative and random, were used. The four item selection strategies were applied to four simulated data sets that differed in underlying dimensionality in order to investigate the effects of violations of the IRT unidimensionality assumption on ability estimation. Each data set represented simulated student responses to item banks consisting of 200 items, with 10 items measuring each of 20 objectives. / The results of the study indicated that optimal and content optimal item selection strategies provided higher levels of measurement precision at the mastery criterion than the conventional strategies, but the differences diminished sharply as data decreased in unidimensionality. Mastery classification error was lower for the optimal strategies for data that was unidimensional or generally unidimensional. Optimal and content optimal strategies demonstrated comparable measurement and classification precision. A small practical effect from the application of IRT optimal item selection to generally unidimensional data was implied by the results of the study. / Source: Dissertation Abstracts International, Volume: 54-02, Section: A, page: 0494. / Major Professor: Jacob G. Beard. / Thesis (Ph.D.)--The Florida State University, 1993. Education, Tests and Measurements
12	A COMPARISON OF EQUATING ERROR IN LINEAR AND RASCH MODEL TEST EQUATING METHODS Unknown Date (has links) This study was designed to measure the relative effectiveness of two models of equating test difficulty; the Linear Model, Design IV, and the Rasch Model. The context of the comparison was that of increased mean and decreased standard deviation for the second administration of a test. It was expected that the Linear Model would be less effective than the Rasch Model since it includes the mean and the standard deviation in its formulation and the Rasch Model does not. It was expected that the relative effectiveness of the Rasch Model would increase as the difference between test means increased. / Two types of anchor tests were evaluated. One consisted of items of moderate difficulty and the other of items with extreme difficulties. The anchor test with extreme difficulties was viewed as representing 'worst case' specification and it was therefore expected that the moderate set would perform better than the extreme set. / The results indicated that when the entire score scale is of concern the Rasch Model is superior to the Linear Model with both the moderate and extreme anchor tests. At the section of the score scale typically used for cut-off scores for minimal competency tests, the Rasch Model is superior in producing small levels of bias, but the Linear Model produces lower levels of error. The equating discrepancies produced by the Rasch Model average near zero but fluctuate relatively widely around that point. Linear Model equating discrepancies show a negative bias but fluctuate less than for the Rasch Model. The Linear Model was more affected by changes in mean and standard deviation than was the Rasch Model. In the area of the score scale typically used for cut-off scores the difference in results for the anchor tests is small enough so that latitude in selecting common items is warranted. / Source: Dissertation Abstracts International, Volume: 45-09, Section: A, page: 2847. / Thesis (Ph.D.)--The Florida State University, 1984. Education, Tests and Measurements
13	THE EFFECTIVENESS OF COMPUTERIZED COACHING FOR SCHOLASTIC APTITUDE TEST IN INDIVIDUAL AND GROUP MODES (CAI) Unknown Date (has links) This study investigated: (1) the effectiveness of computer coaching to improve SAT scores and (2) whether or not there is a difference between the effectiveness of computer coaching when implemented individually or in a small (3 to 4) group process which allowed peer interaction. A posttest-only control group design was used. The difference in posttest scores of the two treatement groups and the uncoached control group was used to determine the effects of coaching. The 93 subjects were 9th, 10th, and 11th grade geometry students at a high school in Santa Rosa County, Florida. Significant differences in means were found in the following: (1) Mathematics computer coaching to improve SAT scores based on a strategy of individual computer usage. (2) Mathematics computer coaching to improve SAT scores based on a strategy of small group (3 to 4) usage which allowed peer interaction within the group. (3) Verbal computer coaching to improve SAT scores based on a strategy of small group (3 to 4) usage which allowed for peer interaction within the group. / Source: Dissertation Abstracts International, Volume: 45-09, Section: A, page: 2847. / Thesis (Ph.D.)--The Florida State University, 1984. Education, Tests and Measurements
14	THE EFFECTS OF CLASS SIZE ON STUDENT ACHIEVEMENT IN HIGH SCHOOL SCIENCE Unknown Date (has links) This study describes the effects on student and class achievement of the interactions between class size and student ability, class ability, teacher qualifications, instructional method and heterogeneity of student ability. Study data on 1,022 high school students enrolled in 50 life science and physical science classes were obtained from the 1978-1979 nationwide field study of the Individualized Science Instructional System (ISIS). To analyze the interaction effects on student level achievement a multi-level analysis approach was used. To analyze the interaction effects on class level achievement a traditional multiple linear regression approach was used. Results of the student and class level analyses indicated that class size did interact with class ability and teacher qualifications. A weak interaction between class size and heterogeneity of student ability was also found. No interactions between class size and student ability or instructional method were revealed. The author concludes that the question of class size should be studied in a multivariate context with particular attention to interactive relationships between class size and other student and class level variables. / Source: Dissertation Abstracts International, Volume: 44-06, Section: A, page: 1764. / Thesis (Ph.D.)--The Florida State University, 1983. Education, Tests and Measurements
15	MEASUREMENT OF THE CONSTRUCT OF READING COMPREHENSION: IMPLICATIONS FOR TESTING IN ENGLISH - AS - A - SECOND LANGUAGE Unknown Date (has links) The nature of reading comprehension as it relates to different cultural or language groups who speak English is an interesting topic not fully explored. The purpose of this study was to provide some psycholinguistic and statistical evidence regarding the reading abilities in English of native speakers compared to nonnative speakers of English. Reading skills believed to underlie reading comprehension were hypothesized to be important elements of a person's ability to perform a reading comprehension task (as measured by the cloze test). The hypothesized crucial reading skills were word meaning, decoding, anaphoric reference, and sentence syntax. / The general procedure of the study was to (a) develop a construct (model) of reading comprehension based on statistical analyses performed on the data resulting from the administration of tests of reading skills to native speakers of English and (b) assess the validity of that construct for nonnative students by comparing the reading performance of nonnative speakers with that of native speakers. In making the comparison, the following analyses were employed: path analysis, Hotelling's T-square, and discriminant analysis. / The data for native speakers came from a study by Roblyer (1978). The subjects consisted of 119 ninth grade students from two Leon County high schools. The additional data for this study were gathered from 100 ninth and tenth grade students in Luchetti high school in San Juan, Puerto Rico. The test scores from these native and nonnative speakers of English were compared. Results from path analysis indicated that for native speakers decoding, anaphoric reference, and sentence syntax had significant direct effects on reading comprehension (cloze test). However, for nonnative speakers, decoding and sentence syntax directly influenced reading comprehension, but anaphoric reference hardly had any effect. This difference was attributed to psycholinguistic factors. Hotelling's T-square revealed that there was a significant difference between the mean vectors of reading tests for the two groups and discriminant analysis revealed that the differences (discrimination) between the two groups could be attributed to differences in scores on tests of word meaning and decoding. / Source: Dissertation Abstracts International, Volume: 44-06, Section: A, page: 1765. / Thesis (Ph.D.)--The Florida State University, 1983. Education, Tests and Measurements
16	THE DEVELOPMENT OF QUESTIONS FOR EVALUATION RESEARCH IN SELECTED STATE AGENCIES Unknown Date (has links) A qualitative study looking at the nature, development and utility of questions for evaluation research was conducted within the framework of several State of Florida agencies. The majority of the effort took place within the Office of Evaluation and Management Review, Department of Health and Rehabilitative Services (DHRS), and the Planning and Evaluation Section, Office of Planning and Budgeting, Executive Office of the Governor (EOG-OPB). / Research activities were organized around three general issues. First, what types of questions are being asked by individual agencies. To answer this question, a classification scheme was developed and applied to 114 questions drawn from 44 studies conducted by the agencies participating in the study. Questions were analyzed and classified as either (1) description of program, (2) description of clients, (3) impact, (4) research, (5) policy issues, or (6) cost. Results of this analysis showed questions fall into one of the first three categories about 75% of the time. / The second activity was a qualitative investigation into the actual development of questions within Evaluation units. This involved a number of interviews with individuals who participate in the question development process. In addition to the interviews, a survey was utilized to obtain information about the relative input of participants in the question development process and to gather information about the evaluators and their background. This investigation identified no single developmental path for evaluation questions; rather a variety of formats were utilized within each office. / The final activity concerned the utility of evaluation questions. First, principle evaluation report users were interviewed to assess their feelings towards studies prepared by the offices participating in this study. Then, several workshops were conducted with evaluation personnel to expose them to a formal question identification procedure (The Evaluation Framework) designed for use in state agencies. Results of these two activities (1) showed users to be generally satisfied with evaluation studies but not totally in agreement with evaluators about what types of questions should be included and (2) generally in favor of the methods and approach to evaluation presented in the Evaluation Framework but concerned about the lack of flexibility associated with this approach. / Source: Dissertation Abstracts International, Volume: 44-09, Section: A, page: 2743. / Thesis (Ph.D.)--The Florida State University, 1983. Education, Tests and Measurements
17	UTILIZATION OF EVALUATION INFORMATION: A CASE STUDY APPROACH INVESTIGATING FACTORS RELATED TO EVALUATION UTILIZATION IN A LARGE STATE AGENCY Unknown Date (has links) This investigation measured evaluation utilization in a large state agency and used a case study approach to investigate conditions that relate to utilization of evaluation reports. One significant contribution of this study was development of a strategy to measure influence of evaluation information on decisions and implementation status of recommendations. The measurement strategy produced scale values for reports, enabling a comparative rating of evaluation reports in terms of utilization. The ratings identified high-use and low-use reports. / Contrast of high-use and low-use reports provided a basis for assessing the potency of various utilization predictors reported in the literature. The results indicated that relevance to decision-making is a major factor influencing utilization in this context. Other variables clearly supported by this study included political and organizational circumstances, focus on manipulable variables and user characteristics. Some evidence suggested support for user involvement in study formulation, credibility of information, evaluator credibility in terms of program knowledge, and quality as important variables. / The findings also indicated that content of evaluation information is an important factor to consider in investigations of utilization. Recommendations in high-use studies were less variable in content compared to recommendations in low-use studies, and they tended to focus on program eligibility issues, service improvements, or improvements in management. Surprisingly, the findings suggest that policy-oriented recommendations and recommendations requiring interprogram or interagency action, even though harder to implement, had more influence on the decisions to implement the recommendations than the less challenging recommendations requiring only action by program managers. / The validity of the findings is enhanced by the procedures used to ensure reliability of the data and by the researcher's prior experience with the agency studied. The applicability of these findings to a general theory of utilization, however, is limited by the restricted setting of this investigation. It is recommended that further research expand the investigation of utilization, through contrast of high- and low-use studies, to other sizes and types of organizations. / Source: Dissertation Abstracts International, Volume: 47-05, Section: A, page: 1704. / Thesis (Ph.D.)--The Florida State University, 1986. Education, Tests and Measurements
18	THE EFFECT OF ITEM DIFFICULTY DISTRIBUTION SHAPE ON THE PRECISION OF MEASUREMENT AT A PASSING SCORE (RASCH MODEL, TARGETING, ITEM RESPONSE THEORY, CLASSIFICATION ERRORS, MASTERY TESTING) Unknown Date (has links) This study sought to determine the loss in measurement precision at a passing ability and at other abilities of interest resulting from the use of nonoptimal, but reasonable and relevant, item difficulty distribution shapes. / Five alternatives to the optimal peaked distribution shape were compared on the basis of their precision relative to the optimal peaked distribution. The distribution shapes were represented by ten tests built in two ways. One set of tests was constructed using item difficulties from an existing minimum competency mastery test. These tests will be referred to as real tests. The other five tests were generated from a simulated, infinitely large item bank. / The relative precision curves produced by the different alternative item combinations were compared to determine which distribution shape generated the greatest precision in the region of the passing ability. As an empirical approach to the question, actual person-item responses were used to estimate abilities and mastery level on each of the five real tests. Mastery classifications by the original long test were used to identify the misclassifications made by each of the real tests. / The distributions centered on the passing score yielded similar error rates, but they differed in the pattern of classification errors made. The number of false passes and false fails were related to whether the test's area of maximum precision was above or below the passing ability. This implies that the types of classification errors made, as well as their number, may to some extent be controlled by the builder of a test. / Source: Dissertation Abstracts International, Volume: 47-01, Section: A, page: 0159. / Thesis (Ph.D.)--The Florida State University, 1985. Education, Tests and Measurements
19	THE EFFECTS OF VARIABLE ENTRY ON BIAS AND INFORMATION OF THE BAYESIAN ADAPTIVE TESTING PROCEDURE Unknown Date (has links) This study investigated the effects of a fixed and variable entry procedure on bias and information of a Bayesian adaptive test. It was found that neither the fixed nor the variable entry procedure produced biased ability estimates on the average. Both procedures did produce, however, biased ability estimates at the extremes of the ability distribution. Both procedures produced peaked and asymmetric information curves, rather than ideal flat curves. Relative efficiency curves indicated that at no point along the ability continuum was one procedure more efficient than the other. The two procedures chose different item subsets for administration. In almost half the cases, the variable entry procedure required more items to reach termination. / Source: Dissertation Abstracts International, Volume: 47-08, Section: A, page: 3013. / Thesis (Ph.D.)--The Florida State University, 1986. Education, Tests and Measurements
20	THE EFFECT OF SAMPLE SIZE ON ERROR PRODUCED BY TUCKER AND RASCH EQUATING METHODS UNDER COMMON ITEMS NONRANDOM GROUPS DESIGN Unknown Date (has links) The purpose of the present study was twofold: (1) to determine the relationship between sample size and equating error produced by the Tucker and Rasch methods; and, (2) to compare the efficiency of the two methods when utilizing small sample sizes. The aim was to examine equating error at selected points on the raw score scale corresponding to the 20th, 40th, 60th, and 80th percentiles, as well as the average error over all examinees and all score points, using five sample sizes of 25, 50, 75, 100, and 500. / The results of the study indicated that the relationship between equating error and sample size was approximately linear and negative. The Rasch method generally produced slightly more error and bias than the Tucker method when using small sample sizes. For the data used in the study, the expected value of equating error for the Rasch method is reduced with higher selected scores, whereas for the Tucker method, it increases as the selected scores deviate from the average score. The minimum number of examinees for equating with the two methods as well as further investigations were suggested. / Source: Dissertation Abstracts International, Volume: 47-12, Section: A, page: 4369. / Thesis (Ph.D.)--The Florida State University, 1986. Education, Tests and Measurements

Search results