Global ETD Search

291	Item parameter drift as an indication of differential opportunity to learn: An exploration of item flagging methods & accurate classification of examinees Sukin, Tia M 01 January 2010 (has links) The presence of outlying anchor items is an issue faced by many testing agencies. The decision to retain or remove an item is a difficult one, especially when the content representation of the anchor set becomes questionable by item removal decisions. Additionally, the reason for the aberrancy is not always clear, and if the performance of the item has changed due to improvements in instruction, then removing the anchor item may not be appropriate and might produce misleading conclusions about the proficiency of the examinees. This study is conducted in two parts consisting of both a simulation and empirical data analysis. In these studies, the effect on examinee classification was investigated when the decision was made to remove or retain aberrant anchor items. Three methods of detection were explored; (1) delta plot, (2) IRT b-parameter plots, and (3) the RPU method. In the simulation study, degree of aberrancy was manipulated as well as the ability distribution of examinees and five aberrant item schemes were employed. In the empirical data analysis, archived statewide science achievement data that was suspected to possess differential opportunity to learn between administrations was re-analyzed using the various item parameter drift detection methods. The results for both the simulation and empirical data study provide support for eliminating the use of flagged items for linking assessments when a matrix-sampling design is used and a large number of items are used within that anchor. While neither the delta nor the IRT b-parameter plot methods produced results that would overwhelmingly support their use, it is recommended that both methods be employed in practice until further research is conducted for alternative methods, such as the RPU method since classification accuracy increases when such methods are employed and items are removed and most often, growth is not misrepresented by doing so. Educational tests & measurements
292	Examination of the application of item response theory to the Angoff standard setting procedure Clauser, Jerome Cody 01 January 2013 (has links) Establishing valid and reliable passing scores is a vital activity for any examination used to make classification decisions. Although there are many different approaches to setting passing scores, this thesis is focused specifically on the Angoff standard setting method. The Angoff method is a test-centric classical test theory based approach to estimating performance standards. In the Angoff method each judge estimates the proportion of minimally competent examinees who will answer each item correctly. These values are summed across items and averages across judges to arrive at a recommended passing score. Unfortunately, research has shown that the Angoff method has a number of limitations which have the potential to undermine both the validity and reliability of the resulting standard. Many of the limitations of the Angoff method can be linked to its grounding in classical test theory. The purpose of this study is to determine if the limitations of the Angoff could be mitigated by a transition to an item response theory (IRT) framework. Item response theory is a modern measurement model for relating examinees' latent ability to their observed test performance. Theoretically the transition to an IRT-based Angoff method could result in more accurate, stable, and efficient passing scores. The methodology for the study was divided into three studies designed to assess the potential advantages of using an IRT-based Angoff method. Study one examined the effect of allowing judges to skip unfamiliar items during the ratings process. The goal of this study was to detect if passing scores are artificially biased due to deficits in the content experts' specific item level content knowledge. Study two explored the potential benefit of setting passing scores on an adaptively selected subset of test items. This study attempted to leverage IRT's score invariance property to more efficiently estimate passing scores. Finally study three compared IRT-based standards to traditional Angoff standards using a simulation study. The goal of this study was to determine if passing scores set using the IRT Angoff method had greater stability and accuracy than those set using the common True Score Angoff method. Together these three studies examined the potential advantages of an IRT-based approach to setting passing scores. The results indicate that the IRT Angoff method does not produce more reliable passing score than the common Angoff method. The transition to the IRT-based approach, however, does effectively ameliorate two sources of systematic error in the common Angoff method. The first source of error is brought on by requiring that all judges rate all items and the second source is introduced during the transition from test to scaled score passing scores. By eliminating these sources of error the IRT-based method allows for accurate and unbiased estimation of the judges' true opinion of the ability of the minimally capable examinee. Although all of the theoretical benefits of the IRT Angoff method could not be demonstrated empirically, the results of this thesis are extremely encouraging. The IRT Angoff method was shown to eliminate two sources of systematic error resulting in more accurate passing scores. In addition this thesis provides a strong foundation for a variety of studies with the potential to aid in the selection, training, and evaluation of content experts. Overall findings from this thesis suggest that the application of IRT to the Angoff standard setting method has the potential to offer significantly more valid passing scores. Educational tests & measurements
293	Application of item response theory models to the algorithmic detection of shift errors on paper and pencil tests Cook, Robert Joseph 01 January 2013 (has links) On paper and pencil multiple choice tests, the potential for examinees to mark their answers in incorrect locations presents a serious threat to the validity of test score interpretations. When an examinee skips one or more items (i.e., answers out of sequence) but fails to accurately reflect the size of that skip on their answer sheet, that can trigger a string of misaligned responses called shift errors. Shift errors can result in correct answers being marked as incorrect, leading to possible underestimation of an examinee's true ability. Despite movement toward computerized testing in recent years, paper and pencil multiple choice tests are still pervasive in many high stakes assessment settings, including K 12 testing (e.g., MCAS) and college entrance exams (e.g., SAT), leaving a continuing need to address issues that arise within this format. Techniques for detecting aberrant response patterns are well established but do little to recognize reasons for the aberrance, limiting options for addressing the misfitting patterns. While some work has been done to detect and address specific forms of aberrant response behavior, little has been done in the area of shift error detection, leaving great room for improvement in addressing this source of aberrance. The opportunity to accurately detect construct irrelevant errors and either adjust scores to more accurately reflect examinee ability or flag examinees with inaccurate scores for removal from the dataset and retesting would improve the validity of important decisions based on test scores, and could positively impact model fit by allowing for more accurate item parameter and ability estimation. The purpose of this study is to investigate new algorithms for shift error detection that employ IRT models for probabilistic determination as to whether misfitting patterns are likely to be shift errors. The study examines a matrix of detection algorithms, probabilistic models, and person parameter methods, testing combinations of these factors for their selectivity (i.e., true positives vs. false positives), sensitivity (i.e., true shift errors detected vs. undetected), and robustness to parameter bias, all under a carefully manipulated, multifaceted simulation environment. This investigation attempts to provide answers to the following questions, applicable across detection methods, bias reduction procedures, shift conditions, and ability levels, but stated generally as: 1) How sensitively and selectively can an IRT based probabilistic model detect shift error across the full range of probabilities under specific conditions?, 2) How robust is each detection method to the parameter bias introduced by shift error?, 3) How well does the detection method detect shift errors compared to other, more general, indices of person fit?, 4) What is the impact on bias of making proposed corrections to detected shift errors?, and 4) To what extent does shift error, as detected by the method, occur within an empirical data set? Results show that the proposed methods can indeed detect shift errors at reasonably high detection rates with only a minimal number of false positives, that detection improves when detecting longer shift errors, and that examinee ability is a huge determinant factor in the effectiveness of the shift error detection techniques. Though some detection ability is lost to person parameter bias, when detecting all but the shortest shift errors, this loss is minimal. Application to empirical data also proved effective, though some discrepancies in projected total counts suggest that refinements in the technique are required. Use of a person fit statistic to detect examinees with shift errors was shown to be completely ineffective, underscoring the value of shift error specific detection methods. Educational tests & measurements
294	Consistency of Single Item Measures Using Individual-Centered Structural Analyses Iaccarino, Stephanie 12 1900 (has links) Estimating reliability for single-item motivational measures presents challenges, particularly when constructs are anticipated to vary across time (e.g., effort, self-efficacy, emotions). We explored an innovative approach for estimating reliability of single-item motivational measures by defining reliability as consistency of interpreting the meaning of items. We applied a psychometric approach to identifying meaning systems from distances between items and operationalized meaning systems as the ordinally-ranked participant’s responses to the items. We investigated the feasibility of this approach among 193 Introduction to Biology undergraduate participant responses to five single items assessing motivational constructs collected through thirteen weekly questionnaires. Partitioning among medoids (PAM) analysis was used to identify an optimal solution from which systems of meaning (SOM) were identified by the investigator. Transitions from SOM to SOM were tracked across time for each individual in the sample, and consistency groupings based on the percentage of consecutively repeated SOMs were computed for each individual. Results suggested that from an optimal eight-cluster solution, six SOMs emerged. While moderate transitions from SOM to SOM occurred, a small minority of participants consecutively repeated the same SOM across time and were placed in high consistency group; participants with moderate and low percentages were placed in lower consistency groups, accordingly. These results provide preliminary evidence in support of the approach, particularly for those highly consistent participants whose reliability might be misrepresented by conventional single-item reliability methods. Implications of the proposed approach and propositions for future research are included. / Educational Psychology Educational tests & measurements
295	Web-based self and peer assessment Gauthier, Geneviève January 2004 (has links) No description available. Education -- Tests and Measurements
296	Construct and criterion-related validity of the Draw a Person: a quantitative scoring system for normal, reading disabled, and developmentally handicapped children Smith, Jean Marie January 1987 (has links) No description available. Educational Tests and Measurements
297	District Level Achievement Gap Between the Distribution of Caucasian and African American District Means on the 2003/2004 Ohio 4th Grade Reading Proficiency Exam Ellis, Joann Almyra January 2005 (has links) No description available. Education Educational Tests and Measurements
298	Construct validity study of the Myers-Briggs type indicator Kaye, Gail Leslie January 1989 (has links) No description available. Educational Tests and Measurements Education
299	A validity test of the Dunn-Rankin Reward Preference Inventory Landschulz, Mary Ellen January 1978 (has links) No description available. Education Educational Tests and Measurements
300	An achievement test in drawing for measuring junior high school graduates Leffel, George H. January 1940 (has links) No description available. Education Educational Tests and Measurements

Search results