Global ETD Search

131	A procedure for developing a common metric in item response theory when parameter posterior distributions are known Baldwin, Peter 01 January 2008 (has links) Because item response theory (IRT) models are arbitrarily identified, independently estimated parameters must be transformed to a common metric before they can be compared. To accomplish this, the transformation constants must be estimated and because these estimates are imperfect, there is a propagation of error effect when transforming parameter estimates. However, this error propagation is typically ignored and estimates of the transformation constants are treated as true when transforming parameter estimates to a common metric. To address this shortcoming, a procedure is proposed and evaluated that accounts for the uncertainty in the transformation constants when adjusting for differences in metric. This procedure utilizes random draws from model parameter posterior distributions, which are available when IRT models are estimated using Markov chain Monte Carlo (MCMC) methods. Given two test forms with model parameter vectors Λ Y and ΛX, the proposed procedure works by sampling the posterior of ΛY and Λ X, estimating the transformation constants using these two samples, and transforming sample X to the scale of sample Y. This process is repeated N times, where N is the desired number of transformed posterior draws. A simulation study is conducted to evaluate the feasibility and success of the proposed strategy compared to the traditional strategy of treated scaling constants estimates as error-free. Results were evaluated by comparing the observed coverage probabilities of the transformed posteriors to their expectation. The proposed strategy yielded equal or superior coverage probabilities compared to the traditional strategy for 140 of the 144 comparisons made in this study (97%). Conditions included four methods of estimated the scaling constants and three anchor lengths.
132	A Bayesian testlet response model with covariates: A simulation study and two applications Baldwin, Su G 01 January 2008 (has links) Understanding the relationship between person, item, and testlet covariates and person, item, and testlet parameters may offer considerable benefits to both test development and test validation efforts. The Bayesian TRT models proposed by Wainer, Bradlow, and Wang (2007) offer a unified structure within which model parameters may be estimated simultaneously with model parameter covariates. This unified approach represents an important advantage of these models: theoretically correct modeling of the relationship between covariates and their respective model parameters. Analogous analyses can be performed via conventional post-hoc regression methods, however, the fully Bayesian framework offers an important advantage over the conventional post-hoc methods by reflecting the uncertainty of the model parameters when estimating their relationship to covariates. The purpose of this study was twofold. First was to conduct a basic simulation study to investigate the accuracy and effectiveness of the Bayesian TRT approach in estimating the relationship of covariates to their respective model parameters. Additionally, the Bayesian TRT results were compared to post-hoc regression results, where the dependent variable was the point estimate of the model parameter of interest. Second, an empirical study applied the Bayesian TRT model to two real data sets: the Step 3 component of the United States Medical Licensing Examination (USMLE), and the Posttraumatic Growth Inventory (PTGI) by Tedeschi and Calhoun (1996). The findings of both simulation and empirical studies suggest that the Bayesian TRT performs very similarly to the post-hoc approach. Detailed discussion is provided and potential future studies are suggested in chapter 5.
133	Measuring teacher effectiveness using student test scores Soto, Amanda Corby 01 January 2013 (has links) Comparisons within states of school performance or student growth, as well as teacher effectiveness, have become commonplace. Since the advent of the Growth Model Pilot Program in 2005 many states have adopted growth models for both evaluative (to measure teacher performance or for accountability) and formative (to guide instructional practice, curricular or programmatic choices) purposes. Growth model data, as applied to school accountability and teacher evaluation, is generally used as a mechanism to determine whether teachers and schools are functioning to move students toward curricular proficiency and mastery. Teacher evaluation based on growth data is an increasingly popular practice in the states, and the introduction of cross-state assessment consortia in 2014 will introduce data that could support this approach to teacher evaluation on a larger scale. For the first time, students in consortium member states will be taking shared assessments and being held accountable for shared curricular standards – setting the stage to quantify and compare teacher effectiveness based on student test scores across states. States' voluntary adoption of the Common Core State Standards and participation in assessment consortia speaks to a new level of support for collaboration in the interest of improved student achievement. The possibility of using these data to build effectiveness and growth models that cross state lines is appealing, as states and schools might be interested in demonstrating their progress toward full student proficiency based on the CCSS. By utilizing consortium assessment data in place of within-state assessment data for teacher evaluation, it would be possible to describe the performance of one state's teachers in reference to the performance of their own students, teachers in other states, and the consortium as a whole. In order to examine what might happen if states adopt a cross-state evaluation model, the consistency of teacher effectiveness rankings based on the Student Growth Percentile (or SGP) model and a value added model are compared for teachers in two states, Massachusetts and Washington D.C., both members of the Partnership for Assessment of Readiness for College and Career (PARCC) assessment consortium The teachers will be first evaluated based on their students within their state, and again when that state is situated within a sample representing students in the other member states. The purpose of the current study is to explore the reliability of teacher effectiveness classifications, as well as the validity of inferences made from student test scores to guide teacher evaluation. The results indicate that two of the models currently in use, SGPs and a covariate adjusted value added model, do not provide particularly reliable results in estimating teacher effectiveness with more than half of the teacher being inconsistently classified in the consortium setting. The validity of the model inferences is also called into question as neither model demonstrates a strong correlation with student test score change as estimated by a value table. The results are outlined and discussed in relation to each model's reliability and validity, along with the implications for the use of these models in making high-stakes decisions about teacher performance.
134	The effects of dimensionality and item selection methods on the validity of criterion-referenced test scores and decisions Dirir, Mohamed Awil 01 January 1993 (has links) Many of the measurement models currently used in testing require that the items that make up the test span a unidimensional space. The assumption of unidimensionality is difficult to satisfy in practice since item pools are arguably multidimensional. Among the causes of test multidimensionality are the presence of minor dimensions (such as test motivation, speed of performance and reading ability) beyond the dominant ability the test is supposed to measure. The consequences of violating the assumption of unidimensionality may be serious. Different item selection procedures when used for constructing tests will have unknown and differential effects on the reliability and validity of tests. The purposes of this research were (1) to review research on test dimensionality, (2) to investigate the impact of test dimensionality on the ability estimation and the decision accuracy of criterion-referenced tests, and (3) to examine the effects of interaction of item selection methods with test dimensionality and content categories on ability estimation and decision accuracy of criterion-referenced tests. The empirical research consisted of two parts: in Part A, three item pools with different dimensionality structures were generated for two different tests. Four item selection methods were used to construct tests from each item pool, and the ability estimates and the decision accuracies of the 12 tests were compared in each test. In Part B, real data were used as an item bank, and four item selection methods were used to construct short tests from the item bank. The measurement precision and the decision accuracies of the resulted tests were compared. It was found that the strength of minor dimensions affect the precision of the ability estimation and decision accuracy of mastery tests, and that optimal item selection methods perform better than other item selection methods, especially when test data are not unidimensional. The differences in measurement precision and decision accuracy among data with different degrees of multidimensionality and among the different item selection methods were statistically and practically significant. An important implication of the study results for the practitioners are that the presence of minor dimensions in a test may lead to the misclassification of examinees, and hence limit the usefulness of the test.
135	An investigation of the effects of conditioning on two ability estimates in DIF analyses when the data are two-dimensional Mazor, Kathleen Michele 01 January 1993 (has links) Differential Item functioning is present when examinees of the same ability, but belonging to different groups, have differing probabilities of success on an item. Traditionally, DIF detection procedures have been implemented conditioning on total test score. However, if there are group differences on the abilities underlying test performance, and total score is used as the matching criterion, multidimensional item impact may be incorrectly identified as DIF. This study sought to confirm earlier research which demonstrated that multidimensional item impact may be identified as DIF, and then to determine whether conditioning on multiple ability estimates would improve item classification accuracy. Data were generated to simulate responses for 1000 reference group members and 1000 focal group members to two-dimensional tests. The focal group mean on the second ability was one standard deviation less than the reference group mean. The dimensional structure of the tests, the discrimination of the items, and the correlation between the two abilities were varied. Logistic regression and Mantel-Haenszel DIF analyses were conducted using total score as the matching criterion. As anticipated, substantial numbers of items were identified as DIF. Items were then selected into subtests based on item measurement direction. The logistic regression procedure was re-implemented, with subtest scores substituted for total score. In the majority of the conditions simulated, this change in criterion resulted in substantial reductions in Type I errors. The magnitude of the reductions were related to the dimensional structure of the test, and the discrimination of the items. Finally, DIF analyses of two real data sets were conducted, using the same procedures. For one of the two tests, substituting subtest scores for total score resulted in a reduction in number of items identified as DIF. These results suggest that multidimensionality in a data set may have a significant impact on the results of DIF analyses. If total score is used as the matching criterion very high Type I error rates may be expected under some conditions. By conditioning on subtest scores in lieu of total score in logistic regression analyses it may be possible to substantially reduce the number of Type I errors, at least in some circumstances.
136	Quantitative and Qualitative Analyses of Readmission to an Institutional Setting for People with Intellectual Disabilities Srivorakiat, Laura January 2013 (has links) No description available. Clinical Psychology Quantitative Psychology mental retardation facilities patients
137	Determination of Change in Online Monitoring of Longitudinal Data: An Evaluation of Methodologies Jokinen, Jeremy D. January 2015 (has links) No description available. Quantitative Psychology Change detection CUSUM Bayesian online data monitoring
138	The Performance of Local Dependence Indices with Psychological Data Houts, Carrie Rena 16 December 2011 (has links) No description available. Quantitative Psychology IRT local dependence Monte Carlo LD indices
139	Assessing the Absolute and Relative Performance of IRTrees Using Cross-Validation and the RORME Index DiTrapani, John B. 03 September 2019 (has links) No description available. Quantitative Psychology Item response theory cross-validation model selection model fit item response tree models quantitative psychology
140	Improving the Detection of Narcissistic Transformational Leaders with the Multifactor Leadership Questionnaire: An Item Response Theory Analysis Martin, Dale Frederick Hosking 01 January 2011 (has links) Narcissistic transformation leaders have inflicted severe physical, psychological, and financial damage on individuals, institutions, and society. Multifactor Leadership Questionnaire (MLQ) has shown promise for early detection of narcissistic leadership tendencies, but selection criteria have not been established. The purpose of this quantitative research was to determine if item response theory (IRT) could advance the detection of narcissistic leadership tendencies using an item-level analysis of the 20 transformational leadership items of the MLQ. Three archival samples of subordinates from Israeli corporate and athletic organizations were combined (N = 1,703) to assess IRT data assumptions, comparative fit of competing IRT models, item discrimination and difficulty, and theta reliabilities within the trait range. Compared to the generalized graded unfolding model, the graded response model had slightly more category points within the 95% confidence interval and consistently lower X2/df item fit indices. Items tended to be easier yet more discriminating than average, and five items were identified as candidates for modification. IRT item marginal reliability was .94 (slightly better than classical test theory reliability of .93), and IRT ability prediction had a .96 reliability within a trait range from -1.7 to 1.3 theta. Based on 8 invariant item parameters, selection criteria of category fairly often (3) or above on attributed idealized influence items and sometimes (2) or below on individual consideration items was suggested. A test case demonstrated how narcissistic tendencies could be detected with these criteria. The study can contribute to positive social change by informing improved selection processes that more effectively screen candidates for key leadership roles that directly impact the wellbeing of individuals and organizations. management Quantitative Psychology Vocational Rehabilitation Counseling

Search results