Global ETD Search

131	Investigating How Equating Guidelines for Screening and Selecting Common Items Apply When Creating Vertically Scaled Elementary Mathematics Tests Hardy, Maria Assunta 09 December 2011 (has links) (PDF) Guidelines to screen and select common items for vertical scaling have been adopted from equating. Differences between vertical scaling and equating suggest that these guidelines may not apply to vertical scaling in the same way that they apply to equating. For example, in equating the examinee groups are assumed to be randomly equivalent, but in vertical scaling the examinee groups are assumed to possess different levels of proficiency. Equating studies that examined the characteristics of the common-item set stress the importance of careful item selection, particularly when groups differ in ability level. Since in vertical scaling cross-level ability differences are expected, the common items' psychometric characteristics become even more important in order to obtain a correct interpretation of students' academic growth. This dissertation applied two screening criteria and two selection approaches to investigate how changes in the composition of the linking sets impacted the nature of students' growth when creating vertical scales for two elementary mathematics tests. The purpose was to observe how well these equating guidelines were applied in the context of vertical scaling. Two separate datasets were analyzed to observe the impact of manipulating the common items' content area and targeted curricular grade level. The same Rasch scaling method was applied for all variations of the linking set. Both the robust z procedure and a variant of the 0.3-logit difference procedure were used to screen unstable common items from the linking sets. (In vertical scaling, a directional item-difficulty difference must be computed for the 0.3-logit difference procedure.) Different combinations of stable common items were selected to make up the linking sets. The mean/mean method was used to compute the equating constant and linearly transform the students' test scores onto the base scale. A total of 36 vertical scales were created. The results indicated that, although the robust z procedure was a more conservative approach to flagging unstable items, the robust z and the 0.3-logit difference procedure produced similar interpretations of students' growth. The results also suggested that the choice of grade-level-targeted common items affected the estimates of students' grade-to-grade growth, whereas the results regarding the choice of content-area-specific common items were inconsistent. The findings from the Geometry and Measurement dataset indicated that the choice of content-area-specific common items had an impact on the interpretation of students' growth, while the findings from the Algebra and Data Analysis/Probability dataset indicated that the choice of content-area-specific common items did not appear to significantly affect students' growth. A discussion of the limitations of the study and possible future research is presented. vertical scaling common-item design equating linking content and construct representation item stability robust z 0.3-logit difference Item Response Theory Rasch scaling Educational Psychology
132	Validating hierarchical sequences in the design copying domain using latent trait models. Burch, Melissa Price. January 1988 (has links) The present study was a systematic investigation of hierarchical skill sequences in the design copying domain. The factors associated with possible variations in task difficulty were delineated. Five hierarchies were developed to reflect variations in rule usage, the structuring of responses, presence of angles, spatial orientations, and stimulus complexity. Three-hundred thirty four subjects aged five through ten years were administered a 25 item design copying test. The data were analyzed using probabilistic models. Latent trait models were developed to test the hypothesized skill sequences. Each latent trait model was statistically compared to alternate models to arrive at a preferred model that would adequately represent the data. Results suggested that items with predictable difficulty levels can be developed in this domain based on an analysis of stimulus dimensions and the use of rules for task completion. The inclusion of visual cues to guide design copying assists accurate task completion. Implications of the current findings for facilitating the construction of tests which accurately provide information about children's skill levels were discussed. The presence of hierarchical skill sequences in a variety of ability domains was supported. Drawing, Psychology of. Space perception in children. Item response theory.
133	Mindfulness In Parenting Questionnaire (MIPQ): Development and validation of a measure of mindful parenting McCaffrey, Stacey 01 January 2015 (has links) Mindful parenting has been defined as “paying attention to your child and your parenting in a particular way: intentionally, here and now, and non-judgmentally” (Kabat-Zinn & Kabat-Zinn, 1997). Although it is hypothesized that increasing mindful parenting improves parent and child functioning, the development of a measure of mindful parenting is needed to support this assumption. The aim of the present study was to develop and psychometrically evaluate a measure of mindful parenting (the Mindfulness In Parenting Questionnaire: MIPQ) for use with mothers and fathers of both children and adolescents, ranging in age from 2- to 16-years-old. The current study contained three phases. First, content experts in the area of mindfulness and parenting provided content for preliminary items. Second, parents participated in cognitive interviewing in order to reduce measurement error and increase the psychometrics of the measure. The third and final phase consisted of large-scale data collection to explore the psychometrics of the new MIPQ. Two-hundred and three parents recruited from academic and after-school programs in South Florida completed the MIPQ, along with measures of intrapersonal mindfulness, parenting behavior, parenting style, and a demographics questionnaire. The Partial Credit Model, which evidenced significantly better fit than the Rating Scale Model, was used to evaluate the MIPQ using WINSTEPS 3.74.01. The MIPQ was iteratively refined based on statistical and clinical considerations, resulting in a 28-item measure with 4 response categories. Further, results supported a 2 factor mindful parenting construct. The first factor (Parental Self-Efficacy) reflects a parent’s self-efficacy, as well as nonreactivity and awareness within the parenting role, while the second factor (Being in the Moment with the Child) pertains to the child, and reflects present-centered attention, empathic understanding, and acceptance of the child. Factors were correlated (r = .67) and explained 42.3% and 43.4% of the variance, respectively. Correlations between the MIPQ and parenting style, parenting practices, practice of mindfulness, and participant demographics provided support for convergent and discriminant validity. The MIPQ exhibited a positive and weak correlation with the MAAS, indicating that interpersonal and intrapersonal mindfulness are related, but separate and distinct constructs. Limitations and directions for future research are discussed. item response theory measurement mindful parenting modern test theory Psychology
134	Study of information specific and relational processing through advertising messaging frameworks Barbeisch, Victoria Elizabeth 28 July 2014 (has links) Utilizing the information garnered from research on information processing in the two elaboration types (i.e., item-specific and relational processing) this research examines the influence of gender and advertising narrative effectiveness. Advertising effectiveness is determined by recall and perception from exposure to relational and item-specific developed narratives. Included are literature reviews, supporting data and analysis, results, discussion, and speculations of differing outcomes based on the study conducted. / text Item-specific Relational Processing Narrative-type Gender Perception Recall
135	En Raschanalys för att jämföra två svenska översättningar av en enkät som mäter hälsorelaterad livskvalitet Kielén, Martina, Wallentinsson, Emma January 2016 (has links) During the 1980’s the non-profit organisation RAND Corporation conducted the two-year Medical Outcomes Study with the goal of creating a comprehensive medical questionnaire. The resulting 116-item questionnaire measures health related quality of life (HRQoL) topics such as physical, mental and general health. The questionnaire is available as a free resource on their web page. SF-36, which contains 36 of these questions, is distributed for a fee by the US company Quality Metric Inc. The company has translated the questionnaire into several languages, including Swedish, and has also taken license for the translations. Registercentrum sydost has made a new Swedish translation of the same questions as in the SF-36. This survey is called RAND-36 and is license free. Because Quality Metric Inc has taken license for its Swedish translation, the surveys are similar but not identical. This study aims to compare the aforementioned HRQoL-instruments to determine whether it is possible to replace the licensed questionnaire SF-36 with the license free RAND-36. The distribution of items with response options according ordinal scale were compared with Mann-Whitney U-test. The test yielded a significant difference for eight items in the measure PF(physical functioning), MH(mental health), VT (vitality) and GH (general health perceptions). The distribution of items with response options according dichotomous scale were compared with X2-test. The test yielded significant difference for an item in the measure RE (emotional role functioning). The reliability of questionnaire was compared with ordinal alpha. In the selection the reliability between MH and VT is equivalent. The biggest difference between the surveys is the measure RP (physical role functioning) where the RAND-36 meets the requirement that the measure can be used for reliable conclusions on the individual level, which is a condition that SF-36 can’t met. The probability of entering an answer, given the respondent's ability, was compared with Rasch analysis. Wald's test gave DIF between most items within the measures PF, MH, VT and GH. Rasch analysis item response theory SF-36 RAND-36
136	The Teacher Attitudes toward Homeless Students Scale: Development and Validation Brown, Jessica January 2012 (has links) Thesis advisor: Larry H. Ludlow / Recent estimates suggest there are roughly 1.6 million homeless children and this number is growing (National Center on Family Homelessness, 2011). This trend is particularly worrisome given that homeless children face a number of obstacles within society and education, not the least of which is negative teacher attitudes (Swick, 2000; U.S. Department of Education, 2002). This study's primary research question addressed whether a set of underlying dimensions could be identified and used to effectively measure teacher attitudes toward homeless students. A necessary part of answering this research question involved the development of a measurement scale. Both Classical Test Theory and Item Response Theory analyses aided in the elimination process of items in order to create the final Teacher Attitudes toward Homeless Students (TAHS) assessment, which includes an attitudes scale and subscales, and a related knowledge scale. The final outcome was a set of 43 items, across eight dimensions, which could effectively be used to measure teacher attitudes toward homeless students. Additionally, the findings upheld the principles of Rasch measurement, including unidimensionality, a hierarchical ordering of items, and a continuum of the construct definition. In other words, the findings indicate that the TAHS scale was successfully developed according to explicit a priori measurement criteria. Moreover, additional correlational and regression analyses provided empirical construct and convergent validity evidence for the TAHS scale. It was also found that attitudes differed slightly for teachers of various backgrounds and experiences, but when analyzed collectively these variables were not significantly related to teacher attitudes toward homeless students. Additionally, there was only a weak relationship between teachers' attitudes and their knowledge about homelessness. Overall the TAHS scale allows for reliable and accurate measurement of teacher attitudes toward homeless students from which valid inferences can be made. The TAHS scale scores and score descriptors can be used to help teacher interpret their attitude. This has the potential for a direct impact in creating equal educational opportunities for homeless students as teachers become aware of their attitude and make positive changes. / Thesis (PhD) — Boston College, 2012. / Submitted to: Boston College. Lynch School of Education. / Discipline: Educational Research, Measurement, and Evaluation. attitudes homeless item response theory measurement Rasch teacher
137	The Impact of Differential Item Functioning of MCAS Mathematics Exams on Immigrant Students and Communities Suarez Munist, Octavio Nestor January 2011 (has links) Thesis advisor: Walt Haney / Migration is now a major component of globalization. The combination of better economic opportunities and lower fertility rates in developed nations suggests that the current migratory wave will last for many decades to come (United Nations Population Fund, 2007). In the U.S., immigration over the last thirty years has significantly changed the face of the workforce and the classroom. At the state level, Massachusetts has been one of the top immigrant-receiving states in the Union. Since the 1990's, Massachusetts has been implementing a policy of standardized testing for accountability and graduation. The Massachusetts Comprehensive Assessment System (MCAS) is a set of standardized, norm-referenced tests administered to comply with the test-based accountability provisions of the 1993 No Child Left Behind federal legislation (NCLB). Used today for high-stakes decisions such as NCLB accountability as well as high school graduation requirements, MCAS has raised a number of validity concerns. Differential item functioning analysis, a technique to statistically identify potentially biased in tests, has not been used to challenge the validity of the tests, although it can provide new insights into test bias that were not previously available. This dissertation investigates the presence of differential item functioning in MCAS between native students and immigrant students. It identifies one test, the 2008 Grade 3 MCAS Mathematics test, as having a significant number of items exhibiting differential functioning and compares the original test version to a purified test version with these items removed. The purified test version results in larger test score improvements for immigrants as well as other non-mainstream students. These alternative test scores are sufficiently large to affect the determination of NCLB-based performance status for many schools and districts that are comparatively poorer and more diverse than the average. While the lack of more precise data on immigrants and other characteristics of the data set reduce the definiteness of the results, there is ample cause for concern about the presence of differential item functioning-based bias on MCAS and the need to further study this phenomenon as NCLB-based accountability determinations impact a growing number of schools, districts and communities. / Thesis (EdD) — Boston College, 2011. / Submitted to: Boston College. Lynch School of Education. / Discipline: Educational Research, Measurement, and Evaluation. Bias Differential Item Functioning Immigrants Massachusetts MCAS Standardized Tests
138	A Bayesian Analysis of a Multiple Choice Test Luo, Zhisui 24 April 2013 (has links) In a multiple choice test, examinees gain points based on how many correct responses they got. However, in this traditional grading, it is assumed that questions in the test are replications of each other. We apply an item response theory model to estimate students' abilities characterized by item's feature in a midterm test. Our Bayesian logistic Item response theory model studies the relation between the probability of getting a correct response and the three parameters. One parameter measures the student's ability and the other two measure an item's difficulty and its discriminatory feature. In this model the ability and the discrimination parameters are not identifiable. To address this issue, we construct a hierarchical Bayesian model to nullify the effects of non-identifiability. A Gibbs sampler is used to make inference and to obtain posterior distributions of the three parameters. For a "nonparametric" approach, we implement the item response theory model using a Dirichlet process mixture model. This new approach enables us to grade and cluster students based on their "ability" automatically. Although Dirichlet process mixture model has very good clustering property, it suffers from expensive and complicated computations. A slice sampling algorithm has been proposed to accommodate this issue. We apply our methodology to a real dataset obtained on a multiple choice test from WPIâ€™s Applied Statistics I (Spring 2012) that illustrates how a student's ability relates to the observed scores. Dirichlet Process Mixture Markov Chain Monte Carlo Item Response
139	Stability and sensitivity of a model-based person-fit index in detecting item pre-knowledge in computerized adaptive test. / 特定模型個人擬合指數在探測預見題目時的穩定性及靈敏度 / CUHK electronic theses & dissertations collection / Te ding mo xing ge ren ni he zhi shu zai tan ce yu jian ti mu shi de wen ding xing ji ling min du January 2008 (has links) After the stability and sensitivity of FLOR were investigated, the application of it in the CAT environment had become the main concern. The present studies found that both the test length and the number of exposed items affect the final value of FLOR. In the fixed length CAT, the FLOR has a much stronger sensitivity than lz and CUSUM in detecting item pre-knowledge. The sensitivity of FLOR in the fixed length CAT was the same as that in the fixed length fixed items test. If the test length could vary, the sensitivity of FLOR in CAT would be slightly weakened. The Adjusted FLOR index could increase the sensitivity. Concerning about the effect of ability on the sensitivity of FLOR in CAT, it was found that the abilities of the test takers in CAT did not affect the sensitivity of FLOR and Adjusted FLOR. / Item response theory is a modern test theory. It focuses on the performance of each item. Under this framework, the performance of test takers on a test item can be predicted by a set of abilities. The relationship between the test takers' item performances and the set of abilities underlying item performances can be described by a monotonically increasing function called an item characteristic curve. Due to various personal reasons, the performances of the test takers may depart from the response patterns predicted by the underlying test model. In order to calculate the extent of departure of these aberrant response patterns, a number of methods have been developed under the theme "person-fit statistics". The degree of aberration is calculated as an index called person-fit index. Inside the computerized adaptive testing (CAT), test takers with different abilities will answer different numbers of questions and the difficulties of the items administered to them are usually clustered at the abilities of the test takers. Due to this reason, the application of person-fit indices in the computerized adaptive testing environment to measure misfit is difficult. / The present study also found that FLOR has a much superior sensitivity over other indices in detecting item pre-knowledge. Concerning about the sensitivity over different abilities of test takers, it was found that the sensitivity of FLOR was the highest among low ability test takers and the weakest among strong ability test takers in the fixed length and fixed items tests. However, the sensitivities of FLOR became the same among different abilities of test takers if items with difficulties matching their abilities were used in the tests. The number of beneficiaries among the test takers did not affect the sensitivity of FLOR. Moreover, in a simulation to test the differentiating power of FLOR, it was found that FLOR could differentiate item pre-knowledge from other reasons of personal misfits (test anxiety, player, random response and challenger) effectively. / The present study assessed the stability of FLOR over other variables, which were unrelated to item pre-knowledge. It found that FLOR was stable over the discrimination and difficulty parameters of test items. It was also stable over positions of the exposed items in the test and the initial assignment of prior probability of item pre-knowledge. However, the asymptotes (guessing factor) and the probabilities of item exposure did affect the final values of FLOR seriously. / The present study used the hf plot to access the sensitivity of the person-fit indices. hf plot is a plot of hit rate against false alarm rate. For a higher hit rate, usually a higher false alarm rate is followed. hf plot provides a good tools for comparison between indices by inspection of the speed of rise of the curves. A sensitive index should give a faster rise of the curve. In this study, sensitivity of an index was defined as the speed of rise of the hf plot, which is represented by a parameter hftau estimated from the data obtained from hf plot. / When the frequent accesses to the item bank has become feasible, test takers may memorize blocks of test items and share these items with future test takers. Individuals with prior knowledge of some items may use that information to get high scores, in the sense that their test scores have been artificially inflated. FLOR is an index of posterior log-odds ratio used for detecting the use of item pre-knowledge. It can be applied both in the fixed item, fixed length test and the CAT environment. It is a model-based index in which aberrant models are defined in the situation of item pre-knowledge. FLOR describes the likelihood that a response pattern arises from the aberrant models. / Hui Hing-fai. / Adviser: Kit-tai Hau. / Source: Dissertation Abstracts International, Volume: 70-09, Section: A, page: . / Thesis (Ed.D.)--Chinese University of Hong Kong, 2008. / Includes bibliographical references (leaves 108-111). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. / School code: 1307. Computer adaptive testing
140	Potential test information for multidimensional tests Jonas, Katherine Grace 01 August 2017 (has links) Test selection in psychological assessment is guided, both explicitly and implicitly, by how informative tests are with regard to a trait of interest. Most existing formulations of test information are sensitive to subpopulation variation, with the result that test information will vary from sample to sample. Recently, measures of test information have been developed that quantify the potential informativeness of the test. These indices are defined by the properties of the test, as distinct from the properties of the sample or examinee. As of yet, however, measures of potential information have been developed only for unidimensional tests. In practice, psychological tests are often multidimensional. Furthermore, multidimensional tests are often used to estimate one specific trait among many. This study develops measures of potential test information for multidimensional tests, as well as measures of marginal potential test information---test information with regard to one trait within a multidimensional test. In Study 1, the performance of the metrics was tested in data simulated from unidimensional, first-order multidimensional, second-order, and bifactor models. In Study 2, measures of marginal and multidimensional potential test information are applied to a set of neuropsychological data collected as part of Rush University's Memory and Aging Project. In simulated data, marginal and multidimensional potential test information were sensitive to the changing dimensionality of the test. In observed neuropsychological data, five traits were identified. Verbal abilities were most closely correlated with probable dementia. Both indices of marginal potential test information identify the Mini Mental Status Exam as the best measure of that trait. More broadly, greater marginal potential test information calculated with regard to verbal abilities was associated with greater criterion validity. These measures allow for the direct comparison of two multidimensional tests that assess the same trait, facilitating test selection and improving the precision and validity of psychological assessment. item response theory measurement psychometrics test information Psychology

Search results