Spelling suggestions: "subject:"tem response 1heory"" "subject:"tem response btheory""
131 |
Alternative estimation approaches for some common Item Response Theory modelsSabouri, Pooneh, 1980- 06 January 2011 (has links)
In this report we give a brief introduction to Item Response Theory models and multilevel models. The general assumptions of two classical Item Response Theory, 1PL and 2PL models are discussed. We follow the discussion by introducing a multilevel level framework for these two Item Response Theory Models. We explain Bock and Aitkin's (1981) work to estimate item parameters for these two models. Finally we illustrate these models with a LSAT exam data and two statistical softwares; R project and Stata. / text
|
132 |
An evaluation of item difficulty and person ability estimation using the multilevel measurement model with short tests and small sample sizesBrune, Kelly Diane 08 June 2011 (has links)
Recently, researchers have reformulated Item Response Theory (IRT) models into multilevel models to evaluate clustered data appropriately. Using a multilevel model to obtain item difficulty and person ability parameter estimates that correspond directly with IRT models’ parameters is often referred to as multilevel measurement modeling. Unlike conventional IRT models, multilevel measurement models (MMM) can handle, the addition of predictor variables, appropriate modeling of clustered data, and can be estimated using non-specialized computer software, including SAS. For example, a three-level model can model the repeated measures (level one) of individuals (level two) who are clustered within schools (level three).
Limitations in terms of the minimum sample size and number of test items that permit reasonable one-parameter logistic (1-PL) IRT model’s parameters have not been examined for either the two- or three-level MMM. Researchers (Wright and Stone, 1979; Lord, 1983; Hambleton and Cook, 1983) have found that sample sizes under 200 and fewer than 20 items per test result in poor model fit and poor parameter recovery for dichotomous 1-PL IRT models with data that meet model assumptions.
This simulation study tested the performance of the two-level and three-level MMM under various conditions that included three sample sizes (100, 200, and 400), three test lengths (5, 10, and 20), three level-3 cluster sizes (10, 20, and 50), and two generated intraclass correlations (.05 and .15).
The study demonstrated that use of the two- and three-level MMMs lead to somewhat divergent results for item difficulty and person-level ability estimates. The mean relative item difficulty bias was lower for the three-level model than the two-level model. The opposite was true for the person-level ability estimates, with a smaller mean relative parameter bias for the two-level model than the three-level model. There was no difference between the two- and three-level MMMs in the school-level ability estimates. Modeling clustered data appropriately; having a minimum total sample size of 100 to accurately estimate level-2 residuals and a minimum total sample size of 400 to accurately estimate level-3 residuals; and having at least 20 items will help ensure valid statistical test results. / text
|
133 |
Effects of sample size, ability distribution, and the length of Markov Chain Monte Carlo burn-in chains on the estimation of item and testlet parametersOrr, Aline Pinto 25 July 2011 (has links)
Item Response Theory (IRT) models are the basis of modern educational measurement. In order to increase testing efficiency, modern tests make ample use of groups of questions associated with a single stimulus (testlets). This violates the IRT assumption of local independence. However, a set of measurement models, testlet response theory (TRT), has been developed to address such dependency issues. This study investigates the effects of varying sample sizes and Markov Chain Monte Carlo burn-in chain lengths on the accuracy of estimation of a TRT model’s item and testlet parameters. The following outcome measures are examined: Descriptive statistics, Pearson product-moment correlations between known and estimated parameters, and indices of measurement effectiveness for final parameter estimates. / text
|
134 |
A National Survey on Prescribers' Knowledge of and Their Source of Drug-Drug Interaction Information-An Application of Item Response TheoryKo, Yu January 2006 (has links)
OBJECTIVES: (1) To assess prescribers' ability to recognize clinically significant DDIs, (2) to examine demographic and practice factors that may be associated with prescribers' DDI knowledge, and (3) to evaluate prescribers' perceived usefulness of various DDI information sources.METHODS: This study used a mailed questionnaire sent to a national sample of prescribers based on their past history of DDI prescribing which was determined using data from a pharmacy benefit manager covering over 50 million lives. The survey questionnaire included 14 drug-drug pairs that tested prescribers' ability to recognize clinically important DDIs and five 5-point Likert scale-type questions that assessed prescribers' perceived usefulness of DDI information provided by various sources. Demographic and practice characteristics were collected as well. Rasch analysis was used to evaluate the knowledge and usefulness questions.RESULTS: Completed questionnaires were obtained from 950 prescribers (overall response rate: 7.9%). The number of drug pairs correctly classified by the prescribers ranged from zero to thirteen, with a mean of 6 pairs (42.7%). The percentage of prescribers who correctly classified specific drug pairs ranged from 18.2% for warfarin-cimetidine to 81.2% for acetaminophen with codeine-amoxicillin. Half of the drug pair questions were answered "not sure" by over one-third of the respondents; among which, two were contraindicated. Rasch analysis of knowledge and usefulness questions revealed satisfactory model-data fit and person reliability of 0.72 and 0.61, respectively. A multiple regression analysis revealed that specialists were less likely to correctly identify interactions as compared to prescribers who were generalists. Other important predictors of DDI knowledge included the experience of seeing a harm caused by DDIs and the extent to which the risk of DDIs affected the prescribers' drug selection. ANOVA with the post-hoc Scheffe test indicated that prescribers considered DDI information provided by "other" sources to be more useful than that provided by computerized alert system. CONCLUSIONS: This study suggests that prescribers' DDI knowledge may be inadequate. The study found that for the drug interactions evaluated, generalists performed better than specialists. In addition, this study presents an application of IRT analysis to knowledge and attitude measurement in health science research.
|
135 |
Using Hierarchical Generalized Linear Modeling for Detection of Differential Item Functioning in a Polytomous Item Response Theory Framework: An Evaluation and Comparison with Generalized Mantel-HaenszelRyan, Cari Helena 16 May 2008 (has links)
In the field of education, decisions are influenced by the results of various high stakes measures. Investigating the presence of differential item functioning (DIF) in a set of items ensures that results from these measures are valid. For example, if an item measuring math self-efficacy is identified as having DIF then this indicates that some other characteristic (e.g. gender) other than the latent trait of interest may be affecting an examinee’s score on that particular item. The use of hierarchical generalized linear modeling (HGLM) enables the modeling of items nested within examinees, with person-level predictors added at level-2 for DIF detection. Unlike traditional DIF detection methods that require a reference and focal group, HGLM allows the modeling of a continuous person-level predictor. This means that instead of dichotomizing a continuous variable associated with DIF into a focal and reference group, the continuous variable can be added at level-2. Further benefits of HGLM are discussed in this study. This study is an extension of work done by Williams and Beretvas (2006) where the use of HGLM with polytomous items (PHGLM) for detection of DIF was illustrated. In the Williams and Beretvas study, the PHGLM was compared with the generalized Mantel-Haenszel (GMH), for DIF detection and it was found that the two performed similarly. A Monte Carlo simulation study was conducted to evaluate HGLM’s power to detect DIF and its associated Type 1 error rates using the constrained form of Muraki’s Rating Scale Model (Muraki, 1990) as the generating model. The two methods were compared when DIF was associated with a continuous variable which was dichotomized for the GMH and used as a continuous person-level predictor with PHGLM. Of additional interest in this study was the comparison of HGLM’s performance with that of the GMH under a variety of DIF and sample size conditions. Results showed that sample size, sample size ratio and DIF magnitude substantially influenced the power performance for both GMH and HGLM. Furthermore, the power performance associated with the GMH was comparable to HGLM for conditions with large sample sizes. The mean performance for both DIF detection methods showed good Type I error control.
|
136 |
A Monte Carlo Study Investigating Missing Data, Differential Item Functioning, and Effect SizeGarrett, Phyllis Lorena 12 August 2009 (has links)
ABSTRACT A MONTE CARLO STUDY INVESTIGATING MISSING DATA, DIFFERENTIAL ITEM FUNCTIONING, AND EFFECT SIZE by Phyllis Garrett The use of polytomous items in assessments has increased over the years, and as a result, the validity of these assessments has been a concern. Differential item functioning (DIF) and missing data are two factors that may adversely affect assessment validity. Both factors have been studied separately, but DIF and missing data are likely to occur simultaneously in real assessment situations. This study investigated the Type I error and power of several DIF detection methods and methods of handling missing data for polytomous items generated under the partial credit model. The Type I error and power of the Mantel and ordinal logistic regression were compared using within-person mean substitution and multiple imputation when data were missing completely at random. In addition to assessing the Type I error and power of DIF detection methods and methods of handling missing data, this study also assessed the impact of missing data on the effect size measure associated with the Mantel, the standardized mean difference effect size measure, and ordinal logistic regression, the R-squared effect size measure. Results indicated that the performance of the Mantel and ordinal logistic regression depended on the percent of missing data in the data set, the magnitude of DIF, and the sample size ratio. The Type I error for both DIF detection methods varied based on the missing data method used to impute the missing data. Power to detect DIF increased as DIF magnitude increased, but there was a relative decrease in power as the percent of missing data increased. Additional findings indicated that the percent of missing data, DIF magnitude, and sample size ratio also influenced the effect size measures associated with the Mantel and ordinal logistic regression. The effect size values for both DIF detection methods generally increased as DIF magnitude increased, but as the percent of missing data increased, the effect size values decreased.
|
137 |
The Impact of Multidimensionality on the Detection of Differential Bundle Functioning Using SIBTEST.Raiford-Ross, Terris 12 February 2008 (has links)
In response to public concern over fairness in testing, conducting a differential item functioning (DIF) analysis is now standard practice for many large-scale testing programs (e.g., Scholastic Aptitude Test, intelligence tests, licensing exams). As highlighted by the Standards for Educational and Psychological Testing manual, the legal and ethical need to avoid bias when measuring examinee abilities is essential to fair testing practices (AERA-APA-NCME, 1999). Likewise, the development of statistical and substantive methods of investigating DIF is crucial to the goal of designing fair and valid educational and psychological tests. Douglas, Roussos and Stout (1996) introduced the concept of item bundle DIF and the implications of differential bundle functioning (DBF) for identifying the underlying causes of DIF. Since then, several studies have demonstrated DIF/DBF analyses within the framework of “unintended” multidimensionality (Oshima & Miller, 1992; Russell, 2005). Russell (2005), in particular, examined the effect of secondary traits on DBF/DTF detection. Like Russell, this study created item bundles by including multidimensional items on a simulated test designed in theory to be unidimensional. Simulating reference group members to have a higher mean ability than the focal group on the nuisance secondary dimension, resulted in DIF for each of the multidimensional items, that when examined together produced differential bundle functioning. The purpose of this Monte Carlo simulation study was to assess the Type I error and power performance of SIBTEST (Simultaneous Item Bias Test; Shealy & Stout, 1993a) for DBF analysis under various conditions with simulated data. The variables of interest included sample size and ratios of reference to focal group sample sizes, correlation between primary and secondary dimensions, magnitude of DIF/DBF, and angular item direction. Results showed SIBTEST to be quite powerful in detecting DBF and controlling Type I error for almost all of the simulated conditions. Specifically, power rates were .80 or above for 84% of all conditions and the average Type I error rate was approximately .05. Furthermore, the combined effect of the studied variables on SIBTEST power and Type I error rates provided much needed information to guide further use of SIBTEST for identifying potential sources of differential item/bundle functioning.
|
138 |
Detecting Inaccurate Response Patterns in Korean Military Personality Inventory: An Application of Item Response TheoryHong, Seunghwa 16 December 2013 (has links)
There are concerns regarding the risk of the inaccurate responses in the personality data. The inaccurate responses negatively affect in the individual selection contexts. Especially, in the military context, the personality score including inaccurate responses results in the selection of inappropriate personnel or allows enlistment dodgers to avoid their military duty. This study conducted IRT-based person-fit analysis with the dichotomous military dataset in the Korean Military Personality Inventory. In order for that, 2PL model was applied for the data and person-fit index l_(z) was used to detect aberrant respondents. Based on l_(z) values of each respondent, potentially inaccurate respondents was identified. In diagnosing possible sources of aberrant response patterns, PRCs was assessed. This study with the military empirical data shows that person-fit analysis using l_(z) is applicable and practical method for detecting inaccurate response patterns in the personnel selection contexts based on the personality measurement.
|
139 |
Measuring Dementia of the Alzheimer Type More PreciselyLowe, Deborah Anne 14 March 2013 (has links)
Alzheimer’s disease (AD) progressively impairs cognitive and functional abilities. Research on pharmacological treatment of AD is shifting to earlier forms of the disease, including preclinical stages. However, assessment methods traditionally used in clinical research may be inappropriate for these populations. The Alzheimer Disease Assessment Scale-cognitive (ADAS-cog), a commonly used cognitive battery in AD research, is most sensitive in the moderate range of cognitive impairment. It focuses on immediate recall and recognition aspects of memory rather than retention and delayed recall. As clinical trials for dementia continue to focus on prodromal stages of AD, instruments need to be retooled to focus on cognitive abilities more prone to change in the earliest stages of the disease. One such domain is delayed recall, which is differentially sensitive to decline in the earliest stages of AD. A supplemental delayed recall subtest for the ADAS-cog is commonly implemented, but we do not know precisely where along the spectrum of cognitive dysfunction this subtest yields incremental information beyond what is gained from the standard ADAS-cog. An item response theory (IRT) approach can analyze this in a psychometrically rigorous way. This study’s aims are twofold: (1) to examine where along the AD spectrum the delayed recall subtest yields optimal information about cognitive dysfunction, and (2) to determine if adding delayed recall to the ADAS-cog can improve prediction of functional outcomes, specifically patients’ ability to complete basic and instrumental activities of daily living.
Results revealed differential functioning of ADAS-cog subtests across the dimension of cognitive impairment. The delayed recall subtest provided optimal information and increased the ADAS-cog’s measurement precision in the relatively mild range of cognitive dysfunction. Moreover, the addition of delayed recall to the ADAS- cog, consistent with my hypothesis, increased covariation with instrumental but not basic activities of daily living. These findings provide evidence that the delayed recall subtest slightly improves the ADAS-cog’s ability to capture information about cognitive impairment in the mild range of severity and thereby improves prediction of instrumental functional deficits.
|
140 |
Item response theory and factor analysis applied to the Neuropsychological Symptom Scale (NSS) / Analysis of the NSS / Analysis of the Neuropsychological SymptonLutz, Jacob T. 21 July 2012 (has links)
The Neuropsychological Symptom Inventory (NSI; Rattan, Dean, & Rattan, 1989), a self report measure of psychiatric and neurological symptoms, was revised to be presented in an electronic format. This revised instrument, the Neuropsychological Symptom Scale (Dean, 2010), was administered to 1,141 adult volunteers from a medium-sized Midwestern university. The collected data was subjected to exploratory factor analysis which suggested three primary factors related to emotional, cognitive, and somatosensory functioning. The items on the NSS were then organized into three subscales reflecting these areas of functioning. A fourth experimental subscale was also created to facilitate the collection of data on items that did not load on any of the three primary subscales. Item Response Theory (IRT) analysis and Classical Test Theory (CTT) approaches were then applied and compared as means of developing standard scores on the three primary subscales of the NSS. The results of these analyses are provided along with recommendations related to the further development of the NSS as an assessment tool. / Department of Educational Psychology
|
Page generated in 0.0824 seconds