Global ETD Search

221	The influence of sample size, effect size, and percentage of DIF items on the performance of the Mantel-Haenszel and logistic regression DIF identification procedures. Kennedy, Michael. January 1994 (has links) The frequent use of standardized tests for admission, advancement, and accreditation has increased public awareness of measurement issues, in particular, test and item bias. The logistic regression (LR) and Mantel-Haenszel (MH) procedures are relatively new methods of detecting item bias or differential item functioning (DIF) in tests. In only a few studies has the performance of these two procedures been compared. In the present study, sample size, effect size, and percentage of DIF items in the test were manipulated in order to compare detection rates of uniform DIF by the LR and MH procedures. Simulated data, with known amounts of DIF, were used to evaluate the effects of these variables on DIF detection rates. In detecting uniform DIF, the LR procedure had a slight advantage over the MH procedure at the cost of increased false positive rates. P-value difference was definitely a more accurate measure of the amount of DIF than b value difference. (Abstract shortened by UMI.) Education, Tests and Measurements.
222	Distribution and power of selected item bias indices: A Monte Carlo study. Ibrahim, Abdul K. January 1992 (has links) This study examines the following DIF procedures--Transformed Item Difficulty (TID), Full Chi-Square, Mantel-Haenszel chi-square, Mantel-Haenszel delta, Logistic Regression, SOS2, SOS4, and Lord's chi-square under three sample sizes, two test lengths, four cases of item discrimination arrangement, and three item difficulty levels. The study is in two parts: The first part examines the distributions of the indices under null (no bias) conditions. The second part deals with the power of the procedures to detect known bias in simulated test data. Agreements among procedures are also addressed. Lord's chi-square certainly appears to perform very well. Its detection rates were very good, and its percentiles were not affected by discrimination level or test length. In retrospect, one would like to know how well it might do at smaller sample sizes. When the tabled values were used, it performed equally well in detecting bias and improved in reducing false positive rates. Of the other indices, the Mantel-Haenszel and the logistic regression indices seemed the best. Camilli chi-square had a number of problems. Its tabled values were not at all useful for detection of bias. The TID was somewhat better but does not have a significance test associated with it. One would need to rely on baseline studies, if one were to use it. For uniform bias either Mantel-Haenszel chi-square or logistic regression would be recommended, while for nonuniform bias logistic regression would be appropriate. It is interesting to note that Lord's chi-square was effective for detecting either kinds of bias. We have been told that sample size is related to chi-square values. For each of the chi square indices the observed values were considerably lower than tabled values. Of course, these were conditions where no bias was present except that which might be randomly induced in data generation. Perhaps it is those instances where bias is truly present that larger sample sizes allow us to more easily identify biased items. Certainly the proportions of biased items detected was greater for large sample sizes for Camilli chi-square, Mantel-Haenszel chi-square, and logistic regression chi-squares. Education, Tests and Measurements.
223	Étude corrélationnelle des liens entre l'anticipation, la préparation et l'autoévaluation, et le résultat à un examen de rendement scolaire. Fournier, Charles. January 1991 (has links) Abstract Not Available. Education, Tests and Measurements.
224	Assessing test dimensionality using two approximate chi-square statistics. De Champlain, André F. January 1992 (has links) Abstract Not Available. Education, Tests and Measurements.
225	An empirical study of the consistency of differential item functioning detection. Brown, Paulette C. January 1992 (has links) Total test scores of examinees on any given standardized test are used to provide reliable and objective information regarding the overall performance of the test takers. When the probability of successfully responding to a test item is not the same for examinees at the same ability levels, but from different groups, the item functions differentially in favour of one group over the other group. This type of problem, defined as differential item functioning (DIF), creates a disadvantage for members of certain subgroups of test takers. Test items need to be accurate and valid measures for all groups because test results may be used to make significant decisions which may have an impact on the future opportunities available to test takers. Thus, DIF is an issue of concern in the field of educational measurement. The purpose of this study was to investigate how well the Mantel-Haenszel (MH) and logistic regression (LR) procedures perform in the identification of items that function differentially across gender groups and regional groups. Research questions to be answered by this study were concerned with three issues: (1) the detection rates for DIF items and items which did not exhibit DIF, (2) the agreement for the MH and LR methods in the detection of DIF items, and (3) the effectiveness of these indices across sample size and over replications. (Abstract shortened by UMI.) Education, Tests and Measurements.
226	Detecting DIF in polytomous item responses. Tian, Fang. January 1999 (has links) Use of polytomously scored items in educational tests is becoming increasingly common with new types of assessments such as performance assessment It is believed that performance assessment provides more equitable approaches to testing than traditional multiple-choice tests. However, some forms of performance assessment may be more likely than conventional tests to be influenced by construct-irrelevant factors, resulting in differential item functioning (DIF). Several methods have been proposed for DIF assessment with polytomous item responses, such as the Mantel procedure (Mantel, 1963), the generalized Mantel-Haenszel procedure (GMH) (Mantel & Haenszel, 1959), the polytomous extension of the standardization approach (STND) (Potenza & Dorans, 1995), the polytomous extension of SIBTEST (SIBTEST) (Chang, Mazzeo, & Roussos, 1996), and logistic discriminant function analysis (LDFA) (Miller & Spray, 1993). The purpose of this study was to investigate and compare the performance of the Mantel, GMH, STND, SIBTEST, and LDFA (LDFA-Uniform and LDFA-Nonuniform) procedures in detecting DIF with polytomous item responses under a variety of conditions. A simulation study was conducted to evaluate their Type I error and power in detecting DIF when the properties of the items were known. The factors considered were group ability distribution differences (three levels), test length (two levels), sample size (three levels), sample size ratio (two levels), item discrimination (null DIF, constant uniform DIF, and balanced uniform DIF) or item-discrimination parameter difference between reference and focal groups (nonuniform DIF) (three levels), and DIF conditions (four levels). Based on the findings of this study, all procedures had good and comparable Type I error control when groups were equal in ability or when groups had a small ability difference and item discrimination was not high or when groups had a large ability difference and item discrimination was low. Both LDFA-Nonuniform and GMH can be used to detect nonuniform DIF with LDFA-Nonuniform being more powerful when two groups had equal ability distributions and GMH being more powerful when two groups had different ability distributions. Finally, results from this study indicated that the LDFA procedure is preferable over the other procedures. (Abstract shortened by UMI.) Education, Tests and Measurements.
227	Assessing the performance of the approximate chi-square and Stout's T statistics with different test structures. Pang, Xiao L. January 1999 (has links) The assessment of dimensionality underlying the responses to a set of test items is a fundamental issue in applying correct IRT models to examinee ability estimation and test result interpretation. Currently, three assessment methods have been shown to be particularly promising: the original Stout's T (Stout's T1), the refined Stout's T (Stout's T2) (Nandakumar, 1987, 1993) based on Stout's essential unidimensional assumption (Stout, 1987), and the Approximate chi2 (De Champlain & Gessaroli, 1992) derived from McDonald's nonlinear factor analysis based on the weak principle of local independence (McDonald, 1981). However, the three indices have only been tested under limited research conditions. The purpose of this study was to assess and compare the Type I error rates and power of the Approximate chi2, Stout's T1, and Stouts T2 in assessing dimensionality of a set of item responses. The variables used in the Type I error study were test length (L) (40 and 80 items), sample size (N) (500, 1,000, and 2,000), and item discrimination (a) (.7, 1.0, and .14). A 2 x 3 x 3 design was created. For each cell of the design, 100 replications were carried out. In the power study, in addition to the three variables used in the Type I error study, different test structure (two dimensional simple test structure and two dimensional complex test structure) and dimension correlation (r = .0, .4, .57, and .7) were used. For both simple and complex test structure, a dimension ratio of 3:1 was set. Similarly, 100 replications were carried out for each combination of the conditions. A total of 14,400 data sets were generated. A according to the results obtained in this study, each index possesses certain advantages and drawbacks. The Approximate chi2 had good Type I error of zero over all conditions and excellent power with two dimensional simple test structure. Stout's T1 and Stout's T2, on the other hand, had higher Type I error rates than the Approximate chi 2, ranging from zero to 12, excellent power with two dimensional simple test structure, and better power than the Approximate chi2 with two dimensional complex test structure. However, Stouts T1 and Stouts T2 have to be used with great caution given the unsatisfactory power shown with two dimensional complex test structure in many cases. (Abstract shortened by UMI.) Education, Tests and Measurements.
228	Identification of gifted students using a neuropsychophysiological paradigm: An exploratory study. Shaffer, Dianna. January 1999 (has links) The development of an accurate assessment paradigm that identifies students as capable of high performance in a variety of areas is urgently needed in education. Current strategies tend to identify only those students who are intellectually or academically gifted. Gifted students who possess other strengths or pupils who exhibit atypical behavior are often overlooked or misidentified in the process. A few educational theorists/practitioners, believe that current practice should be supplemented by some of the testing protocols typically associated with the field of neuropsychophysiology (i.e. neurology, psychology, physiology). This study develops a neuropsychophysiological paradigm that differentiates high average, gifted, and gifted students with perceived behavior problems. Sixty-six students, 10--12 years of age, participated in the study. After being screened for incidents of psychopathology, each student completed 14 timed/untimed relaxation and performance conditions while being monitored by EEG and biofeedback technology. The data were analysed quantitatively using three non-parametric tests. The results of the data analysis suggest that high average, gifted, and gifted students with perceived behavior problems can be differentiated using a neuropsychophysiological paradigm. Foremost in the conclusions is the suggestion that gifted students may excel in two domains intellectual and psychophysiological, and that gifted students with perceived behavior problems may be a legitimate subgroup of the larger population. Education, Tests and Measurements.
229	Étude des éléments d'évaluation de l'enseignement clinique en soins infirmiers au niveau collégial. Pharand, Denyse. January 1999 (has links) Le but premier de la présente étude est de préciser l'intention du processus d'évaluation de l'enseignement clinique, c'est-à-dire les éléments pouvant constituer l'objet de l'évaluation et ce, à partir des perceptions des professeures et des étudiantes en soins infirmiers au niveau collégial. Les composantes du processus de l'enseignement en milieu clinique et la notion du rôle sont également étudiés à titre complémentaire. Le cadre conceptuel de cette étude est basé sur la complémentarité des travaux de recherche sur l'enseignement clinique dans le domaine des sciences infirmières; les uns touchant la définition du processus de l'enseignement clinique, les autres portant sur l'évaluation de cet aspect de l'enseignement. La recherche s'est déroulée auprès de six professeures et six étudiantes. Dans un premier temps, les 12 participantes ont été observées directement dans leur milieu d'enseignement clinique, par la suite chacune d'entre elles a été rencontrée en entrevue. Afin de vérifier l'exactitude de l'analyse des données, deux démarches de validation ont été effectuées. Une première validation, auprès de deux professeures de soins infirmiers non impliquées dans l'étude, où ces dernières ont étudié les protocoles d'entrevue codés de même que les cartes concepts correspondantes. La seconde démarche de validation a été faite auprès de chacune des participantes. Les résultats indiquent que la relation privilégiée entre la professeure et ses étudiantes se situe au coeur du processus de l'enseignement clinique. Cette conception du processus de l'enseignement clinique introduit les éléments d'évaluation de l'enseignement clinique, car la professeure qui est imputable du maintien de la qualité de la relation avec ses étudiantes, se doit de développer cinq compétences principales. Ces compétences sont (a) humaine, (b) pédagogique, (c) technique, (d) professionnelle infirmière et (e) organisationnelle. Ces compétences principales sont concrétisées par des compétences sateIlites, à leur tour explicitées par des indicateurs. La présente recherche, en plus de jeter les premiers jalons pour l'élaboration des instruments d'évaluation, devrait apporter tant aux professeures qui enseignent en soins infirmiers, qu'aux administrateurs des institutions d'enseignement. Pour les premières, les compétences identifiées comme éléments d'évaluation offrent un guide précieux pour bien se préparer à leur fonction d'enseignement en milieu clinique. Pour les seconds, les résultats de cette étude donnent un cadre de référence précis pour initier un processus d'évaluation de l'enseignement en milieu clinique dont l'intention pourrait tre formative ou sommative. L'approche qualitative et analytique utilisée dans cette recherche pourrait également servir de modèle générique pour développer le construit du processus d'évaluation de l'enseignement pratique pour d'autres professions. (Abstract shortened by UMI.) Education, Tests and Measurements.
230	A comparison of the sample invariance of item statistics from the classical test model, item response model, and structural equation model: A case study of real response data. Breithaupt, Krista J. January 2001 (has links) The sample dependency of item statistics and the practical importance of alternative models for test scores are evaluated in this case study using real data. The hypothesized superiority of the item response model (IRM) attributed to the sample invariance of item statistics is tested against a classical test theory (CTT) model and a structural equation model (SEM) for responses to the Center for Epidemiologic Studies-Depression (CES-D) scale. Sample-invariance of item statistics is tested in 10 random samples of 500 people, and across gender, age, and different health groups. Practical implications are considered in a comparison of differential score reliability, and individual rankings, using each test model. Item estimates from a 2-parameter logistic IRM were compared with classical item difficulty and discrimination estimates, and with item regression path estimates from a uni-factor SEM. An intraclass correlation coefficient (ICC) was calculated to evaluate the level of absolute agreement between item statistics in each condition. Item statistics from all test models were very similar across random samples, indicating a high level of invariance. However, IRM threshold parameters were least sensitive to sampling compared with other models. Greater variance was found among item statistics based on all test models across groups that differed in age, health, or gender. IRM discrimination estimates were most stable across contrasting groups, compared with those from other test models. Rankings assigned to individuals were most similar when CTT scores and linear transformed IRM scores were compared. The largest variation in individual rankings was obtained when SEM factor scores were compared with CTT scores in the higher score ranges. The reliability estimate for factor scores based on the SEM was highest overall. However, IRM optimal scores and the modified reliability estimate based on these, provide a more accurate estimate of average measurement error. This evidence supports the hypothesis of improved score precision when tests are constructed and scored using IRM techniques. However, rankings based on individual CES-D scores were very similar when the CTT, IRM, and SEM techniques were compared. Therefore, CTT or SEM scoring are reasonable alternatives to the IRM when norm-referenced score interpretations are based on CES-D scores. Education, Tests and Measurements.

Search results