21 |
Trendschätzung in Large-Scale Assessments bei differenziellem ItemfunktionierenSachse, Karoline A. 27 February 2020 (has links)
Differenzielles Itemfunktionieren bedeutet für die Trendschätzung durch Linking in querschnittlich angelegten Large-Scale Assessments eine Voraussetzungsverletzung. Solche Voraussetzungsverletzungen können sich negativ auf die Eigenschaften von Trendschätzern auswirken, woraus sich Einschränkungen für die Interpretierbarkeit der Trendschätzung ergeben können. Die vorliegende Arbeit umfasst, eingebettet in einen Rahmungstext, drei Einzelbeiträge, die sich mit der Prüfung der Auswirkung differenziellen Itemfunktionierens unterschiedlicher Provenienz auseinandersetzen.
Im ersten Einzelbeitrag wird die Interaktion von Linkingdesigns und Linkingmethoden mit zwischen Ländern und über die Zeit unsystematisch unterschiedlich funktionierenden Items untersucht. Dabei zeigte sich, dass die Wahl des Designs von großer Bedeutung sein kann, während der Performanzunterschied zwischen gängigen Linkingmethoden marginal war. Darüber hinaus führte der häufig praktizierte Ausschluss von differenziell funktionierenden Items zu einem Effizienzverlust.
Im zweiten Einzelbeitrag wird die Unsicherheit der Trendschätzung, die entsteht, wenn Items zwischen Ländern und über die Zeit unsystematisch unterschiedlich funktionieren, quantifiziert und in die Berechnung der zugehörigen Standardfehler integriert.
Im dritten Einzelbeitrag wird betrachtet, wie differenziellem Itemfunktionieren begegnet werden kann, das durch fehlende Werte und wechselnde Ausfallmechanismen zustande kommt. Wurden die fehlenden Werte inadäquat behandelt, verloren die Trendschätzer ihre Erwartungstreue und Konsistenz sowie an Effizienz.
In der Summe wird in der vorliegenden Arbeit identifiziert und hervorgehoben, dass es in den untersuchten Bedingungen je nach Art des differenziellen Itemfunktionierens effektive Möglichkeiten des Umgangs mit diesem gibt, die potenziellen Einschränkungen bei der validen Interpretation der Trendschätzung zumindest teilweise entgegenwirken können. / Differential item functioning signifies a violation of the prerequisites required for trend
estimation, which involves the linking of cross-sectional large-scale assessments. Such
violations can negatively affect the properties of the trend estimators. Hence, the interpretability of trend estimates will be limited under such circumstances. Embedded within an overarching framework, three individual contributions that examine and deal with the effects of differential item functioning from different origins are presented in the current dissertation.
The first article examines the interactions of linking designs and linking methods with
items that show unsystematic and differential functioning between countries and across
time. It showed that the choice of the design can be of great importance, whereas the
difference in performance between common linking methods was marginal. In addition,
the exclusion of differentially functioning items, an approach that is frequently used in
practice, led to a loss of efficiency.
In the second contribution, the uncertainty for the trend estimation resulting from
items that show unsystematic and differential functioning between countries and across
time is quantified and incorporated into the calculation of the trends' standard errors.
The third article focuses on differential item functioning that is induced by missing
values and nonresponse mechanisms that change over time. When the missing values were
treated inappropriately, the trend estimators lost their unbiasedness, their consistency, and
their efficiency.
In sum, this dissertation identifies and emphasizes the ideas that, depending on the
type of differential item functioning, there are effective ways to deal with it under the
investigated conditions, and these can at least partially counteract potential limitations
so that the trend estimates can still be interpreted validly.
|
22 |
AnÃlise psicomÃtrica dos itens de educaÃÃo fÃsica do Exame Nacional do Ensino MÃdio (ENEM) via teoria clÃssica dos testes / PSYCHOMETRIC ANALYSIS OF THE PHYSICAL EDUCATION ITEMS OF THE NATIONAL EXAM OF SECONDARY SCHOOL VIA CLASSICAL TESTS THEORYLeandro AraÃjo de Sousa 31 January 2017 (has links)
nÃo hà / Nos Ãltimos anos tem crescido a importÃncia das avaliaÃÃes em larga escala no contexto brasileiro, com destaque nesse cenÃrio o Exame Nacional do Ensino MÃdio (ENEM). Com sua reformulaÃÃo em 2009, competÃncias e habilidades da Ãrea de EducaÃÃo FÃsica tÃm sido inseridas na matriz de referÃncia desse exame. Nesse mesmo ano à alterado tambÃm o mÃtodo de anÃlise dos resultados, realizado a partir da Teoria ClÃssica dos Testes (TCT), passando a ser utilizada a Teoria de Resposta ao Item (TRI), sob justificativa de ser mais adequada por permitir a comparabilidade dos resultados. Com isso, esta pesquisa objetivou analisar os itens de EducaÃÃo FÃsica do ENEM dos anos de 2009 a 2014 a partir da TCT. Para tanto, utilizou-se os microdados do exame disponibilizados pelo Instituto Nacional de Estudos e Pesquisas Educacionais AnÃsio Teixeira (INEP). Foram analisados os seguintes parÃmetros mÃtricos: validade, fidedignidade, dificuldade e discriminaÃÃo. Utilizou-se como recurso o software SPSS, versÃo 20.0. Os itens apresentaram bons valores de correlaÃÃo e adequaÃÃo da amostra de itens. Apresentaram escores de comunalidade e cargas fatoriais inadequados para composiÃÃo da prova. A AnÃlise Fatorial ExploratÃria apresentou baixa explicaÃÃo da variÃncia considerando apenas um fator, mesmo a anÃlise grÃfica (scree plot) indicando a unidimensionalidade do teste. Os valores de fidedignidade da prova foram bons, nÃo havendo influÃncia dos itens de EducaÃÃo FÃsica. A dificuldade e discriminaÃÃo apresentaram valores aceitÃveis em quase todos os anos. No entanto, em 2014 a prova nÃo apresentou unidimensionalidade, considerando a variÃncia explicada, bem como na anÃlise grÃfica. Neste ano, os itens apresentaram alta dificuldade e baixa discriminaÃÃo. Dessa forma, conclui-se que as provas de Linguagens e CÃdigos do ENEM apresentaram dificuldades de comprovaÃÃo da unidimensionalidade, embora, tenha apresentado boa precisÃo, com exceÃÃo de 2014 e alguns itens de EducaÃÃo FÃsica do exame nÃo apresentaram parÃmetros adequados. Tais fatores podem comprometer a validade da medida e consequentemente dos resultados desse exame. / In recent years the importance of large-scale evaluations in the Brazilian context has grown, with emphasis in this scenario on the National High School Examination (ENEM). With its reformulation in 2009, the skills and abilities of the Physical Education have been inserted in the reference matrix of this exam. In that same year, the method of analysis of the results, based on the Classical Tests Theory (CTT), was also changed, using the Item Response Theory (IRT), under justification of being more adequate to allow the comparability of the Results. With this, this research aimed to analyze the Physical Education items of ENEM from the years 2009 to 2014 from the CTT. For that, we used the microdata of the exam provided by the National Institute of Studies and Educational Research AnÃsio Teixeira (INEP). The following metric parameters were analyzed: validity, reliability, difficulty and discrimination. SPSS software version 20.0 was used as a resource. The items presented good correlation values and adequacy of the item sample. They presented scores of commonality and factorial loads inadequate for the composition of the test. The Exploratory Factor Analysis presented low explanation of the variance considering only one factor, even the scree plot indicating that the test is unidimensionality. The reliability values of the test were good, with no influence of physical education items. The difficulty and discrimination presented values acceptable in almost every year. However, in 2014 the test did not present unidimensionality, considering the explained variance, as well as in the graphic analysis. This year, the items presented high difficulty and low discrimination. Thus, it is concluded that the Language and Codes tests of the ENEM presented difficulties in proving the unidimensionality, although it presented good accuracy, with the exception of 2014 and some Physical Education items of the exam did not present adequate parameters. Such factors may compromise the validity of the measure and consequently the results of such examination.
|
23 |
Setting Accommodation and Item DifficultyLin, Pei-Ying 31 August 2012 (has links)
This study used multilevel measurement modeling to examine the differential difficulties of math and reading items for Grade 6 students participating in Ontario’s provincial assessment in 2005-2006, in relation to whether they received a setting accommodation, had a learning disability (LD), and spoke a language in addition to English. Both differences in difficulty between groups of students for all items (impact) and for individual items (differential item functioning) were examined.
Students’ language backgrounds (whether they spoke a language in addition to English) were not significantly related to item difficulty. Compared to non-accommodated students with LD, math and reading items were relatively difficult for accommodated students with LD. Moreover, the difference in overall impact on math items was larger than on reading items for accommodated and non-accommodated students with LD. Overall, students without LD and who did not receive a setting accommodation outperformed students with LD and/or who received a setting accommodation as well as accommodated students without LD.
It is important to note that, because this was an operational test administration, students were assigned to receive accommodations by their schools based on their individual needs. It is, therefore, not possible to separate the effect of the setting accommodation on item difficulty from the effects of other differences between the accommodated and non-accommodated groups. The differences in math and reading item difficulties between accommodated and non-accommodated students with LD may be due in part to factors such as comorbidity of LD and attention deficit hyperactivity disorder (ADHD) or a possible mismatch between the setting accommodation and the areas of disabilities. Moreover, the results of the present study support the underarousal/optimal stimulation hypothesis instead of the premise of the inhibitory control and attention for the use of setting accommodation.
After controlling for the impact across all items of setting accommodation and LD, several math and reading items were found to exhibit differential item functioning (DIF). The possible sources of DIF were (1) math items that were not adherent to specific item-writing rules and (2) reading items targeting different types of comprehension.
This study also found that the linguistic features of math items (total words, total sentences, average word length, monosyllabic words for math) and reading items (word frequency, average sentence length, and average words per sentence for reading) were associated with math and reading item difficulties for students with different characteristics. The total sentences and average word length in a math item as well as total words in a reading item significantly predicted the achievement gap between groups. Therefore, the linguistic features should be taken into account when assessments are developed and validated for examinees with varied characteristics.
|
24 |
Setting Accommodation and Item DifficultyLin, Pei-Ying 31 August 2012 (has links)
This study used multilevel measurement modeling to examine the differential difficulties of math and reading items for Grade 6 students participating in Ontario’s provincial assessment in 2005-2006, in relation to whether they received a setting accommodation, had a learning disability (LD), and spoke a language in addition to English. Both differences in difficulty between groups of students for all items (impact) and for individual items (differential item functioning) were examined.
Students’ language backgrounds (whether they spoke a language in addition to English) were not significantly related to item difficulty. Compared to non-accommodated students with LD, math and reading items were relatively difficult for accommodated students with LD. Moreover, the difference in overall impact on math items was larger than on reading items for accommodated and non-accommodated students with LD. Overall, students without LD and who did not receive a setting accommodation outperformed students with LD and/or who received a setting accommodation as well as accommodated students without LD.
It is important to note that, because this was an operational test administration, students were assigned to receive accommodations by their schools based on their individual needs. It is, therefore, not possible to separate the effect of the setting accommodation on item difficulty from the effects of other differences between the accommodated and non-accommodated groups. The differences in math and reading item difficulties between accommodated and non-accommodated students with LD may be due in part to factors such as comorbidity of LD and attention deficit hyperactivity disorder (ADHD) or a possible mismatch between the setting accommodation and the areas of disabilities. Moreover, the results of the present study support the underarousal/optimal stimulation hypothesis instead of the premise of the inhibitory control and attention for the use of setting accommodation.
After controlling for the impact across all items of setting accommodation and LD, several math and reading items were found to exhibit differential item functioning (DIF). The possible sources of DIF were (1) math items that were not adherent to specific item-writing rules and (2) reading items targeting different types of comprehension.
This study also found that the linguistic features of math items (total words, total sentences, average word length, monosyllabic words for math) and reading items (word frequency, average sentence length, and average words per sentence for reading) were associated with math and reading item difficulties for students with different characteristics. The total sentences and average word length in a math item as well as total words in a reading item significantly predicted the achievement gap between groups. Therefore, the linguistic features should be taken into account when assessments are developed and validated for examinees with varied characteristics.
|
25 |
Understanding Reading Comprehension Performance in High School StudentsKWIATKOWSKA-WHITE, BOZENA 28 August 2012 (has links)
The ability to extract meaning from text is an important skill. Yet many students struggle with effectively comprehending what they read. In comparison with research carried out with younger students, there is a lack of research in the reading comprehension of adolescents (Grades 4 – 12). The goal of this dissertation was to increase our understanding of the factors that underlie the poor reading comprehension abilities of this older group of students. This dissertation includes two studies drawn from a sample of 137 age 15 year old high school students. Study One utilized archival data from government mandated tests of reading achievement of 78 students administered in Grades 3, 6, and 10, and results from a commercially available test of reading comprehension administered in Grade 10. This longitudinal study examined the prevalence of the stability, cumulative growth, and compensatory models in reading comprehension development. Probabilities of later-grade reading achievement categorization conditioned on earlier-grade reading achievement were computed, the prevalence of developmental paths was estimated, and tests of regression to the mean were conducted. Overall findings suggest considerable stability across time.
Study Two examined the specificity of the comprehension weaknesses of 15 year old readers whose comprehension skills are below those expected from their skill in word reading and nonverbal ability (unexpected poor comprehenders). Regression analyses identified unexpected poor comprehenders, and two contrast groups (expected average and unexpected good comprehenders). Characteristics of unexpected poor comprehenders are examined after controlling for word-reading accuracy, phonological decoding, reading rate, nonverbal ability, and vocabulary. Findings indicate a critical disadvantage of unexpected poor comprehenders lies in their weakness in vocabulary and that comprehension difficulties related to the identification of details and main ideas in summary writing remain when vocabulary is controlled. Implications for interpreting previous and informing future research are discussed.
Results of both studies are discussed with respect to the nature of the reading comprehension construct, identification and remediation of reading comprehension difficulties, and the assessment of reading comprehension. / Thesis (Ph.D, Education) -- Queen's University, 2012-08-28 13:32:25.641
|
26 |
Examining organizational learning conditions and student outcomes using the Programme of International Student Assessment (PISA): A Canada and Saskatchewan school context2015 January 1900 (has links)
The purpose was to investigate the relationship between Canadian and Saskatchewan PISA 2009 reading performance and organizational learning (OL) conditions as perceived by students and principals when selected student and school characteristics were taken into consideration. Gender, Aboriginal status, and socioeconomic status were the student characteristics that were considered. School size, urban versus rural school community, proportion of students self-identified as Aboriginal, and school average socioeconomic status were school characteristics taken into consideration.
A nationally represented sample of 978 schools and 23,207 15-year-old students across the ten Canadian provinces participated in the PISA 2009. Within this sample, 1,997 students and 99 schools were from Saskatchewan.
Principal components analyses were conducted to produce components for the calculation of two composite (OL) indices: a Student OL Index based on the Canada and OECD PISA student questionnaires and a School OL Index based on OECD PISA school questionnaire. Subsequently, two hierarchal linear modelling analyses were employed to examine the association of student-level OL index and school-level OL index with reading performance. Across Canadian and Saskatchewan schools, students’ perspective of OL conditions was positively associated with reading performance in the presence of the selected student and school characteristics. Except for one school-level OL component (i.e., principal’s perspective of school culture/environment) in the Canadian model, school-level OL conditions were not significantly associated to reading performance in the presence of student and school characteristics.
With the adjustment of student and contextual characteristics incorporated in the modelling, the average reading performance was comparable across Canadian and Saskatchewan schools, 528 and 523 respectively. Variance decomposition of final models indicated that 55% of the Canadian school-level variance in reading achievement and 68% of the Saskatchewan school-level variance were explained by the selected student and school characteristics along with student perspective of OL conditions.
The findings from this study supported the hypothesis that OL conditions are associated with student achievement. Additionally, it was noted that the effect of OL conditions was of similar magnitude to that of the socioeconomic status effect. Furthermore, the findings from this study further emphasized the importance of the student voice within the school OL framework.
|
27 |
Relationships between Missing Response and Skill Mastery Profiles of Cognitive Diagnostic AssessmentZhang, Jingshun 13 August 2013 (has links)
This study explores the relationship between students’ missing responses on a large-scale assessment and their cognitive skill profiles and characteristics. Data from the 48 multiple-choice items on the 2006 Ontario Secondary School Literacy Test (OSSLT), a high school graduation requirement, were analyzed using the item response theory (IRT) three-parameter logistic model and the Reduced Reparameterized Unified Model, a Cognitive Diagnostic Model. Missing responses were analyzed by item and by student. Item-level analyses examined the relationships among item difficulty, item order, literacy skills targeted by the item, the cognitive skills required by the item, the percent of students not answering the item, and other features of the item. Student-level analyses examined the relationships among students’ missing responses, overall performance, cognitive skill mastery profiles, and characteristics such as gender and home language.
Most students answered most items: no item was answered by fewer than 98.8% of the students and 95.5% of students had 0 missing responses, 3.2% had 1 missing response, and only 1.3% had more than 1 missing responses). However, whether students responded to items was related to the student’s characteristics, including gender, whether the student had an individual education plan and language spoken at home, and to the item’s characteristics such as item difficulty and the cognitive skills required to answer the item.
Unlike in previous studies of large-scale assessments, the missing response rates were not higher for multiple-choice items appearing later in the timed sections. Instead, the first two items in some sections had higher missing response rates. Examination of the student-level missing response rates, however, showed that when students had high numbers of missing responses, these often represented failures to complete a section of the test. Also, if nonresponse was concentrated in items that required particular skills, the accuracy of the estimates for those skills was lower than for other skills.
The results of this study have implications for test designers who seek to improve provincial large-scale assessments, and for teachers who seek to help students improve their cognitive skills and develop test taking strategies.
|
28 |
Cut Once, Measure Everywhere: The Stability of Percentage of Students Above a Cut ScoreHollingshead, Lynne Marguerite 26 July 2010 (has links)
Large-scale assessment results for schools, school boards/districts, and entire provinces or states are commonly reported as the percentage of students achieving a standard – that is, the percentage of students scoring above the cut score that defines the standard on the assessment scale. Recent research has shown that this method of reporting is sensitive to small changes in the cut score, especially when comparing results across years or between groups. This study extends that work by investigating the effects of reporting group size on the stability of results. For each of ten group sizes, 1000 samples with replacement were drawn from the May 2009 Ontario Grade 6 Assessment of Reading, Writing and Mathematics. The results showed that for small group sizes – analogous to small schools – there is little confidence and that extreme caution must be taken when interpreting differences observed between years or groups.
|
29 |
Relationships between Missing Response and Skill Mastery Profiles of Cognitive Diagnostic AssessmentZhang, Jingshun 13 August 2013 (has links)
This study explores the relationship between students’ missing responses on a large-scale assessment and their cognitive skill profiles and characteristics. Data from the 48 multiple-choice items on the 2006 Ontario Secondary School Literacy Test (OSSLT), a high school graduation requirement, were analyzed using the item response theory (IRT) three-parameter logistic model and the Reduced Reparameterized Unified Model, a Cognitive Diagnostic Model. Missing responses were analyzed by item and by student. Item-level analyses examined the relationships among item difficulty, item order, literacy skills targeted by the item, the cognitive skills required by the item, the percent of students not answering the item, and other features of the item. Student-level analyses examined the relationships among students’ missing responses, overall performance, cognitive skill mastery profiles, and characteristics such as gender and home language.
Most students answered most items: no item was answered by fewer than 98.8% of the students and 95.5% of students had 0 missing responses, 3.2% had 1 missing response, and only 1.3% had more than 1 missing responses). However, whether students responded to items was related to the student’s characteristics, including gender, whether the student had an individual education plan and language spoken at home, and to the item’s characteristics such as item difficulty and the cognitive skills required to answer the item.
Unlike in previous studies of large-scale assessments, the missing response rates were not higher for multiple-choice items appearing later in the timed sections. Instead, the first two items in some sections had higher missing response rates. Examination of the student-level missing response rates, however, showed that when students had high numbers of missing responses, these often represented failures to complete a section of the test. Also, if nonresponse was concentrated in items that required particular skills, the accuracy of the estimates for those skills was lower than for other skills.
The results of this study have implications for test designers who seek to improve provincial large-scale assessments, and for teachers who seek to help students improve their cognitive skills and develop test taking strategies.
|
30 |
Cut Once, Measure Everywhere: The Stability of Percentage of Students Above a Cut ScoreHollingshead, Lynne Marguerite 26 July 2010 (has links)
Large-scale assessment results for schools, school boards/districts, and entire provinces or states are commonly reported as the percentage of students achieving a standard – that is, the percentage of students scoring above the cut score that defines the standard on the assessment scale. Recent research has shown that this method of reporting is sensitive to small changes in the cut score, especially when comparing results across years or between groups. This study extends that work by investigating the effects of reporting group size on the stability of results. For each of ten group sizes, 1000 samples with replacement were drawn from the May 2009 Ontario Grade 6 Assessment of Reading, Writing and Mathematics. The results showed that for small group sizes – analogous to small schools – there is little confidence and that extreme caution must be taken when interpreting differences observed between years or groups.
|
Page generated in 0.0745 seconds