Global ETD Search

11	A Hierarchy of Grammatical Difficulty for Japanese EFL Learners: Multiple-Choice Items and Processability Theory Nishitani, Atsuko January 2012 (has links) This study investigated the difficulty order of 38 grammar structures obtained from an analysis of multiple-choice items using a Rasch analysis. The order was compared with the order predicted by processability theory and the order in which the structures appear in junior and senior high school textbooks in Japan. Because processability theory is based on natural speech data, a sentence repetition test was also conducted in order to compare the result with the order obtained from the multiple-choice tests and the order predicted by processability theory. The participants were 872 Japanese university students, whose TOEIC scores ranged from 200 to 875. The difficulty order of the 38 structures was displayed according to their Rasch difficulty estimates: The most difficult structure was subjunctive and the easiest one was present perfect with since in the sentence. The order was not in accord with the order predicted by processability theory, and the difficulty order derived from the sentence repetition test was not accounted for by processability theory either. In other words, the results suggest that processability theory only accounts for natural speech data, and not elicited data. Although the order derived from the repetition test differed from the order derived from the written tests, they correlated strongly when the repetition test used ungrammatical sentences. This study tentatively concluded that the students could have used their implicit knowledge when answering the written tests, but it is also possible that students used their explicit knowledge when correcting ungrammatical sentences in the repetition test. The difficulty order of grammatical structures derived from this study was not in accord with the order in which the structures appear in junior and senior high school textbooks in Japan. Their correlation was extremely low, which suggests that there is no empirical basis for textbook makers'/writers' policy regarding the ordering of grammar items. This study also demonstrated the difficulty of writing items testing the knowledge of the same grammar point that show similar Rasch difficulty estimates. Even though the vocabulary and the sentence positions were carefully controlled and the two items looked parallel to teachers, they often displayed very different difficulty estimates. A questionnaire was administered concerning such items, and the students' responses suggested that they seemed to look at the items differently than teachers and what they notice and how they interpret what they notice strongly influences item difficulty. Teachers or test-writers should be aware that it is difficult to write items that produce similar difficulty estimates and their own intuition or experience might not be the best guide for writing effective grammar test items. It is recommended to pilot test items to get statistical information about item functioning and qualitative data from students using a think-aloud protocol, interviews, or a questionnaire. / CITE/Language Arts Educational Tests & Measurements English as a Second Language A Rasch Analysis Grammar Tests Item Difficulty Multiple Choice Processability Theory Sentence Repetition
12	Untersuchung zur prädiktiven Validität von Konzentrationstests Schumann, Frank 12 September 2016 (has links) (PDF) In der hier vorliegenden Arbeit wurde die Validität von Aufmerksamkeits- und Konzentrationstests untersucht. Im Vordergrund stand dabei die Frage nach dem Einfluss verschiedener kritischer Variablen auf die prädiktive Validität in diesen Tests, insbesondere der Itemschwierigkeit und Itemhomogenität, der Testlänge bzw. des Testverlaufs, der Testdiversifikation und der Validität im Kontext einer echten Personalauslese. In insgesamt fünf Studien wurden die genannten Variablen systematisch variiert und auf ihre prädiktive Validität zur (retrograden und konkurrenten) Vorhersage von schulischen und akademischen Leistungen (Realschule, Abitur, Vordiplom/Bachelor) hin analysiert. Aufgrund der studentischen (d. h. relativ leistungshomogenen) Stichprobe bestand die Erwartung, dass die Korrelationen etwas unterschätzt werden. Da die Validität in dieser Arbeit jedoch „vergleichend“ für bestimmte Tests bzw. experimentelle Bedingungen bestimmt wurde, sollte dies keine Rolle spielen. In Studie 1 (N = 106) wurde zunächst untersucht, wie schwierig die Items in einem Rechenkonzentrationstest sein sollten, um gute Vorhersagen zu gewährleisten. Dazu wurden leichte und schwierigere Items vergleichend auf ihre Korrelation zum Kriterium hin untersucht. Im Ergebnis waren sowohl leichte als auch schwierigere Testvarianten ungefähr gleich prädiktiv. In Studie 2 (N = 103) wurde die Rolle der Testlänge untersucht, wobei die prädiktive Validität von Kurzversion und Langversion in einem Rechenkonzentrationstest vergleichend untersucht wurde. Im Ergebnis zeigte sich, dass die Kurzversion valider war als die Langversion und dass die Validität in der Langversion im Verlauf abnimmt. In Studie 3 (N = 388) stand der Aspekt der Testdiversifikation im Vordergrund, wobei untersucht wurde, ob Intelligenz besser mit einem einzelnen Matrizentest (Wiener Matrizen-Test, WMT) oder mit einer Testbatterie (Intelligenz-Struktur-Test, I-S-T 2000 R) erfasst werden sollte, um gute prädiktive Validität zu gewährleisten. Die Ergebnisse sprechen klar für den Matrizentest, welcher ungefähr gleich valide war wie die Testbatterie, aber dafür testökonomischer ist. In den Studien 4 (N = 105) und 5 (N =97) wurde die prädiktive Validität zur Vorhersage von Schulleistungen im Kontext einer realen Personalauswahlsituation untersucht. Während die großen Testbatterien, Wilde-Intelligenz-Test 2 (WIT-2) und Intelligenz-Struktur-Test 2000R (I-S-T 2000 R), nur mäßig gut vorhersagen konnten, war der Komplexe Konzentrationstest (KKT), insbesondere der KKT-Rechentest ein hervorragender Prädiktor für schulische und akademische Leistungen. Auf Basis dieser Befunde wurden schließlich Empfehlungen und Anwendungshilfen für den strategischen Einsatz von Testinstrumenten in der diagnostischen Berufspraxis ausgesprochen. Konzentrationstest Itemschwierigkeit Itemhomogenität Testlänge Testverlauf Testdiversifikation tests of attention and concentration item difficulty item homogeneity test length diversity of tests ddc:150 Leistungstest Aufmerksamkeitstest Validität
13	The development and evaluation of Africanised items for multicultural cognitive assessment Bekwa, Nomvuyo Nomfusi 01 1900 (has links) Nothing in life is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. Marie Curie Debates about how best to test people from different contexts and backgrounds continue to hold the spotlight of testing and assessment. In an effort to contribute to the debates, the purpose of the study was to develop and evaluate the viability and utility of nonverbal figural reasoning ability items that were developed based on inspirations from African cultural artefacts such as African material prints, art, decorations, beadwork, paintings, et cetera. The research was conducted in two phases, with phase 1 focused on the development of the new items, while phase 2 was used to evaluate the new items. The aims of the study were to develop items inspired by African art and cultural artefacts in order to measure general nonverbal figural reasoning ability; to evaluate the viability of the items in terms of their appropriateness in representing the African art and cultural artefacts, specifically to determine the face and content validity of the items from a cultural perspective; and to evaluate the utility of the items in terms of their psychometric properties. These elements were investigated using the exploratory sequential mixed method research design with quantitative embedded in phase 2. For sampling purposes, the sequential mixed method sampling design and non-probability sampling strategies were used, specifically the purposive and convenience sampling methods. The data collection methods that were used included interviews with a cultural expert and colour-blind person, open-ended questionnaires completed by school learners and test administration to a group of 946 participants undergoing a sponsored basic career-related training and guidance programme. Content analysis was used for the qualitative data while statistical analysis mainly based on the Rasch model was utilised for quantitative data. The results of phase 1 were positive and provided support for further development of the new items, and based on this feedback, 200 new items were developed. This final pool of items was then used for phase 2 – the evaluation of the new items. The v statistical analysis of the new items indicated acceptable psychometric properties of the general reasoning (“g” or fluid ability) construct. The item difficulty values (pvalues) for the new items were determined using classical test theory (CTT) analysis and ranged from 0.06 (most difficult item) to 0.91 (easiest item). Rasch analysis showed that the new items were unidimensional and that they were adequately targeted to the level of ability of the participants, although there were elements that would need to be improved. The reliability of the new items was determined using the Cronbach alpha reliability coefficient (α) and the person separation index (PSI), and both methods indicated similar indices of internal consistency (α = 0.97; PSI = 0.96). Gender-related differential item functioning (DIF) was investigated, and the majority of the new items did not indicate any significant differences between the gender groups. Construct validity was determined from the relationship between the new items and the Learning Potential Computerised Adaptive Test (LPCAT), which uses traditional item formats to measure fluid ability. The correlation results for the total score of the new items and the pre- and post-tests were 0.616 and 0.712 respectively. The new items were thus confirmed to be measuring fluid ability using nonverbal figural reasoning ability items. Overall, the results were satisfactory in indicating the viability and utility of the new items. The main limitation of the research was that because the sample was not representative of the South African population, there were limited for generalisation. This led to a further limitation, namely that it was not possible to conduct important analysis on DIF for various other subgroups. Further research has been recommended to build on this initiative. / Industrial and Organisational Psychology Multicultural cognitive assessment Fluid ability Item development African art and cultural artefacts Culture-fair assessment Item analysis Rasch analysis Realibility Validity Item difficulty Differential item functioning
14	Untersuchung zur prädiktiven Validität von Konzentrationstests: Ein chronometrischer Ansatz zur Überprüfung der Rolle von Itemschwierigkeit, Testlänge, und Testdiversifikation Schumann, Frank 06 June 2016 (has links) In der hier vorliegenden Arbeit wurde die Validität von Aufmerksamkeits- und Konzentrationstests untersucht. Im Vordergrund stand dabei die Frage nach dem Einfluss verschiedener kritischer Variablen auf die prädiktive Validität in diesen Tests, insbesondere der Itemschwierigkeit und Itemhomogenität, der Testlänge bzw. des Testverlaufs, der Testdiversifikation und der Validität im Kontext einer echten Personalauslese. In insgesamt fünf Studien wurden die genannten Variablen systematisch variiert und auf ihre prädiktive Validität zur (retrograden und konkurrenten) Vorhersage von schulischen und akademischen Leistungen (Realschule, Abitur, Vordiplom/Bachelor) hin analysiert. Aufgrund der studentischen (d. h. relativ leistungshomogenen) Stichprobe bestand die Erwartung, dass die Korrelationen etwas unterschätzt werden. Da die Validität in dieser Arbeit jedoch „vergleichend“ für bestimmte Tests bzw. experimentelle Bedingungen bestimmt wurde, sollte dies keine Rolle spielen. In Studie 1 (N = 106) wurde zunächst untersucht, wie schwierig die Items in einem Rechenkonzentrationstest sein sollten, um gute Vorhersagen zu gewährleisten. Dazu wurden leichte und schwierigere Items vergleichend auf ihre Korrelation zum Kriterium hin untersucht. Im Ergebnis waren sowohl leichte als auch schwierigere Testvarianten ungefähr gleich prädiktiv. In Studie 2 (N = 103) wurde die Rolle der Testlänge untersucht, wobei die prädiktive Validität von Kurzversion und Langversion in einem Rechenkonzentrationstest vergleichend untersucht wurde. Im Ergebnis zeigte sich, dass die Kurzversion valider war als die Langversion und dass die Validität in der Langversion im Verlauf abnimmt. In Studie 3 (N = 388) stand der Aspekt der Testdiversifikation im Vordergrund, wobei untersucht wurde, ob Intelligenz besser mit einem einzelnen Matrizentest (Wiener Matrizen-Test, WMT) oder mit einer Testbatterie (Intelligenz-Struktur-Test, I-S-T 2000 R) erfasst werden sollte, um gute prädiktive Validität zu gewährleisten. Die Ergebnisse sprechen klar für den Matrizentest, welcher ungefähr gleich valide war wie die Testbatterie, aber dafür testökonomischer ist. In den Studien 4 (N = 105) und 5 (N =97) wurde die prädiktive Validität zur Vorhersage von Schulleistungen im Kontext einer realen Personalauswahlsituation untersucht. Während die großen Testbatterien, Wilde-Intelligenz-Test 2 (WIT-2) und Intelligenz-Struktur-Test 2000R (I-S-T 2000 R), nur mäßig gut vorhersagen konnten, war der Komplexe Konzentrationstest (KKT), insbesondere der KKT-Rechentest ein hervorragender Prädiktor für schulische und akademische Leistungen. Auf Basis dieser Befunde wurden schließlich Empfehlungen und Anwendungshilfen für den strategischen Einsatz von Testinstrumenten in der diagnostischen Berufspraxis ausgesprochen.:1 Einführung und Ziele 2 Diagnostik von Konzentrationsfähigkeit 2.1 Historische Einordnung 2.2 Kognitive Modellierung 2.3 Psychometrische Modellierung 3 Prädiktive Validität von Konzentrationstests 3.1 Reliabilität, Konstruktvalidität, Kriterienvalidität 3.2 Konstruktions- und Validierungsstrategien 3.3 Ableitung der Fragestellung 4 Beschreibung der Fragebögen und Tests 5 Empirischer Teil 5.1 Studie 1 - Itemschwierigkeit 5.1.1 Methode 5.1.2 Ergebnisse 5.1.3 Diskussion 5.2 Studie 2 - Testverlängerung und Testverlauf 5.2.1 Methode 5.2.2 Ergebnisse 5.2.3 Diskussion 5.3 Studie 3 - Testdiversifikation 5.3.1 Methode 5.3.2 Ergebnisse 5.3.3 Diskussion 5.4 Studie 4 - Validität in realer Auswahlsituation (I-S-T 2000 R) 5.4.1 Methode 5.4.2 Ergebnisse 5.4.3 Diskussion 5.5 Studie 5 - Validität in realer Auswahlsituation (WIT-2) 5.5.1 Methode 5.5.2 Ergebnisse 5.5.3 Diskussion 6 Diskussion 128 6.1 Sind schwierige Tests besser als leichte Tests? 6.2 Sind lange Tests besser als kurze Tests? 6.3 Sind Testbatterien besser als Einzeltests? 6.4 Sind Tests auch unter „realen“ Bedingungen valide? 6.5 Validität unter realen Bedingungen - Generalisierung 7 Theoretische Implikationen 8 Praktische Konsequenzen 9 Literaturverzeichnis Anhang info:eu-repo/classification/ddc/150 ddc:150

Page generated in 0.0755 seconds