Spelling suggestions: "subject:"anguage testing"" "subject:"1anguage testing""
41 |
Investigating the effects of Rater's Second Language Learning Background and Familiarity with Test-Taker's First Language on Speaking Test ScoresZhao, Ksenia 01 March 2017 (has links)
Prior studies suggest that raters' familiarity with test-takers' first language (L1) can be a potential source of bias in rating speaking tests. However, there is still no consensus between researchers on how and to what extent that familiarity affects the scores. This study investigates raters' performance and focuses on not only how raters' second language (L2) proficiency level interacts with examinees' L1, but also if raters' teaching experience has any effect on the scores. Speaking samples of 58 ESL learners with L1s of Spanish (n = 30) and three Asian languages (Korean, n = 12; Chinese, n = 8; and Japanese, n = 8) of different levels of proficiency were rated by 16 trained raters with varying levels of Spanish proficiency (Novice to Advanced) and different degrees of teaching experience (between one and over 10 semesters). The ratings were analyzed using Many-Facet Rasch Measurement (MFRM). The results suggest that extensive rater training can be quite effective: there was no significant effect of either raters' familiarity with examinees' L1, or raters' teaching experience on the scores. However, even after training, the raters still exhibited different degrees of leniency/severity. Therefore, the main conclusion of this study is that even trained raters may consistently rate differently. The recommendation is to (a) have further rater training and calibration; and/or (b) use MFRM with fair average to compensate for the variance.
|
42 |
Rethinking Vocabulary Size Tests: Frequency Versus Item DifficultyHashimoto, Brett James 01 June 2016 (has links)
For decades, vocabulary size tests have been built upon the idea that if a test-taker knows enough words at a given level of frequency based on a list from corpus, they will also know other words of that approximate frequency as well as all words that are more frequent. However, many vocabulary size tests are based on corpora that are as out-of-date as 70 years old and that may be ill-suited for these tests. Based on these potentially problematic areas, the following research questions were asked. First, to what degree would a vocabulary size test based on a large, contemporary corpus be reliable and valid? Second, would it be more reliable and valid than previously designed vocabulary size tests? Third, do words across, 1,000-word frequency bands vary in their item difficulty? In order to answer these research questions, 403 ESL learners took the Vocabulary of American English Size Test (VAST). This test was based on a words list generated from the Corpus of Contemporary American English (COCA). This thesis shows that COCA word list might be better suited for measuring vocabulary size than lists used in previous vocabulary size assessments. As a 450-million-word corpus, it far surpasses any corpus used in previously designed vocabulary size tests in terms of size, balance, and representativeness. The vocabulary size test built from the COCA list was both highly valid and highly reliable according to a Rasch-based analysis. Rasch person reliability and separation was calculated to be 0.96 and 4.62, respectively. However, the most significant finding of this thesis is that frequency ranking in a word list is actually not as good of a predictor of item difficulty in a vocabulary size assessment as perhaps researchers had previously assumed. A Pearson correlation between frequency ranking in the COCA list and item difficulty for 501 items taken from the first 5,000 most frequent words was 0.474 (r^2 = 0.225) meaning that frequency rank only accounted for 22.5% of the variability of item difficulty. The correlation decreased greatly when item difficulty was correlated against bands of 1,000 words to a weak r = 0.306, (r^2 = 0.094) meaning that 1,000-word bands of frequency only accounts for 9.4% of the variance. Because frequency is a not a highly accurate predictor of item difficulty, it is important to reconsider how vocabulary size tests are designed.
|
43 |
A comparison of two vocabulary tests used with normal and delayed preschool childrenSafadi, Lynn 01 January 1990 (has links)
The purpose of this study was to determine if a difference exists between mean standard scores of the Peabody Picture Vocabulary Test - Revised (PPVT-R) (Dunn and Dunn, 1981) and the Expressive One- Word Picture Vocabulary Test (EOWPVT) (Gardner, 1979) for children in several diagnostic categories. The subjects used in this study were 45 preschool children ranging in age from 36 to 47 months. These subjects were divided into groups of normal, expressively language-delayed (ELD) and normal children with a history of expressive language delay (HELD).
|
44 |
The assessment of phonological processes : a comparison of connected-speech samples and single-word production testsPinkerton, Susan A. 01 January 1990 (has links)
The purpose of this study was to determine if single-word elicitation procedures used in the assessment of phonological processes would have highly similar results to those obtained through connected speech. Connected speech sampling provides a medium for natural production with coarticulatory influence, but can be time-consuming and impractical for clinicians maintaining heavy caseloads or working with highly unintelligible children. Elicitation through single words requires less time than a connected-speech sample and may be more effective with highly unintelligible children because the context is known, but it lacks the influence of surrounding words. Given the inherent differences between these two methods of elicitation, knowledge of the relative effectiveness of single-word and connected-speech sampling may become an issue for clinicians operating under severe time constraints and requiring an efficient and effective means of assessing phonological processes.
|
45 |
A Comparison of Developmental Sentence Score Patterns in Three Groups of Preschool ChildrenRiback, Michelle Lynn 01 December 1992 (has links)
Researchers have successfully labeled specific patterns of expressive language development as it appears in children developing language normally. Little research has identified particular patterns of expressive language in children who display expressive language disorders or delays. Longitudinal studies of expressively language impaired children indicate that linguistic, educational and social impairments exist long after the language impairment was first identified (Aram, Eckelman and Nation, 1984; Aram and Nation, 1980; Fundudis, Kolvin and Garside, 1979; Stark, Berstein, Condino, Bender, Tallal and Catts, 1984). If patterns of delayed or disordered language development are researched and possibly labeled in the early stages of language development, strategies for assessment and intervention can be made more efficient and the effects of early language impairment on later academic achievement may be prevented. The present study was part of the Portland Language Project, a longitudinal study of early language delay. Lee's Developmental Sentence Scoring (DSS) was used to attempt to identify syntactic patterns used by children exhibiting early language delay. The DSS is a standardized measure for analyzing children's standard English expressive language abilities in the following eight grammatical categories: 1) indefinite pronouns; 2) personal pronouns; 3) main verbs; 4) secondary verbs; 5) negatives; 6) conjunctions; 7) interrogative reversal; and 8) Wh-Questions. Using the DSS, specific syntactical areas of deficit can be identified by analysis of an audiotaped speech sample. A comparison of expressive language in the eight subcategories in the DSS was completed among three groups of preschool children; 1) children developing language normally (the NL group); 2) children who did not meet criteria for normal language development at 20 months, but later fell within the normal range of language development as measured by the DSS (Lee, 1974). This is referred to as the history of expressive language delay group (HELD); and 3) children who did not meet criteria for normal language development at 20 months and again, did not meet criteria for normal language development as measured by the DSS (Lee, 1974) at later ages. This is referred to as the expressive language delay group (ELD). The purpose of this study was to determine if significant differences exist in each of the eight subcategory group scores from the DSS between those children identified as expressively language delayed and those identified as developing language normally at ages three and four. At age three, significant differences were found among the three groups in all eight subcategory scores of the DSS. By age four, the significant differences were found between the delayed group and the normal developing group in the main verb category and the personal pronoun category only. There were no significant differences between the normal developing and the history of delay groups on any of the eight categories at age four. The delayed group exhibited marked improvement and narrowed the deficits in expressive language to a specific area of language. The present study suggested that children with early language delay appear to "catch up" with normal peers in most areas of syntactic production by age four. The DSS (Lee, 1974) provides information about specific areas of syntactic development. Due to the length and complexity of the DSS, it is not a tool that practicing clinicians often use. A study such as this may help the practicing clinician quickly screen a preschool child in a specific syntactic category, such as verb marking, in order to check for possible early language delay. In addition to providing clinical assistance, this study has opened up the door for future research in syntactic development. This study could be expanded to examine the specific verb markers that are being used by the delayed subjects. This may lead to more efficient identification and remediation of early language delays.
|
46 |
Interactional competence in paired speaking tests: role of paired task and test-taker speaking ability in co-constructed discourseKley, Katharina 01 May 2015 (has links)
This dissertation centers on the under-researched construct of interactional competence, which refers to features of jointly constructed discourse. When applied to the testing of speaking skills in a second language, interactional competence refers to features of the discourse that the two students produce together; rather than the speaking ability or performance of each person individually. This dissertation describes the construct of interactional competence in a low-stakes, paired speaking test setting targeted at students in their second year of German instruction at the college level.
The purpose of this study is two-fold. First, the study analyzes the conversational resources that are co-constructed in the test discourse to maintain mutual understanding, which is considered the basis for interactional competence. Second, the study examines the impact of task (jigsaw task and discussion task) and speaking ability-level combination (same and different ability) in the test-taker pair on the co-constructed test discourse and thus on the deployment of the conversational resources to maintain intersubjectivity. In that respect, this study also seeks to analyze how the identified conversational resources are involved in establishing and negotiating language ability identities that are displayed in the test discourse.
Conversation analytic conventions were used to investigate the interactional resources that test takers deploy to maintain mutual understanding. The procedures of repair (self-repair in response to other-initiated repair, inter-turn delays, and misunderstandings as well as other-repair in conjunction with word search activities) that emerged from the inductive analysis of the test discourse have broadened the conceptualization of interactional competence in the context of paired speaking assessments.
Frequency distributions of the interactional resources were created to provide a better understanding of the impact of task and ability-level combination on the co-constructed repair procedures. The rationale behind this analysis is the general understanding of language testers that both resources and context influence test performance. The findings from the quantitative analysis suggest that there are more similarities than differences in repair use across the jigsaw task and the discussion task. In addition, even though some trends in the co-construction of repair procedures may be attributed to the higher or lower speaking ability of the test takers, the relationship between the ability-level combination in the pair and the use of repair seems to be rather variable.
Finally, to learn more about the interrelationship between test takers’ speaking ability and interactional competence, this dissertation also approached speaking ability in terms of test takers’ co-constructed language ability identities that are displayed in the test discourse. By means of single case analyses, the study provided a detailed picture of the relationship between language ability identities and the procedures of repair, both of which are co-constructed at the discourse level. The findings from the conversation analysis show that the speaker who provides the repair is usually able to position himself or herself as the more competent or proficient speaker in the test discourse.
|
47 |
An Argument-based Validity Inquiry into the Empirically-derived Descriptor-based Diagnostic (EDD) Assessment in ESL Academic WritingKim, Youn-Hee 13 August 2010 (has links)
This study built and supported arguments for the use of diagnostic assessment in English as a second language (ESL) academic writing. In the two-phase study, a new diagnostic assessment scheme, called the Empirically-derived Descriptor-based Diagnostic (EDD) checklist, was developed and validated for use in small-scale classroom assessment. The checklist assesses ESL academic writing ability using empirically-derived evaluation criteria and estimates skill parameters in a way that overcomes the problems associated with the number of items in diagnostic models. Interpretations of and uses for the EDD checklist were validated using five assumptions: (a) that the empirically-derived diagnostic descriptors that make up the EDD checklist are relevant to the construct of ESL academic writing; (b) that the scores derived from the EDD checklist are generalizable across different teachers and essay prompts; (c) that performance on the EDD checklist is related to performance on other measures of ESL academic writing; (d) that the EDD checklist provides a useful diagnostic skill profile for ESL academic writing; and (e) that the EDD checklist helps teachers make appropriate diagnostic decisions and has the potential to positively impact teaching and learning ESL academic writing.
Using a mixed-methods research design, four ESL writing experts created the EDD checklist from 35 descriptors of ESL academic writing. These descriptors had been elicited from nine ESL teachers’ think-aloud verbal protocols, in which they provided diagnostic feedback on ESL essays. Ten ESL teachers utilized the checklist to assess 480 ESL essays and were interviewed about its usefulness. Content reviews from ESL writing experts and statistical dimensionality analyses determined that the underlying structure of the EDD checklist consists of five distinct writing skills: content fulfillment, organizational effectiveness, grammatical knowledge, vocabulary use, and mechanics. The Reduced Reparameterized Unified Model (Hartz, Roussos, & Stout, 2002) then demonstrated the diagnostic quality of the checklist and produced fine-grained writing skill profiles for individual students. Overall teacher evaluation further justified the validity claims for the use of the checklist. The pedagogical implications of the use of diagnostic assessment in ESL academic writing were discussed, as were the contributions that it would make to the theory and practice of second language writing instruction and assessment.
|
48 |
An Argument-based Validity Inquiry into the Empirically-derived Descriptor-based Diagnostic (EDD) Assessment in ESL Academic WritingKim, Youn-Hee 13 August 2010 (has links)
This study built and supported arguments for the use of diagnostic assessment in English as a second language (ESL) academic writing. In the two-phase study, a new diagnostic assessment scheme, called the Empirically-derived Descriptor-based Diagnostic (EDD) checklist, was developed and validated for use in small-scale classroom assessment. The checklist assesses ESL academic writing ability using empirically-derived evaluation criteria and estimates skill parameters in a way that overcomes the problems associated with the number of items in diagnostic models. Interpretations of and uses for the EDD checklist were validated using five assumptions: (a) that the empirically-derived diagnostic descriptors that make up the EDD checklist are relevant to the construct of ESL academic writing; (b) that the scores derived from the EDD checklist are generalizable across different teachers and essay prompts; (c) that performance on the EDD checklist is related to performance on other measures of ESL academic writing; (d) that the EDD checklist provides a useful diagnostic skill profile for ESL academic writing; and (e) that the EDD checklist helps teachers make appropriate diagnostic decisions and has the potential to positively impact teaching and learning ESL academic writing.
Using a mixed-methods research design, four ESL writing experts created the EDD checklist from 35 descriptors of ESL academic writing. These descriptors had been elicited from nine ESL teachers’ think-aloud verbal protocols, in which they provided diagnostic feedback on ESL essays. Ten ESL teachers utilized the checklist to assess 480 ESL essays and were interviewed about its usefulness. Content reviews from ESL writing experts and statistical dimensionality analyses determined that the underlying structure of the EDD checklist consists of five distinct writing skills: content fulfillment, organizational effectiveness, grammatical knowledge, vocabulary use, and mechanics. The Reduced Reparameterized Unified Model (Hartz, Roussos, & Stout, 2002) then demonstrated the diagnostic quality of the checklist and produced fine-grained writing skill profiles for individual students. Overall teacher evaluation further justified the validity claims for the use of the checklist. The pedagogical implications of the use of diagnostic assessment in ESL academic writing were discussed, as were the contributions that it would make to the theory and practice of second language writing instruction and assessment.
|
49 |
The Effects Of Varied Text Structures And Response Formats On The Reading ComprehensionYilik, Mehmet Ali 01 December 2006 (has links) (PDF)
This research study examines the effects of varied text structures and response formats on Turkish university students&rsquo / reading comprehension test performance. More precisely, it deals with investigating the effects of awareness of rhetorical organization on reading comprehension and on testing of comprehension through different procedures.
First, a short review of the relevant research on the text structure and response formats and their effects on reading comprehension is presented. Then, the results of a reading experiment are given. In this experiment, four groups of upper-intermediate level EFL students (100 students) read two English passages written in &ldquo / description&rdquo / and &ldquo / cause-effect&rdquo / rhetorical organization formats. Then, their comprehension of the texts was tested through a cloze procedure and a multiple choice test. The tests were carried out on first year university students from different departments at the English Language Department of BaSkent University during the 2006-2007 Academic Year Fall Semester.
After the research period finished, the data collected throughout the research period were transferred into MS Excel and SPSS spreadsheets and analyzed using the statistical procedure of the paired samples t-test. Relying on the analysis of the data, the hypotheses formulated for the study were weighed against the results in order to see if they were confirmed or rejected.
The study showed significant difference between the varied text structures and subjects&rsquo / reading comprehension test performance. However, there was not any significant difference between the different response formats and the subjects&rsquo / reading comprehension test performance. Finally, the thesis ends with an interpretation and discussion of the results of the study.
|
50 |
Gaps-In-Noise and pitch pattern sequence tests: norms for Mandarin-speaking adolescentsChang, Man-si, Menzie., 張汶詩. January 2010 (has links)
published_or_final_version / Speech and Hearing Sciences / Master / Master of Science in Audiology
|
Page generated in 0.0711 seconds