Global ETD Search

1	Assessing interactional competence : the case of school-based speaking assessment in Hong Kong Lam, Ming Kei January 2015 (has links) In recent decades, the field of assessing speaking has seen an increasing emphasis on ‘interaction’. In defining the construct of interactional competence (IC), both the theoretical formulation and empirical evidence suggest that the competence is coconstructed and context-specific. This poses a multitude of conundrums for language testing practitioners and researchers, one of which is the extent to which we can extrapolate candidates’ performance in the target non-testing context from their performance in a test. This thesis considers these questions in the case of the Group Interaction (GI) task in the School-based Assessment (SBA) for the Hong Kong Diploma of Secondary Education Examination (HKDSE). Validation studies on the SBA Group Interaction task to date have generated somewhat contradictory results as to whether the task elicits authentic oral language use. Moreover, studies to date have not compared students’ interactions under different task implementation conditions (such as the amount of preparation time), or have investigated in detail what exactly students do during preparation time and how that might impact on their subsequent assessed interaction. This study explores what kinds of interactional features constitute interactional competence; how IC is co-constructed in discourse, and what complexities there might be in assessing the competence through a group interaction task. It also investigates whether the SBA GI task elicits authentic oral language use, and how the task implementation condition of preparation time might influence the validity of the task. Video-recordings of the assessed group interactions were obtained from two schools, with students given extended preparation time in one school but not the other. The assessed group interactions are analyzed using a Conversation Analytic approach, supplemented by data from mock assessments and stimulated recall interviews with student-candidates and teacher-raters. This study contributes to the construct definition of interactional competence – its components and the specific ways they are performed in discourse. Drawing on findings about students’ overhearer-oriented talk, it also problematizes the assumption that a group interaction task is necessarily eliciting and assessing candidates’ competence for interacting in a peer group only. More specifically to the SBA GI task, this study has produced evidence that group interactions with and without extended preparation time are qualitatively different, and has identified some of the ways in which extended preparation time might compromise the task’s validity in assessing interactional competence.
2	Establishing the validity of reading-into-writing test tasks for the UK academic context Chan, Sathena Hiu Chong January 2013 (has links) The present study aimed to establish a test development and validation framework of reading-into-writing tests to improve the accountability of using the integrated task type to assess test takers' ability in Academic English. This study applied Weir's (2005) socio-cognitive framework to collect three components of test validity: context validity, cognitive validity and criterion-related validity of two common types of reading-into-writing test tasks (essay task with multiple verbal inputs and essay task with multiple verbal and non-verbal inputs). Through literature review and a series of pilot, a set of contextual and cognitive parameters that are useful to explicitly describe the features of the target academic writing tasks and the cognitive processes required to complete these tasks successfully was defined at the pilot phase of this study. A mixed-method approach was used in the main study to establish the context, cognitive and criterion-related validity of the reading-into-writing test tasks. First of all, for context validity, expert judgement and automated textual analysis were applied to examine the degree of correspondence of the contextual features (overall task setting and input text features) of the reading-into-writing test tasks to those of the target academic writing tasks. For cognitive validity, a cognitive process questionnaire was developed to assist participants to report the processes they employed on the two reading-into-writing test tasks and two real-life academic tasks. A total of 443 questionnaires from 219 participants were collected. The analysis of the cognitive validity included three stands: 1) the cognitive processes involved in real-life academic writing, 2) the extent to which these processes are elicited by the reading-into-writing test tasks, and 3) the underlying structure of the processes elicited by the reading-into-writing test tasks. A range of descriptive, inferential and factor analyses were performed on the questionnaire data. The participants' scores on these real-life academic and reading-into-writing test tasks were collected for correlational analyses to investigate the criterion-related validity of the test tasks. The findings of the study support the context, cognitive and criterion-related validity of the integrated reading-into-writing task type. In terms of context validity, the two reading-into-writing tasks largely resembled the overall task setting, the input text features and the linguistic complexity of the input texts of the real-life tasks in a number of important ways. Regarding cognitive validity, the results revealed 11 cognitive processes involved in 5 phases of real-life academic writing as well as the extent to which these processes were elicited by the test tasks. Both reading-into-writing test tasks were able to elicit from high-achieving and low-achieving participants most of these cognitive processes to a similar extent as the participants employed the processes on the real-life tasks. The medium-achieving participants tended to employ these processes more on the real-life tasks than on the test tasks. The results of explanatory factor analysis showed that both test tasks were largely able to elicit from the participants the same underlying cognitive processes as the real-life tasks did. Lastly, for criterion-related validity, the correlations between the two reading-into-writing test scores and academic performance reported in this study are apparently better than most previously reported figures in the literature. To the best of the researcher's knowledge, this study is the first study to validate two types of reading-into-writing test tasks in terms of three validity components. The results of the study proved with empirical evidence that reading-into-writing tests can successfully operationalise the appropriate contextual features of academic writing tasks and the cognitive processes required in real-life academic writing under test conditions, and the reading-into-writing test scores demonstrated a promising correlation to the target academic performance. The results have important implications for university admissions officers and other stakeholders; in particular they demonstrate that the integrated reading-into-writing task type is a valid option when considering language teaching and testing for academic purposes. The study also puts forward a test framework with explicit contextual and cognitive parameters for language teachers, test developers and future researchers who intend to develop valid reading-into-writing test tasks for assessing academic writing ability and to conduct validity studies in such integrated task type. 428.24
3	Investigating Prompt Difficulty in an Automatically Scored Speaking Performance Assessment Cox, Troy L. 14 March 2013 (has links) (PDF) Speaking assessments for second language learners have traditionally been expensive to administer because of the cost of rating the speech samples. To reduce the cost, many researchers are investigating the potential of using automatic speech recognition (ASR) as a means to score examinee responses to open-ended prompts. This study examined the potential of using ASR timing fluency features to predict speech ratings and the effect of prompt difficulty in that process. A speaking test with ten prompts representing five different intended difficulty levels was administered to 201 subjects. The speech samples obtained were then (a) rated by human raters holistically, (b) rated by human raters analytically at the item level, and (c) scored automatically using PRAAT to calculate ten different ASR timing fluency features. The ratings and scores of the speech samples were analyzed with Rasch measurement to evaluate the functionality of the scales and the separation reliability of the examinees, raters, and items. There were three ASR timed fluency features that best predicted human speaking ratings: speech rate, mean syllables per run, and number of silent pauses. However, only 31% of the score variance was predicted by these features. The significance in this finding is that those fluency features alone likely provide insufficient information to predict human rated speaking ability accurately. Furthermore, neither the item difficulties calculated by the ASR nor those rated analytically by the human raters aligned with the intended item difficulty levels. The misalignment of the human raters with the intended difficulties led to a further analysis that found that it was problematic for raters to use a holistic scale at the item level. However, modifying the holistic scale to a scale that examined if the response to the prompt was at-level resulted in a significant correlation (r = .98, p < .01) between the item difficulties calculated analytically by the human raters and the intended difficulties. This result supports the hypothesis that item prompts are important when it comes to obtaining quality speech samples. As test developers seek to use ASR to score speaking assessments, caution is warranted to ensure that score differences are due to examinee ability and not the prompt composition of the test. Automatic Speech Recognition second language oral proficiency language testing and assessment English as a second language tests speech signal processing Educational Psychology
4	DEVELOPMENT OF FLUENCY, COMPLEXITY, AND ACCURACY IN SECOND LANGUAGE ORAL PROFICIENCY: A LONGITUDINAL STUDY OF TWO INTERNATIONAL TEACHING ASSISTANTS IN THE U.S. Qiusi Zhang (16641342) 27 July 2023 (has links) <p>I collected two types of data throughout Weeks 1-14, with the original purpose of enhancing teaching and learning in ENGL620. The data included weekly assignment recordings and weekly surveys.</p><p>The primary data were students' speech data, which were collected through 14 weekly timed speaking assessments conducted from Week 1 to Week 14. These assignments were made available on Monday at midnight and were required to be completed and submitted by Sunday at midnight). The assignments were delivered, and responses were collected using Extempore (<a href="http://www.extemporeapp.com/" target="_blank">www.extemporeapp.com</a>), a website specifically designed to support oral English assessment and practice.</p><p>To conduct more comprehensive assessments of students’ performances, I incorporated two OEPT item types into the weekly assignments, including PROS and CONS (referred to as “PC”) and LINE GRAPH (referred to as “LG”). See Appendix B for the assignment items. The PC item presented challenging scenarios ITAs may encounter and required the test-takers to make a decision and discuss the pros and cons associated with the decision. An example item is “<i>Imagine you have a student who likes to come to your office hours but often talks about something irrelevant to the course. What would you do in this situation? What are the pros and cons associated with the decision?</i>”. The LG item asked students to describe a line graph illustrating two or three lines and provide possible reasons behind those trends. It can be argued that the two tasks targeted slightly different language abilities and background knowledge. The two item types were selected because they represented two key skills that the OEPT tests. The PC task focused on stating one’s decision and presenting an argument within a personal context, while the LG item assessed students’ ability to describe visual information and engage in discussions about broader topics such as gender equality, employment, economic growth, college policy. The PC and LG items are the most difficult items in the test (Yan et al., 2019). Therefore, progress in the two tasks can be a good indicator of improvement in the speaking skills required in this context. All the items were either taken from retired OEPT items or developed by the researcher following the specifications for OEPT item development. In particular, the design of the items aimed to avoid assuming prior specific knowledge and to ensure that students could discuss them without excessive cognitive load.</p><p>For each task, the students were allocated 2 minutes for preparation and a maximum of 2 minutes to deliver their response to the assigned topic. The responses were monologic, resembling short classroom presentations. During the preparation time, the participants were permitted to take notes. Each item only allowed for one attempt, which aimed to capture students’ online production of speech and their utilization of language resources. Table 2 presents the descriptive statistics of the responses.</p><p>The PC prompt was deliberately kept consistent for Week 2 and Week 12 randomly selected as time points at the beginning and end of the semester. This deliberate choice of using the same prompt at these two distinct stages serves multiple purposes. Firstly, it provides a valuable perspective for analyzing growth over time. This approach adds depth to the study results and conclusions by providing additional evidence and triangulation. Second, this approach addresses one of the specific challenges identified by Ortega and Iberr-Shea (2005) in studies involving multiple data collection points, as maintaining consistency in the prompt can minimize potential variations in task difficulty or topic-related factors.</p><p>After completing each speaking assignment, the students were requested to rate the level of difficulty for each item on a scale of 1 (Very Easy) to 5 (Very difficult). Additionally, they were asked to fill out a weekly survey using Qualtrics. The Qualtrics survey contained six questions related to the frequency of their English language use outside of the classroom and their focus on language skills in the previous and upcoming week. These questions were considered interesting as potential contributing factors to changes in their performances throughout the semester. Refer to Appendix C for the survey questions.</p> English as a second language Testing, assessment and psychometrics International Teaching Assistants Language testing and assessment Language development Complexity, Accuracy, Fluency (CAF) Dynamic Systems Theory (DST)

1

Page generated in 0.1463 seconds