1 |
Fluency Features and Elicited Imitation as Oral Proficiency MeasurementChristensen, Carl V. 07 July 2012 (has links) (PDF)
The objective and automatic grading of oral language tests has been the subject of significant research in recent years. Several obstacles lie in the way of achieving this goal. Recent work has suggested a testing technique called elicited imitation (EI) can be used to accurately approximate global oral proficiency. This testing methodology, however, does not incorporate some fundamental aspects of language such as fluency. Other work has suggested another testing technique, simulated speech (SS), as a supplement to EI that can provide automated fluency metrics. In this work, I investigate a combination of fluency features extracted for SS testing and EI test scores to more accurately predict oral language proficiency. I also investigate the role of EI as an oral language test, and the optimal method of extracting fluency features from SS sound files. Results demonstrate the ability of EI and SS to more effectively predict hand-scored SS test item scores. I finally discuss implications of this work for future automated oral testing scenarios.
|
2 |
Investigating Prompt Difficulty in an Automatically Scored Speaking Performance AssessmentCox, Troy L. 14 March 2013 (has links) (PDF)
Speaking assessments for second language learners have traditionally been expensive to administer because of the cost of rating the speech samples. To reduce the cost, many researchers are investigating the potential of using automatic speech recognition (ASR) as a means to score examinee responses to open-ended prompts. This study examined the potential of using ASR timing fluency features to predict speech ratings and the effect of prompt difficulty in that process. A speaking test with ten prompts representing five different intended difficulty levels was administered to 201 subjects. The speech samples obtained were then (a) rated by human raters holistically, (b) rated by human raters analytically at the item level, and (c) scored automatically using PRAAT to calculate ten different ASR timing fluency features. The ratings and scores of the speech samples were analyzed with Rasch measurement to evaluate the functionality of the scales and the separation reliability of the examinees, raters, and items. There were three ASR timed fluency features that best predicted human speaking ratings: speech rate, mean syllables per run, and number of silent pauses. However, only 31% of the score variance was predicted by these features. The significance in this finding is that those fluency features alone likely provide insufficient information to predict human rated speaking ability accurately. Furthermore, neither the item difficulties calculated by the ASR nor those rated analytically by the human raters aligned with the intended item difficulty levels. The misalignment of the human raters with the intended difficulties led to a further analysis that found that it was problematic for raters to use a holistic scale at the item level. However, modifying the holistic scale to a scale that examined if the response to the prompt was at-level resulted in a significant correlation (r = .98, p < .01) between the item difficulties calculated analytically by the human raters and the intended difficulties. This result supports the hypothesis that item prompts are important when it comes to obtaining quality speech samples. As test developers seek to use ASR to score speaking assessments, caution is warranted to ensure that score differences are due to examinee ability and not the prompt composition of the test.
|
Page generated in 0.1127 seconds