Return to search

Scoring L2 Chinese speaking performance: linking scores to candidate performance. / Scoring Chinese as second language speaking performance / 漢語作為第二語言的口語表現評分研究: 連接分數與考生表現 / Han yu zuo wei di er yu yan de kou yu biao xian ping fen yan jiu: lian jie fen shu yu kao sheng biao xian

本博士論文主旨為在漢語作為第二語言(以下簡稱“漢語“)的口語表現評分中連接分數與考生表現。為此,本文設計了三個相互關聯的研究,即:研究一、研究二以及研究三。研究一採用傳統評分法獲取傳統分數,研究二設計自信評分法獲得自信分數,研究三則將傳統評分法與自信評分法進行比較。 / 研究一檢驗了傳統分數與考生表現之間的關係。基於代表漢語口語構念四大維度的七個特徵,研究一分析了66個考生在一次漢語口語測試中的表現(每個考生9分鐘),並採用了相關分析和標準多元回歸作進一步分析。結果表明:第一,七個特徵中的每一個特徵都與傳統分數顯著相關,呈現高等或中等效應量。第二,79%和77%的分數差異可以被兩個回歸分析中相應的特徵所解釋。因此,研究一提供了連接分數與考生表現的實證證據,進而驗證了漢語口語能力評估中的效度問題。 / 然而,研究一採用傳統評分法連接分數與考生表現時,還發現了兩個問題:相鄰等級界限模糊和維度之間的交迭。為解決這兩個問題,研究二提出了一種新的評分方法:自信評分法。然而,自信評分法卻在兩個相鄰等級和三個維度上產生了眾多原始自信分數。這些原始自信分數須轉換成一個準確的自信分數便於分數解釋和使用。為此,研究二採用了隸屬度函數和推理規則,並設計了自信評分演算法。隨後,研究二對自信評分進行了試驗,並對傳統評分法和自信評分法進行了初步比較。分數可靠性和關鍵資訊點相關係數結果表明,與傳統評分法相比,自信評分法具有優越性。 / 研究三採用混合研究法對傳統評分法和自信評分法進行比較,便於我們更為全面的理解這兩種方法的異同。量化分數資料包括五名評分員的傳統分數和自信分數。質化訪談資料則包括這五名評分員對傳統評分法和自信評分法評分過程的看法。量化分數資料分析表明,與傳統評分法相比,自信評分法能在分數和考生表現之間建立更加緊密的連接。質化訪談資料分析發現,自信評分法基於傳統評分法,並從傳統評分法發展而來。更為重要的是,自信評分法不但包含了傳統評分法,而且還承認並利用了評分員的評分自信對口語表現進行評分。 / 本博士論文旨在驗證漢語口語能力評估的效度問題,主要貢獻在於對連接分數與考生表現這一議題上,提供了概念解釋、實證證據以及方法論創新。 / The major objective of the thesis is to link scores to candidate performance in scoring speaking performance for Chinese as a second language (L2 Chinese). To this end, the thesis has been designed comprising three coordinated studies, namely, Study One, Study Two and Study Three. Study One employs Traditional Scoring for obtaining traditional scores, Study Two develops Confidence Scoring for producing confidence scores and Study Three compares Traditional Scoring and Confidence Scoring. / In Study One, the relationship between traditional scores and candidate performance was examined. Seven features--representing four major categories in the L2 Chinese speaking construct--were employed. Speech samples of 66 candidates on an L2 Chinese speaking test (i.e., 9 minutes’ speech length for each) were analyzed in terms of the seven features, with correlations and standard multiple regression being employed. Results indicated that, first, each of the seven features was significantly correlated to the traditional scores, producing large or medium effect sizes; second, 79% and 77% of the variance in the scores could be explained by the features involved in two regression analyses respectively. Study One therefore provides empirical evidence for linking scores to candidate performance to validate the assessment of L2 Chinese speaking proficiency. / Two problems were, however, identified in Study One when linking scores to candidate performance in using Traditional Scoring--indistinction between adjacent levels and overlap between scales. To address these two problems, Study Two therefore proposed a new approach, Confidence Scoring, leading to raw confidence scores between two adjacent levels applied to three rating scales. Since raw confidence scores had to be transformed to an exact confidence score for score interpretation and use, membership functions and rule bases were applied and a Confidence Scoring Algorithm was developed. A pilot study was subsequently conducted in Study Two to try out Confidence Scoring. An initial comparison was also made in the pilot study between Traditional Scoring and Confidence Scoring. Results of scoring dependabilities and correlations with Key Message Points (KMPs) indicated that Confidence Scoring outperformed Traditional Scoring. / In Study Three, a mixed methods study was conducted to provide a more comprehensive picture in comparing Traditional Scoring and Confidence Scoring. Quantitative score data comprised traditional scores and confidence scores from five raters. Qualitative interview data encompassed five raters’ perceptions on the scoring process using Traditional Scoring and Confidence Scoring. The analysis of quantitative score data indicated that, in relation to Traditional Scoring, a closer link between scores and candidate performance was established through Confidence Scoring. The investigation of qualitative interview data found that Confidence Scoring was based on and developed from Traditional Scoring. More importantly, while Confidence Scoring embraced Traditional Scoring, it provided a more flexible way of acknowledging and incorporating raters’ confidence in scoring speaking performance. / The contribution of the thesis therefore rests on conceptual understanding, empirical evidence as well as methodological innovationin linking scores to candidate performance--in the context of validating the assessment of L2 Chinese speaking proficiency. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Jin, Tan. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 161-175). / Abstract also in Chinese; appendixes includes Chinese. / ABSTRACT --- p.I / 摘要 --- p.III / ACKNOWLEDGEMENTS --- p.IV / FIGURES --- p.IX / TABLES --- p.X / ABBREVIATIONS --- p.XI / PUBLICATIONS --- p.XII / Journal articles (podcast) --- p.XII / Conference proceedings --- p.XII / Chapter 1 --- INTRODUCTION --- p.1 / Chapter 1.1 --- Context of the research --- p.1 / Chapter 1.2 --- Research questions --- p.4 / Chapter 1.2.1 --- Research Question 1 (Study One) --- p.5 / Chapter 1.2.2 --- Research Question 2 (Study Two) --- p.5 / Chapter 1.2.3 --- Research Question 3 (Study Three) --- p.5 / Chapter 1.3 --- Research design overview --- p.5 / Chapter 1.3.1 --- Study One: Traditional Scoring --- p.6 / Chapter 1.3.2 --- Study Two: Confidence Scoring --- p.6 / Chapter 1.3.3 --- Study Three: Traditional Scoring and Confidence Scoring --- p.8 / Chapter 1.4 --- Potential Contribution --- p.10 / Chapter 2 --- REVIEW OF THE LITERATURE --- p.12 / Chapter 2.1 --- Early Development: Linking scores to expert experience --- p.16 / Chapter 2.1.1 --- Expert experience: The “native speaker“ benchmark --- p.16 / Chapter 2.1.2 --- Practice perspective: (I)ELTS (1986 & 1989) --- p.17 / Chapter 2.2 --- Major Contribution: Linking scores to rater perception --- p.19 / Chapter 2.2.1 --- Teacher/Rater interpretation: “scaling descriptors“ --- p.20 / Chapter 2.2.2 --- Rater judgment: “binary comparisons“ --- p.21 / Chapter 2.2.3 --- Practice perspective: IELTS revision (1998-2001) --- p.23 / Chapter 2.3 --- Work in Progress: Linking scores to candidate performance --- p.26 / Chapter 2.3.1 --- Identifying features from rater perception --- p.28 / Chapter 2.3.2 --- Identifying features from documents/rating scales --- p.30 / Chapter 2.3.3 --- Practice perspective: TOEFL iBT and IELTS (operational) --- p.31 / Chapter 2.4 --- The L2 Chinese context and identifying L2 Chinese features --- p.35 / Chapter 2.4.1 --- Pronunciation --- p.38 / Chapter 2.4.2 --- Fluency --- p.39 / Chapter 2.4.3 --- Vocabulary --- p.40 / Chapter 2.4.4 --- Grammar --- p.40 / Chapter 2.5 --- Traditional Scoring and problems of “indistinction“ and “overlap“ --- p.41 / Chapter 2.6 --- Summary --- p.46 / Chapter 3 --- STUDY ONE: TRADITIONAL SCORING --- p.48 / Chapter 3.1 --- Introduction --- p.48 / Chapter 3.1.1 --- Traditional Scoring --- p.48 / Chapter 3.1.2 --- Research Question 1 --- p.49 / Chapter 3.2 --- Method --- p.50 / Chapter 3.2.1 --- Instrument: An L2 Chinese speaking test --- p.50 / Chapter 3.2.2 --- Participants --- p.53 / Chapter 3.2.3 --- Coding --- p.54 / Chapter 3.2.4 --- Statistical analysis --- p.56 / Chapter 3.3 --- Results --- p.58 / Chapter 3.3.1 --- Correlations --- p.58 / Chapter 3.3.2 --- Standard multiple regression --- p.59 / Chapter 3.4 --- Discussion --- p.61 / Chapter 3.5 --- Summary --- p.74 / Chapter 4 --- STUDY TWO: CONFIDENCE SCORING --- p.76 / Chapter 4.1 --- Introduction --- p.76 / Chapter 4.1.1 --- Confidence Scoring --- p.77 / Chapter 4.1.2 --- Research Question 2 --- p.80 / Chapter 4.2 --- Confidence Scoring design --- p.82 / Chapter 4.2.1 --- Raw confidence scores of adjacent levels --- p.82 / Chapter 4.2.2 --- Raw confidence scores from different scales --- p.88 / Chapter 4.2.3 --- Raw confidence scores to a confidence score --- p.90 / Chapter 4.2.4 --- Score interpretation and use --- p.97 / Chapter 4.3 --- Pilot study --- p.98 / Chapter 4.3.1 --- Candidates and instruments --- p.98 / Chapter 4.3.2 --- Coding system --- p.99 / Chapter 4.3.3 --- Confidence scores and traditional scores --- p.100 / Chapter 4.4 --- Discussion --- p.104 / Chapter 4.5 --- Summary --- p.106 / Chapter 5 --- STUDY THREE: TRADITIONAL SCORING AND CONFIDENCE SCORING --- p.108 / Chapter 5.1 --- Introduction --- p.108 / Chapter 5.1.1 --- Mixed methods: The convergent parallel design --- p.109 / Chapter 5.1.2 --- Research Question 3 --- p.110 / Chapter 5.2 --- Method --- p.111 / Chapter 5.2.1 --- Quantitative score data --- p.111 / Chapter 5.2.2 --- Qualitative interview data --- p.112 / Chapter 5.3 --- Analysis --- p.114 / Chapter 5.3.1 --- Quantitative data analysis --- p.114 / Chapter 5.3.2 --- Qualitative data analysis --- p.121 / Chapter 5.4 --- Results and Findings --- p.123 / Chapter 5.4.1 --- Quantitative results --- p.123 / Chapter 5.4.2 --- Qualitative findings --- p.126 / Chapter 5.5 --- Discussion --- p.141 / Chapter 5.6 --- Summary --- p.145 / Chapter 6 --- GENERAL DISCUSSION AND CONCLUSION --- p.147 / Chapter 6.1 --- Study One: Traditional Scoring --- p.147 / Chapter 6.1.1 --- Constructing rating scales based on candidate performance --- p.148 / Chapter 6.1.2 --- Establishing a potential alignment of L2 speaking tests --- p.148 / Chapter 6.2 --- Study Two: Confidence Scoring --- p.151 / Chapter 6.2.1 --- Applying Confidence Scoring in other educational contexts --- p.151 / Chapter 6.2.2 --- Developing computation package for Confidence Scoring --- p.152 / Chapter 6.3 --- Study Three: Traditional Scoring and Confidence Scoring --- p.153 / Chapter 6.4 --- Limitations --- p.154 / Chapter 6.5 --- Conclusion --- p.155 / Chapter 6.6 --- Future agendas: Where are we heading? --- p.157 / Chapter 6.6.1 --- Investigating more features representing the construct --- p.158 / Chapter 6.6.2 --- Applying Confidence Scoring to different contexts --- p.159 / Chapter 6.6.3 --- Combining automated scoring and raters’ scoring --- p.160 / REFERENCES --- p.161 / APPENDICES --- p.176 / Appendix 1 --- p.176 / Appendix 2 --- p.177 / Appendix 3 --- p.184 / Appendix 4 --- p.188 / Appendix 5 --- p.192 / Appendix 6 --- p.193 / Appendix 7 --- p.195 / Appendix 8 --- p.199 / Appendix 9 --- p.203 / Appendix 10 --- p.207

Identiferoai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328429
Date January 2012
ContributorsJin, Tan., Chinese University of Hong Kong Graduate School. Division of Education.
Source SetsThe Chinese University of Hong Kong
LanguageEnglish, Chinese, Chinese
Detected LanguageEnglish
TypeText, bibliography
Formatelectronic resource, electronic resource, remote, 1 online resource (xii, 210 leaves) : ill. (chiefly col.)
RightsUse of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Page generated in 0.0027 seconds