<p>The present dissertation investigated
the impact of scales / scoring methods and prompt linguistic features on the
meausrement quality of L2 English elicited imitation (EI). Scales / scoring
methods are an important feature for the validity and reliabilty of L2 EI test,
but less is known (Yan et al., 2016). Prompt linguistic features are also known
to influence EI test quaity, particularly item difficulty, but item
discrimination or corpus-based, fine-grained meausres have rarely been incorporated
into examining the contribution of prompt linguistic features. The current
study addressed the research needs, using item response theory (IRT) and random
forest modeling.</p><p>Data consisted of 9,348 oral responses
to forty-eight items, including EI prompts, item scores, and rater comments, which
were collected from 779 examinees of an L2 English EI test at Purdue
Universtiy. First, the study explored the current and alternative EI scales / scoring
methods that measure grammatical / semantic accuracy, focusing on optimal IRT-based
measurement qualities (RQ1 through RQ4 in Phase Ⅰ). Next, the project
identified important prompt linguistic features that predict EI item difficulty
and discrimination across different scales / scoring methods and proficiency, using
multi-level modeling and random forest regression (RQ5 and RQ6 in Phase
Ⅱ).</p><p>The main findings were
(although not limited to): 1) collapsing exact repetition and paraphrase
categories led to more optimal measurement (i.e., adequacy of item parameter values, category
functioning, and model / item / person fit) (RQ1); there were fewer misfitting
persons with lower proficiency and higher frequency of unexpected responses in
the extreme categories (RQ2); the inconsistency of qualitatively distinguishing
semantic errors and the wide range of grammatical accuracy in the minor error
category contributed to misfit (RQ3); a quantity-based, 4-category ordinal
scale outperformed quality-based or binary scales (RQ4); sentence length
significantly explained item difficulty only, with small variance explained
(RQ5); Corpus-based lexical measures and
phrase-level syntactic complexity were important to predicting item difficulty,
particularly for the higher ability level. The findings made implications for
EI scale / item development in human and automatic scoring settings and L2
English proficiency development.</p>
Identifer | oai:union.ndltd.org:purdue.edu/oai:figshare.com:article/16807516 |
Date | 14 October 2021 |
Creators | Ji-young Shin (11560495) |
Source Sets | Purdue University |
Detected Language | English |
Type | Text, Thesis |
Rights | CC BY 4.0 |
Relation | https://figshare.com/articles/thesis/Towards_optimal_measurement_and_theoretical_grounding_of_L2_English_elicited_imitation_Examining_scales_mis_fits_and_prompt_features_from_item_response_theory_and_random_forest_approaches/16807516 |
Page generated in 0.0018 seconds