Return to search

Towards optimal measurement and theoretical grounding of L2 English elicited imitation: Examining scales, (mis)fits, and prompt features from item response theory and random forest approaches

<p>The present dissertation investigated
the impact of scales / scoring methods and prompt linguistic features on the
meausrement quality of L2 English elicited imitation (EI). Scales / scoring
methods are an important feature for the validity and reliabilty of L2 EI test,
but less is known (Yan et al., 2016). Prompt linguistic features are also known
to influence EI test quaity, particularly item difficulty, but item
discrimination or corpus-based, fine-grained meausres have rarely been incorporated
into examining the contribution of prompt linguistic features. The current
study addressed the research needs, using item response theory (IRT) and random
forest modeling.</p><p>Data consisted of 9,348 oral responses
to forty-eight items, including EI prompts, item scores, and rater comments, which
were collected from 779 examinees of an L2 English EI test at Purdue
Universtiy. First, the study explored the current and alternative EI scales / scoring
methods that measure grammatical / semantic accuracy, focusing on optimal IRT-based
measurement qualities (RQ1 through RQ4 in Phase Ⅰ). Next, the project
identified important prompt linguistic features that predict EI item difficulty
and discrimination across different scales / scoring methods and proficiency, using
multi-level modeling and random forest regression (RQ5 and RQ6 in Phase
Ⅱ).</p><p>The main findings were
(although not limited to): 1) collapsing exact repetition and paraphrase
categories led to more optimal measurement (i.e., adequacy of item parameter values, category
functioning, and model / item / person fit) (RQ1); there were fewer misfitting
persons with lower proficiency and higher frequency of unexpected responses in
the extreme categories (RQ2); the inconsistency of qualitatively distinguishing
semantic errors and the wide range of grammatical accuracy in the minor error
category contributed to misfit (RQ3); a quantity-based, 4-category ordinal
scale outperformed quality-based or binary scales (RQ4); sentence length
significantly explained item difficulty only, with small variance explained
(RQ5); Corpus-based lexical measures and
phrase-level syntactic complexity were important to predicting item difficulty,
particularly for the higher ability level. The findings made implications for
EI scale / item development in human and automatic scoring settings and L2
English proficiency development.</p>

  1. 10.25394/pgs.16807516.v1
Identiferoai:union.ndltd.org:purdue.edu/oai:figshare.com:article/16807516
Date14 October 2021
CreatorsJi-young Shin (11560495)
Source SetsPurdue University
Detected LanguageEnglish
TypeText, Thesis
RightsCC BY 4.0
Relationhttps://figshare.com/articles/thesis/Towards_optimal_measurement_and_theoretical_grounding_of_L2_English_elicited_imitation_Examining_scales_mis_fits_and_prompt_features_from_item_response_theory_and_random_forest_approaches/16807516

Page generated in 0.0018 seconds