• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Readability Assessment with Pre-Trained Transformer Models : An Investigation with Neural Linguistic Features

Ma, Chuchu January 2022 (has links)
Readability assessment (RA) is to assign a score or a grade to a given document, which measures the degree of difficulty to read the document. RA originated in language education studies and was used to classify reading materials for language learners. Later, RA was applied to many other applications, such as aiding automatic text simplification.  This thesis is aimed at improving the way of using Transformer for RA. The motivation is the “pipeline” effect (Tenney et al., 2019) of pretrained Transformers: lexical, syntactic, and semantic features are best encoded with different layers of a Transformer model.  After a preliminary test of a basic RA model that resembles the previous works, we proposed several methods to enhance the performance: by using a Transformer layer that is not the last, by concatenating or mixing the outputs of all layers, and by using syntax-augmented Transformer layers. We examined these enhanced methods on three datasets: WeeBit, OneStopEnglish, and CommonLit.  We observed that the improvements showed a clear correlation with the dataset characteristics. On the OneStopEnglish and the CommonLit datasets, we achieved absolute improvements of 1.2% in F1 score and 0.6% in Pearson’s correlation coefficients, respectively. We also show that an 𝑛-gram frequency- based baseline, which is simple but was not reported in previous works, has superior performance on the classification datasets (WeeBit and OneStopEnglish), prompting further research on vocabulary-based lexical features for RA.

Page generated in 0.08 seconds