Return to search

Standardising Text Complexity : Applying and Optimising a Multi-Scale Probit Model for Cross-Source Classification and Ranking in Swedish

The increasing accessibility of texts has highlighted the need to differentiate them based on complexity. For instance, individuals with reading disabilities such as dyslexia often face greater challenges with complex texts. Similarly, teachers may wish to use texts of varying complexity for different grade levels. These scenarios underscore the necessity of develop- ing a method for classifying texts by their difficulty level. Text complexity here refers to the characteristics of a text that determine its difficulty, independent of the reader. The scarcity of Swedish texts suitable for traditional text complexity classification methods poses a significant challenge that needs to be tackled.  The Multi-Scale Probit model employs a Bayesian approach to classify and rank texts of varying complexity from multiple sources. This thesis implements the Multi-Scale Probit model on linguistic features of Swedish easy-to-read books and investigates data augmentation and feature regularisation as optimisation methods for text complexity assessment. Multi-Scale and Single Scale Probit models are implemented using different ratios of training data, and then compared. The results indicate that the Multi-Scale Probit model outper- forms a baseline model and that the multi-scale approach generally surpasses the single scale approach. The first optimisation method demonstrates that data augmentation is a viable approach to enhance performance using available data. The second optimisation method reveals that a feature selection step can improve both the performance and computational efficiency of the model. Overall, the findings suggest that the Multi-Scale Probit model is an effective method for classifying and ranking new texts, though there is room for further performance improvements.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-205491
Date January 2024
CreatorsAndersson, Elsa
PublisherLinköpings universitet, Institutionen för datavetenskap
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0015 seconds