Global ETD Search

Return to search

Standardising Text Complexity : Applying and Optimising a Multi-Scale Probit Model for Cross-Source Classification and Ranking in Swedish

The increasing accessibility of texts has highlighted the need to differentiate them based on complexity. For instance, individuals with reading disabilities such as dyslexia often face greater challenges with complex texts. Similarly, teachers may wish to use texts of varying complexity for different grade levels. These scenarios underscore the necessity of develop- ing a method for classifying texts by their difficulty level. Text complexity here refers to the characteristics of a text that determine its difficulty, independent of the reader. The scarcity of Swedish texts suitable for traditional text complexity classification methods poses a significant challenge that needs to be tackled. The Multi-Scale Probit model employs a Bayesian approach to classify and rank texts of varying complexity from multiple sources. This thesis implements the Multi-Scale Probit model on linguistic features of Swedish easy-to-read books and investigates data augmentation and feature regularisation as optimisation methods for text complexity assessment. Multi-Scale and Single Scale Probit models are implemented using different ratios of training data, and then compared. The results indicate that the Multi-Scale Probit model outper- forms a baseline model and that the multi-scale approach generally surpasses the single scale approach. The first optimisation method demonstrates that data augmentation is a viable approach to enhance performance using available data. The second optimisation method reveals that a feature selection step can improve both the performance and computational efficiency of the model. Overall, the findings suggest that the Multi-Scale Probit model is an effective method for classifying and ranking new texts, though there is room for further performance improvements.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-205491

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-205491
Date	January 2024
Creators	Andersson, Elsa
Publisher	Linköpings universitet, Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0015 seconds

Standardising Text Complexity : Applying and Optimising a Multi-Scale Probit Model for Cross-Source Classification and Ranking in Swedish

Description

Links & Downloads

Tags

Additional Fields