Return to search

Automatic Readability Detection for Modern Standard Arabic

Research for automatic readability prediction of text has increased in the last decade and has shown that various machine learning methods can effectively address this problem. Many researchers have applied machine learning to readability prediction for English, while Modern Standard Arabic (MSA) has received little attention. Here I describe a system which leverages machine learning to automatically predict the readability of MSA. I gathered a corpus comprising 179 documents that were annotated with the Interagency Language Roundtable (ILR) levels. Then, I extracted lexical and discourse features from each document. Finally, I applied the Tilburg Memory-Based Learning (TiMBL) machine learning system to read these features and predict the ILR level of each document using 10-fold cross validation for both 3-level and 5-level classification tasks and an 80/20 division for a 5-level classification task. I measured performance using the F-score. For 3-level and 5-level classifications my system achieved F-scores of 0.719 and 0.519 respectively. I discuss the implication of these results and the possibility of future development.

Identiferoai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-4982
Date19 March 2014
CreatorsForsyth, Jonathan Neil
PublisherBYU ScholarsArchive
Source SetsBrigham Young University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses and Dissertations
Rightshttp://lib.byu.edu/about/copyright/

Page generated in 0.0016 seconds