Return to search

Spelling Normalization of English Student Writings

Spelling normalization is the task to normalize non-standard words into standard words in texts, resulting in a decrease in out-of-vocabulary (OOV) words in texts for natural language processing (NLP) tasks such as information retrieval, machine translation, and opinion mining, improving the performance of various NLP applications on normalized texts. In this thesis, we explore different methods for spelling normalization of English student writings including traditional Levenshtein edit distance comparison, phonetic similarity comparison, character-based Statistical Machine Translation (SMT) and character-based Neural Machine Translation (NMT) methods. An important improvement of our implementation is that we develop an approach combining Levenshtein edit distance and phonetic similarity methods with added components of frequency count and compound splitting and it is evaluated as a best approach with 0.329% accuracy improvement and 63.63% error reduction on the original unnormalized test set.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-361925
Date January 2018
CreatorsHONG, Yuchan
PublisherUppsala universitet, Institutionen för lingvistik och filologi
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0017 seconds