Global ETD Search

Return to search

Spelling Normalization of English Student Writings

Spelling normalization is the task to normalize non-standard words into standard words in texts, resulting in a decrease in out-of-vocabulary (OOV) words in texts for natural language processing (NLP) tasks such as information retrieval, machine translation, and opinion mining, improving the performance of various NLP applications on normalized texts. In this thesis, we explore diﬀerent methods for spelling normalization of English student writings including traditional Levenshtein edit distance comparison, phonetic similarity comparison, character-based Statistical Machine Translation (SMT) and character-based Neural Machine Translation (NMT) methods. An important improvement of our implementation is that we develop an approach combining Levenshtein edit distance and phonetic similarity methods with added components of frequency count and compound splitting and it is evaluated as a best approach with 0.329% accuracy improvement and 63.63% error reduction on the original unnormalized test set.

http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-361925

spelling normalization

English student writings

phonetic similarity comparison

Levenshtein edit distance

General Language Studies and Linguistics

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-361925
Date	January 2018
Creators	HONG, Yuchan
Publisher	Uppsala universitet, Institutionen för lingvistik och filologi
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0019 seconds

Spelling Normalization of English Student Writings

Description

Links & Downloads

Tags

Additional Fields