Perso-Arabic script is the official writing system in Iran. Romanized transcriptions, based on phonology of Persian, have been extensively used in electronic communications especially on Internet. Dealing with the conversion between these two types of writing systems has been an interesting topic in Natural Language Processing. Similar to Machine Translation, these conversions can be applied at different grammatical layers; such as sentence, phrase or word layer. In this thesis, by choosing Dabire as a standard Romanized transcription, we introduce two approaches to achieve such conversions at word level. In Lexicon-based approach we use Finite State Technology for bi-directional conversion between Perso-Arabic and Dabire. The second approach uses association analysis for statistical conversion from Perso-Arabic to Dabire.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-61029 |
Date | January 2010 |
Creators | Yaesoubi, Maziar |
Publisher | Linköpings universitet, Institutionen för datavetenskap |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/masterThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0017 seconds