We introduce and discuss a number of issues that arise in the process of building a finite-state morphological analyzer for Urdu, in particular issues with potential ambiguity and non-concatenative morphology. Our approach allows for an underlyingly similar treatment of both Urdu and Hindi via a cascade of finite-state transducers that transliterates the very different scripts into a common ASCII transcription system. As this transliteration system is based on the XFST tools that the Urdu/Hindi common morphological analyzer is also implemented in, no compatibility problems arise.
Identifer | oai:union.ndltd.org:Potsdam/oai:kobv.de-opus-ubp:2715 |
Date | January 2008 |
Creators | Bögel, Tina, Butt, Miriam, Hautli, Annette, Sulger, Sebastian |
Publisher | Universität Potsdam, Extern. Extern |
Source Sets | Potsdam University |
Language | English |
Detected Language | English |
Type | InProceedings |
Format | application/pdf |
Rights | http://opus.kobv.de/ubp/doku/urheberrecht.php |
Page generated in 0.0016 seconds