Global ETD Search

Return to search

A Probabilistic Morphological Analyzer for Syriac

We show that a carefully crafted probabilistic morphological analyzer significantly outperforms a reasonable, naive baseline for Syriac. Syriac is an under-resourced Semitic language for which there are no available language tools such as morphological analyzers. Such tools are widely used to contribute to the process of annotating morphologically complex languages. We introduce and connect novel data-driven models for segmentation, dictionary linkage, and morphological tagging in a joint pipeline to create a probabilistic morphological analyzer requiring only labeled data. We explore the performance of this model with varying amounts of training data and find that with about 34,500 tokens, it can outperform the baseline trained on over 99,000 tokens and achieve an accuracy of just over 80%. When trained on all available training data, this joint model achieves 86.47% accuracy — a 29.7% reduction in error rate over the baseline.

segmentation

dictionary linkage

morphological tagging

Identifer	oai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-3199
Date	08 July 2010
Creators	McClanahan, Peter J.
Publisher	BYU ScholarsArchive
Source Sets	Brigham Young University
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations
Rights	http://lib.byu.edu/about/copyright/

Page generated in 0.0024 seconds

A Probabilistic Morphological Analyzer for Syriac

Description

Links & Downloads

Tags

Additional Fields