Global ETD Search

Return to search

Normalizace pravopisu v arabských dialektech / Orthography Standardization in Arabic Dialects

Orthography Standardization in Arabic Dialects Abstract Christian Cayralat1 1 Charles University Spontaneous orthography in Arabic dialects poses one of the biggest ob- stacles in the way of Dialectal Arabic NLP applications. As the Arab world enjoys a wide array of these widely spoken and recently written, non-standard, low-resource varieties, this thesis presents a detailed account of this relatively overlooked phenomenon. It sets out to show that continuously creating addi- tional noise-free, manually standardized corpora of Dialectal Arabic does not free us from the shackles of non-standard (spontaneous) orthography. Because real-world data will most often come in a noisy format, it also investigates ways to ease the amount of noise in textual data. As a proof of concept, we restrict ourselves to one of the dialectal varieties, namely, Lebanese Arabic. It also strives to gain a better understanding of the nature of the noise and its distri- bution. All of this is done by leveraging various spelling correction and morpho- logical tagging neural architectures in a multi-task setting, and by annotating a Lebanese Arabic corpus for spontaneous orthography standardization, and morphological segmentation and tagging, among other features. Additionally, a detailed taxonomy of spelling inconsistencies for...

http://www.nusl.cz/ntk/nusl-451841

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:451841
Date	January 2021
Creators	Cayralat, Christian
Contributors	Zeman, Daniel, Straňák, Pavel
Source Sets	Czech ETDs
Language	English
Detected Language	English
Type	info:eu-repo/semantics/masterThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.0019 seconds

Normalizace pravopisu v arabských dialektech / Orthography Standardization in Arabic Dialects

Description

Links & Downloads

Tags

Additional Fields