Processing Spontaneous Orthography

Ramy Eskander, Nizar Habash, Owen Rambow, Nadi Tomeh

2013 (modified: 16 Jul 2019)HLT-NAACL 2013Readers: Everyone

Abstract: In cases in which there is no standard orthography for a language or language variant, written texts will display a variety of orthographic choices. This is problematic for natural language processing (NLP) because it creates spurious data sparseness. We study the transformation of spontaneously spelled Egyptian Arabic into a conventionalized orthography which we have previously proposed for NLP purposes. We show that a two-stage process can reduce divergences from this standard by 69%, making subsequent processing of Egyptian Arabic easier.

0 Replies