Processing Spontaneous OrthographyDownload PDF

2013 (modified: 16 Jul 2019)HLT-NAACL 2013Readers: Everyone
Abstract: In cases in which there is no standard orthography for a language or language variant, written texts will display a variety of orthographic choices. This is problematic for natural language processing (NLP) because it creates spurious data sparseness. We study the transformation of spontaneously spelled Egyptian Arabic into a conventionalized orthography which we have previously proposed for NLP purposes. We show that a two-stage process can reduce divergences from this standard by 69%, making subsequent processing of Egyptian Arabic easier.
0 Replies

Loading