On the Word Boundaries of Emergent Languages Based on Harris's Articulation SchemeDownload PDF

Keywords: Emergent Communication, Emergent Language, Unsupervised Word Segmentation, Harris's Articulation Scheme, Compositionality
TL;DR: This paper investigates whether Harris's articulation scheme (HAS) also holds in emergent languages.
Abstract: This paper shows that emergent languages in signaling games lack meaningful word boundaries in terms of Harris's Articulation Scheme (HAS), a universal property of natural language. Emergent Languages are artificial communication protocols arising among agents. However, it is not obvious whether such a simulated language would have the same properties as natural language. In this paper, we test if they satisfy HAS. HAS states that word boundaries can be obtained solely from phonemes in natural language. We adopt HAS-based word segmentation and verify whether emergent languages have meaningful word segments. The experiment suggested they do not have, although they meet some preconditions for HAS. We discovered a gap between emergent and natural languages to be bridged, indicating that the standard signaling game satisfies prerequisites but is still missing some necessary ingredients.
