Abstract: We propose a character-level perspective for a new understanding and visualization of language, in its textual representation in computing, using relative line length and character vocabulary size from parallel corpora as parameters. We discover an emergent pattern with a natural, continuous order to languages. We highlight some of the outlier languages and discuss the opportunities and challenges in line for character- and byte-level development in language technology.
0 Replies
Loading