A Statistical Typology of (Textual) Language in Finer GranularityDownload PDF

Anonymous

16 May 2021 (modified: 05 May 2023)ACL ARR 2021 May Blind SubmissionReaders: Everyone
Abstract: We propose a character-level perspective for a new understanding and visualization of language, in its textual representation in computing, using relative line length and character vocabulary size from parallel corpora as parameters. We discover an emergent pattern with a natural, continuous order to languages. We highlight some of the outlier languages and discuss the opportunities and challenges in line for character- and byte-level development in language technology.
0 Replies

Loading