Keywords: Zipf's law, mutual information, Markov order estimation, Kolmogorov complexity
TL;DR: We present an impossibility result, called a theorem about facts and words, which pertains to a general communication system.
Abstract: We present an impossibility result, called a theorem about facts and words, which pertains to a general communication system. The theorem states that the number of distinct words used in a finite text is roughly greater than the number of independent elementary persistent facts described in the same text. In particular, this theorem can be related to Zipf's law, power-law scaling of mutual information, and power-law-tailed learning curves. The assumptions of the theorem are: a finite alphabet, linear sequence of symbols, complexity that does not decrease in time, entropy rate that can be estimated, and finiteness of the inverse complexity rate.
In-person Presentation: no