Compression Represents Intelligence Linearly

Yuzhen Huang; Jinghan Zhang; Zifei Shan; Junxian He

Compression Represents Intelligence Linearly

Yuzhen Huang, Jinghan Zhang, Zifei Shan, Junxian He

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: Evaluation, Science of LMs

Keywords: compression, language models, linear correlation

TL;DR: Viewing language models as compressors, we show that the "intelligence" of language models as reflected by benchmark scores is linearly correlated with their compression capabilities

Abstract: There is a belief that learning to compress well will lead to intelligence. Recently, language modeling has been shown to be equivalent to compression, which offers a compelling rationale for the success of large language models (LLMs): development of more advanced language models is essentially enhancing compression which facilitates intelligence. Despite such appealing discussions, little empirical evidence is present for the interplay between compression and intelligence. In this work, we examine the relationship between compression and intelligence in the context of LLMs, treating LLMs as data compressors. Given the abstract concept of "intelligence", we adopt the average downstream benchmark scores as a surrogate, specifically targeting intelligence related to knowledge and commonsense, coding, and mathematical reasoning. Across 12 benchmarks, our study brings together 31 public LLMs that vary in size and originate from diverse organizations. Remarkably, we find that LLMs' intelligence -- reflected by benchmark scores -- almost **linearly** correlates with their ability to compress external text corpora. These results provide concrete evidence supporting the belief that superior compression indicates greater intelligence. Furthermore, our findings suggest that compression efficiency, as an unsupervised metric derived from raw text corpora, serves as a reliable evaluation measure that is linearly associated with the model capabilities. This work advocates for the adoption of compression performance as a stable, flexible, and reliable metric for evaluating LLMs. We open-source our compression datasets as well as our data collection pipelines to facilitate future researchers to assess compression properly.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 817

Loading