With a Little Help from Gzip: Text Classification with No Training

Zhiying Jiang, Matthew Y.R. Yang, Mikhail Tsirlin, Raphael Tang, Jimmy Lin

09 Jul 2022 (modified: 13 Nov 2022)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

TL;DR: gzip for text classification

Abstract: In text classification, neural network methods often turn out to be an overkill. Compressor-based methods are simpler, but previous works in this area fail to achieve a result comparable with neural network methods. In this paper, we combine a simple compressor like gzip with a k-nearest-neighbor classifier for text classification. Without any training, pre-training or fine-tuning, our method achieves results that are competitive with deep learning methods on seven datasets, and it even outperforms BERT and sentence-BERT on one dataset. In addition, we demonstrate the robustness of our method in the few-shot setting.

0 Replies