Compact Embedding of Binary-coded Inputs and Outputs using Bloom Filters

Joan Serrà; Alexandros Karatzoglou

Compact Embedding of Binary-coded Inputs and Outputs using Bloom Filters

Joan Serrà, Alexandros Karatzoglou

18 Aug 2025 (modified: 21 Jul 2022)Submitted to ICLR 2017Readers: Everyone

TL;DR: Bloom embeddings allow a compact and accurate representation of high-dimensional binary inputs and/or outputs

Abstract: The size of neural network models that deal with sparse inputs and outputs is often dominated by the dimensionality of those inputs and outputs. Large models with high-dimensional inputs and outputs are difficult to train due to the limited memory of graphical processing units, and difficult to deploy on mobile devices with limited hardware. To address these difficulties, we propose Bloom embeddings, a compression technique that can be applied to the input and output of neural network models dealing with sparse high-dimensional binary-coded instances. Bloom embeddings are computationally efficient, and do not seriously compromise the accuracy of the model up to 1/5 compression ratios. In some cases, they even improve over the original accuracy, with relative increases up to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4 alternative methods, obtaining favorable results.

Conflicts: telefonica.com

5 Replies

Loading