Compact Embedding of Binary-coded Inputs and Outputs using Bloom Filters

Joan SerrĂ , Alexandros Karatzoglou

Nov 03, 2016 (modified: Jan 12, 2017) ICLR 2017 conference submission readers: everyone
  • Abstract: The size of neural network models that deal with sparse inputs and outputs is often dominated by the dimensionality of those inputs and outputs. Large models with high-dimensional inputs and outputs are difficult to train due to the limited memory of graphical processing units, and difficult to deploy on mobile devices with limited hardware. To address these difficulties, we propose Bloom embeddings, a compression technique that can be applied to the input and output of neural network models dealing with sparse high-dimensional binary-coded instances. Bloom embeddings are computationally efficient, and do not seriously compromise the accuracy of the model up to 1/5 compression ratios. In some cases, they even improve over the original accuracy, with relative increases up to 12%. We evaluate Bloom embeddings on 7 data sets and compare it against 4 alternative methods, obtaining favorable results. We also discuss a number of further advantages of Bloom embeddings, such as 'on-the-fly' constant-time operation, zero or marginal space requirements, training time speedups, or the fact that they do not require any change to the core model architecture or training configuration.
  • TL;DR: Bloom embeddings compactly represent sparse high-dimensional binary-coded instances without compromising accuracy
  • Conflicts:
  • Keywords: Applications, Deep learning, Unsupervised Learning