Cat2Vec: Learning Distributed Representation of Multi-field Categorical Data

Ying Wen, Jun Wang, Tianyao Chen, Weinan Zhang

Nov 04, 2016 (modified: Nov 04, 2016) ICLR 2017 conference submission readers: everyone
  • Abstract: This paper presents a method of learning distributed representation for multi-field categorical data, which is a common data format with various applications such as recommender systems, social link prediction, and computational advertising. The success of non-linear models, e.g., factorisation machines, boosted trees, has proved the potential of exploring the interactions among inter-field categories. Inspired by Word2Vec, the distributed representation for natural language, we propose Cat2Vec (categories to vectors) model. In Cat2Vec, a low-dimensional continuous vector is automatically learned for each category in each field. The interactions among inter-field categories are further explored by different neural gates and the most informative ones are selected by pooling layers. In our experiments, with the exploration of the interactions between pairwise categories over layers, the model attains great improvement over state-of-the-art models in a supervised learning task, e.g., click prediction, while capturing the most significant interactions from the data.
  • TL;DR: an unsupervised pairwise interaction model to learning the distributed representation of multi-field categorical data
  • Keywords: Unsupervised Learning, Deep learning, Applications
  • Conflicts:,