Cat2Vec: Learning Distributed Representation of Multi-field Categorical Data

Ying Wen; Jun Wang; Tianyao Chen; Weinan Zhang

Cat2Vec: Learning Distributed Representation of Multi-field Categorical Data

Ying Wen, Jun Wang, Tianyao Chen, Weinan Zhang

26 Jul 2025 (modified: 21 Jul 2022)Submitted to ICLR 2017Readers: Everyone

Abstract: This paper presents a method of learning distributed representation for multi-field categorical data, which is a common data format with various applications such as recommender systems, social link prediction, and computational advertising. The success of non-linear models, e.g., factorisation machines, boosted trees, has proved the potential of exploring the interactions among inter-field categories. Inspired by Word2Vec, the distributed representation for natural language, we propose Cat2Vec (categories to vectors) model. In Cat2Vec, a low-dimensional continuous vector is automatically learned for each category in each field. The interactions among inter-field categories are further explored by different neural gates and the most informative ones are selected by pooling layers. In our experiments, with the exploration of the interactions between pairwise categories over layers, the model attains great improvement over state-of-the-art models in a supervised learning task, e.g., click prediction, while capturing the most significant interactions from the data.

TL;DR: an unsupervised pairwise interaction model to learning the distributed representation of multi-field categorical data

Conflicts: ucl.ac.uk, sjtu.eud.cn

Keywords: Unsupervised Learning, Deep learning, Applications

8 Replies

Loading