Autoencoder-Based General-Purpose Representation Learning for Entity Embedding

Jan Henrik Bertrand; David B. Hoffmann; Jacopo Pio Gargano; Laurent Mombaerts; Jonathan Taws

Autoencoder-Based General-Purpose Representation Learning for Entity Embedding

Jan Henrik Bertrand, David B. Hoffmann, Jacopo Pio Gargano, Laurent Mombaerts, Jonathan Taws

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: customer, embeddings, embedding, tabular, general, purpose, autoencoder, representation learning, general purpose, reconstruction loss, entity, entity embedding, entity representation, contractive autoencoder, dimensionality, reduction, latent, space, representation, feature, regularization, variational autoencoder

TL;DR: We introduce DeepCAE, an enhanced multi-layer contractive autoencoder, and benchmark autoencoder architectures in a general-purpose tabular data embedding framework for reconstruction and downstream performance, with a 34% reconstruction improvement.

Abstract: Recent advances in representation learning have successfully leveraged the underlying domain-specific structure of data across various fields. However, representing diverse and complex entities stored in tabular format within a latent space remains challenging. In this paper, we introduce DeepCAE, a novel method for calculating the regularization term for multi-layer contractive autoencoders (CAEs). Additionally, we formalize a general-purpose entity embedding framework and use it to empirically show that DeepCAE outperforms all other tested autoencoder variants in both reconstruction performance and downstream prediction performance. Notably, when compared to a stacked CAE across 13 datasets, DeepCAE achieves a 34% improvement in reconstruction error.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9893

Loading