Symbolic Autoencoding for Self-Supervised Sequence Learning

Mohammad Hossein Amani; Nicolas Baldwin; Amin Mansouri; Martin Josifoski; Maxime Peyrard; Robert West

Symbolic Autoencoding for Self-Supervised Sequence Learning

Mohammad Hossein Amani, Nicolas Baldwin, Amin Mansouri, Martin Josifoski, Maxime Peyrard, Robert West

Published: 27 Jun 2024, Last Modified: 20 Aug 2024Differentiable Almost EverythingEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Symbolic autoencoding, self-supervised learning, discrete auto-encoding, discrete representation learning, straight-though gradient estimation

TL;DR: A paradigm based on straight through gradient estimations for connecting sequence-to-sequence models and training them end-to-end, similar to an auto-encoder where the hidden representation is discrete and sequential, i.e. sentences from a language.

Abstract: Traditional language models(LMs) excel at next-token prediction in text sequences but often struggle with transduction tasks involving distinct symbolic systems, particularly when parallel data is scarce or nonexistent. This issue is even more pronounced in domains dealing with complex, non-natural language sequences, such as audio signals, protein structures, or biological sequences, where the strengths of LMs in natural language do not directly translate. To address this challenge, we introduce symbolic autoencoding ($\Sigma$AE), a self-supervised framework designed to exploit the wealth of non-parallel data alongside limited parallel data. $\Sigma$AE integrates two generative models via a discrete bottleneck layer, optimizing the entire system end-to-end by minimizing unsupervised reconstruction loss for all data such that the sequence generated at the discrete bottleneck can be read out as the transduced input sequence, and separately optimizing the two models with supervised loss on the subset of labeled parallel data. To allow optimization of the models in the presence of discrete symbols, we use a family of straight-through gradient estimators. We demonstrate the effectiveness of $\Sigma$AE on four sequence-to-sequence transduction tasks, showing that it significantly outperforms strong baselines in weakly supervised settings.

Submission Number: 32

Loading