A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention

Grégoire Mialon; Dexiong Chen; Alexandre d'Aspremont; Julien Mairal

A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention

Grégoire Mialon, Dexiong Chen, Alexandre d'Aspremont, Julien Mairal

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: bioinformatics, optimal transport, kernel methods, attention, transformers

Abstract: We address the problem of learning on sets of features, motivated by the need of performing pooling operations in long biological sequences of varying sizes, with long-range dependencies, and possibly few labeled data. To address this challenging task, we introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost. Our aggregation technique admits two useful interpretations: it may be seen as a mechanism related to attention layers in neural networks, or it may be seen as a scalable surrogate of a classical optimal transport-based kernel. We experimentally demonstrate the effectiveness of our approach on biological sequences, achieving state-of-the-art results for protein fold recognition and detection of chromatin profiles tasks, and, as a proof of concept, we show promising results for processing natural language sequences. We provide an open-source implementation of our embedding that can be used alone or as a module in larger learning models at https://github.com/claying/OTK.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We propose a new, trainable embedding for large sets of features such as biological sequences, and demonstrate its effectiveness.

Supplementary Material: zip

Code: [![github](/images/github_icon.svg) claying/OTK](https://github.com/claying/OTK)

Data: [GLUE](https://paperswithcode.com/dataset/glue), [SST](https://paperswithcode.com/dataset/sst), [SST-2](https://paperswithcode.com/dataset/sst-2)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2006.12065/code)

13 Replies

Loading