Enhancing the Transformer with explicit relational encoding for math problem solving

Imanol Schlag; Paul Smolensky; Roland Fernandez; Nebojsa Jojic; Jürgen Schmidhuber; Jianfeng Gao

Enhancing the Transformer with explicit relational encoding for math problem solving

Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic, Jürgen Schmidhuber, Jianfeng Gao

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: Our Tensor-Product Transformer sets a new state of the art on the recently-introduced Mathematics Dataset containing 56 categories of free-form math word-problems.

Abstract: We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure. Our Tensor-Product Transformer (TP-Transformer) sets a new state of the art on the recently-introduced Mathematics Dataset containing 56 categories of free-form math word-problems. The essential component of the model is a novel attention mechanism, called TP-Attention, which explicitly encodes the relations between each Transformer cell and the other cells from which values have been retrieved by attention. TP-Attention goes beyond linear combination of retrieved values, strengthening representation-building and resolving ambiguities introduced by multiple layers of regular attention. The TP-Transformer's attention maps give better insights into how it is capable of solving the Mathematics Dataset's challenging problems. Pretrained models and code will be made available after publication.

Keywords: Tensor Product Representation, Transformer, Mathematics Dataset, Attention

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/enhancing-the-transformer-with-explicit/code)

Original Pdf: pdf

10 Replies

Loading