Semantic Representations of Mathematical Expressions in a Continuous Vector Space

Published: 02 Sept 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Mathematical notation makes up a large portion of STEM literature, yet finding semantic representations for formulae remains a challenging problem. Because mathematical notation is precise, and its meaning changes significantly with small character shifts, the methods that work for natural text do not necessarily work well for mathematical expressions. This work describes an approach for representing mathematical expressions in a continuous vector space. We use the encoder of a sequence-to-sequence architecture, trained on visually different but mathematically equivalent expressions, to generate vector representations (or embeddings). We compare this approach with a structural approach that considers visual layout to embed an expression and show that our proposed approach is better at capturing mathematical semantics. Finally, to expedite future research, we publish a corpus of equivalent transcendental and algebraic expression pairs.
Submission Length: Regular submission (no more than 12 pages of main content)
Code: https://github.com/mlpgroup/expemb
Supplementary Material: zip
Assigned Action Editor: ~Yonatan_Bisk1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1003
Loading