- Keywords: machine learning, nlp, transformer, tree, structural, seq2seq, code2seq, source code, code, AST, abstract syntax tree, code summarization, machine translation
- TL;DR: We incorporate tree-structured network topologies into transformers by enhancing self-attention with relative position representations to consider relative movements between nodes and outperform the SoTA on several code-to-sequence tasks by up to 6%.
- Abstract: We suggest two extensions to incorporate syntactic information into transformer models operating on linearized trees (e.g. abstract syntax trees). First, we use self-attention with relative position representations to consider structural relationships between nodes using a representation that encodes movements between any pair of nodes, and demonstrate how those movements can be computed efficiently on the fly. Second, we train the network to predict the lowest common ancestor of node pairs using a new structural loss function. We apply both methods to source code summarization tasks, where we outperform the state-of-the-art by up to 6% F1. On natural language machine translation, our models yield competitive results. We also consistently outperform sequence-based transformers, and demonstrate that our method yields representations that are more closely aligned to the AST structure.