BiG-Transformer: Integrating Hierarchical Features for Transformer via Bipartite Graph

Xiaobo Shu, Mengge Xue, Yanzeng Li, Zhenyu Zhang, Tingwen Liu

2020 (modified: 26 Oct 2021)IJCNN 2020Readers: Everyone

Abstract: Self-attention based models like Transformer have achieved great success on kinds of Natural Language Processing tasks. However, the traditional fixed fully-connected structure faces many challenges in practice, such as computing redundancy, fixed granularity, and inexplicable. In this paper, we present BiG-Transformer, which employs attention with bipartite-graph structure to replace the fully-connected self-attention mechanism in Transformer. Specifically, two parts of the graph are designed for integrating hierarchical semantic information, and two types of connection are proposed to fuse information from different positions. Experiments on four tasks show the BiG-Transformer achieves better performance compared to Transformer liked models and Recurrent Neural Networks.

0 Replies