Abstract: Long document understanding is a challenging problem in natural language understanding. Most current transformer-based models only employ textual information for attention calculation due to high computation limit. To address those issues for long document understanding, we explore new approaches using different position-aware attention masks and investigate their performance on different benchmarks. Experimental results show that our models have the advantages on long document understanding based on various evaluation metrics. Furthermore, our approach makes changes only to the attention module in the transformer and thus can be flexibly detached and plugged into any other transformer-based solutions with ease.
0 Replies
Loading