Toggle navigation
OpenReview
.net
Login
×
Go to
ICML 2020
homepage
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
,
Yunchang Yang
,
Di He
,
Kai Zheng
,
Shuxin Zheng
,
Chen Xing
,
Huishuai Zhang
,
Yanyan Lan
,
Liwei Wang
,
Tie-Yan Liu
2020 (modified: 09 Sept 2021)
ICML 2020
Readers:
Everyone
Abstract:
The Transformer is widely used in natural language processing tasks. To train a Transformer however, one usually needs a carefully designed learning rate warm-up stage, which is shown to be crucial...
0 Replies
Loading