Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient DescentDownload PDFOpen Website

2021 (modified: 11 Nov 2021)EMNLP (1) 2021Readers: Everyone
Abstract: William Merrill, Vivek Ramanujan, Yoav Goldberg, Roy Schwartz, Noah A. Smith. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021.
0 Replies

Loading