Catformer: Designing Stable Transformers via Sensitivity AnalysisDownload PDFOpen Website

Published: 2021, Last Modified: 12 May 2023ICML 2021Readers: Everyone
Abstract: Transformer architectures are widely used, but training them is non-trivial, requiring custom learning rate schedules, scaling terms, residual connections, careful placement of submodules such as n...
0 Replies

Loading