Toggle navigation
OpenReview
.net
Login
×
Go to
DBLP
homepage
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
,
Peter J. Liu
,
Lechao Xiao
,
Katie E. Everett
,
Alexander A. Alemi
,
Ben Adlam
,
John D. Co-Reyes
,
Izzeddin Gur
,
Abhishek Kumar
,
Roman Novak
,
Jeffrey Pennington
,
Jascha Sohl-Dickstein
,
Kelvin Xu
,
Jaehoon Lee
,
Justin Gilmer
,
Simon Kornblith
Published: 01 Jan 2024, Last Modified: 15 May 2025
ICLR 2024
Everyone
Revisions
BibTeX
CC BY-SA 4.0
Loading