Keywords: efficient, pretraining, LLMs
TL;DR: We detail a preliminary attempt to train a GPT3-quality model in under 10K USD
Abstract: We describe a system to pretrain a 4B-parameter model, called CloverLM, aimed at zero-shot performance similar to the standard GPT3-175 / OPT-175B models, in a highly cost-effective manner. Our approach works by combining multiple known techniques: 1) Accurate native NVFP4 training via the Quartet II algorithm, 2) High-quality data training on the CLIMB dataset; 3) Several model- and framework-specific optimizations. While we claim no technical novelty, it is notable that we can reach OPT-175B-level of accuracy on multi-choice zero-shots in pure NVFP4 using approximately 1600 B300 GPU hours, for an estimated cost between USD 5'600 (spot) and USD 10'000 (on-demand), for the main run, on a sustainable neocloud provider. Our code is openly available.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 224
Loading