GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sidney Black; Stella Biderman; Eric Hallahan; Quentin Gregory Anthony; Leo Gao; Laurence Golding; Horace He; Connor Leahy; Kyle McDonell; Jason Phang; Michael Martin Pieler; USVSN Sai Prashanth; Shivanshu Purohit; Laria Reynolds; Jonathan Tow; Ben Wang; Samuel Weinbach

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sidney Black, Stella Biderman, Eric Hallahan, Quentin Gregory Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Martin Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach

Published: 09 Apr 2022, Last Modified: 04 Aug 2025BigScience#5Readers: Everyone

Keywords: scaling laws, language modeling, pretraining, open source

TL;DR: We train GPT-NeoX-20B, a 20 billion parameter autoregressive language model; code and weights will be made available to the public

Abstract: We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe GPT-NeoX-20B's architecture and training, and evaluate its performance. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/gpt-neox-20b-an-open-source-autoregressive/code)

1 Reply

Loading