Model name,Link to HF,"Params, B",Input hidden size,Vocabulary size,"Max capacity, tokens","PG-19, Gain, tokens","PG-19, CE Reduction","Random, Gain, tokens","Random, CE Reduction","Text CE (512 tokens, PG-19)","Estimated gain, tokens",PG19,Random
Pythia-160M,EleutherAI/pythia-160m,0.16,768,50304,787,71,396,61,501,1817.89,31.03591981,0.09011584734,0.07791398367
Pythia-410M,EleutherAI/pythia-410m,0.41,1024,50304,1049,81,431,77,630,1609.67,45.94921823,0.07750089974,0.0733065091
Pythia-1.4B,EleutherAI/pythia-1.4b,1.40,2048,50304,2098,158,793,144,1108,1486.78,131.6882576,0.07530837736,0.06882613728
Pythia-2.8B,EleutherAI/pythia-2.8b,2.80,2560,50304,2623,150,740,134,1024,1424.14,128.4005274,0.05723436679,0.05120969661
OPT-1.3B,facebook/opt-1.3b,1.30,2048,50272,2098,132,713,129,1066,1557.95,100.9399732,0.06300748288,0.06162532175
OLMo-1B,allenai/OLMo-1B-0724-hf,1.00,2048,50304,2098,406,1901,257,1852,1470.72,846.8432803,0.1936569223,0.1226382626
Sh.Llama-1.3B,princeton-nlp/Sheared-LLaMA-1.3B,1.30,2048,32000,2190,384,1835,315,1893,1400.06,845.747309,0.1751975968,0.1439123116
Llama-3.2-1B,meta-llama/Llama-3.2-1B,1.00,2048,128256,1931,426,2120,295,2265,1490.97,1131.615697,0.2207045223,0.152711787
Llama-3.2-3B,meta-llama/Llama-3.2-3B,3.00,3072,128256,2897,720,3292,457,3383,1383.60,6704.461063,0.2486680235,0.1577348604
Llama-3.1-8B,meta-llama/Llama-3.1-8B ,8.00,4096,128256,3862,1094,4866,623,4541,1266.66,66792.07392,0.2832858023,0.1613597587