Language Models Are Better Than Humans at Next-token Prediction

Buck Shlegeris; Fabien Roger; Lawrence Chan; Euan McLean

Language Models Are Better Than Humans at Next-token Prediction

Buck Shlegeris, Fabien Roger, Lawrence Chan, Euan McLean

Published: 15 Jul 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code. However, causal language models are not trained to perform well at these tasks; they are trained to accurately predict the next token given previous tokens in tokenized text. It is not clear whether language models are better or worse than humans at next-token prediction. To try to answer this question, we performed two distinct experiments to directly compare humans and language models on this front: one measuring top-1 accuracy and the other measuring perplexity on OpenWebText. In both experiments, we find humans to be consistently \emph{worse} than relatively small language models like GPT-Neo-1.3B or GPT-2-large at next-token prediction.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/FabienRoger/lm-game-analysis-main

Supplementary Material: zip

Assigned Action Editor: ~W_Ronny_Huang1

Submission Number: 2191

Loading