Fifty shapes of BLiMP: syntactic learning curves in language models are not uniform, but sometimes unruly

Bastian Bunzeck; Sina Zarrieß

Fifty shapes of BLiMP: syntactic learning curves in language models are not uniform, but sometimes unruly

Bastian Bunzeck, Sina Zarrieß

Published: 21 Sept 2024, Last Modified: 06 Oct 2024BlackboxNLP 2024EveryoneRevisionsBibTeXCC BY 4.0

Track: Extended abstract

Keywords: learning curves, grammar acquisition, small models

TL;DR: Performance on grammatical benchmarks does not always increase across the training process, but sometimes deteriorates.

Abstract: Syntactic learning curves in LMs are usually re- ported as stable and power law-shaped. By an- alyzing the learning curves of different LMs on various syntactic phenomena using small, self- trained llama models and larger, pre-trained pythia models, we show that while many phe- nomena do follow typical power law curves, others exhibit S-shaped, U-shaped, or erratic patterns. Certain syntactic paradigms remain challenging even for large models. Moreover, most phenomena show similar curves for their concrete paradigms, but the existence of di- verging patterns and oscillations indicates that average curves mask important developmental differences.

Copyright PDF: pdf

Submission Number: 2

Loading