Fifty shapes of BLiMP: syntactic learning curves in language models are not uniform, but sometimes unruly
Track: Extended abstract
Keywords: learning curves, grammar acquisition, small models
TL;DR: Performance on grammatical benchmarks does not always increase across the training process, but sometimes deteriorates.
Abstract: Syntactic learning curves in LMs are usually re- ported as stable and power law-shaped. By an- alyzing the learning curves of different LMs on various syntactic phenomena using small, self- trained llama models and larger, pre-trained pythia models, we show that while many phe- nomena do follow typical power law curves, others exhibit S-shaped, U-shaped, or erratic patterns. Certain syntactic paradigms remain challenging even for large models. Moreover, most phenomena show similar curves for their concrete paradigms, but the existence of di- verging patterns and oscillations indicates that average curves mask important developmental differences.
Copyright PDF: pdf
Submission Number: 2
Loading