Transformers Learning Contrafactives

INRIA Sémagramme 2025 SIR01 Submission3 Authors

18 Jul 2025 (modified: 21 Jul 2025)INRIA Sémagramme 2025 SIR01 SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: attitude ascriptions, language universals, transformer models
TL;DR: We investigate the purported language universal of the absence of contrafactives and explore a larger variety of distributions than previous contributions
Abstract: No natural language is known to have contrafactive attitude verbs, yet factives are common across natural languages. Several experiments by Strohmaier and Wimmer (2022, 2023, 2025) try to explain this asymmetry via a learnability difference, using transformers as model learners. But they do not explore empirically-founded data distributions. We fill this gap, further improving the overall quality of training data distributions using linear programming. Our results confirm Strohmaier and Wimmer's (2025) conclusion that there is no learnability difference in production, while establishing the impact of differences in data distributions.
Submission Number: 3
Loading