Transformer-based Speech Model Learns Well as Infants and Encodes Abstractions through Exemplars in the Poverty of the Stimulus Environment

Yi Yang, Yiming Wang, Jiahong Yuan

Published: 17 Jan 2025, Last Modified: 06 Feb 2025COLING 2025 (Proceedings of the 31st International Conference on Computational Linguistics, Long Paper)EveryoneCC BY 4.0

Abstract: Infants are capable of learning language, predominantly through speech and associations, in impoverished environments—a phenomenon known as the Poverty of the Stimulus (POS). Is this ability uniquely human, as an innate linguistic predisposition, or can it be empirically learned through potential linguistic structures from sparse and noisy exemplars? As an early exploratory work, we systematically designed a series of tasks, scenarios, and metrics to simulate the POS. We found that the emerging speech model wav2vec2.0 with pretrained weights from an English corpus can learn well in noisy and sparse Mandarin environments. We then tested various hypotheses and observed three pieces of evidence for abstraction: label correction, categorical patterns, and clustering effects. We concluded that models can encode hierarchical linguistic abstractions through exemplars in the POS environments. We hope this work offers new insights into language acquisition from a speech perspective and inspires further research.