How to Build a Toddler Lexical Network

Published: 01 Jan 2022, Last Modified: 19 Feb 2025CogSci 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Author(s): Weber, Jennifer; Colunga, Eliana | Abstract: Understanding child language development requires accurately representing children’s lexicons. However, past work modeling children’s lexical-semantic structure typically utilized adult norms and corpora. The present work uses Word2Vec embeddings trained on a newly-created toddler-directed language corpus. Distributional approaches like Word2Vec calculate similarities taking into account not just when words occur together, but also when words occur in similar contexts. A network created from Word2Vec embeddings showed higher accuracy in predicting normed word acquisition from 16 to 30 months using network centrality measures, when compared to a network created using sliding window co-occurrences. We also compared predictions from the Word2Vec toddler network, a network created by training Word2Vec on typical adult input, and a model trained using both corpora. The toddler-only network outperformed the other two, indicating the importance of selecting language sources that reflect the population of interest. The present results reveal a promising new direction in understanding toddler word learning.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview