Rethinking Developmental Curricula for Contrastive Visual Learning

TMLR Paper6602 Authors

21 Nov 2025 (modified: 26 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: While large machine learning models have achieved remarkable results, they still fall short of the efficiency and adaptability characteristic of human perception. Motivated by prior work that draws inspiration from infant visual development, we examine whether commonly used developmental proxies confer measurable benefits under controlled experimental conditions. Within a virtual environment, we systematically modulated four dynamic factors, namely image blur, lighting complexity, avatar movement speed, and scene complexity, as developmentally inspired components of a learning curriculum. However, none of these factor-wise curricula improved downstream classification performance compared with a stable training corpus. We then replicated the experiments on the real-world SAYCam dataset, varying movement speed and scene complexity independently, and observed consistent results. These findings suggest that, under the present training regime and evaluation suite, these particular factor-wise curricula do not inherently confer learning advantages. More broadly, the results contextualize claims that developmental-like progression inherently benefits learning and highlight the need for more principled curriculum design mechanisms. Our results offer a new perspective on curriculum design for self-supervised learning.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We have substantially revised the manuscript to clarify the scope of our claims, strengthen experimental support, and improve the organization and interpretability of the results in response to the reviewers' feedback. All major revisions are marked in blue in the manuscript. **Clarification of Motivation and Developmental Framing** To address concerns regarding our use of the term “developmental curriculum” and its connection to infant perceptual development: - We revised the Abstract and Introduction to avoid implying direct transfer from human developmental processes to neural network training and clarified that infant development serves as conceptual inspiration rather than mechanistic justification. - We added an explicit definition of “developmental curriculum” in the Introduction before introducing our hypothesis. - We expanded the limitations section to emphasize that blur, lighting, movement speed, and complexity are operational proxies rather than comprehensive models of human development. **Generalizability and Strength of Claims** To address concerns about generalizability and broad conclusions: - We refined language throughout the Introduction, Discussion, and Conclusion to avoid broad negative claims about “developmental curricula” in general and explicitly state that our conclusions may not extend to all contrastive learning variants. - We clarified that our findings are specific to the factors, schedule designs, and computational settings examined. - We added additional experiments using BYOL (Bootstrap Your Own Latent) in Appendix C. Developmental schedules similarly fail to yield consistent improvements under this negative-free framework. **Experimental Settings and Convergence** - To address concerns about limited training budgets and potential underfitting: We added Appendix D, analyzing convergence by comparing loss reduction in the final training epochs with the total loss reduction over the full pretraining schedule (House100K). - For MoCo on House100K, we tested a larger memory bank (65,536) and observed similar conclusions (Appendix E). - For SAYCam-S, we clarified that the 12-epoch pretraining follows the original setting. We are currently running additional experiments with doubled pretraining epochs (complexity factor, late-stage/developmental/anti-developmental modes). These are in progress and will be incorporated within two weeks. -We explicitly note that conclusions may be conditioned on computational resources, and that larger-scale extensions remain future work. **Structure, Visualization, and Clarity** To improve clarity and interpretability: - We added a concise summary paragraph at the beginning of the Experimental section to clarify the logic and progression of experiments. - We added Figure 3 (movement speed visualization) and revised Figure 4 (developmental sequence). - We added Section 6.2 to discuss discrepancies with prior work. - Section 6.3 has been reorganized into structured subsections to clearly discuss potential reasons for null results. - We corrected minor inconsistencies (e.g., SAYCam-S, ADev., formatting issues).
Assigned Action Editor: ~Steffen_Schneider1
Submission Number: 6602
Loading