Keywords: data scaling law, meta-learning, representation learning, learning theory
Abstract: Pre-training has become a fundamental paradigm in modern machine learning, with one of its key empirical benefits being reduced downstream sample complexity as the scale of pre-training data increases. In this paper, we introduce complexity minimization, a novel meta-representation learning framework designed to enable theoretical analysis of this scaling behavior. Our end-to-end theoretical analysis proves that an explicitly constructed algorithm within this framework achieves a downstream convergence rate whose exponent improves with pre-training data size, providing a rigorous proof of achievability for scaling-law-type behavior. Empirically, we demonstrate that incorporating complexity regularization into existing meta-learning methods consistently improves downstream sample efficiency.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 73
Loading