Self-Improving Foundation Models Without Human Supervision

Published: 03 Dec 2024, Last Modified: 03 Dec 2024ICLR 2025 Workshop ProposalsEveryoneRevisionsBibTeXCC BY 4.0
Keywords: foundation models, self-improvement, RL, synthetic data
TL;DR: A workshop on developing algorithms and training methods for self-improvement of foundation models
Abstract: As foundation models (FMs) scale, they face a data bottleneck, where the growth of high-quality internet data unable to keep pace with their training needs. This is most apparent with text data already, has been a consistent problem in domains such as embodied intelligence, and is expected to soon inflict other modalities as well. ***Self-improvement***, a paradigm where models generate and train on synthetic data generated from the same or other models, offers a promising solution. This paradigm differs from both supervised learning, which relies on curated human data, and reinforcement learning (RL), which depends on external rewards. Self-improvement frameworks require models to self-curate training data, often using imperfect learned verifiers, with unique challenges. This workshop will explore algorithms for self-improvement, covering topics such as synthetic data, multi-agent and multi-modal systems, weak-to-strong generalization, inference-time self-supervision, and theoretical limits.
Submission Number: 103
Loading