Let Large Language Models Find the Data to Train Themselves

Fanqi Wan; Deng Cai; Shijue Huang; Xiaojun Quan; Mingxuan Wang

Let Large Language Models Find the Data to Train Themselves

Fanqi Wan, Deng Cai, Shijue Huang, Xiaojun Quan, Mingxuan Wang

26 Sept 2024 (modified: 23 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Self-improving, Synthetic Data, Large Language Models

TL;DR: We pioneer the idea of automating the data search process for training LLMs, a task currently handled by expert human efforts, making a further step towards fully automated self-improving AI systems capable of continuous learning and adaptation.

Abstract: The current iterative development process for large language models (LLMs) is heavily data-centric, relying on human researchers and engineers to manually analyze model performance and determine what data to acquire for further training. However, this human-supervised approach is costly and may fail to identify optimal training signals. Its scalability is further limited as models become increasingly capable and may eventually exceed human intelligence. To address these issues, we propose an automated framework that enables models to autonomously discover and strategically acquire the most valuable training data to enhance their performance. It establishes a self-improving framework where models can invoke APIs to crawl and/or generate tailored datasets from various resources and environments, and retrain themselves. The data selection decisions are shaped by reinforcement feedback signals that reward performance gains while penalizing computational overhead. This formulation incentivizes models to develop self-knowledge about their strengths and areas for improvement in order to efficiently select training data. Empirical results demonstrate that LLMs operating within our framework are able to autonomously and strategically acquire valuable training data to enhance their performance across a variety of skills in 1,000 diverse in-house test tasks and three public benchmarks.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6455

Loading